Natural Language Interaction
Overview of the CMU SLM Toolkit, Rev 1.0
Microsoft researchers are using data mined from the Internet to develop Engkoo, an online Chinese-to-English dictionary and language-practice service. The technology could be used in similar tools to learn any language. Engkoo has a core of translation data drawn from Microsoft-licensed dictionaries. That content is mixed with data from Web sites with parallel Chinese and English versions. When an Engkoo user types a word or sentence into the Web site's input bar, in either Chinese or English, the site draws on statistics from its data to translate it.
Evaluating Natural Language Generation (NLG) systems is a notoriously hard problem: Unlike natural language interpretation, where annotated corpora may provide a gold standard against which a system can be measured, there are generally multiple equally good outputs that an NLG system might produce. On the other hand, access to human experimental subjects who could judge the quality of the system's output is usually too expensive for large-scale use. Nevertheless, there has recently been an increased interest in shared tasks and new methodologies for evaluating and comparing NLG systems.
Language Analysis Tool to Ascertain Age and Gender The Engineer (United Kingdom) (06/22/10)
The Linguistic Data Consortium supports language-related education, research and technology development by creating and sharing linguistic resources: data, tools and standards.
Athree-year project focused on the problem of real-time understanding of spontaneous speech in the context of advanced telecom services. The main objective of LUNA is the creation of a robust natural spoken language understanding toolkit for multilingual dialogue services, able to carry out human-computer communication with a good degree of user satisfaction.
Carnegie Mellon University (CMU) researchers are developing the Never-Ending Language Learning (NELL) system, a computer that can master semantics by learning more like a human. NELL was provided with basic knowledge in various categories and connected to the Web with a mission to teach itself. "For all the advances in computer science, we still don't have a computer that can learn as humans do, cumulatively, over the long term," says CMU professor Tom M. Mitchell.
The Rosetta Project is The Long Now Foundation's first exploration into very long-term archiving. It serves as a means to focus attention on the problem of digital obsolescence, and ways we might address that problem through creative archival storage methods.
SIL International Partners in Language Development SIL serves language communities worldwide, building their capacity for sustainable language development, by means of research, translation, training and materials development.
European researchers working on the Statistical Multilingual Analysis for Retrieval and Translation (SMART) project have developed technology that will enable machine translation using statistical analysis. SMART researchers were inspired by the Pascal Network of Excellence, which sought to develop cooperative ties among Europe's leaders in pattern analysis, statistical modeling, and computational learning.
RDF is a directed, labeled graph data format for representing information in the Web. This specification defines the syntax and semantics of the SPARQL query language for RDF. SPARQL can be used to express queries across diverse data sources, whether the data is stored natively as RDF or viewed as RDF via middleware. SPARQL contains capabilities for querying required and optional graph patterns along with their conjunctions and disjunctions. SPARQL also supports extensible value testing and constraining queries by source RDF graph.
SRILM is a toolkit for building and applying statistical language models (LMs), primarily for use in speech recognition, statistical tagging and segmentation, and machine translation. It has been under development in the SRI Speech Technology and Research Laboratory since 1995. The toolkit has also greatly benefitted from its use and enhancements during the Johns Hopkins University/CLSP summer workshops in 1995, 1996, 1997, and 2002 (see history).
University of Washington researchers have developed an automated information extraction software engine that mines meaning out of more than 500 million Web pages, contributed by Google, by analyzing fundamental relationships between words. The project expands the scale of the TextRunner application in terms of the number of pages and the breadth of topics it can examine.
The aim of the project is study and develop automated reasoning techniques for both offline and online tasks associated with ontologies, either seen in isolation or as a community of interoperating systems, and devise methodologies for the deployment of such techniques, on the one hand in advanced tools supporting ontology design and management, and on the other hand in applications supporting software agents in operating with ontologies.
We present a visualization of all the nouns in the English language arranged by semantic meaning. Each of the tiles in the mosaic is an arithmetic average of images relating to one of 53,464 nouns. The images for each word were obtained using Google's Image Search and other engines. A total of 7,527,697 images were used, each tile being the average of 140 images. The average reveals the dominant visual characteristics of each word. For some, the average turns out to be a recognizable image; for others the average is a colored blob.
Powered by Drupal and Drupal Theme created with Artisteer by Greg Placencia.