Searching interoperability between Linguistic Coding and Ontologies for Language Description: Language Acquisition Data.

Barbara C. Lust
Suzanne Flynn
Jon Corson-Rikert
Brian Lowe
Maria Blume, University of Texas at El Paso


In this paper we will present an overview of a project which has been developing over the last several years, centered at Cornell University, but integrating several national and international institutions (MIT, Rutgers/New Brunswick, Rutgers/Newark, California State University, Southern Illinois University, City University of New York, Columbia University and several sites in India, Taiwan and Peru for example). Funded by planning grants from programs at NSF, and working with collaboration of Cornell’s Albert R. Mann Library, this project is now building an infrastructure for the shared collection, representation, preservation, access and dissemination of large amounts of cross-linguistic data in the field of language acquisition. This project involves the creation of materials at several levels: (i) best practice manuals for scientific research in the field; (ii) software for the mark up of metadata and data and their seamless integration in a coherent relational database; (iii) a multi-level Web-based ontology tool for managing diverse inter-disciplinary resources within and beyond the university library, as a platform for disseminating metadata about, and ultimately access to, detailed linguistic resources.We will begin to investigate the possibility for integration with current developments in GOLD (General Ontology for Linguistic Description). At first glance, realizing this potential appears to require developing interoperability with upper level ontologies (ULO) as well as with lower level ontologies (LLO). For example, with regard to LLO, our current coding of language attempts to verify and calibrate metadata and data ranging from subject to session to utterance transcription. Below that it begins to code linguistic elements in a manner which allows comparability across widely varying language data (English, Romance Languages, Hebrew, and several East Asian and South Asian languages) and across widely varying language acquisition stages from initial to adult state. In particular, we are now attempting to develop morphological coding in this system. For this challenging process, we will appreciate the independent developments in GOLD for morphological markup, and seek formats which can link to universal and standardized annotation systems at this critical morphological level. At each level (i-iii above) we will articulate both promises and problems involved in current work, in the possible integration with GOLD, and in linking cross-linguistic language acquisition data to the general purpose of making “our combined knowledge of the world’s languages fully accessible and interoperable.”