Language Documentation

Language diversity, its documentation and analysis, have always interested linguists around the world, especially those working on language typology. However, the beginning of language documentation as it is known today started in the 1990s. Several factors contributed to the emergence of this "new" linguistic discipline. First of all, technological developments which enabled the recording, processing, and storage of large amounts of linguistic data with high quality portable devices and fewer storage necessities (i.e. by more efficient codecs) opened up new perspectives and possibilities for the work in the field, in and with the language communities. On the other hand, the interest in linguistic diversity and more specifically in endangered languages spread beyond the academic world and became a public issue, mainly through the continuous reports on the subject published by the press and well-known institutions, such as UNESCO with its Atlas of World’s Languages in Danger. This media coverage also contributed to the rise of financial support for the documentation and research of undocumented or poorly documented languages. Additionally, the need to standardize the study and documentation of endangered languages became a subject in academic discussions.

In this context, documentary linguistics imposed itself with the aim of developing a "lasting, multipurpose record of a language" [1]. The collection, distribution, and preservation of primary data of a variety of communicative events, i.e. real situations of language use in several contexts, emphasize the difference between documentary linguistics and descriptive linguistics. In this sense, primary data include not only notes (elicited or not) taken by linguists during the work with the language community, but also, and above all, audio and video recordings, as well as photos and text collections. The data is normally transcribed, translated, and it should also be annotated. This task requires linguistic annotations (i.e. morpho-syntactic, semantic, pragmatic, and/or phonetic annotations,) as well as a broad range of non-linguistic annotations (i.e. anthropological, sociolinguistic, musical, gestural, etc. annotations) whenever possible and/or if important to the language community. Even if the researcher does not develop a full annotation in the way described before, the fact of making primary data available presents the advantage that researchers from the same or from other disciplines can use the data for their own purpose and complement it with their own annotations.

Typical end products of language documentation projects are:

  • Multimedia corpora (with audio, video, photos, and annotations) properly archived;
  • Dictionaries (frequently multimedia dictionaries);
  • Sketch grammars of the documented language where the main characteristics of its grammatical system are described and which serve as a kind of user manual for the created corpus. The data included in the grammar should be entirely extracted from the collected data.

This new perspective on collecting, analyzing and distributing linguistic data brought by documentary linguistics has proven to be a very important step towards interdisciplinary research in Humanities and towards the improvement of accountability of linguistic research results.

[1] Himmelmann, Nikolaus 2006. Language documentation: What is it and what is it good for?. In Gippert, Jost, Himmelmann, Nikolaus, Mosel, Ulrike (eds.). Essentials of Language Documentation. Berlin, New York: Mouton de Gruyter, 1-30.

Language Typology

Language typology is a subfield of linguistics that compares and classifies languages according to linguistic parameters. One differentiates between genetic, geographic and typological classifications. While the genetic classification concentrates on the genealogical affiliation of languages (language families) and their relation to and evolution from a possible proto-language, the geographic classification focusses on the similarities between languages that are geographically close but not genetically related (and if so, only distantly). Those languages might show similarities due to language contact and cultural relationship or specific areal features that arose from convergence and that depart them structurally from other genetically related languages, sometimes building what is known as Sprachbund (area of linguistic convergence). On the other hand, the typological classification attempts to classify languages by their structural types rather than by their (genetic or geographical) relationships. In this sense, there is a distinction between morphological (or classical), phonological and syntactic typology, having correspondingly morpheme arrangement at the word level and marking of morpho-syntactic categories at the sentence level, phonological characteristics and word order types as basic criteria for the classification.

Language documentation delivers the data needed to enrich and enlarge the typological analyses done so far, fostering as well the research on linguistic universals which is closely tied to the study of language typology.

CIDLeS Homepage

Support Us

CIDLeS on Facebook: cidles.eu/fb