A Fala Database (1)
Publications (1)
Experience Sharing (2)

A Fala Database

Name:   A Fala_Database_ver01_Sep2020 (Download)


The following database is a result of the proyect: Community-Driven Documentation and Description of A Fala carried out at CIDLeS in cooperation with Technical University of Liberec, Czech Republic. The database contains 225 000 tokens/words documented in 156 texts. The database has been compiled from transcribed recordings, which contributed with 110 315 words (49% od the database), and published and unpublished texts written in one of the varieties of A Fala, which contributed with the remaining 114 690 words (51% of the database). However, due to the copyrights issues 10 of the written texts had to be deleted and for that reason this public version has only 146 text accessible with over 204 000 words.

The objective of the project was to create a database that would reflect both spoken and written aspect of the language, taking into account a variety of factors: equal representation of the three varieties (lagarteiru, mañegu and valverdeñu), participation of both genders (women and man), participation of speakers of different age groups, not only the most elder speakers, and variety of topics to be covered in the interviews ranging from the traditional ones like the local agriculture to European funds and their local usage. The community of speakers contributed into all stages of the database compilation.

Community participation: approx. 175 participants, 4% of the population of the three villages.

Technical requirements

You will need the latest version of FLEx to open the database.

The database is password protected. It is available to everyone, but to get the password, please contact: miroslav.vales@tul.cz

Content specifications


Total tokens/words registered: 110 315
Total number of recordings: 63 (in 37 interwiev sessions)
Total time: 705 min (11hrs 45 min)
Video recordings: 61 (94%)
Audio only recordings: 2 (6%)
Total number of participants: 67 (37 women, 30 men, 20 participants in the position of interwievers with limited participation)

Recordings Lagarteiru

Number of recordings: 16 (in 12 interwiev sessions)
Time: 238 min (3 hrs 58 min)
Tokens/words registered: 38 709
Participants: 22 (12 women, 10 men, 6 in the position of interwievers)

Recordings Mañegu

Number of recordings: 26 (in 12 interwiev sessions)
Time: 248 min (4 hrs 8 min)
Tokens/words registered: 37 703
Participants: 19 (11 women, 8 men, 4 in the position of interwievers)

Recordings Valverdeñu

Number of recordings: 21 (in 13 interwiev sessions)
Time: 219 min (3 hrs 39 min)
Tokens/words registered: 33 903
Participants: 26 (14 women, 12 men, 10 in the position of interwievers)

Written texts

Total tokens/words registered: 114 690
Total number of written texts: 93
Larger texts (books, theatre plays): 6 (49 305 words)
Shorter texts (magazine articles, short stories, etc.): 80 (55 308 words)
Translations: 5 (9 518 words)
Web texts: 1 (408 words)
Public anouncments: 1 (151 words)
Total authors: 71
Authors lagarteiru: 33
Authors mañegu: 12
Authors valverdeñu: 26
Texts not available in the public version of the database: 10 (20 443 words)

FLEx specifications

This is the first version of the FLEx database and for that reason there are sections that has not been completed adecuately yet. This is the case of sociolingüistic information relataed to the participants of the recordings. This information will be inserted into the database in the near future. Also the Lexicon section is under construction and for that reason it will be substantially corrected before the Dictionary publication.

Lexicon - Entry section:

• General note line is used for extended comments on usage as the Usages line only offers pre-defined categories.

• Semantic domains - the only semantic domains that has been marked are related to Animals (1.6), Plants (1.5) and Tools (6.7). The categorization is simplified and it will be matter of futer corrections and completion.

• Restrictions - this line reflects the frecuency of words. It is also a section to be completed.

no mark = frequent words (5 000)

A = less frecuent words (10 000) (not marked yet)
B = rare words (15 000) (not marked yet)
C = very unfrequent words - related to the traditional culture, often unused e.g. corsetería
D = very unfrequent words - related to castellano) e.g. lasaña, paracetamol
E = adverbs in -menti, they will not be part of the dictionary, but they appear in the database
F = words that will not be part of the dictionary or they will be inserted after verification

Go up


VALEŠ, Miroslav. 2020. Recopilación de datos primarios para la descripción y documentación de la lengua. Études romanes de Brno, vol. 41, no. 1, pp. 87-98. ISSN 1803-7399 (print), ISSN 2336-4416 (online).

>>1A - ERB Vales_final <<

Go up

Experience sharing

Course: Sociolinguistics and research methodology

This one semester course (28 hours of lectures and seminars) is compulsory subject for all MA students of Spanish, English or German at the Faculty of Science, Arts and Education, Technical University of Liberec. The course reflects the methodological part of the project as the students will learn how to collect linguistic data, process them (ELAN) and create their own database FLEx.

• Lecture: Project of minority language documentation

Date: 8 October, 2020

The objective of the lecture was to share the experience with coleagues, especially those from language departments (English, German, Romance languages), and to motivate them to apply for their own linguistic projects that would support minority languages or linguistics in general. The methodology of the project was discussed in detail, as well as the outputs.

Go up

CIDLeS Homepage

Support Us

CIDLeS on Facebook: cidles.eu/fb