AINA is an initiative based on artificial intelligence (AI) promoted by the Department of Vice-Presidency and Digital and Territorial Policies and the Barcelona Supercomputing Center (BSC-CNS) in 2020 to get machines to understand, speak Catalan in 2022 and be able to have a smooth, natural conversation with people. Three million euros will be allocated to the project and the collection of voices for the Mozilla Foundation 's Common Voice corpus will be enhanced, which has recently exceeded 1,500 hours recorded in Catalan and is expected to reach 2,000 hours before the end of the project. year.
Although the textual corpus of Catalan already exceeds 10 GB and the voice corpus of 25 GB, the figures are still far from languages such as English, the largest corpus with more than 825 GB of data. In addition, there is a lack of variety , with 76% of the Common Voice voices corresponding to the central dialect of Catalan and there is a lack of female presence since 63% of them correspond to men.
How can you participate?
In order to get the public to join the project, the Government has launched the "Our language is your voice" campaign and specific actions will be taken in the territory to get the participation of the variants with fewer samples. You can collaborate with the AINA project by validating the voice cuts , writing the sentences that will be incorporated into the corpus or by validating written sentences . You will find more information at this link .
In this video you will find the public presentation of the project: