the Local Language Speech Technology Initiative
 
Home
Vision
Approach
Languages

 Database

 Sponsorship
Demos
Downloads
Documents
Partners
News Archive
Contact Us
 
Home > Languages > Database
 

TTS-related Database

The TTS-related language database has formed the foundation of the local language TTS program. It encapsulates our knowledge of languages and scripts used worldwide, and specifically identifies the language and script features which add complexity when building TTS systems.
 
The database can be searched by either language features or script features, or both:
 
Language name:    Script name:    
 
You can search using many other features with an advanced search
 
From these language and script features we can derive an approximate complexity score for any language the project is considering developing. The procedure is not automatic as the assignment of scores is to a certain extent subjective. Some of the languages we have scored, and the basis for the scoring, are shown here.
 
The database is neither complete nor 100% accurate. There is a distinct lack of data for some of the required linguistic information - in particular: place of lexical stress, place of secondary stress (if any) and presence/absence of homographs. The identification of lexical and secondary stress in certain languages appears to need more research work. In fact we are finding that the process of producing a TTS system for a language is in itself a very effective way of refining our understanding of the langauge. We touch on this in our paper Issues in Porting TTS to Minority Languages
 
There are three languages in the database that can be written in two scripts and from the TTS development point of view these are considered to be different entries (these languages are Malay, Panjabi and Sindhi).
Your feedback on the data is very welcome. Please write your comments, suggestions or corrections to
Ksenia Shalonova or Roger Tucker
 

References

1. A great deal of information about language chatacteristics on the main levels such as phonology/morphology/syntax was taken from The Rosetta Project
2. Some information about the place of lexical stress was taken from StressTyp Database


Top

 

 

© Local Language Speech Technology Initiative. All Rights Reserved.