With the rapid progress in IP telephony or mobile phone technology, the standardization work to transmit speech signal with the same frequency bandwidth (14 kHz) and format (Stereo) as music signal is achieved in ITU-T or 3GPP/ETSI. To answer the requests to speech samples to be applicable to the performance tests for such purpose, we released new multi-lingual speech database to public.
Benefits / Features
The fundamental recording procedures comply with ITU-T Rec. P.800, but pair of high quality studio condenser microphone and 48 kHz sampling rate ensure the spectrum up to 20 kHz and sufficient signal-to-noise ratio stereo-phonic recording. Six languages (Japanese, North American (NA) English, British English, French, German, and Chinese (Mandarin) )are available. More than 100 sentences are spoken by each speaker.
Specifications / Details
All speech samples are recorded using digital equipment, with all recording complying with ITU-T Recommendation P.800. All recording is done using a pair of studio condenser microphone for assuring flat response up to 14 kHz. Puff noise is prevented through a acoustic screen.
A total of 3 male and 3 female native speakers for five languages except NA-English, and 4 male and 4 female native speakers for NA-English are employed. In Japanese, one of male and female voice actors are included. They are given no special instructions regarding utterances, and their native accent is preserved unless it results in readings or lexical meanings that are incorrect.
Each speech sample contains one short sentence spoken in 4-second time slot. Almost all sentences within 120 (for NA-English) or 100 (for other languages) are same for each speaker, but some minor differences due to the removal of samples with incorrect utterances or extraneous noise. Merging the files enable users to make speech samples comprising sentence pairs and time structure as given in ITU-T Recommendation P.800 and the subjective test plan of 3GPP.
Speech data are sampled at 16-bit and 48kHz rates and formed into Windows WAVE files. The active power level of each sample is normalized to -26 dBov according to the ITU-T Rec.P.56 algorithm.
|Super Wideband StereoSpeech Database||Please contact us.|