Contact|Site Map|Help||Japanese

Super Wideband Stereo Speech Database

With the rapid progress in IP telephony or mobile phone technology, the standardization work to transmit speech signal with the same frequency bandwidth (14 kHz) and format (Stereo) as music signal is achieved in ITU-T or 3GPP/ETSI. To answer the requests to speech samples to be applicable to the performance tests for such purpose, we released new multi-lingual speech database to public.

Feature

The fundamental recording procedures comply with ITU-T Rec. P.800, but pair of high quality studio condenser microphone and 48 kHz sampling rate ensure the spectrum up to 20 kHz and sufficient signal-to-noise ratio stereo-phonic recording. Six languages (Japanese, North American (NA) English, British English, French, German, and Chinese (Mandarin) )are available. More than 100 sentences are spoken by each speaker.

Price

  • For any set of two languages excluding North American English, price is 200,000JPY
  • For any set of two languages including North American English, price is 250,000 JPY
  • For set of all six languages, price is 500,000JPY

Note: All prices are inclusive delivery charge, but clients is requested to pay domestic tax or customs duty by themselves. A 5% sales tax will be added for shipments within Japan.



Specification

Recording

All speech samples are recorded using digital equipment, with all recording complying with ITU-T Recommendation P.800. All recording is done using a pair of studio condenser microphone for assuring flat response up to 14 kHz. Puff noise is prevented through a acoustic screen.

Speaker

A total of 3 male and 3 female native speakers for five languages except NA-English, and 4 male and 4 female native speakers for NA-English are employed. In Japanese, one of male and female voice actors are included. They are given no special instructions regarding utterances, and their native accent is preserved unless it results in readings or lexical meanings that are incorrect.

Sentences

Each speech sample contains one short sentence spoken in 4-second time slot. Almost all sentences within 120 (for NA-English) or 100 (for other languages) are same for each speaker, but some minor differences due to the removal of samples with incorrect utterances or extraneous noise. Merging the files enable users to make speech samples comprising sentence pairs and time structure as given in ITU-T Recommendation P.800 and the subjective test plan of 3GPP.

Media

Speech data are sampled at 16-bit and 48kHz rates and formed into Windows WAVE files. All files are stored in ISO9660-formatted CD-ROMs. The active power level of each sample is normalized to -26 dBov according to the ITU-T Rec.P.56 algorithm.



Security Policy|Privacy Policy|Copyright & Link