Speaker recognition methods can be divided into text-independent and text-dependent methods. In a independent system, speaker models capture characteristics of somebody's speech, which show up irrespective of what one is saying. This paper is based on independent speaker recognition system and makes use of mel frequency cepstrum coefficients to process the input signal and vector quantization approach to identify the speaker. The above task is implemented using MATLAB.
This report is available in the thread:
Text Independent Speaker Recognition
The main method of speech recognition is to decode the speech signal in a
sequential method based on the observed acoustic features of the signal and known relations between acoustic features and phonetic symbols. The pattern recognition approach to speech recognition is the one in which the speech patterns are directly used and it involves involves the training of speech patterns and recognition of patterns via pattern comparison. the machine
learns which acoustic properties of the speech class are reliable and repeatable across all training tokens of the pattern in the characterization of speech via training pattern.
The performance of the speech recognition systems is given in terms of a word error rate (%) as measured for a specified technology, for a given task, with specified task syntax, in a specified mode, and for a specified word vocabulary. Continuous speech recognition systems find applications in voice repertory
dialer where eyes free, hands free dialing of numbers is possible. Speaker recognition involves the speaker identification to output the identity of the person most likely to have spoken from among a given population or to verify a personâ„¢s identity who he/she claims to be from a given speech input. Automatic speaker verification (ASV) requires a mere comparison between test pattern and one reference template and involves a binary decision of whether to accept or reject a speaker.
Feature based on Cepstrum
The short-time speech spectrum in the case of the speech has two components:
peaks due to the periodicity of voiced speech, and
2) glottal pulse shape.
periodicity of voiced speech is decided by the excitation source. The variations among speakers are
indicated by formant locations and bandwidth.
Speech recognition based on proposed features
This includes features like, training and testing
data, building VQ codebook models for all digits and continuous speeches of speakers and testing each utterance against a certain number of speech models to detect the identity of the speech of that utterance from among the speech models. The speech database used contains isolated digits from TI digits_1 and TI digits_2 in case of isolated digit recognition. This is evaluated by using digits pronounced by other speakers in the database. Speaker independent continuous speech
recognition system is evaluated on training data formed by concatenation of dialect sentences of 24 speakers and test data from 100 speakers in the TIMIT database.