Biometric modality: Voice – what is it?
A person’s voice – i.e. the way they sound when they speak – is the result of a combination of distinctive physical attributes (such as the length of vocal cords and the shape of the throat) and distinctive behavioural attributes (such as the accent with which a person speaks).
The human voice consists of / creates wave lengths that can be measured. The voice is collected and analysed by software that employs artificial intelligence and machine learning techniques to produce a vast array of data derived from factors such as modulation of speech, tones, accent, frequency etc. These elements enable the system to create a reference template of the voice (known as a ‘voice print’ or ‘voice model) that can be used to authenticate the speaker in subsequent transactions. Similar technology is applied to allow devices to understand, translate and interact with a voice command/question, for example, when talking to smart speakers, mobile devices, domestic appliances, virtual assistants.
NB There is a difference between speaker recognition (recognising who is speaking) in biometric applications and speech recognition (recognising what is being said) e.g. applications such as machine dictation, voice command systems, integrated telephony automation, etc. These two terms are frequently confused, as is voice recognition. In simple terms voice is a synonym for speaker and not speech.