Language is man's most important means of communication and speech its primary medium. Speech provides an international forum for communication among researchers in the disciplines that contribute to our understanding of the production, perception, processing, learning and use. Here we deal with this interaction between the man and machine through synthesis and recognition applications.
The article on the speech technology and conversion of speech into analog and digital waveforms which is understood by the machines . Speech recognition, or speech-to-text, involves capturing and digitizing the sound waves, converting them to basic language units or phonemes, constructing words from phonemes, and contextually analyzing the words to ensure correct spelling for words that sound alike.
Speech Recognition is the ability of a computer to recognize general, naturally flowing utterances from a wide variety of users. It recognizes the caller's answers to move along the flow of the call.
We have emphasized on the modeling of speech units and grammar on the basis of Hidden Markov Model. Speech Recognition allows you to provide input to an application with your voice. The applications and limitations on this subject has enlightened us upon the impact of speech processing in our modern technical field.
While there is still much room for improvement, current speech recognition systems have remarkable performance. We are only humans, but as we develop this technology and build remarkable changes we attain certain achievements. Rather than asking what is still deficient, we ask instead what should be done to make it efficient….
SPEECH TECHNOLOGY
- Three primary speech technologies are used in voice processing applications: stored speech, text-to – speech and speech recognition . Stored speech involves the production of computer speech from an actual human voice that is stored in a computer’s memory and used in any of several ways.
- Speech can also be synthesized from plain text in a process known as text-to – speech which also enables voice processing applications to read from textual database.
- Speech recognition is the process of deriving either a textual transcription or some form of meaning from a spoken input.
- Speech analysis can be thought of as that part of voice processing that converts human speech to digital forms suitable for transmission or storage by computers.
- Speech synthesis functions are essentially the inverse of speech analysis – they reconvert speech data from a digital form to one that’s similar to the original recording and suitable for playback.
- Speech analysis processes can also be referred to as a digital speech encoding ( or simply coding) and
About Author
Sachin
CUSAT
SPEECH RECOGNITION
Speech Recognition is the process of deriving either a textual transcription or some form of meaning from a spoken input. Speech recognition is the inverse process of synthesis, conversion of speech to text. The Speech recognition task is complex. This involves the computer taking the user's speech and interpreting what has been said. This allows the user to control the computer (or certain aspects of it) by voice, rather than having to use the mouse and keyboard, or alternatively just dictating the contents of a document. It would be complicated enough if every speaker pronounced every word in an identical manner each time, but this doesn’t happen
POTENTIAL APPLICATIONS FOR SPEECH RECOGNITION
The specific use of speech recognition technology will depend on the application. Some target applications that are good candidates for integrating speech recognition include:
Games and Edutainment
Speech recognition offers game and edutainment developers the potential to bring their applications to a new level of play. With games, for example, traditional computer-based characters could evolve into characters that the user can actually talk to.
While speech recognition enhances the realism and fun in many computer games, it also provides a useful alternative to keyboard-based control, and voice commands provide new freedom for the user in any sort of application, from entertainment to office productivity.
Data Entry
Applications that require users to keyboard paper-based data into the computer (such as database front-ends and spreadsheets) are good candidates for a speech recognition application. Reading data directly to the computer is much easier for most users and can significantly speed up data entry.
While speech recognition technology cannot effectively be used to enter names, it can enter numbers or items selected from a small (less than 100 items) list. Some recognizers can even handle spelling fairly well. If an application has fields with mutually exclusive data types (for example, one field allows "male" or "female", another is for age, and a third is for city), the speech recognition engine can process the command and automatically determine which field to fill in.
Document Editing
This is a scenario in which one or both modes of speech recognition could be used to dramatically improve productivity. Dictation would allow users to dictate entire documents without typing. Command and control would allow users to modify formatting or change views without using the mouse or keyboard. For example, a word processor might provide commands like "bold", "italic", "change to Times New Roman font", "use bullet list text style," and "use 18 point type." A paint package might have "select eraser" or "choose a wider brush."
Limitations
By contrast traditional computer programming techniques make it relatively easy to spot differences, but surprisingly difficult to spot similarity even when the variability is only slight. Much effort is being devoted at the moment to developing techniques which can re-orientate this situation and turn the computer into an efficient pattern spotting device.
Each of the speech technologies of recognition and synthesis have their limitations. These limitations or constraints on speech recognition systems focus on the idea of variability. Overcoming the tendency for asr systems to assign completely different labels to speech signals which a human being would judge to be variants of the same signal has been a major stumbling block in developing the technology. The task has been viewed as one of de-sensitising recognisers to variability. It is not entirely clear that this idea models adequately the parallel process in human speech perception.
Human being are extremely good at spotting similarities between input signals - whether they are speech signals or some other kind of sensory input, like visual signals. The human being is essentially a pattern seeking device, attempting all the while to spot identity rather than difference.
MERITS
The uses of speech technology are wide ranging. Most effort at the moment centers around trying to provide voice input and output for information systems - say, over the telephone network.
A relatively new refinement here is the provision of speech systems for accessing distributed information of the kind presented on the Internet. The idea is to make this information available to people who do not have, or do not want to have, access to screens and keyboards. Essentially researchers are trying to harness the more natural use of speech as a means of direct access to systems which which more normally associated with the technological paraphernalia of computers.
Clearly a major use of the technology is to assist people who are disadvantaged in one way or another with respect to producing or perceiving normal speech.
The eavesdropping potential referred to in the slide is not sinister. It simply means the provision of, say, a speech recognition system for providing an input to a computer when the speaker has their hands engaged on some other task and cannot manipulate a keyboard - for example, a surgeon giving a running commentary on what he or she is doing. Another example might be a car mechanic on his or her back underneath a vehicle interrogating a stores computer as to the availability of a particular spare part.
CONCLUSION
Speech recognition is a truly amazing human capacity, especially when you consider that normal conversation requires the recognition of 10 to 15 phonemes per second. It should be of little surprise then that attempts to make machine (computer) recognition systems have proven difficult. Despite these problems, a variety of systems are becoming available that achieve some success, usually by addressing one or two particular aspects of speech recognition. A variety of speech synthesis systems, on the other hand, have been available for some time now. Though limited in capabilities and generally lacking the ``natural'' quality of human speech, these systems are now a common component in our lives