It would appear that evolution is inbuilt in nature since the present natural world one sees today is a result of millions of years of development from a primitive life form. The human being too has evolved from whatever species to be called the homo sapiens of today. This inbuilt evolution among human beings has also resulted in the evolution of whatever the race is associated with. For example, some ancient genius invented the wheel and it has now evolved into the sophisticated wheels and tyres seen today on automobiles, planes, and trains. Man has the capability invent and evolve those things continuously until the product becomes redundant. Information technology too has been invented and evolved into highly sophisticated systems that can compute and store date in terabytes. Man’s quest to soar like a bird has ultimately resulted in the invention and evolution of airplanes (and micro-lites and helicopters). Similarly, the quest to make a machine that can talk and converse intelligently with humans have also been the subject of exploration and evolution. This paper attempts to study the history of the talking computer, its evolution, the present scenario, and the future of the technology. Though considerable progress has been made, there is still a long way to go. It has to be seen whether a computer that can fool a man into thinking that he or she is conversing with a real person (without coming face to face) is possible. The paper will follow a time line starting from 1968 to 1988 and from 1988 to 2008. This would be the ultimate test that the talking computer has arrived.
specifically for you
for only $16.05 $11/page
Talking computers/machines and science fiction
Jules Verne was one of the most celebrated and famous of early science fiction writers who had the ability to predict the arrival of submarines and space travel. His counterpart in the world of information technology would be Arthur C Clark who had conjured up the perfect talking machine in the film 2001 – A Space Odyssey. This landmark film made in 1968 was directed by Stanley Kubrick (screenplay jointly written by the director and Arthur C Clark) was based on Clark’s short story titled The Sentinel. A book having the same name was published later. The film was also technically stunning with realistic shots of spaceships and visuals of travelling in space. “Viewers are left to experience the non-verbal, mystical vastness of the film, and to subjectively reach into their own subconscious and into the film’s pure imagery to speculate about its meaning” (2001: A Space Odyssey, Tim Dirks, Greatest films, TOP 100). The film also features an intelligent talking computer called HAL that had a soft human like voice. “The name HAL is an amalgam of “heuristic” and “algorithmic”, the two main processes of learning”. (Fast facts, 2001: A Space Odyssey). Talking computers have since been seen in many such films like the Star Wars movies and its sequels. Forty years later, technology still has not matched the one found in the fictitious HAL both in terms of artificial intelligence and quality of voice produced.
Talking computers and machines – The early years (Pre 1968)
There have been early references in literature about talking machines even during the latter half of the eighteenth century. C.G. Kratzenstein had built a machine that could produce the sound of the vowels in 1779 by blowing air through holes. The man is also credited as the inventor of the mouthorgan. (58, Page 24, Notes and references, Robert Thomas Beyer, Sounds of our Times, Published by Springer). Many devices that used the same principles followed this particular invention. During the electrical and electronic ages many machines that could talk (actually reproduce sound) were also invented. They include all the audio devices that were invented down the ages like the TV, the phonograph and the multimedia computer. But these were just storage devices that could store and reproduce very realistically the pre-recorded voices of humans and other animate and inanimate objects. But machines that could generate their own sound was another matter altogether because it was extremely challenging to reproduce the technology of the human voice box on a machine. Mechanical devices like the Voder could also generate on their own, sounds that can be distantly similar to the human voice, but all of them needed to be mechanically operated. The end result was the quality (however poor) was dependent on the skill of the operator and hence was not considered to be self generated.
One of the earliest instances of talking machines is the telephone and the phonograph. “At the turn of the century, two devices could convert an acoustic speech signal (air vibrations) into an electrical signal by a microphone and change an electrical signal back to an acoustic signal through a loud speaker”. (Electroacoustic Models, Joseph P Olive, “The Talking Computer”: Text to Speech Synthesis, HAL’s Legacy, Edited by David G Stork, The MIT Press, 1996). A light was directed at the fifty concentric circles and the light that passed though it was projected on to a transparent spectrogram. The light that passed though this spectrogram was made into sound again using a photoelectric cell and then amplified into a loudspeaker. Even though this device did produce sound, it was considered to be too complicated for practical use. Moreover this machine also used pre-recorded sounds (converted into a spectrograph) and cannot be considered to be a speech generator as such. Within a short while it was found that machines can machines can generate speech if it was possible to manipulate resonances in sound. Storing words and speech segments was another way researchers tried to make a machine speak. But this was applicable to only certain areas. An example would be the announcement of a train arrival onto the platform.
Talking computer – 1968 to 1988
Developments and research in the phonetics deign this time resulted in great advancement in the field during this period. A notable example would be the rules of speech synthesizing developed by D H Klatt, this time using a digitized version of the electroacoustic synthesizer. Great advancement were also made in synthesizing of audio stored as data on computers and other storage devices. Two types of synthesizing processes were developed one being stored waveforms and the other mathematical conversion of the data into speech. These two parameters were referred to the linear predicative coding or LPCs. Faster computational speeds enabled smoother flow between words making the listening experience less jarring.
Joseph P Olive on whose article this paper is mainly based is an expert in the field of speech synthesizing. The LPC technology also made the sound produced of a very high quality even though it sounded a bit mechanical. The use of computer related phonetic notations also began to be common. Schwa and phonemes used in phonetics also began to be incorporated more and more in speech synthesizing. The technical ability to read text (till now it was synthetic reproduction of stored audio) was also developed during this time. Even though the computer could clearly voice the words, figures etc it had difficulty in intonating the sentence correctly. The machine also cannot understand the meaning of the sentence that it is voicing. This caused difficulty in understanding the correct meaning intended by the author. Mr. Olive gives an instance quoting a sentence from HAL. The sentence that HAL speaks about human error in the antenna is give here. The exact words were “this sort of thing has cropped up before”. The meaning will be clear only if the words are grouped as ‘this sort of thing’, ‘has cropped up’ and before. Using any other grouping of words will make the meaning confusing. Stressing or intonation will also create problems in text to speech synthesizing. The example given is the HAL dialogue “I enjoy working with people”. HAL could use correct intonations according to the situation. But this is not possible with the current technology. If the sentence is programmed to stress I it would give the meaning that the computer enjoys working with people while the people may not enjoy it. If working is stresses, it will mean that the speaker enjoys working rather than playing. It would be extremely difficult to program the computer to use the correct intonation according to the circumstance since it cannot understand the situation itself. All that can be done is to make it say the sentence in monotone without any intonation and let the listener grasp the meaning. This will be fine in the case of simple situations like announcing train timing or billing information, where intonations will be fixed in every case. But in complex situations which required intonations to make the meaning clear, the technology is not developed enough. Even more complex will be the stressing given within a word. HAL was able to do both the above according to the situation. For computers in real life, this is only possible until the computer understands the emotions and feelings of a given situation. In this connection, the different accents heard in a single language are also a problem. A human voice gives a different pitch when it is angry and another pitch when it is sad. Maybe making the computer analyse the pitch the pitch of the human voice is a step in the right direction.
Talking computers – 1988 to 2008
Apart from improving the quality of the synthesized voice and more seamless movement from one word to another, no significant solutions to the above mentioned challenges have happened even now. In other words, understanding emotion, intonating the same sentence according to the emotion, and feeling etc are still not practical. Differentiating between dates and numbers are also difficult unless it is provided in a format it is programmed to understand. Modern day robotics can understand certain pitches in the human voice, but too much variation and the capability is lost.
100% original paper
on any topic
done in as little as
Future of talking computers: Literacy will be unnecessary by 2050, claims an expert on information technology, computers will respond to our voices and tell us what we want to know. This was the opinion of William Crossman. The voice-in/voice-out (VIVO) computer will be the last nail in written language’s coffin. By enabling us to access stored information orally-aurally, talking computers will finally make it possible for us to replace all written language with spoken language. We will be able to store and retrieve information simply by talking, listening, and looking at graphics, not at text. With this giant step forward into the past, we’re about to recreate oral culture on a more efficient and reliable technological foundation.
It continues as by 2050, if large numbers of students have been able to gain access to talking computers, all this negativity and failure concerning writing and reading will be a distant memory. All education in the electronically developed countries will be oral-aural and non-text visual. Students will use talking computers with optional monitors displaying icons, graphics, and visuals to store and retrieve information.
Instead of the “three R’s” of reading, ‘riting, and ‘rithmetic, students will focus on the “four C’s”: critical thinking, creative thinking, comp-speak, and calculators. I call it VIVO-lutionary learning.
We won’t have to wait until 2050. By 2005, when Jenny is assigned to write an essay, she will be able to speak it into a VIVO computer, use VIVO’s grammar-check to organize and correct it, “proofread” it by listening to VIVO repeat it, print it out, and submit it to the teacher for a grade.
The polymorphic sound that is generated and used in telephones, and other public interfaces will be changed to the talking computers. The future is going to be a revolution in the field of talking computers thus economy to the organisation and better service to the public.
The development in talking computers between 1998 – 2008:. This drawback is overcome to a certain extent with advances in Artificial Intelligence. The mixing up of AI with the software to produce the sound makes the computer to talk. The software, for example, Alice talker version 1.4, is a Swing-based Java client application which enables the user to interact via spoken words and synthesized speech with a HTTP interface server running on the same or a remote machine. It uses Cloudgarden’s implementation of Sun’s Java Speech API.
Automated dialogue systems delivered over the telephone offer a promising approach to delivering health-related interventions to populations of individuals at low-cost (Journal of bioinformatics, 2006).
Again, the development of WT-5 talking robot having vocal cords and lips based on human biological structures. The thermoplastic rubber has a similar elasticity to human tissue. The vocal cords vibrated like those of human vocal cords. The lips acted similar to the human lips. This helped it to pronounce vowels well and could produce voices closer to the human.
HAL of the 2001, A Space Odyssey
As mentioned earlier, the fictional technology that has been used in the creation of the humanlike HAL has not been reproduced even in the year 2008 and it appears that it will not be the case in the near future. While technology may have produced computers to converse intelligently with other humans, HAL has the ability for rational thinking and even feelings (of elation, motivation etc) that are felt by human beings. In other words this is the level of artificial intelligence has become the holy grail of computer scientists, something that can be conceived, but cannot be proved and made (till date). HAL is virtually human except for the fact that it was artificially made and that its internal structures consist of computers and other electronic circuits. In other ways it resembles a human in the sense that it can see, hear, think analyse and feel emotions. A series of dialogues between HAL and other humans will prove this point. All the dialogues have been sourced or rephrased from the Internet Movie Database site. When an antenna is malfunctioning on the mother ship in which HAL is situated, the computer blames human error. HAL sees itself as a human being or at least someone or something with a human mind. In the movie its circuits become damaged just like damages inside the brain of a human being. Dave Bowman finds that HAL has killed his crew. HAL then tells Bowman that he should not be upset, but should sit down and think things over and maybe even take a pill to calm his mind. The highlight of HAL’s dialogue is when Bowman manages to shut down the supercomputer. Hal tells Bowman that “I’m afraid. I’m afraid, Dave. Dave, my mind is going. I can feel it. I can feel it. My mind is going. There is no question about it. I can feel it. I can feel it. I can feel it. I’m a… fraid”. (HAL, HAL’s shutdown, Memorable quotes for 2001: A Space Odyssey, IMBD – The Internet Movie Database). Reactions about the film ranges from boring and not comprehensible on one extreme to stunning and thought provoking on the other. Whatever critics and movie goers may say about the movie, the level of (artificial) intelligence attributed HAL has never been done at that point of time (1968) both in fiction and in real life. In that sense, HAL is a path breaking concept, something which human beings will take a long time to reproduce in real life.
The literature studied so far has revealed that the level of fictitious technology used in HAL is way ahead of what is available even in 2008. The software of today can produce clear sound that is quite close to the human voice. It can also mimic different pitches in sounds like that of a young or old lady, a boy and a girl, an old man and a young man. It can also understand pauses where punctuation marks have been included. But understanding emotions is still way out of reach for computers today. Also, changing intonations, stress on words and within words for a single sentence in different circumstances is also not possible. It cannot also understand number accurately. For example, the computer cannot differentiate whether 1984 is a date or just a number. At present, the human beings who enter text that need to be synthesized will have to be careful in providing correct formats and punctuations in a way that the computer is programmed to understand. The same method has to be followed each time text is entered. HAL must have so revolutionary to the viewers of 1968 that they could not grasp the significance of the creation. It must be remembered that computers in 1968 were the prerogative of a chosen few who were qualified technically to handle the difficult commands and processes needed to make it work. Viewers back then would have thought that talking machines were something easy for computers to do. It would have been like saying that the elephant is as big as a football field to a person who has never seen even the picture of an elephant in his life. He would easily accept it and picture an unnaturally huge elephant. The difficulty became apparent only when computers became common and the lack of adaptability or versatility of today’s talking computers were felt first hand by the users. But the current level of synthesized speech usage is just for mundane tasks like announcements or for reading out a text. Even these features are mainly useful more for people who are visually impaired. In other words the use of synthesized voice is very limited when it comes to the common man. Otherwise, one would see people using the talking capability of computers more often. Until some new technology or a paradigm shift happens in technology, the talking computer HAL will remain unmatched in artificial intelligence for years to come. A day may come when a person who is feeling bored may switch on his PC and start an intelligent and engaging conversation. A day might come when a computer can pacify an angry person and or act as a counsellor to a troubled mind. This technology at present is the stuff of science fiction and not reality.