Speech Synthesis
........ À½¼ºÇÕ¼º (Speech Syethesis) ´Â Àΰ£ÀÇ ¸» (speech) À» ÀΰøÀûÀ¸·Î ¸¸µå´Â °ÍÀÌ´Ù. ±×·± ½Ã½ºÅÛÀ» speech synthesizer ¶ó ÇÏ°í ¼ÒÇÁÆ®¿þ¾î³ª Çϵå¿þ¾î·Î ±¸ÇöµÈ´Ù. À½¼º ¼º ÇÁ·Î±×·¥Àº ¹®¼¸¦ ÀÔ·ÂÇÏ¿© (written input) ÀÚµ¿ÀûÀ¸·Î »ý¼ºµÇ´Â ÇÕ¼º À½¼ºÀ¸·Î º¯È¯ÇÏ¿© Ãâ·ÂÇÏ´Â (spoken output) °ÍÀÌ´Ù. ±×·¡¼ À½¼ºÇÕ¼ºÀº °¡²û "Text-to-Speech" º¯È¯ (TTS) ·Î ºÒ¸®¿öÁø´Ù .......
À½¼º ÇÕ¼ºÀ̶õ ±â°èÀûÀÎ ÀåÄ¡³ª ÀüÀÚȸ·Î ¶Ç´Â ÄÄÇ»ÅÍ ¸ðÀǸ¦ ÀÌ¿ëÇÏ¿© ÀÚµ¿À¸·Î À½¼º ÆÄÇüÀ» »ý¼ºÇس»´Â °ÍÀ¸·Î Á¤ÀÇÇÒ ¼ö ÀÖ´Ù. À½¼º ÇÕ¼º¿¡ ´ëÇÑ ¿¬±¸´Â ´Ù¸¥ À½¼º¿¡ °ü·Ã ±â¼úµéº¸´Ù °¡Àå ¸ÕÀú ¿¬±¸µÈ ±â¼úÀÌ´Ù. ÃʱâÀÇ À½¼º ÇÕ¼º¿¡ ´ëÇÑ ¿¬±¸´Â ´ëºÎºÐ ±â°èÀû ¶Ç´Â ÀüÀÚȸ·Î¸¦ ÀÌ¿ëÇÏ¿© Àΰ£ÀÇ ¹ß¼º±â°üÀ» ¸ðÀÇÇÏ´Â °ÍÀ̾ú´Ù. Àΰ£ÀÇ ¹ß¼º±â°üÀ» ¸ðµ¨¸µÇÏ´Â °ÍÀº ¾ÆÁ÷±îÁöµµ À½¼º ÇÕ¼º ¿¬±¸¿¡ ±Ã±ØÀûÀÎ ¸ñÇ¥·Î ³²¾ÆÀÖÁö¸¸, ÄÄÇ»ÅÍÀÇ ¿¬»ê ¼Óµµ ¹× ±â¾ï¿ë·®ÀÌ ±Þ¼ÓÈ÷ ¹ßÀüÇÏ¸é¼ À½¼º ÇÕ¼º¿¡ ´ëÇÑ ¿¬±¸´Â ´Ü¼øÈ÷ Àΰ£ÀÇ ¹ß¼º±â°ü ¸ðµ¨¸µ¿¡ ±×Ä¡Áö ¾Ê°í ¹®¼Ã³¸® ±â¼úÀ» Æ÷ÇÔÇÑ ¹®¼-À½¼º º¯È¯ ±â¼ú·Î È®ÀåµÇ¾ú´Ù. À½¼º ÇÕ¼º¿¡ ÀÇÇØ ¸Þ½ÃÁö¸¦ Àü´ÞÇÏ´Â °æ¿ì¿¡ ´ÙÀ½°ú °°Àº ÀÌÁ¡ÀÌ ÀÖ´Ù.
¨ç Ưº°ÇÑ ÁÖÀdzª ÈÆ·Ã¾øÀÌ ´©±¸¶óµµ ½±°Ô ³»¿ëÀ»
ÀÌÇØÇÒ ¼ö ÀÖ´Ù.
¨è À̵¿ÁßÀ̰ųª ÀÛ¾÷Áß¿¡µµ µéÀ» ¼ö ÀÖ¾î¼ Æ¯º°È÷ ±Í±â¿ïÀ̰í
ÀÖÁö ¾Ê¾Æµµ ¾ðÁ¦µçÁö Á¤º¸¸¦ Àü´ÞÇÒ ¼ö ÀÖ´Ù.
¨é Ưº°ÇÑ ÀåÄ¡°¡ ÇÊ¿ä¾ø°í
ÀüȱⰡ ±×´ë·Î »ç¿ë°¡´ÉÇÏ¿© °æÁ¦ÀûÀÌ¸ç ¸Õ°÷¿¡µµ ¼Õ½±°Ô ÀüÇÒ ¼ö ÀÖ´Ù.
¨ê
Á¾À̰¡ ÇÊ¿ä¾ø´Ù.
À½¼º ÇÕ¼º ±â¼úÀº ½ÇÁ¦ ÀÀ¿ë ¹æ½Ä¿¡ µû¶ó Å©°Ô µÎ °¡Áö·Î ±¸ºÐµÉ ¼ö ÀÖ´Ù. Á¦ÇÑµÈ ¾îÈÖ °³¼ö¿Í ±¸¹®±¸Á¶ÀÇ ¹®À常À» ÇÕ¼ºÇÏ´Â Á¦ÇÑ ¾îÈÖ ÇÕ¼º ¶Ç´Â ÀÚµ¿À½¼ºÀÀ´ä ½Ã½ºÅÛ (ARS ; Automatic Response System) °ú ÀÓÀÇÀÇ ¹®ÀåÀ» ÀÔ·Â¹Þ¾Æ À½¼º ÇÕ¼ºÇÏ´Â ¹«Á¦ÇÑ ¾îÈÖ ÇÕ¼º ¶Ç´Â ¹®¼-À½¼º º¯È¯ (TTS ; Text-to-Speech) ½Ã½ºÅÛÀÌ ÀÖ´Ù. ............ (¿À¿µÈ¯ 1998)
term :
¾ð¾î (Speech) À½¼ºÀÎ½Ä (Speech Recognition) À½¼ºÇÕ¼º (Speech Systhesis) À½¼ºÀÌÇØ (Speech Understanding) (Understanding) ÀÚ¿¬¾îÀÌÇØ (Natural Language Understanding) ÀÚ¿¬¾îó¸® (Natural Language Processing) ÀΰøÁö´É (Artificial Intelligence)
site :
À½¼º ÇÕ¼ºÀÇ FAQ : CMU, Andrew Hunt À½¼ºÇÕ¼º °ü·Ã web page
Bell lab ÀÇ test to speech systhesis ¿Í overview ¿Í demo
paper :
À½¼º»ý¼º : Peter Denes. Elliot Pinson
À½¼ºÇÕ¼º±â¼ú °³¹ßÀÇ ÇöȲ°ú °úÁ¦ : À̾çÈñ, ´ëÇÑÀ½¼ºÇÐȸ, 1994
À½¼ºÀÎ½Ä ¹× ÇÕ¼º±â¼úÀÇ ÇöȲ°ú Àü¸Á : ¿À¿µÈ¯, ¿µ³²´ë Â÷¼¼´ë Á¤º¸Åë½Å ±¹Á¦Çмú ½ÉÆ÷Áö¿ò, 2000
À½¼ºÀνİú À½¼ºÇÕ¼º¿¡ ÀÖ¾î¼ÀÇ À½¼ºÇаú À½¿î·ÐÀÇ ¿ªÇÒ : ±è±âÈ£, ´ëÇÑÀ½¼ºÇÐȸ, 1994
ÀÎÅÍ³Ý À¥ÆäÀÌÁöÀÇ À½¼ºÇÕ¼ºÀ» À§ÇÑ ¿£Áø ¹× Ç÷¯±×-ÀÎ ¼³°è ¹× ±¸Çö (Design and Implementation of a Speech Synthesis Engine and a Plug - in for Internet Web Page) : ÀÌÈñ¸¸, ±èÁö¿µ, Çѱ¹Á¤º¸Ã³¸®ÇÐȸ, 2000
ŰÇÁ·¹ÀÓ ¾ó±¼¿µ»óÀ» ÀÌ¿ëÇÑ ½Ãû°¢ À½¼ºÇÕ¼º ½Ã½ºÅÛ ±¸Çö (Implementation of Text-to-Audio Visual Speech Synthesis Using Key Frames of Face Images) : ±èÁø¿µ, ±è¸í°ï, ¹é¼ºÁØ, ´ëÇÑÀ½¼ºÇÐȸ, 2002
ÆÛÁö º¤ÅÍ ¾çÀÚȱ⠻ç»óÈ¿Í ½Å°æ¸Á¿¡ ÀÇÇÑ ÈÀÚÀûÀÀ À½¼ºÇÕ¼º (Speaker-Adaptive Speech Synthesis based on Fuzzy Vector Quantizer Mapping and Neural Networks) : À̱¤Çü, ÀÌÁøÀÌ, Çѱ¹Á¤º¸Ã³¸®ÇÐȸ, 1997
Çѱ¹¾î À½¼ºÇÕ¼º¿¡¼ À½¿î Áö¼Ó½Ã°£ ¸ðµ¨È (Segmental duration modeling for Korean text-to-speech synthesis) : À̾çÈñ, ´ëÇÑÀ½¼ºÇÐȸ, 1996