K-means Clustering Algorithm
K-means (MacQueen, 1967) Àº À¯¸íÇÑ ±ºÁýÈ (Clustering) ¹®Á¦¸¦ ÇØ°áÇÏ´Â °¡Àå °£´ÜÇÑ ÀÚÀ²ÇнÀ (Unsupervised Learning) ¾Ë°í¸®ÁòÁß ÇϳªÀÌ´Ù. »çÀü¿¡ Á¤ÇØÁø ¾î¶²¼öÀÇ Å¬·¯½ºÅ͸¦ ÅëÇØ¼ ÁÖ¾îÁø µ¥ÀÌÅÍ ÁýÇÕÀ» ºÐ·ùÇÏ´Â °£´ÜÇÏ°í ½¬¿î ¹æ¹ýÀÌ´Ù. k-means ´Â partitional clustering ¿¡ ¼ÓÇÑ´Ù.
data À̿ܿ¡ cluster ÀÇ ¼ö ¸¦ input À¸·Î Çϸç À̶§
¸¦ seed point ¶ó°í ÇÑ´Ù. seed point ´Â ÀÓÀÇ·Î ¼±ÅÃµÇ¸ç ¹Ù¶÷Á÷ÇÑ cluster
±¸Á¶¿¡ °üÇÑ ¾î¶² Áö½ÄµéÀÌ seed point¸¦ ¼±ÅÃÇϴµ¥¿¡ »ç¿ëµÉ ¼ö ÀÖ´Ù. Forgy' algorithm °ú ´Ù¸¥Á¡Àº ÇϳªÀÇ
sample ÀÌ ÇϳªÀÇ cluster ¿¡ ÇÕ·ùÇÏÀÚ¸¶ÀÚ °ð cluster ÀÇ centroid °¡ ´Ù½Ã °è»êµÈ´Ù´Â
°ÍÀÌ´Ù. ¶ÇÇÑ Forgy' algorithm ÀÌ ¹Ýº¹Àû(iterative) ÇÑ ¹Ý¸é¿¡
-means algorithm Àº data set¿¡¼ ´ÜÁö µÎ ¹ø¸¸ÀÇ pass °¡ ÀÌ·ç¾îÁø´Ù. ±× °úÁ¤Àº
´ÙÀ½°ú °°´Ù.
1. óÀ½¿¡ cluster ·Î¼ ½ÃÀÛÇÑ´Ù. ³²¾ÆÀÖ´Â
sampleµé¿¡ ´ëÇØ¼´Â °¡Àå °¡±îÀÌ ÀÖ´Â centroid¸¦ ã´Â´Ù. À̰Ϳ¡ °¡Àå
°¡±îÀÌ ÀÖ´Â centroid¸¦ °¡Áö´Â °ÍÀÌ È®ÀÎµÈ cluster ¿¡ sampleÀ» Æ÷ÇÔ½ÃŲ´Ù. °¢°¢ÀÇ
sample µéÀÌ ÇÒ´çµÈ ÈÄ¿¡ ÇÒ´çµÈ cluster ÀÇ centroid °¡ ´Ù½Ã °è»êµÈ´Ù.
2. ±× data¸¦ µÎ ¹ø ó¸®ÇÑ´Ù. °¢ sample¿¡ ´ëÇÏ¿© °¡Àå °¡±îÀÌ ÀÖ´Â centroid¸¦ ã´Â´Ù. °¡Àå °¡±îÀÌ ÀÖ´Â centroid¸¦ °¡Áø °ÍÀ¸·Î È®ÀÎµÈ cluster ¿¡ sampleÀ» À§Ä¡½ÃŲ´Ù. (ÀÌ step ¿¡¼´Â ¾î¶² centroid µµ ´Ù½Ã °è»êÇÏÁö ¾Ê´Â´Ù.)
site :
A Tutorial on Clustering Algorithms : K-means : Applet
paper :
The k-means Algorithm : Earl Gose ¿Ü
K-Æò±Õ±ºÁýÈ(K-Means Clustering) : Àå³²½Ä
K-Æò±Õ ±ºÁý ¹æ¹ýÀ» ÀÌ¿ëÇÑ °¡Áß Ä¿³ÎºÐ·ù±â (Kernel Pattern Recognition using K-means Clustering Method) : ½ÉÁ¤¿í, ¹éÀå¼±, Çѱ¹Åë°èÇÐȸ ÀÀ¿ëÅë°è¿¬±¸ 13±Ç 2È£, 2000
K-Æò±Õ ±ºÁýÈÀÇ ÀçÇö¼º Æò°¡ ¹× ÀÀ¿ë (Reproducibility Assessment of K-Means Clustering and Applications) : Çã¸íÈñ, À̿뱸, Çѱ¹Åë°èÇÐȸ ÀÀ¿ëÅë°è¿¬±¸ 17±Ç 1È£, 2004