Clustering
Pattern Recognition and Image Analysis : Earl Gose. Richard Johnsonbaugh. Steve Jost Àú¼, Prentice Hall, 1996, Page 199~219
±¸ºÐÇÏ·Á°í ÇÏ´Â °¢ class¿¡ ´ëÇÑ ¾Æ¹«·± Áö½ÄÀÌ ¾ø´Â »óÅ¿¡¼ classifyÇÏ´Â °ÍÀ̹ǷΠunsupervised learning¿¡ ÇØ´çÇÑ´Ù. Áï sampleµé¿¡ ´ëÇÑ Áö½Ä¾øÀÌ similarity (À¯»çµµ) ¿¡ ±Ù°ÅÇÏ¿© cluster µéÀ» ±¸ºÐÇÑ´Ù. ÆÐÅÏ °ø°£¿¡ ÁÖ¾îÁø À¯ÇÑ °³ÀÇ ÆÐÅϵéÀÌ ¼·Î °¡±õ°Ô ¸ð¿©¼ ¹«¸®¸¦ ÀÌ·ç°í ÀÖ´Â ÆÐÅÏ ÁýÇÕÀ» cluster (±ºÁý) À̶óÇÏ°í ¹«¸®Áö¿ö ³ª°¡´Â ó¸® °úÁ¤À» clustering À̶ó ÇÑ´Ù.
cluster °£ÀÇ À¯»çµµ¸¦ Æò°¡Çϱâ À§ÇØ ¿©·¯ °¡ÁöÀÇ °Å¸® ÃøÁ¤ ÇÔ¼ö¸¦ »ç¿ëÇϴµ¥ ¿¹¸¦µé¸é Euclidean distance, Mahalanobis distance, Lance-Williams distance, Hamming distance µîÀÌ »ç¿ëµÈ´Ù.
hierarchy ´Â ´ÙÀ½ ±×¸²°ú °°ÀÌ tree ±¸Á¶·Î Ç¥ÇöµÉ ¼ö ÀÖ´Ù. µ¿¹° º´¿ø¿¡¼ ȯÀÚµéÀº Å©°Ô °³¿Í °í¾çÀÌ·Î ±¸¼ºµÇ¸ç °¢ÀÚ subgroup À¸·Î ³ª´²Áö¸ç ±×°ÍÀº ´Ù½Ã subgroup À¸·Î ³ª´²Áø´Ù. °¢°³ÀÇ µ¿¹°Àº 1 ~ 5 ·Î ±¸ºÐµÇ¸ç °¡Àå ¾Æ·¡¿¡ À§Ä¡ÇÑ´Ù. hierarchical clustering Àº data¸¦ ¸¹Àº ÀÛÀº ±×·ìÀ¸·Î ±¸¼ºµÇ´Â Å« ±×·ìÀ» ±¸¼ºÇÏ´Â °úÁ¤À» ÀǹÌÇÑ´Ù. ÈçÈ÷ tree ³ª dendrogram À¸·Î ±×·Á¼ Ç¥ÇöÇÏ¸ç °¡Àå ¹Ì¼¼ÇÑ ±×·ìÀº °¡Àå ¾Æ·¡¿¡ À§Ä¡ÇÏ¸ç °¢ sample Àº ÇϳªÀÇ cluster¸¦ Çü¼ºÇÑ´Ù. °¡Àå Å« ±×·ìÀÌ Á¦ÀÏ À§¿¡ À§Ä¡ÇÏ¸ç °¢ sample µéÀº ÇϳªÀÇ cluster ±×·ì¿¡ ¼ÓÇÑ´Ù.
À§ÀÇ ±×¸²¿¡¼ level ¿¡ µû¶ó ´ÙÀ½ÀÇ sample µé·Î ±¸¼ºµÇ´Â cluster ·Î ±¸¼ºµÈ´Ù.
level 0 : {1}, {2}, {3}, {4}, {5},
level 1 : {1, 2}, {3}, {4}, {5}.
level 2 : {1, 2}, {3}, {4, 5}.
level 3 : {1, 2, 3}, {4, 5}.
level 4 : {1, 2, 3, 4, 5} : ÇϳªÀÇ cluster ·Î ±¸¼ºµÈ´Ù.
hierarchical clustering ¿¡¼´Â ¾î¶² level ¿¡ ÇϳªÀÇ cluster ¿¡ ¼ÓÇÏ´Â µÎ sample ÀÌ ÀÖ´Ù¸é ±×°ÍÀº »óÀ§ level ¿¡¼´Â °°Àº cluster ¿¡ ¼ÓÇÑ´Ù. Áï À§ÀÇ ±×¸²¿¡¼ level 2 ¿¡¼ÀÇ 4 ¿Í 5 ´Â °°Àº cluster ¿¡ ¼ÓÇÏ¸ç ±×°ÍÀº level 3 °ú 4 ¿¡¼µµ °°Àº cluster ¿¡ ¼ÓÇÑ´Ù.
hierarchical clustering algorithm Àº °èÃþ±¸Á¶¸¦ bottom up ¹æ½ÄÀ¸·Î ¸¸µé °æ¿ì agglomerative (º´ÇÕ½Ä) À̶ó°í Çϰí, top down ¹æ½ÄÀ̸é divisive (ºÐÇÒ½Ä) À̶ó°í ÇÑ´Ù.
cluster °£ÀÇ À¯»çµµ¸¦ ÃøÁ¤ÇÏ´Â ´Ù¸¥ ¹æ¹ýµéÀ» »ç¿ëÇÏ¿© ´Ù¸¥ hierarchical clustering algorithmÀ» ±¸ÇÒ ¼ö ÀÖ´Ù. ±× À¯»çµµ¸¦ ÃøÁ¤ÇÏ´Â ÇϳªÀÇ ¹æ¹ýÀÌ cluster °£ °Å¸®¸¦ ÃøÁ¤ÇÏ´Â ÇÔ¼ö¸¦ Á¤ÀÇÇÏ´Â °ÍÀÌ´Ù. °Å¸® ÇÔ¼ö´Â sample ÀÇ ½Öµé°£ °Å¸®¸¦ ÃøÁ¤ÇÏ´Â ÀáÀç ÇÔ¼ö¿¡ ÀÇÇØ À¯µµµÈ´Ù. nearest neighbor ÀÇ °æ¿ìó·³ cluster ¹æ¹ý¿¡¼µµ °¡Àå ÀϹÝÀûÀÎ °Å¸® ÃøÁ¤Àº Eucledian distance ¿Í city block distance ¹æ¹ýÀÌ´Ù.
ÀϹÝÀûÀÎ agglomerative clustering algorithm Àº ¹¦»çÇϱⰡ ¼ö¿ùÇÏ´Ù. sample ÀÇ Àüü ¼ö¸¦ n À̶ó°í ÇÒ °æ¿ì ´ÙÀ½°ú °°Àº °úÁ¤À» °ÅÄ£´Ù.
1. n °³ÀÇ cluster ·Î ½ÃÀÛÇÑ´Ù. °¢°¢Àº ÇϳªÀÇ sampleÀ» Æ÷ÇÔÇÑ´Ù.
2. step 3À» n - 1 ¹ø ¹Ýº¹ÇÑ´Ù.
3. °¡Àå À¯»çÇÑ cluster ¿Í
À» ã¾Æ¼ ÇϳªÀÇ cluster ·Î º´ÇÕÇÑ´Ù. ¸¸ÀÏ °°Àº °ªÀÌ ÀÖÀ¸¸é ¸ÕÀú ¹ß°ßµÈ
½ÖÀ» º´ÇÕÇÑ´Ù.
´Ù¸¥ À̸§À¸·Î´Â minimum method ¿Í nearest neighbor method ·Îµµ ºÒ¸®¿î´Ù. ÈÄÀÚÀÇ À̸§Àº nearest neighbor classification °ú ¹ÐÁ¢ÇÑ °ü°è¸¦ º¸¿©ÁØ´Ù. °¢ cluster¿¡ Çϳª¾¿ÀÇ Á¡ÀÌ ÀÖÀ» ¶§ µÎ Á¡ °£ÀÇ °¡Àå °¡±î¿î °Å¸®¸¦ µÎ cluster °£ÀÇ °Å¸®·Î¼ Á¤ÀÇÇÏ´Â ¾Ë°í¸®ÁòÀÌ´Ù.
µÎ cluster ¿Í
°¡ ÀÖÀ» ¶§ ±×µé »çÀÌÀÇ °Å¸®´Â ´ÙÀ½°ú °°ÀÌ Á¤ÀǵȴÙ. ¿©±â¼
´Â sample
¿Í
»çÀÌÀÇ °Å¸®¸¦ ÀǹÌÇÑ´Ù.
Example 5.1 single-linkage algorithmÀ» »ç¿ëÇÑ Hierarchical clustering
5 °³ÀÇ sample °ú 2 °³ÀÇ feature ¿Í
¸¦ °¡Áö´Â °æ¿ì¸¦ º¸ÀÚ. 5 °³ÀÇ sampleÀº ´ÙÀ½ ±×¸²°ú °°ÀÌ ºÐÆ÷ÇØ ÀÖ´Ù°í ÇÏÀÚ. sample
°£ÀÇ °Å¸® ÃøÁ¤À» À§ÇØ Euclidean distance¸¦ »ç¿ëÇ϶ó
´ÙÀ½ Ç¥´Â °¢ sample ÀÇ feature °ª°ú sample ½Öµé
°£ÀÇ °Å¸® ¸¦ º¸¿©ÁØ´Ù.
|
|
|
|
|
1 |
2 |
3 |
4 |
5 |
1 2 3 4 5 |
4 8 15 24 24 |
4 4 8 4 12 |
|
1 2 3 4 5 |
- 4.0 11.7 20.0 21.5 |
4.0 - 8.1 16.0 17.9 |
11.7 8.1 - 9.8 9.8 |
20.0 16.0 9.8 - 8.0 |
21.5 17.9 9.8 8.0 - |
ÇϳªÀÇ sampleÀ» °¡Áö´Â cluster ¿Í
°¡ ÀÖ°í ±×µé°£ÀÇ °Å¸®´Â
ÀÌ´Ù.
¾Ë°í¸®ÁòÀº °¢ÀÚ ÇϳªÀÇ sampleÀ» °¡Áö´Â 5 °³ÀÇ cluster ·Î ½ÃÀÛÇÑ´Ù. ±×¸®°í ³ª¼ 2 °³ÀÇ °¡Àå °¡±î¿î cluster °¡ º´ÇյȴÙ. À§ÀÇ Ç¥¿¡¼ °¡Àå ÀÛÀº ¼ö´Â 4 À̰í sample 1 °ú 2 »çÀÌÀÇ °Å¸®ÀÌ´Ù. µû¶ó¼ cluster {1} °ú {2} °¡ º´ÇյȴÙ. À̶§´Â ´ÙÀ½ÀÇ 4 °³ÀÇ cluster °¡ µÈ´Ù.
{1, 2}, {3}, {4}, {5}
´ÙÀ½¿¡´Â 4 °³ÀÇ cluster °£ÀÇ °Å¸®°¡ ´ÙÀ½ Çà·Ä°ú °°ÀÌ ÁÖ¾îÁø´Ù.
|
{1, 2} |
3 |
4 |
5 |
{1, 2} 3 4 5 |
- 8.1 16.0 17.9 |
8.1 - 9.8 9.8 |
16.0 9.8 - 8.0 |
17.9 9.8 8.0 - |
Çà {1, 2} °ú ¿ 3 À§Ä¡¿¡ ÀÖ´Â °ª 8.1 Àº cluster
{1, 2} °ú {3} »çÀÌÀÇ °Å¸®ÀÌ¸ç ´ÙÀ½°ú °°ÀÌ °è»êµÈ´Ù. À§ÀÇ Ã¹ ¹øÂ° Ç¥¿¡¼ À̸ç
À» º¸¿©ÁØ´Ù. single cluster algorithm ¿¡¼´Â µÎ °ªÁß¿¡¼ ÃÖ¼Ò°ª 8.1À» cluster
°£ÀÇ °Å¸®·Î ¼±ÅÃÇÑ´Ù. ù° ÇàÀÇ ´Ù¸¥ °ªµéµµ °°Àº ¹æ¹ýÀ¸·Î °è»êµÈ´Ù. ù° Çà°ú
¿ ÀÌ¿ÜÀÇ ´Ù¸¥ °ªµéÀº ±× ´ë·Î ½Â°èÇÑ´Ù. ´ÙÀ½À¸·Î Çà·ÄÇ¥¿¡¼ÀÇ ÃÖ¼Ò°ªÀÌ 8 À̱â
¶§¹®¿¡ cluster {4} ¿Í {5} °¡ º´ÇյȴÙ. µû¶ó¼ 3 °³ÀÇ cluster °¡ µÈ´Ù.
{1, 2}, {3}, {4, 5}
3 °³ÀÇ cluster °£ÀÇ °Å¸®´Â ´ÙÀ½ Çà·ÄÇ¥¿Í °°´Ù.
|
{1, 2} |
3 |
{4, 5} |
{1, 2} 3 {4, 5} |
- 8.1 16.0 |
8.1 - 9.8 |
16.0 9.8 - |
À§ Ç¥¿¡¼ÀÇ ÃÖ¼Ò°ªÀº 8.1 À̱⠶§¹®¿¡ cluster {1, 2} ¿Í {3} ÀÌ º´ÇյǾî 2 °³ÀÇ cluster ·Î µÈ´Ù.
{1, 2, 3}, {4, 5}
´ÙÀ½À¸·Î °Å¸® 9.8¿¡¼ ³²¾ÆÀÖ´Â cluster µÎ °³¸¦ º´ÇÕÇÑ´Ù. ÀÌ·¡¼ hierarchical clustering ÀÌ ¿Ï¼ºµÇ¸ç ±× tree Àº ´ÙÀ½°ú °°´Ù.
º´ÇÕÇÏ´Â cluster °£ÀÇ °Å¸® Àº vertical Ãà¿¡¼ º¼ ¼ö ÀÖ´Ù.
´Ù¸¥ À̸§À¸·Î´Â maximum method
¶Ç´Â farthest neighbor method ¶ó°íµµ ºÒ¸°´Ù. ¼·Î ´Ù¸¥ cluster
¿¡ À§Ä¡ÇÏ´Â sample µé °£¿¡ °¡Àå Å« °Å¸®¸¦ µÎ cluster °£ÀÇ °Å¸®·Î Á¤ÀÇÇÏ¿©
±¸ÇÏ¿©Áø´Ù. µÎ cluster ¿Í
»çÀÌÀÇ °Å¸®´Â ´ÙÀ½°ú °°ÀÌ Á¤ÀǵȴÙ.
Example 5.2 complete linkage algorithmÀ» »ç¿ëÇÑ hierarchical clustering
´ÙÀ½ ±×¸²°ú °°ÀÌ data °¡ ºÐÆ÷ÇÑ´Ù°í ÇÏÀÚ. sample °£ÀÇ °Å¸® ÃøÁ¤À» À§ÇØ Euclidean distance¸¦ »ç¿ëÇ϶ó
´ÙÀ½ Ç¥´Â °¢ sample ÀÇ feature °ª°ú sample ½Öµé
°£ÀÇ °Å¸® ¸¦ º¸¿©ÁØ´Ù.
|
|
|
|
|
1 |
2 |
3 |
4 |
5 |
1 2 3 4 5 |
4 8 15 24 24 |
4 4 8 4 12 |
|
1 2 3 4 5 |
- 4.0 11.7 20.0 21.5 |
4.0 - 8.1 16.0 17.9 |
11.7 8.1 - 9.8 9.8 |
20.0 16.0 9.8 - 8.0 |
21.5 17.9 9.8 8.0 - |
ÇϳªÀÇ sampleÀ» Æ÷ÇÔÇÏ´Â ´Ù¼¸ °³ÀÇ cluster ·Î ½ÃÀÛÇÑ´Ù. °¡Àå °¡±îÀÌ ÀÖ´Â cluster {1} ¿Í {2} ´Â º´ÇÕµÇ¾î »õ·Î¿î cluster¸¦ ¸¸µç´Ù.
{1, 2}, {3}, {4}, {5}
ÀÌ cluster °£ÀÇ °Å¸®¸¦ ±¸ÇÑ Çà·ÄÀÌ ´ÙÀ½°ú °°´Ù.
|
{1, 2} |
3 |
4 |
5 |
{1, 2} 3 4 5 |
- 11.7 20.0 21.5 |
11.7 - 9.8 9.8 |
20.0 9.8 - 8.0 |
21.5 9.8 8.0 - |
Çà {1, 2} °ú ¿ 3 À§Ä¡ÀÇ °ªÀº 11.7 À̸ç À̰ÍÀº cluster {1, 2} ¿Í {3} »çÀÌÀÇ °Å¸®¸¦ ÀǹÌÇÏ¸ç ±×°ÍÀº ´ÙÀ½°ú °°ÀÌ ±¸ÇØÁø´Ù.
¿ø·¡ ÁÖ¾îÁø data¿¡¼ ¿Í
ÀÇ °ªÀÌ ÁÖ¾îÁö¸ç complete linkage ¾Ë°í¸®Áò¿¡¼´Â ÃÖ´ë°ªÀÎ
11.7À» cluster °£ÀÇ °Å¸®¸¦ ¼±ÅÃÇÑ´Ù. ù ¹øÂ° ÇàÀÇ ´Ù¸¥ °ªµéµµ °°Àº ¹æ¹ýÀ¸·Î
±¸ÇØÁø´Ù. ù° Çà°ú «R° ¿ ÀÌ¿ÜÀÇ °ªÀº ±×·¡µµ ½Â°èÇÑ´Ù. ±× ¶§ Çà·Ä¿¡¼ÀÇ ÃÖ¼Ò°ªÀº
8 ÀÌ¸ç µû¶ó¼ cluster {4} ¿Í {5} °¡ º´ÇÕÇÑ´Ù. À̶§ÀÇ cluster ´Â ´ÙÀ½°ú °°´Ù.
{1, 2}, {3}, {4, 5}
ÀÌ cluster °£ÀÇ °Å¸®´Â ´ÙÀ½ Çà·Ä°ú °°ÀÌ ±¸ÇØÁø´Ù.
|
{1, 2} |
3 |
{4, 5} |
{1, 2} 3 {4, 5} |
- 11.7 21.5 |
11.7 - 9.8 |
21.5 9.8 - |
À§ Çà·ÄÀÇ ÃÖ¼Ò°ªÀº 9.8 À̱⠶§¹®¿¡ cluster {3} °ú {4, 5} °¡ º´ÇյǾî cluster´Â ´ÙÀ½°ú °°´Ù.
{1, 2}, {3, 4, 5}
À§ÀÇ cluster ´Â single linkage algorithm ÀÇ µ¿µîÇÑ À§Ä¡¿¡¼ÀÇ ±¸ÇØÁø cluster ¿Í´Â ´Ù¸¥ °ÍÀ» ¾Ë ¼ö ÀÖ´Ù. ´ÙÀ½ °úÁ¤À¸·Î ³²¾ÆÀÖ´Â cluster µéÀÌ º´ÇյǾî hierarchical clustering ÀÌ ¿Ï¼ºµÇ¸ç ±× tree Àº ´ÙÀ½°ú °°´Ù.
single linkage algorithm Àº tree »ó¿¡¼ cluster µéÀÌ ±æ°í °¡´Â(long and thin) ¸ð¾çÀ» º¸ÀÏ ¼ö ÀÖ´Ù. complete linkage algorithm ¿¡¼´Â º¸´Ù compact ÇÑ ¸ð¾çÀ» º¸ÀδÙ. µÎ clustering ¹æ¹ý ¸ðµÎ ºñÁ¤»óÀûÀÎ °üÂû¿¡ ¹Î°¨ÇÏ¿© º¯ÇüµÉ ¼ö ÀÖ´Ù. µÎ ¾Ë°í¸®ÁòÀÇ ±Ø´ÜÀ» ÀýÃæÇϱâ À§ÇÑ ¹æ¹ýÀÌ average linkage algorithm ÀÌ´Ù.
average linkage algorithm Àº ´Ù¸¥ À̸§À¸·Î´Â
unweighted pair group method using arithmetic average (UPGMA) ¶ó°íµµ ºÒ¸®¿ì¸ç
°¡Àå ³Î¸® »ç¿ëµÇ´Â hierarchical clustering algorithm Áß ÇϳªÀÌ´Ù. ¼·Î
´Ù¸¥ cluster ¿¡ ¼ÓÇÑ µÎ Á¡ »çÀÌÀÇ Æò±Õ °Å¸®¸¦ µÎ cluster °£ÀÇ °Å¸®·Î Á¤ÀÇ ÇÔÀ¸·Î½á
average linkage algorithm ÀÌ ¼öÇàµÈ´Ù. ¸¸ÀÏ cluster °¡
°³ÀÇ ¸â¹ö°¡ ÀÖ°í, cluster
°¡
°³ÀÇ ¸â¹ö¸¦ °¡Áú °æ¿ì µÎ cluster °£ÀÇ °Å¸®´Â ´ÙÀ½°ú °°´Ù.
Example 5.3 average linkage algorithmÀ» »ç¿ëÇÑ hierarchical clustering
´ÙÀ½ ±×¸²°ú °°ÀÌ data °¡ ºÐÆ÷ÇÑ´Ù°í ÇÏÀÚ. sample °£ÀÇ °Å¸® ÃøÁ¤À» À§ÇØ Euclidean distance¸¦ »ç¿ëÇ϶ó
´ÙÀ½ Ç¥´Â °¢ sample ÀÇ feature °ª°ú sample ½Öµé
°£ÀÇ °Å¸® ¸¦ º¸¿©ÁØ´Ù.
|
|
|
|
|
1 |
2 |
3 |
4 |
5 |
1 2 3 4 5 |
4 8 15 24 24 |
4 4 8 4 12 |
|
1 2 3 4 5 |
- 4.0 11.7 20.0 21.5 |
4.0 - 8.1 16.0 17.9 |
11.7 8.1 - 9.8 9.8 |
20.0 16.0 9.8 - 8.0 |
21.5 17.9 9.8 8.0 - |
ÇϳªÀÇ sampleÀ» Æ÷ÇÔÇÏ´Â ´Ù¼¸ °³ÀÇ cluster ·Î ½ÃÀÛÇÑ´Ù. °¡Àå °¡±îÀÌ ÀÖ´Â cluster {1} ¿Í {2} ´Â º´ÇÕµÇ¾î »õ·Î¿î cluster¸¦ ¸¸µç´Ù.
{1, 2}, {3}, {4}, {5}
ÀÌ cluster °£ÀÇ °Å¸®¸¦ ±¸ÇÑ Çà·ÄÀÌ ´ÙÀ½°ú °°´Ù.
|
{1, 2} |
3 |
4 |
5 |
{1, 2} 3 4 5 |
- 9.9 18 19.7 |
9.9 - 9.8 9.8 |
18.0 9.8 - 8.0 |
19.7 9.8 8.0 - |
Çà {1, 2} °ú ¿ 3 À§Ä¡ÀÇ °ªÀº 9.9 À̸ç À̰ÍÀº cluster {1, 2} ¿Í {3} »çÀÌÀÇ °Å¸®¸¦ ÀǹÌÇÏ¸ç ±×°ÍÀº ´ÙÀ½°ú °°ÀÌ ±¸ÇØÁø´Ù.
¿ø·¡ ÁÖ¾îÁø data¿¡¼ ¿Í
ÀÇ °ªÀÌ ÁÖ¾îÁö¸ç average linkage ¾Ë°í¸®Áò¿¡¼´Â Æò±Õ°ªÀÎ
9.9¸¦ cluster °£ÀÇ °Å¸®·Î ¼±ÅÃÇÑ´Ù. ù ¹øÂ° ÇàÀÇ ´Ù¸¥ °ªµéµµ °°Àº ¹æ¹ýÀ¸·Î
±¸ÇØÁø´Ù. ù° Çà°ú «R° ¿ ÀÌ¿ÜÀÇ °ªÀº ±×·¡µµ ½Â°èÇÑ´Ù. ±× ¶§ Çà·Ä¿¡¼ÀÇ ÃÖ¼Ò°ªÀº
8 ÀÌ¸ç µû¶ó¼ cluster {4} ¿Í {5} °¡ º´ÇÕÇÑ´Ù. À̶§ÀÇ cluster ´Â ´ÙÀ½°ú °°´Ù.
{1, 2}, {3}, {4, 5}
ÀÌ cluster °£ÀÇ °Å¸®´Â ´ÙÀ½ Çà·Ä°ú °°ÀÌ ±¸ÇØÁø´Ù.
|
{1, 2} |
3 |
{4, 5} |
{1, 2} 3 {4, 5} |
- 9.9 18.9 |
9.9 - 9.8 |
18.9 9.8 - |
À§ Çà·ÄÀÇ ÃÖ¼Ò°ªÀº 9.8 À̱⠶§¹®¿¡ cluster {3} °ú {4, 5} °¡ º´ÇյǾî cluster´Â ´ÙÀ½°ú °°´Ù.
{1, 2}, {3, 4, 5}
´ÙÀ½ °úÁ¤À¸·Î ³²¾ÆÀÖ´Â cluster µéÀÌ º´ÇյǾî hierarchical clustering ÀÌ ¿Ï¼ºµÈ´Ù.
´Ù¸¥ À̸§À¸·Î´Â minimum variance method ¶ó°íµµ ÇÑ´Ù. ´Ù¸¥ ¾Ë°í¸®Áò ó·³ °¢ sampleÀ» À§ÇÑ ÇϳªÀÇ cluster ·Î½á ½ÃÀÛÇÑ´Ù. ¸ðµç cluster ½Ö »çÀÌ¿¡¼ ¹Ýº¹À» ÅëÇØ °¡Àå ÀÛÀº squared error¸¦ °¡Áö´Â ½ÖÀ» º´ÇÕÇÏ¿© »õ·Î¿î cluster¸¦ ¸¸µç´Ù. °¢ cluster¸¦ À§ÇÑ squared error´Â ´ÙÀ½°ú °°ÀÌ Á¤ÀǵȴÙ.
ÇϳªÀÇ cluster °¡ °³ÀÇ sample (
, ¿©±â¼
´Â feature vector
) À» Æ÷ÇÔÇÑ´Ù¸é, sample
ÀÇ squared error (Æò±Õ¿¡¼ÀÇ squared Euclidean distance)´Â ´ÙÀ½°ú °°´Ù.
¿©±â¼ ´Â cluster¿¡¼ sampleµéÀ» À§ÇÑ feature
ÀÇ Æò±Õ°ªÀÌ´Ù.
Àüü cluster ¿¡ ´ëÇÑ squared error ´Â sample µéÀÇ squared error µéÀÇ ÇÕÀÌ´Ù.
°¢ feature ÀÇ Æò±Õ°ªÀ¸·Î ±¸¼ºµÈ vector Àº ±× cluster ÀÇ mean vector ¶Ç´Â centroid ¶ó°í ºÒ¸®¿î´Ù. ÇϳªÀÇ cluster¸¦
À§ÇÑ squared error ´Â °¢ feature¿¡¼ cluster ¸â¹ö·ÎºÎÅÍ ±×µéÀÇ Æò±Õ°ª±îÁöÀÇ
squared distance ÀÇ ÇÕÀÌ´Ù. squared error´Â cluster ÀÇ total variance
¿¡´Ù°¡ cluster ÀÇ sampleÀÇ ¼ö
À» °öÇÑ °Í°ú °°´Ù. ¿©±â¼ total variance´Â °¢ feature ÀÇ variance ÀÇ ÇÕÀ¸·Î½á
·Î Á¤ÀǵȴÙ. ÀÏ·ÃÀÇ clusterµéÀ» À§ÇÑ squared error ´Â °¢ clusterµéÀ» À§ÇÑ
squared error µéÀÇ ÇÕÀ¸·Î Á¤ÀǵȴÙ.
Example 5.4 Ward's method¸¦ »ç¿ëÇÑ hierarchical clustering
´ÙÀ½ ±×¸²°ú °°ÀÌ data °¡ ºÐÆ÷ÇÑ´Ù°í ÇÏÀÚ.
´ÙÀ½ Ç¥´Â °¢ sample ÀÇ feature °ª°ú sample ½Öµé
°£ÀÇ °Å¸® ¸¦ º¸¿©ÁØ´Ù.
|
|
|
|
|
1 |
2 |
3 |
4 |
5 |
1 2 3 4 5 |
4 8 15 24 24 |
4 4 8 4 12 |
|
1 2 3 4 5 |
- 4.0 11.7 20.0 21.5 |
4.0 - 8.1 16.0 17.9 |
11.7 8.1 - 9.8 9.8 |
20.0 16.0 9.8 - 8.0 |
21.5 17.9 9.8 8.0 - |
À̶§ squared error ´Â zero ÀÌ´Ù. cluster ½ÖÀ» º´ÇÕÇÏ´Â ¹æ¹ýÀº 10 °¡Áö°¡ ÀÖ´Ù. cluster {1} °ú {2} º´ÇÕ, {1} °ú {3} º´ÇÕµîµî.
´ÙÀ½ Ç¥´Â °¡´ÉÇÑ ¹æ¹ýÀÇ squared error¸¦ º¸¿©ÁØ´Ù. ¿¹¸¦µé¸é cluster {1} ¿Í {2}¸¦ º´ÇÕÇÑ´Ù°í ÇÏÀÚ. sample 1 ÀÇ feature vector ´Â (4,4) À̰í sample 2 ´Â (8,4) ÀÌ¸ç µû¶ó¼ feature Æò±ÕÀº 6 °ú 4 ÀÌ´Ù. cluster {1, 2}¸¦ À§ÇÑ squared error ´Â ´ÙÀ½°ú °°ÀÌ ±¸ÇØÁø´Ù.
(4 - 6)2 + (8 - 6)2 + (4 - 4)2 + (4 - 4)2 = 8
´Ù¸¥ °¢°¢ÀÇ cluster {3}, {4}, {5} ÀÇ squared error ´Â 0 ÀÌ´Ù. µû¶ó¼ cluster {1, 2}, {3}, {4}, {5} ÀÇ total squared error ´Â ´ÙÀ½°ú °°´Ù.
8 + 0 + 0 + 0 = 8
´ÙÀ½ Ç¥¿¡¼ °¡Àå ÀÛÀº squared error ´Â 8 À̱⠶§¹®¿¡ cluster {1} ¿Í {2} ´Â º´ÇյǾî 4 °³ÀÇ cluster {1, 2}, {3}, {4}, {5}¸¦ ¸¸µç´Ù.
Clusters |
Squared |
{1, 2}, {3}, {4}, {5} {1, 3}, {2}, {4}, {5} {1, 4}, {2}, {3}, {5} {1, 5}, {2}, {3}, {4} {2, 3}, {1}, {4}, {5} {2, 4}, {1}, {3}, {5} {2, 5}, {1}, {3}, {5} {3, 4}, {1}, {2}, {5} {3, 5}, {1}, {2}, {4} {4, 5}, {1}, {2}, {3} |
8.0 68.5 200.0 232.0 32.5 128.0 160.0 48.5 48.5 32.0 |
4 °³ÀÇ cluster¸¦ ¸¸µé °æ¿ìÀÇ °¢°¢ÀÇ squared error
´ÙÀ½ Ç¥¿¡¼´Â {1, 2}, {3}, {4}, {5} Áß¿¡¼ 2 °³¸¦ º´ÇÕÇÏ´Â °¡´ÉÇÑ °æ¿ìÀÇ squared error¸¦ º¸¿©ÁØ´Ù. ´ÙÀ½Ç¥¿¡¼ °¡Àå ÀÛÀº squared error ´Â 40 À̱⠶§¹®¿¡ cluster {4} ¿Í {5} °¡ º´ÇյǾî 3 °³ÀÇ cluster¸¦ ¸¸µç´Ù.
{1, 2}, {3}, {4, 5}
Clusters |
Squared |
{1, 2, 3}, {4}, {5} {1, 2, 4}, {3}, {5} {1, 2, 5}, {3}, {4} {1, 2}, {3, 4}, {5} {1, 2}, {3, 5}, {4} {1, 2}, {4, 5}, {3} |
72.7 224.0 266.7 56.5 56.5 40.0 |
3 °³ÀÇ cluster¸¦ ¸¸µé °æ¿ìÀÇ °¢°¢ÀÇ squared error
´ÙÀ½ Ç¥¿¡¼´Â {1, 2}, {3}, {4, 5} Áß¿¡¼ 2 °³¸¦ º´ÇÕÇÏ´Â °¡´ÉÇÑ °æ¿ìÀÇ squared error¸¦ º¸¿©ÁØ´Ù. ´ÙÀ½Ç¥¿¡¼ °¡Àå ÀÛÀº squared error ´Â 94 À̱⠶§¹®¿¡ cluster {3} °ú {4, 5} °¡ º´ÇյǾî 2 °³ÀÇ cluster¸¦ ¸¸µç´Ù.
{1, 2}, {3, 4, 5}
Clusters |
Squared |
{1, 2, 3}, {4, 5} {1, 2, 4, 5}, {3} {1, 2}, {3, 4, 5} |
104.7 380.0 94.0 |
2 °³ÀÇ cluster¸¦ ¸¸µé °æ¿ìÀÇ °¢°¢ÀÇ squared error
´ÙÀ½À¸·Î ³²Àº 2 °³ÀÇ cluster °¡ º´ÇյǾî hierarchical clustering ÀÌ ¿Ï¼ºµÈ´Ù. ´ÙÀ½ ±×¸²Àº ¿Ï¼ºµÈ tree¸¦ º¸¿©ÁØ´Ù.
Ward's ¹æ¹ý¿¡ ÀÇÇÑ Æ®¸®
cluster ÀÇ °èÃþÀ» °í·ÁÇÏÁö ¾Ê°í Æò¸éÀûÀ¸·Î clustering ÇÏ´Â ¹æ¹ýÀ¸·Î ÀϹÝÀûÀ¸·Î ¹Ì¸® ¸î °³ÀÇ cluster ·Î ³ª´©¾î Áú °ÍÀ̶ó°í ¿¹»óÇϰí cluster ÀÇ °³¼ö¸¦ Á¤ÇÏ´Â °ÍÀÌ´Ù.
°¡Àå °£´ÜÇÑ partitional clustering algorithm
ÁßÀÇ Çϳª°¡ Forgy's algorithm ÀÌ´Ù. data À̿ܿ¡ cluster ÀÇ ¼ö ¸¦ input À¸·Î Çϸç À̶§
¸¦ seed point ¶ó°í ÇÑ´Ù. seed point ´Â ÀÓÀÇ·Î ¼±ÅÃµÇ¸ç ¹Ù¶÷Á÷ÇÑ cluster
±¸Á¶¿¡ °üÇÑ ¾î¶² Áö½ÄµéÀÌ seed point¸¦ ¼±ÅÃÇϴµ¥¿¡ »ç¿ëµÉ ¼ö ÀÖ´Ù. ±× °úÁ¤Àº
´ÙÀ½°ú °°´Ù.
1. ÀÓÀÇÀÇ °¹¼öÀÇ seed point¸¦ cluster centroid ·Î¼ ÃʱâÈ ÇÑ´Ù.
2. °¢ sample ¿¡ ´ëÇØ °¡Àå °¡±îÀÌ ÀÖ´Â cluster centroid¸¦ ã¾Æ¼ ÇØ´ç cluster ¿¡ sampleÀ» ¹èÁ¤ÇÑ´Ù.
3. ¸¸ÀÏ step 2¿¡¼ sample ÀÌ cluster¸¦ º¯È½ÃŰÁö ¸øÇϸé Á¾·áÇÑ´Ù.
4. º¯ÈµÈ cluster µé¿¡ ´ëÇÑ centroid¸¦ ´Ù½Ã °è»êÇØ¼ ´Ù½Ã step 2 ·Î °£´Ù.
Example 5.5 Forgy's algorithmÀ» »ç¿ëÇÑ partitional clustering
´ÙÀ½ ±×¸²°ú °°ÀÌ data °¡ ºÐÆ÷ÇÑ´Ù°í ÇÏÀÚ.
´ÙÀ½ Ç¥´Â °¢ sample ÀÇ feature ,
°ª°ú sample ½Öµé
°£ÀÇ °Å¸®
¸¦ º¸¿©ÁØ´Ù.
|
|
|
|
|
1 |
2 |
3 |
4 |
5 |
1 2 3 4 5 |
4 8 15 24 24 |
4 4 8 4 12 |
|
1 2 3 4 5 |
- 4.0 11.7 20.0 21.5 |
4.0 - 8.1 16.0 17.9 |
11.7 8.1 - 9.8 9.8 |
20.0 16.0 9.8 - 8.0 |
21.5 17.9 9.8 8.0 - |
step 1, óÀ½¿¡ 2 °³ÀÇ cluster¸¦ ·Î µÎ°í seed point ·Î¼ °¢ cluster ´Â ÃÖÃÊÀÇ µÎ sample (4,4), (8,4)¸¦ »ç¿ëÇÑ´Ù.
Forgy's algorithm¿¡¼´Â °è»ê¿¡¼ÀÇ ÆíÀǸ¦ À§ÇØ sampleµéÀÇ ¼öº¸´Ù´Â feature vector
·Î¼ sample µéÀ» Ç¥±âÇÑ´Ù.
step 2, °¢ sample¿¡¼ °¡Àå °¡±îÀÌ ÀÖ´Â cluster centroid¸¦ ã´Â´Ù. ´ÙÀ½ Ç¥¿¡¼´Â ±× °á°ú¸¦ º¸¿©ÁØ´Ù. cluster 2 °³ {(4, 4)} ¿Í {(8, 4), (15, 8), (24, 4), (24, 12)} °¡ ¸¸µé¾î Á³´Ù.
step 4, cluster µéÀ» À§ÇÑ centroid¸¦ °è»êÇÑ´Ù. ù ¹øÂ° cluster ÀÇ centroid ´Â (4,4) ÀÌ¸ç µÎ ¹øÂ°´Â (17.75, 7) ÀÌ¸ç ±×°ÍÀº ´ÙÀ½°ú °°ÀÌ °è»êµÇ¾ú´Ù.
(8 + 15 + 24 + 24) / 4 = 17.75
(4 + 8 + 4 + 12) / 4 = 7
sampleµéÀÌ cluster µéÀ» º¯È½ÃÄ×±â (óÀ½¿¡´Â cluster °¡ ¾ø¾úÀ¸´Ï±î) ¶§¹®¿¡ step 2 ·Î µ¹¾Æ°£´Ù.
Sample |
Nearest |
(4, 4) (8, 4) (15, 8) (24, 4) (24, 12) |
(4, 4) (8, 4) (8, 4) (8, 4) (8, 4) |
Forgy's algorithm ÀÇ Ã¹ ¹øÂ° ¹Ýº¹
´ÙÀ½ Ç¥¿¡¼´Â °¢ sample µé¿¡ °¡Àå °¡±î¿î cluster centroid¸¦ ã´Â´Ù. cluster {(4, 4), (8, 4)} ¿Í {(15, 8), (24, 4), (24, 12)} ÀÌ ¸¸µé¾î Á³´Ù.
step 4, cluster ÀÇ centroid (6, 4) ¿Í (21, 8) ÀÌ °è»êµÈ´Ù. sample (8, 4) °¡ cluster µéÀ» º¯È½ÃÄ×°í µû¶ó¼ step 2 ·Î °£´Ù.
Sample |
Nearest |
(4, 4) (8, 4) (15, 8) (24, 4) (24, 12) |
(4, 4) (4, 4) (17.75, 7) (17.75, 7) (17.75, 7) |
Forgy's algorithm ÀÇ µÎ ¹øÂ° ¹Ýº¹
´ÙÀ½ Ç¥¿¡¼´Â °¢ sample µé¿¡ °¡Àå °¡±î¿î cluster centroid¸¦ ã´Â´Ù. cluster {(4, 4), (8, 4)} ¿Í {(15, 8), (24, 4), (24, 12)} ÀÌ ¸¸µé¾î Á³´Ù.
step 4, cluster ÀÇ centroid (6, 4) ¿Í (21, 8) ÀÌ °è»êµÈ´Ù. ¾î¶² sample µµ cluster µéÀ» º¯È½ÃŰÁö ¾Ê¾Ò±â ¶§¹®¿¡ ¾Ë°í¸®ÁòÀº Á¾·áµÈ´Ù.
Sample |
Nearest |
(4, 4) (8, 4) (15, 8) (24, 4) (24, 12) |
(6, 4) (6, 4) (21, 8) (21, 8) (21, 8) |
Forgy's algorithm ÀÇ ¼¼ ¹øÂ° ¹Ýº¹
À§ ¿¹Á¦¿¡¼´Â seed point¸¦ óÀ½ÀÇ
µÎ sampleµé·Î ÀÓÀÇ·Î ¼±ÅõǾú´Ù. ±×·¯³ª ´Ù¸¥ °¡´É¼ºÀÌ Á¦¾ÈµÉ ¼ö ÀÖ´Ù. ÇϳªÀÇ
°¡´É¼ºÀº hierarchical clustering algorithm ÁßÀÇ Çϳª·Î ¸¸µé¾îÁø cluster ·Î ½ÃÀÛÇÏ¿© ÃÖÃÊÀÇ seed point ·Î¼ ±× centroid¸¦ »ç¿ëÇÏ´Â °ÍÀÌ´Ù.
Forgy's algorithm °ú À¯»çÇÑ ¹æ¹ýÀÌ´Ù. data À̿ܿ¡ cluster ÀÇ ¼ö ¸¦ input À¸·Î Çϸç À̶§
¸¦ seed point ¶ó°í ÇÑ´Ù. Forgy' algorithm °ú ´Ù¸¥Á¡Àº ÇϳªÀÇ
sample ÀÌ ÇϳªÀÇ cluster ¿¡ ÇÕ·ùÇÏÀÚ¸¶ÀÚ °ð cluster ÀÇ centroid °¡ ´Ù½Ã °è»êµÈ´Ù´Â
°ÍÀÌ´Ù. ¶ÇÇÑ Forgy' algorithm ÀÌ ¹Ýº¹Àû(iterative) ÇÑ ¹Ý¸é¿¡
-means algorithm Àº data set¿¡¼ ´ÜÁö µÎ ¹ø¸¸ÀÇ pass °¡ ÀÌ·ç¾îÁø´Ù. ±× °úÁ¤Àº
´ÙÀ½°ú °°´Ù.
1. óÀ½¿¡ cluster ·Î¼ ½ÃÀÛÇÑ´Ù. ³²¾ÆÀÖ´Â
sampleµé¿¡ ´ëÇØ¼´Â °¡Àå °¡±îÀÌ ÀÖ´Â centroid¸¦ ã´Â´Ù. À̰Ϳ¡ °¡Àå
°¡±îÀÌ ÀÖ´Â centroid¸¦ °¡Áö´Â °ÍÀÌ È®ÀÎµÈ cluster ¿¡ sampleÀ» Æ÷ÇÔ½ÃŲ´Ù. °¢°¢ÀÇ
sample µéÀÌ ÇÒ´çµÈ ÈÄ¿¡ ÇÒ´çµÈ cluster ÀÇ centroid °¡ ´Ù½Ã °è»êµÈ´Ù.
2. ±× data¸¦ µÎ ¹ø ó¸®ÇÑ´Ù. °¢ sample¿¡ ´ëÇÏ¿© °¡Àå °¡±îÀÌ ÀÖ´Â centroid¸¦ ã´Â´Ù. °¡Àå °¡±îÀÌ ÀÖ´Â centroid¸¦ °¡Áø °ÍÀ¸·Î È®ÀÎµÈ cluster ¿¡ sampleÀ» À§Ä¡½ÃŲ´Ù. (ÀÌ step ¿¡¼´Â ¾î¶² centroid µµ ´Ù½Ã °è»êÇÏÁö ¾Ê´Â´Ù.)
Example 5.6
-means algorithmÀ» »ç¿ëÇÑ partitional clustering
´ÙÀ½ ±×¸²°ú °°ÀÌ data °¡ ºÐÆ÷ÇÑ´Ù°í ÇÏÀÚ.
´ÙÀ½ Ç¥´Â °¢ sample ÀÇ feature ,
°ª°ú sample ½Öµé
°£ÀÇ °Å¸®
¸¦ º¸¿©ÁØ´Ù.
|
|
|
|
|
1 |
2 |
3 |
4 |
5 |
1 2 3 4 5 |
4 8 15 24 24 |
4 4 8 4 12 |
|
1 2 3 4 5 |
- 4.0 11.7 20.0 21.5 |
4.0 - 8.1 16.0 17.9 |
11.7 8.1 - 9.8 9.8 |
20.0 16.0 9.8 - 8.0 |
21.5 17.9 9.8 8.0 - |
óÀ½¿¡ 2 °³ÀÇ cluster¸¦ ·Î µÎ°í óÀ½ÀÇ µÎ sample µéÀº (8,4) ¿Í (24,4) ·Î ÇÑ´Ù.
step 1
µÎ °³ÀÇ cluster {(8, 4} ¿Í {(24, 4)} ·Î ½ÃÀÛÇÏ¸ç ±×°ÍÀº (8, 4) ¿Í (24, 4) ¿¡ centroid¸¦ °®´Â´Ù. ³ª¸ÓÁö 3 °³ÀÇ sample °¢°¢¿¡ ´ëÇØ¼´Â °¡Àå °¡±îÀÌ ÀÖ´Â centroid¸¦ ã°í ±× cluster ¿¡ sampleÀ» µÎ°í cluster ÀÇ centroid¸¦ ´Ù½Ã °è»êÇÑ´Ù.
´ÙÀ½ sample (15, 8) Àº centroid (8, 4) ¿¡ °¡Àå °¡±õ°í µû¶ó¼ cluster {(8, 4)} ¿¡ ÇÕ·ùÇÑ´Ù. À̶§¿¡ cluster µéÀº {(8, 4), (15, 8)} ¿Í {(24, 4)} ÀÌ´Ù. ù ¹øÂ° cluster ÀÇ centroid ´Â (11.5, 6) ·Î ¹Ù²î´Âµ¥ ±×°ÍÀº ´ÙÀ½°ú °°ÀÌ °è»êµÈ °ÍÀÌ´Ù.
(8 + 15) / 2 = 11.5, (4 + 8) / 2 = 6
´ÙÀ½ sample (4, 4) Àº centroid (11.5, 6) ¿¡ °¡Àå °¡±õ°í µû¶ó¼ cluster {(8, 4), (15, 8)} ¿¡ ÇÕ·ùÇÑ´Ù. À̶§¿¡ cluster µéÀº {(8, 4), (15, 8), (4, 4)} ¿Í {(24, 4)} ÀÌ´Ù. ù ¹øÂ° cluster ÀÇ centroid ´Â (9, 5.3) ·Î ¹Ù²ï´Ù.
´ÙÀ½ sample (24, 12) Àº centroid (24, 4) ¿¡ °¡Àå °¡±õ°í µû¶ó¼ cluster {(24, 4)} ¿¡ ÇÕ·ùÇÑ´Ù. À̶§¿¡ cluster µéÀº {(8, 4), (15, 8), (4, 4)} ¿Í {(24, 12), (24, 4)} ÀÌ´Ù. À̶§ µÎ ¹øÂ° cluster ÀÇ centroid ´Â (24, 8) ·Î ¹Ù²ï´Ù. À̶§¿¡ ¾Ë°í¸®ÁòÀÇ step 1 Àº ¿Ï¼ºµÈ´Ù.
step 2
sampleµéÀ» Çϳª¾¿ °Ë»çÇÏ°í °¡Àå °¡±îÀÌ ÀÖ´Â centroid¸¦ °¡Áö´Â °ÍÀ¸·Î È®ÀÎµÈ cluster ¿¡ °¢ sampleÀ» À§Ä¡½ÃŲ´Ù. ´ÙÀ½ Ç¥¿¡¼ º¸¿©Áö´Â °Íó·³ sample µéÀÌ cluster µéÀ» º¯È½ÃŰÁö ¸øÇÒ °æ¿ì¿¡ ÃÖÁ¾ÀûÀ¸·Î ´ÙÀ½ÀÇ cluster µé·Î ºÐ·ùµÈ´Ù.
{(8, 4), (15, 8), (4, 4)} ¿Í {(24, 12), (24, 4)}
Sample |
Distance to |
Distance to |
(8, 4) (24, 4) (15, 8) (4, 4) (24, 12) |
1.6 15.1 6.6 6.6 16.4 |
16.5 4.0 9.0 40.4 4.0 |
-means algorithm ÀÇ step 2¿¡¼ »ç¿ëÇϱâ À§ÇÑ distance
-means algorithm
(8, (24, 12)
{(4, 4), (8, 4), (15, 8)}, {(24, 4), (24, 12)}
Isodata Algorithm Àº Forgy's algorithm °ú -means algorithm À» º¸°ÇÑ ¹æ¹ýÀ¸·Î »ý°¢ÇÒ ¼ö ÀÖ´Ù. °°Àº Á¡Àº °¡Àå °¡±îÀÌ
ÀÖ´Â centroid ¿¡ sampleµéÀ» ÇÒ´çÇÏ¿© squared error¸¦ ÃÖ¼ÒÈ ½ÃŲ´Ù´Â °ÍÀÌ´Ù.
´Ù¸¥Á¡Àº °íÁ¤µÈ ¼öÀÇ cluster µéÀ» ó¸®ÇÏ´Â °ÍÀÌ ¾Æ´Ï¶ó, »ç¿ëÀÚ¿¡
ÀÇÇØ ¿ä±¸µÇ´Â cluster ÀÇ ¼ö¸¦ Æ÷ÇÔÇÏ´Â ¹üÀ§±îÁö Çã¿ëµÇ´Â
°³ÀÇ cluster¸¦ ´Ù·é´Ù´Â °ÍÀÌ´Ù. ¸¸ÀÏ cluster µéÀÇ ¼ö°¡ ³Ê¹« ¸¹¾ÆÁö°Å³ª
cluster µéÀÌ ³Ê¹« °¡±îÀÌ ÀÖ°Ô µÇ¸é cluster ´Â º´ÇյȴÙ.
¸¸ÀÏ cluster µéÀÇ ¼ö°¡ ³Ê¹« Àû°Å³ª cluster °¡ ¾ÆÁÖ ´Ù¸¥ Á¾·ùÀÇ sample µéÀ» Æ÷ÇÔÇϰí
ÀÖ´Ù¸é cluster ´Â ºÐ¸®µÈ´Ù. ÀÚ¼¼ÇÑ °ÍÀº ´ÙÀ½¿¡ ¼¼úÇÑ´Ù.
isodata algorithm ÀÇ °æ¿ì data ¿Í seed point À̿ܿ¡ ´ÙÀ½ÀÇ parameter µéÀÌ ÇÊ¿äÇÏ´Ù.
no_clusters : cluster ÀÇ ¹Ù¶÷Á÷ÇÑ ¼ö·Î¼ seed point ÀÇ ¼ö¿Í °°´Ù.
min_elements : °¢ cluster ¸¶´Ù Çã¿ëµÇ´Â sample µéÀÇ ÃÖ¼Ò °¹¼ö.
min_dist : º´ÇÕÀÌ ÀϾÁö ¾Ê´Â, cluster centroid »çÀÌ¿¡ Çã¿ëµÇ´Â ÃÖ¼Ò °Å¸®
split_size : cluster ÀÇ ºÐ¸®¸¦ Á¶ÀýÇÏ´Â parameter
iter_start : ¾Ë°í¸®ÁòÀÇ first part¿¡¼ ¹Ýº¹(iteration) ÀÇ ÃÖ´ë¼ö
max_merge : °¢ ¹Ýº¹(iteration) ¿¡¼ÀÇ cluster º´ÇÕÀÇ ÃÖ´ë¼ö
iter_body : ¾Ë°í¸®ÁòÀÇ main part ³»¿¡¼ ¹Ýº¹ÀÇ ÃÖ´ë¼ö
ÀÌ·¯ÇÑ parameter µéÀº ¾Ë°í¸®Áò °úÁ¤¿¡¼ ÀÚ¼¼È÷ ¼³¸íµÈ´Ù.
Isodata Algorithm ÀÇ °úÁ¤Àº ´ÙÀ½°ú °°´Ù.
1. ÀÓÀÇÀÇ °¹¼öÀÇ seed point¸¦ cluster centroid ·Î¼ ÃʱâÈ ÇÑ´Ù.(step 1¿¡¼ 4 ±îÁö´Â Forgy's algorithm °ú °°´Ù)
2. °¢ sample ¿¡ ´ëÇØ °¡Àå °¡±îÀÌ ÀÖ´Â cluster centroid¸¦ ã¾Æ¼ ÇØ´ç cluster ¿¡ sampleÀ» ¹èÁ¤ÇÑ´Ù.
3. º¯ÈµÈ cluster ÀÇ centroid¸¦ °è»êÇÑ´Ù..
4. ¸¸ÀÏ Àû¾îµµ ÇϳªÀÇ sample ÀÌ cluster¸¦ º¯È½ÃŰ°í ¹Ýº¹µÇ´Â ¼ö°¡ iter_start º¸´Ù ÀÛ´Ù¸é, step 2 ·Î °£´Ù.
5. min_elements º¸´Ù ÀûÀº¼öÀÇ sampleÀ» °¡Áø cluster ´Â Æó±âÇÑ´Ù. ¶ÇÇÑ Æ÷ÇÔµÈ sample µéµµ Æó±âÇÑ´Ù.
6. ¸¸ÀÏ cluster ÀÇ ¼ö°¡ 2 * no_clusters º¸´Ù Å©°Å³ª °°°í, ¶Ç´Â ¹Ýº¹µÇ´Â ¼ö°¡ ¦¼ö(even)À̶ó¸é step 7 (º´ÇÕ µ¿ÀÛ)À» ½ÇÇàÇÏ°í ±×·¸Áö ¾ÊÀ¸¸é step 8 ·Î °£´Ù.
7. ¸¸ÀÏ µÎ centroid °£ÀÇ °Å¸®°¡ min_dist º¸´Ù ÀÛ´Ù¸é µÎ cluster¸¦ º´ÇÕÇϰí centroid¸¦ º¯È½ÃŲ´Ù. ±×·¸Áö ¾ÊÀ¸¸é step 7 ·Î °£´Ù. ÀÌ·¯ÇÑ stepÀ» max_merge ¹øÀ» ¹Ýº¹Çϰí step 8 ·Î °£´Ù.
8. ¸¸ÀÏ cluster ÀÇ ¼ö°¡ no_clusters / 2 º¸´Ù À۰ųª °°°í, ¶Ç´Â ¹Ýº¹ÀÇ ¼ö°¡ Ȧ¼ö(odd) ¶ó¸é step 9 (ºÐ¸® µ¿ÀÛ)À» ¼öÇàÇÏ°í ±×·¸Áö ¾ÊÀ¸¸é step 10 À¸·Î °£´Ù.
9. split_size * ¸¦ ÃʰúÇϴ ǥÁØÆíÂ÷¸¦ °¡Áö´Â cluster¸¦ ã´Â´Ù (¿©±â¼
´Â ¾î¶² º¯¼öÀ̰í
´Â ¿ø·¡ÀÇ sample ÁýÇÕ¿¡¼
ÀÇ Ç¥ÁØÆíÂ÷ÀÌ´Ù). ¸¸ÀÏ ¾øÀ¸¸é step 10 À¸·Î °£´Ù. cluster ³»¿¡¼
ÀÇ Æò±ÕÀ» °è»êÇÑ´Ù. ÀÌ cluster ¿¡ ÀÖ´Â sample µéÀ» µÎ ÁýÇÕÀ¸·Î ³ª´«´Ù(
°¡ Æò±Õº¸´Ù Å©°Å³ª °°Àº ÁýÇÕ°ú
°¡ Æò±Õº¸´Ù ÀÛÀº ÁýÇÕ). ÀÌ·¯ÇÑ µÎ cluster ÀÇ centroid¸¦ °è»êÇÑ´Ù. ¸¸ÀÏ ÀÌ·¯ÇÑ
centroid »çÀÌÀÇ °Å¸®°¡ 1.1 * min_dist º¸´Ù Å©°Å³ª °°À¸¸é ¿ø·¡ÀÇ cluster¸¦ ÀÌ·¯ÇÑ
µÎ °³ÀÇ cluster ·Î ¹Ù²Ù°í, ±×·¸Áö ¾ÊÀ¸¸é cluster¸¦ ºÐ¸®ÇÏÁö ¾Ê´Â´Ù.
10. ¸¸ÀÏ step 10 ÀÌ iter_body Ƚ¼ö¸¸Å ¼öÇàµÇ°Å³ª, ¸¶Áö¸· step 10 ÀÌ ¼öÇàµÈ ÀÌÈÄ cluster ¿¡ ¾Æ¹«·± º¯Èµµ ¹ß»ýÇÏÁö ¾ÊÀ¸¸é Áß´ÜÇÑ´Ù. ±×·¸Áö ¾ÊÀ¸¸é »õ·Î¿î seed point ·Î¼ cluster ÀÇ centroid¸¦ ÃëÇϰí step 2 ·Î °£´Ù.
Example 5.7 isodata algorithmÀ» »ç¿ëÇÑ partitional clustering
´ÙÀ½°ú °°Àº data ¿Í parameter µéÀÌ ÀÖÀ» °æ¿ì data¸¦ cluster ÇϱâÀ§ÇØ isodata algorithmÀ» »ç¿ëÇØ º¸ÀÚ.
Number |
x |
y |
Number |
x |
y |
1 2 3 4 5 6 7 |
0.0 0.0 0.0 0.5 0.5 1.0 1.0 |
0.0 1.0 3.0 0.5 3.5 1.0 3.0 |
8 9 10 11 12 13 14 |
6.0 6.0 6.0 6.0 6.2 6.2 8.0 |
0.75 1.00 2.00 2.10 0.80 2.05 12.0 |
no_clusters = |
3 |
seed point ´Â sample 1, 3, 13 À̸ç, óÀ½ÀÇ ¹Ýº¹ Ƚ¼ö´Â 0 ÀÌ´Ù.
¾Ë°í¸®ÁòÀÇ Forgy part (step 1 ~4) ÀÇ ¼ö·ÅÀ» À§ÇØ data ÀÇ ÇѹøÀÇ ¼öÇàÀÌ ÇÊ¿äÇÏ´Ù. À̶§¿¡´Â ´ÙÀ½°ú °°ÀÌ 3 °³ÀÇ cluster °¡ Á¸ÀçÇÑ´Ù.
{1, 2, 4, 6}, {3, 5, 7}, {8, 9, 10, 12, 13, 14}
step 5, ¾î¶² cluster µµ min_elements º¸´Ù ¸â¹ö¼ö°¡ ÀÛÁö ¾Ê±â ¶§¹®¿¡ Æó±âµÇ´Â cluster ´Â ¾ø´Ù.
step 6, cluster ÀÇ ¼ö°¡ 2 * no_clusters º¸´Ù Å©°Å³ª °°Áö ¾Ê°í, ¶Ç´Â ¹Ýº¹µÇ´Â ¼ö (0) °¡ ¦¼ö(even)À̹ǷÎ, step 7 (º´ÇÕ µ¿ÀÛ)À» ½ÇÇàÇÑ´Ù. ÇÏ°í ±×·¸Áö ¾ÊÀ¸¸é step 8 ·Î °£´Ù.
cluster {1, 2, 4, 6} ¿Í {3, 5, 7} ÀÇ centroid °£ÀÇ °Å¸®°¡ min_dist º¸´Ù À۱⠶§¹®¿¡ µÎ cluster¸¦ º´ÇÕÇϰí centroid¸¦ º¯È½ÃŲ´Ù. À̶§¿¡ 2 °³ÀÇ cluster °¡ ´ÙÀ½°ú °°ÀÌ µÈ´Ù.
{1, 2, 3, 4, 5, 6, 7}, {8, 9, 10, 11, 12, 13, 14}
º´ÇÕ step ÀÌ ¹Ýº¹µÇÁö ¾ÊÀ¸¹Ç·Î (max_merge = 1 À̱⠶§¹®) step 8 ·Î ³ª¾Æ°£´Ù. ( ÀÌ °æ¿ì ³²¾ÆÀÖ´Â cluster µéÀº º´ÇÕµÉ ¼ö ¾ø´Ù. ¿Ö³ÄÇϸé centroid °£ÀÇ °Å¸®°¡ min_dist º¸´Ù Å©±â ¶§¹®ÀÌ´Ù)
step 8, cluster ÀÇ ¼ö (2) °¡ no_clusters / 2 (1.5) º¸´Ù Å©±â ¶§¹®¿¡, ¶ÇÇÑ ¹Ýº¹ÀÇ ¼ö°¡ Ȧ¼ö(odd) °¡ ¾Æ´Ï±â ¶§¹®¿¡ step 10 À¸·Î °£´Ù.
step 10, ¹Ýº¹ÀÇ È½¼ö°¡ ¿ä±¸µÇ´Â ¼ö (5) º¸´Ù ÀÚ°í cluster µéÀº º¯ÈµÇ¾ú´Ù. µû¶ó¼ step 2 ·Î ³ª¾Æ°£´Ù.
À̶§¿¡ ¾Ë°í¸®ÁòÀÇ Forgy part (step 1~4) ´Â cluster¸¦ º¯È½ÃŰÁö ¾Ê´Â´Ù.
step 5, ¾î¶² cluster µµ min_elements º¸´Ù ¸â¹ö¼ö°¡ ÀÛÁö ¾Ê±â ¶§¹®¿¡ Æó±âµÇ´Â cluster ´Â ¾ø´Ù.
step 6, cluster ÀÇ ¼ö°¡ 2 * no_clusters º¸´Ù Å©°Å³ª °°Áö ¾Ê°í, ¶Ç´Â ¹Ýº¹µÇ´Â ¼ö (1) °¡ Ȧ¼ö(odd)À̹ǷÎ, step 8 ·Î °£´Ù.
step 8, cluster ÀÇ ¼ö (2) °¡ no_clusters / 2 º¸´Ù Å©±â ¶§¹®¿¡, ¶ÇÇÑ ¹Ýº¹ÀÇ ¼ö°¡ Ȧ¼ö(odd) À̹ǷÎ, ºÐ¸® µ¿ÀÛ (step 9) °¡ ¼öÇàµÈ´Ù.
step 9, split_size * ¸¦ ÃʰúÇÏ´Â (º¯¼ö
¸¦ À§ÇÑ) Ç¥ÁØÆíÂ÷¸¦ °¡Áö´Â cluster {8, 9, 10, 11, 12, 13, 14} °¡ ÀÖ´Ù. sample
µéÀº cluster ¿¡¼ÀÇ
ÀÇ Æò±Õº¸´Ù À۰ųª Å«
°ªÀ» °¡Áö´Â µÎ ÁýÇÕÀ¸·Î ³ª´¶´Ù.
{8, 9, 10, 11, 12, 13}, {14}
±×µéÀÇ centroid »çÀÌÀÇ °Å¸®°¡ 1.1 * mid_dist º¸´Ù Å©°Å³ª °°±â ¶§¹®¿¡, cluster ´Â ºÐ¸®»óŸ¦ À¯ÁöÇÑ´Ù. À̶§¿¡´Â 3 °³ÀÇ cluster °¡ Á¸ÀçÇÑ´Ù.
{1, 2, 3, 4, 5, 6, 7}, {8, 9, 10, 11, 12, 13}, {14}
step 10, ¹Ýº¹ Ƚ¼ö°¡ ¿ä±¸µÇ´Â ¼öº¸´Ù ÀÛ°í cluster °¡ º¯ÈµÇ¾úÀ¸¹Ç·Î step 2 ·Î °£´Ù.
´Ù½Ã À̶§¿¡ ¾Ë°í¸®ÁòÀÇ Forgy part (step 1~4) ´Â cluster¸¦ º¯È½ÃŰÁö ¾Ê´Â´Ù.
step 5, cluster {14} ´Â min_elements ¸â¹ö¼öº¸´Ù À۱⠶§¹®¿¡ Æó±âµÈ´Ù. À̶§¿¡´Â µÎ °³ÀÇ cluster °¡ µÈ´Ù.
{1, 2, 3, 4, 5, 6, 7}, {8, 9, 10, 11, 12, 13}
step 6, cluster ÀÇ ¼ö°¡ 2 * no_clusters º¸´Ù ÀÛ°í ¹Ýº¹È½¼ö (2) °¡ ¦¼öÀ̱⠶§¹®¿¡, º´ÇÕµ¿ÀÛ (step 7) ÀÌ ¼öÇàµÈ´Ù.
cluster {1, 2, 3, 4, 5, 6, 7} °ú {8, 9, 10, 11, 12, 13} ÀÇ centroid »çÀÌÀÇ °Å¸®°¡ min_dist º¸´Ù À۱⠶§¹®¿¡, ÀÌ cluster µéÀº º´ÇÕµÇÁö ¾Ê´Â´Ù. step 8 ·Î ³ª¾Æ°£´Ù.
step 8, cluster ÀÇ ¼ö (2) °¡ no_clusters / 2 º¸´Ù Å©°í ¹Ýº¹ Ƚ¼ö°¡ ¦¼öÀ̱⠶§¹®¿¡ step 10 À¸·Î °£´Ù.
step 10, ¹Ýº¹ Ƚ¼ö°¡ ¿ä±¸µÇ´Â ¼öº¸´Ù ÀÛ°í cluster µéÀÌ º¯ÈµÇ¾ú±â ¶§¹®¿¡ step 2 ·Î °£´Ù.
´Ù½Ã À̶§¿¡ ¾Ë°í¸®ÁòÀÇ Forgy part (step 1~4) ´Â cluster¸¦ º¯È½ÃŰÁö ¾Ê´Â´Ù.
step 5, ¾î¶² cluster µµ min_elements º¸´Ù ¸â¹ö¼ö°¡ ÀÛÁö ¾Ê±â ¶§¹®¿¡ Æó±âµÇ´Â cluster ´Â ¾ø´Ù.
step 6, cluster ÀÇ ¼ö°¡ 2 * no_clusters º¸´Ù ÀÛ°í ¹Ýº¹È½¼ö (3) °¡ Ȧ¼öÀ̱⠶§¹®¿¡ step 8 ·Î ³ª¾Æ°£´Ù.
step 8, cluster ÀÇ ¼ö (2) °¡ no_clusters / 2 º¸´Ù Å©°í ¹Ýº¹È½¼ö°¡ Ȧ¼öÀ̱⠶§¹®¿¡, ºÐ¸® µ¿ÀÛ (step 9) ÀÌ ¼öÇàµÈ´Ù.
step 9, ¾î¶² cluster µµ Ç¥ÁØÆíÂ÷°¡ split_size * ¸¦ ÃʰúÇÏ´Â º¯¼ö¸¦ °¡Áö°í ÀÖÁö ¾Ê±â ¶§¹®¿¡ step 10 À¸·Î ³ª¾Æ°£´Ù.
step 10, ¹Ýº¹ Ƚ¼ö°¡ ¿ä±¸µÇ´Â ¼öº¸´Ù ÀÛ´Ù, ±×·¯³ª ¾î¶² cluster µµ º¯ÈµÇÁö ¾Ê¾Ò´Ù. µû¶ó¼ ¾Ë°í¸®ÁòÀº Á¾·áµÈ´Ù.