1 00:00:05,050 --> 00:00:08,400 Recall from our last video that it was impossible for us 2 00:00:08,400 --> 00:00:10,770 to use hierarchical clustering because 3 00:00:10,770 --> 00:00:13,380 of the high resolution of our image. 4 00:00:13,380 --> 00:00:15,660 So in this video, we will try to segment 5 00:00:15,660 --> 00:00:20,250 the MRI image using the k-means clustering algorithm. 6 00:00:20,250 --> 00:00:22,210 The first step in k-means clustering 7 00:00:22,210 --> 00:00:25,250 involves specifying the number of clusters, k. 8 00:00:25,250 --> 00:00:27,650 But how do we select k? 9 00:00:27,650 --> 00:00:31,370 Well, our clusters would ideally assign each point in the image 10 00:00:31,370 --> 00:00:32,390 to a tissue class. 11 00:00:32,390 --> 00:00:35,340 Or a particular substance, for instance, 12 00:00:35,340 --> 00:00:38,460 grey matter or white matter, and so on. 13 00:00:38,460 --> 00:00:41,650 And these substances are known to the medical community. 14 00:00:41,650 --> 00:00:44,120 So setting the number of clusters 15 00:00:44,120 --> 00:00:46,070 depends on exactly what you're trying 16 00:00:46,070 --> 00:00:48,310 to extract from the image. 17 00:00:48,310 --> 00:00:50,770 For the sake of our example, let's set the number 18 00:00:50,770 --> 00:00:54,740 of clusters here, k, to five. 19 00:00:54,740 --> 00:00:57,000 And since the k-means clustering algorithm 20 00:00:57,000 --> 00:01:00,100 starts by randomly assigning points to clusters, 21 00:01:00,100 --> 00:01:02,090 we should set the seed, so that we all 22 00:01:02,090 --> 00:01:03,720 obtain the same clusters. 23 00:01:03,720 --> 00:01:08,660 So let's type set.seed, and give it a value of 1. 24 00:01:08,660 --> 00:01:13,350 To run the k-means clustering algorithm, or KMC in short, 25 00:01:13,350 --> 00:01:16,140 we need to use the k-means function in R. 26 00:01:16,140 --> 00:01:19,490 And the first input is whatever we are trying to cluster. 27 00:01:19,490 --> 00:01:23,430 In this case it is the healthy vector. 28 00:01:23,430 --> 00:01:26,670 The second argument is the number of clusters, 29 00:01:26,670 --> 00:01:30,840 and we can specify it using the argument centers, 30 00:01:30,840 --> 00:01:33,060 and that would be equal to k. 31 00:01:33,060 --> 00:01:36,190 And then finally, since the k-means is an iterative method 32 00:01:36,190 --> 00:01:38,880 that could take very long to converge, 33 00:01:38,880 --> 00:01:41,570 we need to set a maximum number of iterations. 34 00:01:41,570 --> 00:01:45,200 And we can do this by typing iter.max, 35 00:01:45,200 --> 00:01:48,539 and give it, for instance, the value 1,000. 36 00:01:48,539 --> 00:01:51,620 And now let's run the k-means algorithm. 37 00:01:51,620 --> 00:01:54,289 The k-means algorithm is actually quite fast, 38 00:01:54,289 --> 00:01:57,680 even though we have a high resolution image. 39 00:01:57,680 --> 00:02:00,980 Now to see the result of the k-means clustering algorithm, 40 00:02:00,980 --> 00:02:05,160 we can output the structure of the KMC variable. 41 00:02:05,160 --> 00:02:07,640 The first, and most important, piece of information 42 00:02:07,640 --> 00:02:09,889 that we get, is the cluster vector. 43 00:02:09,889 --> 00:02:13,020 Which assigns each intensity value in the healthy vector 44 00:02:13,020 --> 00:02:14,250 to a cluster. 45 00:02:14,250 --> 00:02:16,690 In this case, it will be giving them values 1 46 00:02:16,690 --> 00:02:19,950 through 5, since we have 5 clusters. 47 00:02:19,950 --> 00:02:22,329 Now recall that to output the segmented image, 48 00:02:22,329 --> 00:02:24,440 we need to extract this vector. 49 00:02:24,440 --> 00:02:27,470 The way to do this is by using the dollar notation. 50 00:02:27,470 --> 00:02:31,900 For instance, let us define healthyClusters, 51 00:02:31,900 --> 00:02:33,860 and then set it equal to KMC$cluster. 52 00:02:37,380 --> 00:02:39,050 And what we're basically doing here 53 00:02:39,050 --> 00:02:41,410 is that we are taking the information, 54 00:02:41,410 --> 00:02:44,340 extracting the information of the cluster vector, 55 00:02:44,340 --> 00:02:46,630 and putting it in the new variable that 56 00:02:46,630 --> 00:02:49,200 is called healthyClusters. 57 00:02:49,200 --> 00:02:52,310 Now how can we obtain the mean intensity value 58 00:02:52,310 --> 00:02:54,850 within each of our 5 clusters? 59 00:02:54,850 --> 00:02:58,480 In hierarchical clustering, we needed to do some manual work, 60 00:02:58,480 --> 00:03:02,350 and use the t-apply function to extract this information. 61 00:03:02,350 --> 00:03:05,030 In this case, we have the answers ready, 62 00:03:05,030 --> 00:03:07,360 under the vector centers. 63 00:03:07,360 --> 00:03:10,490 In fact, for instance, the mean intensity value 64 00:03:10,490 --> 00:03:14,460 of the first cluster is 0.48, and the mean intensity value 65 00:03:14,460 --> 00:03:17,510 of the last cluster is 0.18. 66 00:03:17,510 --> 00:03:20,230 We can also extract this information using the dollar 67 00:03:20,230 --> 00:03:20,730 sign. 68 00:03:20,730 --> 00:03:22,160 For instance, KMC$centers[2]. 69 00:03:27,010 --> 00:03:29,540 This should give us the mean intensity value 70 00:03:29,540 --> 00:03:32,390 of the second cluster, which is 0.1. 71 00:03:32,390 --> 00:03:35,340 And indeed, this is what we obtain. 72 00:03:35,340 --> 00:03:38,280 Before we move on, I would like to point your attention 73 00:03:38,280 --> 00:03:41,020 to one last interesting piece of information 74 00:03:41,020 --> 00:03:42,380 that we can get here. 75 00:03:42,380 --> 00:03:45,060 And that is the size of the cluster. 76 00:03:45,060 --> 00:03:47,870 For instance, the largest cluster that we have 77 00:03:47,870 --> 00:03:53,060 is the third one, which combines 133,000 values in it. 78 00:03:53,060 --> 00:03:55,329 And interestingly, it's the one that 79 00:03:55,329 --> 00:03:58,540 has the smallest mean intensity value, which 80 00:03:58,540 --> 00:04:02,370 means that it corresponds to the darkest shade in our image. 81 00:04:02,370 --> 00:04:05,390 Actually, if we look at all the mean intensity values, 82 00:04:05,390 --> 00:04:08,390 we can see that they are all less than 0.5. 83 00:04:08,390 --> 00:04:10,660 So they're all pretty close to 0. 84 00:04:10,660 --> 00:04:13,330 And this means that our images is pretty dark. 85 00:04:13,330 --> 00:04:16,730 If we look at our image again, it's indeed very dark. 86 00:04:16,730 --> 00:04:20,730 And we have very few points that are actually white. 87 00:04:20,730 --> 00:04:22,760 Now the exciting part. 88 00:04:22,760 --> 00:04:26,290 Let us output the segmented image and see what we get. 89 00:04:26,290 --> 00:04:28,050 Recall that we first need to convert 90 00:04:28,050 --> 00:04:30,720 the vector healthy clusters to a matrix. 91 00:04:30,720 --> 00:04:33,630 To do this, we will use the dimension function, 92 00:04:33,630 --> 00:04:37,140 that takes as an input the healthy clusters vector. 93 00:04:37,140 --> 00:04:40,490 And now we're going to turn it into a matrix. 94 00:04:40,490 --> 00:04:44,510 So we have to specify using the combined function, the number 95 00:04:44,510 --> 00:04:48,100 of rows, and the number of columns that we want. 96 00:04:48,100 --> 00:04:49,820 We should make sure that it corresponds 97 00:04:49,820 --> 00:04:52,420 to the same size as the healthy matrix. 98 00:04:52,420 --> 00:04:55,040 And since we've forgot the number of rows and the number 99 00:04:55,040 --> 00:04:56,960 columns in the healthy matrix, we 100 00:04:56,960 --> 00:05:01,360 can simply use the nrow and ncol function to get them. 101 00:05:01,360 --> 00:05:03,620 So the first input right now would 102 00:05:03,620 --> 00:05:08,070 be nrow of healthy matrix. 103 00:05:08,070 --> 00:05:10,470 And then the second input would be the number 104 00:05:10,470 --> 00:05:14,340 of columns of the healthy matrix. 105 00:05:14,340 --> 00:05:17,670 And now we are assigning these numbers of rows and columns 106 00:05:17,670 --> 00:05:21,670 to our new matrix, healthy clusters. 107 00:05:21,670 --> 00:05:23,910 And now we can visualize our clusters 108 00:05:23,910 --> 00:05:27,790 by using the function image, which takes as an input 109 00:05:27,790 --> 00:05:30,420 the healthy cluster's matrix. 110 00:05:30,420 --> 00:05:32,870 And let's turn off the axes. 111 00:05:32,870 --> 00:05:36,450 And then let's be creative and use a fancy color scheme. 112 00:05:36,450 --> 00:05:41,070 We're going to invoke for color here, the rainbow palette in R. 113 00:05:41,070 --> 00:05:44,670 And the rainbow palette, or the function rainbow, 114 00:05:44,670 --> 00:05:47,880 takes as an input the number of colors that we want. 115 00:05:47,880 --> 00:05:49,730 In this case, the number of colors 116 00:05:49,730 --> 00:05:52,290 would correspond to the number of clusters. 117 00:05:52,290 --> 00:05:55,800 So the input would be k. 118 00:05:55,800 --> 00:05:59,060 And now let's output the segmented image. 119 00:05:59,060 --> 00:06:00,880 Going back to the graphics window, 120 00:06:00,880 --> 00:06:03,170 we see that k-means algorithm was 121 00:06:03,170 --> 00:06:06,730 able to segment the image in 5 different clusters. 122 00:06:06,730 --> 00:06:09,850 More refinement maybe needs to be made to our clustering 123 00:06:09,850 --> 00:06:12,360 algorithm to appropriately capture 124 00:06:12,360 --> 00:06:14,620 all the anatomical structures. 125 00:06:14,620 --> 00:06:17,380 But this seems like a good starting point. 126 00:06:17,380 --> 00:06:21,760 The question now is, can we use the clusters, or the classes, 127 00:06:21,760 --> 00:06:25,830 found by our k-means algorithm on the healthy MRI image 128 00:06:25,830 --> 00:06:31,070 to identify tumors in another MRI image of a sick patient? 129 00:06:31,070 --> 00:06:35,409 We will see if this is possible in the next video.