1 00:00:10,230 --> 00:00:13,340 In the previous video we identified clusters, or tissue 2 00:00:13,340 --> 00:00:16,050 substances, in a healthy brain image. 3 00:00:16,050 --> 00:00:17,780 It would be really helpful if we can 4 00:00:17,780 --> 00:00:20,460 use these clusters to automatically detect 5 00:00:20,460 --> 00:00:23,950 tumors in MRI images of sick patients. 6 00:00:23,950 --> 00:00:27,620 The tumor.csv file corresponds to an MRI brain image 7 00:00:27,620 --> 00:00:30,560 of a patient with oligodendroglioma, 8 00:00:30,560 --> 00:00:34,540 a tumor that commonly occurs in the front lobe of the brain. 9 00:00:34,540 --> 00:00:37,500 Since brain biopsy is the only definite diagnosis 10 00:00:37,500 --> 00:00:42,260 of this tumor, MRI guidance is key in determining its location 11 00:00:42,260 --> 00:00:43,680 and geometry. 12 00:00:43,680 --> 00:00:46,450 Now, make sure that the tumor.csv file 13 00:00:46,450 --> 00:00:48,450 is in your current directory. 14 00:00:48,450 --> 00:00:51,280 Let's go to the console, and clear it, and then 15 00:00:51,280 --> 00:00:53,990 read the data, and save it to a data frame 16 00:00:53,990 --> 00:00:58,560 that we're going to call tumor, and use the read.csv function, 17 00:00:58,560 --> 00:01:01,330 which takes as an input the tumor dataset, 18 00:01:01,330 --> 00:01:04,040 and make sure to turn off the header using header 19 00:01:04,040 --> 00:01:05,790 equals FALSE. 20 00:01:05,790 --> 00:01:09,350 And let's quickly create our tumorMatrix, 21 00:01:09,350 --> 00:01:13,690 using the as.matrix function over the tumor data 22 00:01:13,690 --> 00:01:18,600 frame, and the tumorVector, using the as.vector 23 00:01:18,600 --> 00:01:20,650 function over the tumorMatrix. 24 00:01:25,560 --> 00:01:28,210 Now, we will not run the k-means algorithm again 25 00:01:28,210 --> 00:01:29,880 on the tumor vector. 26 00:01:29,880 --> 00:01:33,259 Instead, we will apply the k-means clustering results 27 00:01:33,259 --> 00:01:35,370 that we found using the healthy brain 28 00:01:35,370 --> 00:01:37,650 image on the tumor vector. 29 00:01:37,650 --> 00:01:40,180 In other words, we treat the healthy vector 30 00:01:40,180 --> 00:01:44,430 as training set and the tumor vector as a testing set. 31 00:01:44,430 --> 00:01:46,520 To do this, we first need to install 32 00:01:46,520 --> 00:01:49,090 a new package that is called flexclust. 33 00:01:49,090 --> 00:01:50,920 So let us type install.packages("flexclus"). 34 00:01:57,550 --> 00:01:59,300 And then the first thing R will ask 35 00:01:59,300 --> 00:02:01,790 us is to set the region that is closest 36 00:02:01,790 --> 00:02:03,970 to our geographical location. 37 00:02:03,970 --> 00:02:07,280 And after that, press OK, and the installation 38 00:02:07,280 --> 00:02:11,310 shouldn't take more than two seconds to complete. 39 00:02:11,310 --> 00:02:11,810 OK. 40 00:02:11,810 --> 00:02:13,710 Now that the package is installed, 41 00:02:13,710 --> 00:02:15,920 let us load it by typing library(flexclust). 42 00:02:22,020 --> 00:02:25,920 The flexclust package contains the object class KCCA, 43 00:02:25,920 --> 00:02:29,890 which stands for K-Centroids Cluster Analysis. 44 00:02:29,890 --> 00:02:32,450 We need to convert the information from the clustering 45 00:02:32,450 --> 00:02:36,500 algorithm to an object of the class KCCA. 46 00:02:36,500 --> 00:02:38,480 And this conversion is needed before we 47 00:02:38,480 --> 00:02:41,770 can use the predict function on the test set tumorVector. 48 00:02:41,770 --> 00:02:46,670 So calling our new variable KMC.kcca 49 00:02:46,670 --> 00:02:50,120 and then using the as.kcca function, which 50 00:02:50,120 --> 00:02:55,500 takes as a first input the original KMC variable that 51 00:02:55,500 --> 00:02:58,230 stored all the information from the k-means clustering 52 00:02:58,230 --> 00:03:02,220 function, and the second input is the data that we clustered. 53 00:03:02,220 --> 00:03:04,460 And in this case, it's the training set, 54 00:03:04,460 --> 00:03:05,810 which is the healthyVector. 55 00:03:09,000 --> 00:03:11,300 And now, be aware that this data conversion 56 00:03:11,300 --> 00:03:14,350 will take some time to run. 57 00:03:14,350 --> 00:03:19,070 Now that R finally finished creating the object KMC.kcca, 58 00:03:19,070 --> 00:03:21,900 we can cluster the pixels in the tumorVector 59 00:03:21,900 --> 00:03:23,930 using the predict function. 60 00:03:23,930 --> 00:03:26,130 Let us call the cluster vector tumorClusters = 61 00:03:26,130 --> 00:03:27,750 predict(KMC.kcca, newdata=tumorVector). 62 00:03:47,070 --> 00:03:49,870 And now, the tumorClusters is a vector 63 00:03:49,870 --> 00:03:53,940 that assigns a value 1 through 5 to each of the intensity 64 00:03:53,940 --> 00:03:56,550 values in the tumorVector, as predicted 65 00:03:56,550 --> 00:03:58,880 by the k-means algorithm. 66 00:03:58,880 --> 00:04:01,340 To output the segmented image, we first 67 00:04:01,340 --> 00:04:04,270 need to convert the tumor clusters to a matrix. 68 00:04:04,270 --> 00:04:06,860 So let's use the dimension function, 69 00:04:06,860 --> 00:04:10,150 and then the input is simply tumorClusters, 70 00:04:10,150 --> 00:04:12,640 and then using the c function with the first input 71 00:04:12,640 --> 00:04:17,060 as the number of rows in the tumorMatrix 72 00:04:17,060 --> 00:04:18,800 and the second input as the number 73 00:04:18,800 --> 00:04:22,540 of columns in the tumorMatrix. 74 00:04:22,540 --> 00:04:24,780 And now, we can visualize the clusters 75 00:04:24,780 --> 00:04:29,640 by using the image function with the input tumorClusters matrix, 76 00:04:29,640 --> 00:04:32,570 and make sure to set the axes to FALSE, 77 00:04:32,570 --> 00:04:36,750 and let's use again these fancy rainbow colors, here. 78 00:04:36,750 --> 00:04:38,050 So col=rainbow(k). 79 00:04:42,010 --> 00:04:44,850 Again, k is equal to 5. 80 00:04:44,850 --> 00:04:45,520 Alright. 81 00:04:45,520 --> 00:04:47,500 Let's navigate to the graphics window, 82 00:04:47,500 --> 00:04:50,540 now, to see if we can detect the tumor. 83 00:04:50,540 --> 00:04:51,840 Oh, and yes, we do! 84 00:04:51,840 --> 00:04:54,010 It is this abnormal substance here 85 00:04:54,010 --> 00:04:56,070 that is highlighted in blue that was not 86 00:04:56,070 --> 00:04:58,700 present in the healthy MRI image. 87 00:04:58,700 --> 00:05:02,800 So we were successfully able to identify, more or less, 88 00:05:02,800 --> 00:05:06,390 the geometry of the malignant structure. 89 00:05:06,390 --> 00:05:09,160 We see that we did a good job capturing the major tissue 90 00:05:09,160 --> 00:05:11,040 substances of the brain. 91 00:05:11,040 --> 00:05:14,230 The grey matter is highlighted in purple and the white matter 92 00:05:14,230 --> 00:05:15,480 in yellow. 93 00:05:15,480 --> 00:05:17,860 For the sick patient, the substance highlighted in blue 94 00:05:17,860 --> 00:05:20,440 is the oligodendroglioma tumor. 95 00:05:20,440 --> 00:05:23,200 Notice that we do not see substantial blue regions 96 00:05:23,200 --> 00:05:25,200 in the healthy brain image, apart 97 00:05:25,200 --> 00:05:27,610 from the region around the eyes. 98 00:05:27,610 --> 00:05:30,260 Actually, looking at the eyes regions, 99 00:05:30,260 --> 00:05:33,409 we notice that the two images were not taken precisely 100 00:05:33,409 --> 00:05:35,620 at the same section of the brain. 101 00:05:35,620 --> 00:05:37,340 This might explain some differences 102 00:05:37,340 --> 00:05:40,230 in shapes between the two images. 103 00:05:40,230 --> 00:05:43,390 Let's see how the images look like originally. 104 00:05:43,390 --> 00:05:47,260 We see that the tumor region has a lighter color intensity, 105 00:05:47,260 --> 00:05:49,740 which is very similar to the region around the eyes 106 00:05:49,740 --> 00:05:51,620 in the healthy brain image. 107 00:05:51,620 --> 00:05:55,070 This might explain highlighting this region in blue. 108 00:05:55,070 --> 00:05:58,140 Of course, we cannot claim that we did a wonderful job 109 00:05:58,140 --> 00:06:01,670 obtaining the exact geometries of all the tissue substances, 110 00:06:01,670 --> 00:06:04,440 but we are definitely on the right track. 111 00:06:04,440 --> 00:06:08,010 In fact, to do so, we need to use more advanced algorithms 112 00:06:08,010 --> 00:06:10,600 and fine-tune our clustering technique. 113 00:06:10,600 --> 00:06:14,490 MRI image segmentation is an ongoing field of research. 114 00:06:14,490 --> 00:06:17,350 While k-means clustering is a good starting point, 115 00:06:17,350 --> 00:06:18,850 more advanced techniques have been 116 00:06:18,850 --> 00:06:22,270 proposed in the literature, such as the modified fuzzy k-means 117 00:06:22,270 --> 00:06:23,970 clustering method. 118 00:06:23,970 --> 00:06:26,430 Also, if you are interested, R has 119 00:06:26,430 --> 00:06:31,300 packages that are specialized for analyzing medical images. 120 00:06:31,300 --> 00:06:34,100 Now, if we had MRI axial images taken 121 00:06:34,100 --> 00:06:36,140 at different sections of the brain, 122 00:06:36,140 --> 00:06:39,230 we could segment each image and capture the geometries 123 00:06:39,230 --> 00:06:41,800 of the substances at different levels. 124 00:06:41,800 --> 00:06:45,490 Then, by interpolating between the segmented images, 125 00:06:45,490 --> 00:06:48,320 we can estimate the missing slices, 126 00:06:48,320 --> 00:06:51,400 and we can then obtain a 3D reconstruction 127 00:06:51,400 --> 00:06:55,900 of the anatomy of the brain from 2D MRI cross-sections. 128 00:06:55,900 --> 00:06:59,440 In fact, 3D reconstruction is particularly 129 00:06:59,440 --> 00:07:02,290 important in the medical field for diagnosis, 130 00:07:02,290 --> 00:07:06,640 surgical planning, and biological research purposes. 131 00:07:06,640 --> 00:07:08,780 I hope that this recitation gave you 132 00:07:08,780 --> 00:07:12,770 a flavor of this fascinating field of image segmentation. 133 00:07:12,770 --> 00:07:16,110 In our next video, we will review all the analytics tools 134 00:07:16,110 --> 00:07:18,160 we have covered so far in this class 135 00:07:18,160 --> 00:07:22,240 and discuss their uses, pros, and cons.