1
00:00:05,050 --> 00:00:08,400
Recall from our last video
that it was impossible for us

2
00:00:08,400 --> 00:00:10,770
to use hierarchical
clustering because

3
00:00:10,770 --> 00:00:13,380
of the high resolution
of our image.

4
00:00:13,380 --> 00:00:15,660
So in this video, we
will try to segment

5
00:00:15,660 --> 00:00:20,250
the MRI image using the
k-means clustering algorithm.

6
00:00:20,250 --> 00:00:22,210
The first step in
k-means clustering

7
00:00:22,210 --> 00:00:25,250
involves specifying the
number of clusters, k.

8
00:00:25,250 --> 00:00:27,650
But how do we select k?

9
00:00:27,650 --> 00:00:31,370
Well, our clusters would ideally
assign each point in the image

10
00:00:31,370 --> 00:00:32,390
to a tissue class.

11
00:00:32,390 --> 00:00:35,340
Or a particular
substance, for instance,

12
00:00:35,340 --> 00:00:38,460
grey matter or white
matter, and so on.

13
00:00:38,460 --> 00:00:41,650
And these substances are known
to the medical community.

14
00:00:41,650 --> 00:00:44,120
So setting the
number of clusters

15
00:00:44,120 --> 00:00:46,070
depends on exactly
what you're trying

16
00:00:46,070 --> 00:00:48,310
to extract from the image.

17
00:00:48,310 --> 00:00:50,770
For the sake of our example,
let's set the number

18
00:00:50,770 --> 00:00:54,740
of clusters here, k, to five.

19
00:00:54,740 --> 00:00:57,000
And since the k-means
clustering algorithm

20
00:00:57,000 --> 00:01:00,100
starts by randomly assigning
points to clusters,

21
00:01:00,100 --> 00:01:02,090
we should set the
seed, so that we all

22
00:01:02,090 --> 00:01:03,720
obtain the same clusters.

23
00:01:03,720 --> 00:01:08,660
So let's type set.seed,
and give it a value of 1.

24
00:01:08,660 --> 00:01:13,350
To run the k-means clustering
algorithm, or KMC in short,

25
00:01:13,350 --> 00:01:16,140
we need to use the
k-means function in R.

26
00:01:16,140 --> 00:01:19,490
And the first input is whatever
we are trying to cluster.

27
00:01:19,490 --> 00:01:23,430
In this case it is
the healthy vector.

28
00:01:23,430 --> 00:01:26,670
The second argument is
the number of clusters,

29
00:01:26,670 --> 00:01:30,840
and we can specify it
using the argument centers,

30
00:01:30,840 --> 00:01:33,060
and that would be equal to k.

31
00:01:33,060 --> 00:01:36,190
And then finally, since the
k-means is an iterative method

32
00:01:36,190 --> 00:01:38,880
that could take very
long to converge,

33
00:01:38,880 --> 00:01:41,570
we need to set a maximum
number of iterations.

34
00:01:41,570 --> 00:01:45,200
And we can do this
by typing iter.max,

35
00:01:45,200 --> 00:01:48,539
and give it, for
instance, the value 1,000.

36
00:01:48,539 --> 00:01:51,620
And now let's run the
k-means algorithm.

37
00:01:51,620 --> 00:01:54,289
The k-means algorithm
is actually quite fast,

38
00:01:54,289 --> 00:01:57,680
even though we have a
high resolution image.

39
00:01:57,680 --> 00:02:00,980
Now to see the result of the
k-means clustering algorithm,

40
00:02:00,980 --> 00:02:05,160
we can output the structure
of the KMC variable.

41
00:02:05,160 --> 00:02:07,640
The first, and most important,
piece of information

42
00:02:07,640 --> 00:02:09,889
that we get, is
the cluster vector.

43
00:02:09,889 --> 00:02:13,020
Which assigns each intensity
value in the healthy vector

44
00:02:13,020 --> 00:02:14,250
to a cluster.

45
00:02:14,250 --> 00:02:16,690
In this case, it will
be giving them values 1

46
00:02:16,690 --> 00:02:19,950
through 5, since
we have 5 clusters.

47
00:02:19,950 --> 00:02:22,329
Now recall that to output
the segmented image,

48
00:02:22,329 --> 00:02:24,440
we need to extract this vector.

49
00:02:24,440 --> 00:02:27,470
The way to do this is by
using the dollar notation.

50
00:02:27,470 --> 00:02:31,900
For instance, let us
define healthyClusters,

51
00:02:31,900 --> 00:02:33,860
and then set it
equal to KMC$cluster.

52
00:02:37,380 --> 00:02:39,050
And what we're
basically doing here

53
00:02:39,050 --> 00:02:41,410
is that we are taking
the information,

54
00:02:41,410 --> 00:02:44,340
extracting the information
of the cluster vector,

55
00:02:44,340 --> 00:02:46,630
and putting it in
the new variable that

56
00:02:46,630 --> 00:02:49,200
is called healthyClusters.

57
00:02:49,200 --> 00:02:52,310
Now how can we obtain
the mean intensity value

58
00:02:52,310 --> 00:02:54,850
within each of our 5 clusters?

59
00:02:54,850 --> 00:02:58,480
In hierarchical clustering, we
needed to do some manual work,

60
00:02:58,480 --> 00:03:02,350
and use the t-apply function
to extract this information.

61
00:03:02,350 --> 00:03:05,030
In this case, we have
the answers ready,

62
00:03:05,030 --> 00:03:07,360
under the vector centers.

63
00:03:07,360 --> 00:03:10,490
In fact, for instance,
the mean intensity value

64
00:03:10,490 --> 00:03:14,460
of the first cluster is 0.48,
and the mean intensity value

65
00:03:14,460 --> 00:03:17,510
of the last cluster is 0.18.

66
00:03:17,510 --> 00:03:20,230
We can also extract this
information using the dollar

67
00:03:20,230 --> 00:03:20,730
sign.

68
00:03:20,730 --> 00:03:22,160
For instance, KMC$centers[2].

69
00:03:27,010 --> 00:03:29,540
This should give us the
mean intensity value

70
00:03:29,540 --> 00:03:32,390
of the second
cluster, which is 0.1.

71
00:03:32,390 --> 00:03:35,340
And indeed, this
is what we obtain.

72
00:03:35,340 --> 00:03:38,280
Before we move on, I would
like to point your attention

73
00:03:38,280 --> 00:03:41,020
to one last interesting
piece of information

74
00:03:41,020 --> 00:03:42,380
that we can get here.

75
00:03:42,380 --> 00:03:45,060
And that is the
size of the cluster.

76
00:03:45,060 --> 00:03:47,870
For instance, the largest
cluster that we have

77
00:03:47,870 --> 00:03:53,060
is the third one, which
combines 133,000 values in it.

78
00:03:53,060 --> 00:03:55,329
And interestingly,
it's the one that

79
00:03:55,329 --> 00:03:58,540
has the smallest mean
intensity value, which

80
00:03:58,540 --> 00:04:02,370
means that it corresponds to
the darkest shade in our image.

81
00:04:02,370 --> 00:04:05,390
Actually, if we look at all
the mean intensity values,

82
00:04:05,390 --> 00:04:08,390
we can see that they
are all less than 0.5.

83
00:04:08,390 --> 00:04:10,660
So they're all
pretty close to 0.

84
00:04:10,660 --> 00:04:13,330
And this means that our
images is pretty dark.

85
00:04:13,330 --> 00:04:16,730
If we look at our image
again, it's indeed very dark.

86
00:04:16,730 --> 00:04:20,730
And we have very few points
that are actually white.

87
00:04:20,730 --> 00:04:22,760
Now the exciting part.

88
00:04:22,760 --> 00:04:26,290
Let us output the segmented
image and see what we get.

89
00:04:26,290 --> 00:04:28,050
Recall that we first
need to convert

90
00:04:28,050 --> 00:04:30,720
the vector healthy
clusters to a matrix.

91
00:04:30,720 --> 00:04:33,630
To do this, we will use
the dimension function,

92
00:04:33,630 --> 00:04:37,140
that takes as an input the
healthy clusters vector.

93
00:04:37,140 --> 00:04:40,490
And now we're going to
turn it into a matrix.

94
00:04:40,490 --> 00:04:44,510
So we have to specify using the
combined function, the number

95
00:04:44,510 --> 00:04:48,100
of rows, and the number
of columns that we want.

96
00:04:48,100 --> 00:04:49,820
We should make sure
that it corresponds

97
00:04:49,820 --> 00:04:52,420
to the same size as
the healthy matrix.

98
00:04:52,420 --> 00:04:55,040
And since we've forgot the
number of rows and the number

99
00:04:55,040 --> 00:04:56,960
columns in the
healthy matrix, we

100
00:04:56,960 --> 00:05:01,360
can simply use the nrow and
ncol function to get them.

101
00:05:01,360 --> 00:05:03,620
So the first input
right now would

102
00:05:03,620 --> 00:05:08,070
be nrow of healthy matrix.

103
00:05:08,070 --> 00:05:10,470
And then the second
input would be the number

104
00:05:10,470 --> 00:05:14,340
of columns of the
healthy matrix.

105
00:05:14,340 --> 00:05:17,670
And now we are assigning these
numbers of rows and columns

106
00:05:17,670 --> 00:05:21,670
to our new matrix,
healthy clusters.

107
00:05:21,670 --> 00:05:23,910
And now we can
visualize our clusters

108
00:05:23,910 --> 00:05:27,790
by using the function image,
which takes as an input

109
00:05:27,790 --> 00:05:30,420
the healthy cluster's matrix.

110
00:05:30,420 --> 00:05:32,870
And let's turn off the axes.

111
00:05:32,870 --> 00:05:36,450
And then let's be creative
and use a fancy color scheme.

112
00:05:36,450 --> 00:05:41,070
We're going to invoke for color
here, the rainbow palette in R.

113
00:05:41,070 --> 00:05:44,670
And the rainbow palette,
or the function rainbow,

114
00:05:44,670 --> 00:05:47,880
takes as an input the number
of colors that we want.

115
00:05:47,880 --> 00:05:49,730
In this case, the
number of colors

116
00:05:49,730 --> 00:05:52,290
would correspond to
the number of clusters.

117
00:05:52,290 --> 00:05:55,800
So the input would be k.

118
00:05:55,800 --> 00:05:59,060
And now let's output
the segmented image.

119
00:05:59,060 --> 00:06:00,880
Going back to the
graphics window,

120
00:06:00,880 --> 00:06:03,170
we see that k-means
algorithm was

121
00:06:03,170 --> 00:06:06,730
able to segment the image
in 5 different clusters.

122
00:06:06,730 --> 00:06:09,850
More refinement maybe needs
to be made to our clustering

123
00:06:09,850 --> 00:06:12,360
algorithm to
appropriately capture

124
00:06:12,360 --> 00:06:14,620
all the anatomical structures.

125
00:06:14,620 --> 00:06:17,380
But this seems like a
good starting point.

126
00:06:17,380 --> 00:06:21,760
The question now is, can we use
the clusters, or the classes,

127
00:06:21,760 --> 00:06:25,830
found by our k-means algorithm
on the healthy MRI image

128
00:06:25,830 --> 00:06:31,070
to identify tumors in another
MRI image of a sick patient?

129
00:06:31,070 --> 00:06:35,409
We will see if this is
possible in the next video.