1 00:00:01,730 --> 00:00:03,700 PROFESSOR: The previous video was 2 00:00:03,700 --> 00:00:06,480 about positive definite matrices. 3 00:00:06,480 --> 00:00:12,420 This video is also linear algebra, a very interesting way 4 00:00:12,420 --> 00:00:17,660 to break up a matrix called the singular value decomposition. 5 00:00:17,660 --> 00:00:23,760 And everybody says SVD for singular value decomposition. 6 00:00:23,760 --> 00:00:25,560 And what is that factoring? 7 00:00:25,560 --> 00:00:28,790 What are the three pieces of the SVD? 8 00:00:28,790 --> 00:00:34,600 So this is the fact is every matrix, rectangular, 9 00:00:34,600 --> 00:00:40,970 every matrix factors into-- these are the three pieces. 10 00:00:40,970 --> 00:00:43,580 U sigma V transpose. 11 00:00:43,580 --> 00:00:48,150 People use those letters for the three factors. 12 00:00:48,150 --> 00:00:54,430 The factor U is an orthogonal matrix, an orthogonal matrix. 13 00:00:54,430 --> 00:00:58,780 The factor sigma in the middle is a diagonal matrix. 14 00:00:58,780 --> 00:01:01,020 The factor V transpose on the right 15 00:01:01,020 --> 00:01:03,610 is also an orthogonal matrix. 16 00:01:03,610 --> 00:01:09,890 So I have orthogonal, diagonal, orthogonal, or physically, 17 00:01:09,890 --> 00:01:13,770 rotation, stretching, rotation. 18 00:01:13,770 --> 00:01:18,015 Now we have seen three factors for 19 00:01:18,015 --> 00:01:22,030 a matrix, V, lambda, V inverse. 20 00:01:22,030 --> 00:01:23,740 What's the difference? 21 00:01:23,740 --> 00:01:30,750 What's the difference between this SVD, this, and the V, 22 00:01:30,750 --> 00:01:35,120 lambda, V transpose, V inverse, V lambda, 23 00:01:35,120 --> 00:01:39,610 V inverse for diagonalizing other matrices? 24 00:01:39,610 --> 00:01:43,050 So the lambda is diagonal and the sigma is diagonal, 25 00:01:43,050 --> 00:01:44,460 but they're different. 26 00:01:44,460 --> 00:01:49,970 The key point is I now have two different matrices, not just 27 00:01:49,970 --> 00:01:53,210 V and V inverse, but two different matrices. 28 00:01:53,210 --> 00:01:56,810 But the new great advantage is they 29 00:01:56,810 --> 00:02:00,900 are orthogonal matrices, both of them. 30 00:02:00,900 --> 00:02:09,229 So by going to-- and I can do it for rectangular matrices also. 31 00:02:09,229 --> 00:02:13,230 Eigenvalues really worked for square matrices. 32 00:02:13,230 --> 00:02:15,160 Now we really are-- we have two. 33 00:02:15,160 --> 00:02:19,780 We have an input matrix and an output matrix. 34 00:02:19,780 --> 00:02:25,150 In those spaces m and n can have different dimensions. 35 00:02:25,150 --> 00:02:29,090 So by allowing two separate bases, 36 00:02:29,090 --> 00:02:35,630 we get rectangular matrices, and we get orthogonal factors 37 00:02:35,630 --> 00:02:37,240 with, again, a diagonal. 38 00:02:37,240 --> 00:02:39,120 And this is called-- these numbers 39 00:02:39,120 --> 00:02:44,560 sigma instead of eigenvalues, are called singular values. 40 00:02:44,560 --> 00:02:46,850 So these are the singular values. 41 00:02:46,850 --> 00:02:50,470 These are the singular vectors, the right singular vectors 42 00:02:50,470 --> 00:02:52,910 and the left singular vectors. 43 00:02:52,910 --> 00:02:55,430 That's the statement of the factorization. 44 00:02:55,430 --> 00:03:01,820 But we have to think a little bit, what are those factors? 45 00:03:01,820 --> 00:03:06,100 What are the-- can we see why this works? 46 00:03:06,100 --> 00:03:07,980 So I want that. 47 00:03:07,980 --> 00:03:12,450 And let me do, as you see this coming, 48 00:03:12,450 --> 00:03:18,310 I'll look at A transpose A. I like A transpose A. 49 00:03:18,310 --> 00:03:21,500 So A transpose will be, I transpose this. 50 00:03:21,500 --> 00:03:27,980 V sigma transpose U transpose, right? 51 00:03:27,980 --> 00:03:29,160 That's A transpose. 52 00:03:29,160 --> 00:03:34,980 Then I multiply by A U sigma V transpose. 53 00:03:34,980 --> 00:03:38,410 And what do I have? 54 00:03:38,410 --> 00:03:40,900 Well, I've got six matrices. 55 00:03:40,900 --> 00:03:45,140 But U transpose U in here is the identity, 56 00:03:45,140 --> 00:03:47,720 because U is an orthogonal matrix. 57 00:03:47,720 --> 00:03:52,330 So I really have just the V on one side, a sigma transpose 58 00:03:52,330 --> 00:03:59,420 sigma, that'll be diagonal, and a V transpose the right. 59 00:03:59,420 --> 00:04:01,070 This I recognize. 60 00:04:01,070 --> 00:04:02,120 This I recognize. 61 00:04:02,120 --> 00:04:07,990 Here is a single V, a diagonal matrix, a V transpose. 62 00:04:07,990 --> 00:04:10,350 What I'm showing you here, what we 63 00:04:10,350 --> 00:04:14,240 reached is the eigenvalue, the diagonalization, 64 00:04:14,240 --> 00:04:17,810 the usual eigenvalues are in here 65 00:04:17,810 --> 00:04:21,029 and the eigenvectors are in here. 66 00:04:21,029 --> 00:04:24,740 But the matrix is A transpose A. 67 00:04:24,740 --> 00:04:28,610 Once again, A was rectangular and completely general 68 00:04:28,610 --> 00:04:32,710 and we couldn't see perfect results. 69 00:04:32,710 --> 00:04:34,510 But when we went to A transpose A, 70 00:04:34,510 --> 00:04:37,970 that gave us a positive semidefinite matrix, 71 00:04:37,970 --> 00:04:39,860 symmetric for sure. 72 00:04:39,860 --> 00:04:42,930 Its eigenvectors will be orthogonal. 73 00:04:42,930 --> 00:04:46,840 That's how I know this V matrix, the eigenvectors 74 00:04:46,840 --> 00:04:49,960 for this symmetric matrix, are orthogonal 75 00:04:49,960 --> 00:04:53,030 and the eigenvalues are positive. 76 00:04:53,030 --> 00:04:55,830 And they're the squares of the singular value. 77 00:04:55,830 --> 00:04:58,450 So this is telling me the lambdas 78 00:04:58,450 --> 00:05:07,620 for A transpose A are the sigma squareds for s-- for A. 79 00:05:07,620 --> 00:05:08,205 For A itself. 80 00:05:12,860 --> 00:05:14,790 Lambda is the same. 81 00:05:14,790 --> 00:05:21,210 Lambda for A transpose A is sigma squared for the matrix A. 82 00:05:21,210 --> 00:05:25,410 Well that tells me V, that tells me sigma, 83 00:05:25,410 --> 00:05:31,850 and U disappeared here because U transpose U was the identity. 84 00:05:31,850 --> 00:05:33,250 It just went away. 85 00:05:33,250 --> 00:05:36,570 How would I get hold of U? 86 00:05:36,570 --> 00:05:39,330 Well, here's one way to see it. 87 00:05:39,330 --> 00:05:44,730 I multiply A times A transpose in that order, in that order. 88 00:05:44,730 --> 00:05:48,790 So now I have U sigma V transpose 89 00:05:48,790 --> 00:05:51,890 times the transpose, which is the V sigma 90 00:05:51,890 --> 00:05:55,260 transpose U transpose-- I'm having a lot of fun 91 00:05:55,260 --> 00:05:56,570 here with transposes. 92 00:05:56,570 --> 00:06:00,970 But V transpose V is now the identity in the middle. 93 00:06:00,970 --> 00:06:03,190 So what do I learn here? 94 00:06:03,190 --> 00:06:07,440 I learn that U is the eigenvector 95 00:06:07,440 --> 00:06:12,330 matrix for AA transpose. 96 00:06:12,330 --> 00:06:16,670 So these have the same eigenvalues, 97 00:06:16,670 --> 00:06:18,610 A times B has the same eigenvalues 98 00:06:18,610 --> 00:06:22,680 as B times A in this case, it comes out here. 99 00:06:22,680 --> 00:06:23,940 Same eigenvalues. 100 00:06:23,940 --> 00:06:28,120 This has eigenvectors V, this has eigenvectors U, 101 00:06:28,120 --> 00:06:32,580 and those are the V and the U in the singular value 102 00:06:32,580 --> 00:06:33,580 decomposition. 103 00:06:33,580 --> 00:06:36,130 Well, I have to show you an example 104 00:06:36,130 --> 00:06:38,710 I have to show you an example and an application, 105 00:06:38,710 --> 00:06:40,390 and that's it. 106 00:06:40,390 --> 00:06:41,440 So here's an example. 107 00:06:44,760 --> 00:06:50,060 Suppose A, I'll make it a square matrix, 2, 2, minus 1, 108 00:06:50,060 --> 00:06:53,590 1, not symmetric. 109 00:06:53,590 --> 00:06:55,090 Certainly not positive definite. 110 00:06:55,090 --> 00:06:58,360 I wouldn't use the word because that matrix is not symmetric. 111 00:06:58,360 --> 00:07:04,010 But it's got an SVD, three factors. 112 00:07:08,890 --> 00:07:12,190 And I work them out. 113 00:07:12,190 --> 00:07:16,960 This is the orthogonal matrix. 114 00:07:16,960 --> 00:07:22,810 I have to divide by square root of 5 to make it unit vectors. 115 00:07:22,810 --> 00:07:25,680 Oops, that's not going to work. 116 00:07:25,680 --> 00:07:27,250 How about that? 117 00:07:27,250 --> 00:07:32,140 The two columns are orthogonal and that's a perfectly good U. 118 00:07:32,140 --> 00:07:36,360 And then in the sigma, I got, well that's a-- oh, 119 00:07:36,360 --> 00:07:37,930 I did want 1 and 1. 120 00:07:37,930 --> 00:07:40,880 I did want 1 and 1, yes. 121 00:07:40,880 --> 00:07:46,970 So I have a singular matrix, determinant 0, singular matrix. 122 00:07:46,970 --> 00:07:52,300 So my eigenvalues will be 0 and it turns out square root of 10 123 00:07:52,300 --> 00:07:57,020 is the other eigenvalue for that-- other singular value 124 00:07:57,020 --> 00:07:58,160 for this guy. 125 00:07:58,160 --> 00:08:01,920 And now I'll put in the V transpose matrix, which 126 00:08:01,920 --> 00:08:09,160 is 1, 1, and 1, minus 1 is it? 127 00:08:09,160 --> 00:08:12,160 And those have length square root of 2, 128 00:08:12,160 --> 00:08:13,725 which I have to divide by. 129 00:08:17,290 --> 00:08:20,500 Well, I didn't do that so smoothly, 130 00:08:20,500 --> 00:08:22,880 but the result is clear. 131 00:08:22,880 --> 00:08:27,010 U, sigma, V transpose, so here's the sigma. 132 00:08:29,660 --> 00:08:35,020 And the singular values of this matrix are square root of 10 133 00:08:35,020 --> 00:08:39,559 and then 0 because it's a singular matrix. 134 00:08:39,559 --> 00:08:45,080 And the eigenvectors, well the singular vectors of the matrix 135 00:08:45,080 --> 00:08:50,830 are the left singular vectors and the right singular vectors. 136 00:08:50,830 --> 00:08:53,510 That looks good to me. 137 00:08:53,510 --> 00:08:56,320 And now the application to finish. 138 00:08:56,320 --> 00:09:01,750 A first application is, well, very important. 139 00:09:01,750 --> 00:09:04,090 All the time in this century, we're 140 00:09:04,090 --> 00:09:08,580 getting matrices with data in them. 141 00:09:08,580 --> 00:09:13,150 Maybe in life sciences, we test a bunch 142 00:09:13,150 --> 00:09:19,100 of sample people for genes. 143 00:09:19,100 --> 00:09:23,890 So I have a-- my data comes somehoe-- I 144 00:09:23,890 --> 00:09:27,460 have a gene expression matrix. 145 00:09:27,460 --> 00:09:38,320 I have samples, people, people 1, 2, 3 in those columns. 146 00:09:38,320 --> 00:09:46,200 And I have in the rows, let me say four rows, 147 00:09:46,200 --> 00:09:53,008 I have genes, gene expressions. 148 00:09:53,008 --> 00:09:54,950 That would be completely normal. 149 00:09:54,950 --> 00:09:58,210 A rectangular matrix, because the number of people 150 00:09:58,210 --> 00:10:00,970 and the number of genes is not the same. 151 00:10:00,970 --> 00:10:04,980 And in reality, those are both very, very big numbers, 152 00:10:04,980 --> 00:10:06,850 so I have a large matrix. 153 00:10:06,850 --> 00:10:10,370 And out of it, I want to-- and each number in the matrix 154 00:10:10,370 --> 00:10:17,190 is telling me how much the gene is expressed by that person. 155 00:10:17,190 --> 00:10:21,610 We may be searching for genes causing some disease. 156 00:10:21,610 --> 00:10:25,470 So we take several people, some well, some with the disease, 157 00:10:25,470 --> 00:10:27,220 we check on the genes. 158 00:10:27,220 --> 00:10:30,850 We get a big matrix and we look to understand 159 00:10:30,850 --> 00:10:32,140 something about of it. 160 00:10:32,140 --> 00:10:33,660 What can we understand? 161 00:10:33,660 --> 00:10:35,050 What are we looking for? 162 00:10:35,050 --> 00:10:39,630 We're looking for the correlation, the connection, 163 00:10:39,630 --> 00:10:45,590 between some combination maybe of genes and some-- 164 00:10:45,590 --> 00:10:49,650 we're looking for a gene people connection here. 165 00:10:49,650 --> 00:10:53,480 But it's not going to be person number one. 166 00:10:53,480 --> 00:10:55,660 We're not looking for one person. 167 00:10:55,660 --> 00:10:58,350 We're going to find a mixture of those people, 168 00:10:58,350 --> 00:11:05,060 so we're going to have sort of an eigensample, eigenpeople. 169 00:11:05,060 --> 00:11:09,130 Oh, that's a terrible-- eigenperson would be better. 170 00:11:09,130 --> 00:11:12,530 So I think I'm seeing an eigenperson. 171 00:11:12,530 --> 00:11:15,260 Let me see where I'm going to put this. 172 00:11:15,260 --> 00:11:21,380 So yeah, I think my matrix would be written-- oh, here 173 00:11:21,380 --> 00:11:23,840 is the main point. 174 00:11:23,840 --> 00:11:26,600 That just as I see in this example, 175 00:11:26,600 --> 00:11:30,970 it's the first vector and the first vector 176 00:11:30,970 --> 00:11:35,290 and the biggest sigma that are all important. 177 00:11:35,290 --> 00:11:39,060 Well, in that example the other sigma was 0, nothing. 178 00:11:39,060 --> 00:11:40,990 But in this example, I'll probably 179 00:11:40,990 --> 00:11:43,270 have three different sigmas. 180 00:11:43,270 --> 00:11:50,600 But the largest sigma, the first, the U1 and the V1, it's 181 00:11:50,600 --> 00:11:52,230 that combination that I want. 182 00:11:52,230 --> 00:12:03,700 I want U1 sigma 1 V1 transpose, the first eigenvector 183 00:12:03,700 --> 00:12:06,680 of A transpose A and of AA transpose. 184 00:12:06,680 --> 00:12:09,690 And the first singular, the biggest singular value, 185 00:12:09,690 --> 00:12:11,830 that's the information. 186 00:12:11,830 --> 00:12:17,560 That's the best sort of put together 187 00:12:17,560 --> 00:12:21,710 person, eigenperson, combination of these people 188 00:12:21,710 --> 00:12:24,190 and the best combination of genes. 189 00:12:24,190 --> 00:12:26,660 It has the-- in statistics, I would 190 00:12:26,660 --> 00:12:28,460 say the greatest variance. 191 00:12:28,460 --> 00:12:31,880 In ordinary English, I would say the most information. 192 00:12:31,880 --> 00:12:35,620 The most information in this big matrix 193 00:12:35,620 --> 00:12:40,620 is in this very special matrix with only rank one, 194 00:12:40,620 --> 00:12:43,630 only a single column repeated. 195 00:12:43,630 --> 00:12:47,030 A single row repeated, and a number 196 00:12:47,030 --> 00:12:50,080 sigma 1, the number that tells me that. 197 00:12:50,080 --> 00:12:53,120 Because remember, U is a unit vector. 198 00:12:53,120 --> 00:12:54,870 V is a unit vector. 199 00:12:54,870 --> 00:12:57,110 It's that number sigma 1 that's selling me. 200 00:12:57,110 --> 00:13:02,830 So it's like that unit vector times that number, key number, 201 00:13:02,830 --> 00:13:07,170 times that unit vector, that's this. 202 00:13:07,170 --> 00:13:12,280 I'm talking here about principle component analysis. 203 00:13:12,280 --> 00:13:16,420 I'm looking for the principle component, this part. 204 00:13:16,420 --> 00:13:20,100 Principle component analysis. 205 00:13:20,100 --> 00:13:26,580 A big application in applied statistics. 206 00:13:26,580 --> 00:13:32,640 You know, in large scale drug tests, 207 00:13:32,640 --> 00:13:38,268 statisticians really have a central place here. 208 00:13:38,268 --> 00:13:41,450 And this is on the research side, 209 00:13:41,450 --> 00:13:44,870 to find the-- get the information out 210 00:13:44,870 --> 00:13:48,150 of a big sample. 211 00:13:48,150 --> 00:13:52,330 So U1 is sort of a combination of people. 212 00:13:52,330 --> 00:13:54,920 V1 is a combination of genes. 213 00:13:54,920 --> 00:13:57,610 Sigma 1 is the biggest number I can get. 214 00:13:57,610 --> 00:14:02,760 So that's PCA, all coming from the singular value 215 00:14:02,760 --> 00:14:04,370 decomposition. 216 00:14:04,370 --> 00:14:06,090 Thank you.