1 00:00:00,090 --> 00:00:02,490 The following content is provided under a Creative 2 00:00:02,490 --> 00:00:04,030 Commons license. 3 00:00:04,030 --> 00:00:06,330 Your support will help MIT OpenCourseWare 4 00:00:06,330 --> 00:00:10,720 continue to offer high-quality educational resources for free. 5 00:00:10,720 --> 00:00:13,320 To make a donation, or view additional materials 6 00:00:13,320 --> 00:00:17,280 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,280 --> 00:00:18,450 at ocw.mit.edu. 8 00:00:26,554 --> 00:00:28,220 DENNIS FREEMAN: So last time, we started 9 00:00:28,220 --> 00:00:30,500 to think about sampling. 10 00:00:30,500 --> 00:00:34,020 And that's what I want to finish up today. 11 00:00:34,020 --> 00:00:36,844 I think sampling is a very important issue. 12 00:00:36,844 --> 00:00:38,510 It's one of the strengths of this course 13 00:00:38,510 --> 00:00:42,560 because we can think about on equal footing the way signals 14 00:00:42,560 --> 00:00:46,490 work in a CT system, or in a DT system, when the signals are 15 00:00:46,490 --> 00:00:48,470 CT, when the signals are DT. 16 00:00:48,470 --> 00:00:50,990 And specifically, when you convert between them. 17 00:00:50,990 --> 00:00:55,250 Converting between them, like we saw last time, that's 18 00:00:55,250 --> 00:00:57,890 a very important process because many of the kinds of signals 19 00:00:57,890 --> 00:01:01,610 that we want to think about occur in physical-- 20 00:01:01,610 --> 00:01:04,849 have a physical origin where they are naturally 21 00:01:04,849 --> 00:01:08,210 continuous time or continuous space kinds of signals, 22 00:01:08,210 --> 00:01:10,970 but we would like to use inexpensive digital electronics 23 00:01:10,970 --> 00:01:12,540 in order to process them. 24 00:01:12,540 --> 00:01:15,590 So it's important to understand how we can take a CT signal 25 00:01:15,590 --> 00:01:21,900 and represent the information that's there in a DT manner. 26 00:01:21,900 --> 00:01:28,280 And it's completely remarkable that you can even do that. 27 00:01:28,280 --> 00:01:32,320 CT signals are in some sense arbitrarily more complicated 28 00:01:32,320 --> 00:01:33,300 than DT signals. 29 00:01:33,300 --> 00:01:37,490 DT signals only exist at integer multiples of time, 30 00:01:37,490 --> 00:01:40,360 at integer values of time. 31 00:01:40,360 --> 00:01:44,170 CT signals, in principle, can do anything 32 00:01:44,170 --> 00:01:47,270 between two consecutive samples of a DT signal. 33 00:01:47,270 --> 00:01:49,650 So in some sense, they're arbitrarily more complicated. 34 00:01:49,650 --> 00:01:52,920 So it's kind of remarkable at all 35 00:01:52,920 --> 00:01:55,720 that we can talk meaningfully about how you can represent 36 00:01:55,720 --> 00:01:58,970 the information that's in a CT system with a DT equivalent 37 00:01:58,970 --> 00:01:59,470 system. 38 00:01:59,470 --> 00:02:01,595 And the point is, and the reason we're doing it now 39 00:02:01,595 --> 00:02:04,110 in this part of the course, is that by thinking 40 00:02:04,110 --> 00:02:07,870 about Fourier transforms, everything's very simple. 41 00:02:07,870 --> 00:02:10,539 Something that could be conceptually quite complicated 42 00:02:10,539 --> 00:02:13,750 is in fact, extremely simple to think about. 43 00:02:13,750 --> 00:02:16,990 So last time, we saw that the way to think about the signal, 44 00:02:16,990 --> 00:02:19,450 if you want to sample it, if you want 45 00:02:19,450 --> 00:02:21,839 to convert a CT signal to a DT signal, 46 00:02:21,839 --> 00:02:24,130 the way to think about it is to think about the Fourier 47 00:02:24,130 --> 00:02:26,570 transform. 48 00:02:26,570 --> 00:02:29,690 So then, the example that we talked about last time, 49 00:02:29,690 --> 00:02:32,610 you think about a CT signal, x of t. 50 00:02:32,610 --> 00:02:35,280 You think about its sample is taken uniformly in time. 51 00:02:37,840 --> 00:02:40,619 And then in order to think about the information and whether 52 00:02:40,619 --> 00:02:42,910 or not you've captured it all, the question is, can you 53 00:02:42,910 --> 00:02:46,750 reconstruct the original thing that you started 54 00:02:46,750 --> 00:02:50,250 with from the samples only? 55 00:02:50,250 --> 00:02:50,750 OK. 56 00:02:50,750 --> 00:02:52,270 Well, in general, no. 57 00:02:52,270 --> 00:02:55,300 So what we're really asking is, what are the rules, 58 00:02:55,300 --> 00:02:58,180 what are the conditions under which you can do that? 59 00:02:58,180 --> 00:03:00,779 And are they useful conditions or not? 60 00:03:00,779 --> 00:03:03,070 So the first way you can think about taking the samples 61 00:03:03,070 --> 00:03:05,200 and turning them back into a continuous time signal 62 00:03:05,200 --> 00:03:08,545 is something that we called impulse reconstruction. 63 00:03:08,545 --> 00:03:11,100 In impulse reconstruction, we substitute 64 00:03:11,100 --> 00:03:16,140 for every sample an impulse appropriately located in time 65 00:03:16,140 --> 00:03:18,210 and appropriately scaled in amplitude. 66 00:03:18,210 --> 00:03:20,010 The appropriate scale and amplitude 67 00:03:20,010 --> 00:03:23,610 is that you take the samples and you weight the impulses. 68 00:03:23,610 --> 00:03:28,650 You weight the impulse at the n-th time step 69 00:03:28,650 --> 00:03:34,570 by the sample value for time n. 70 00:03:34,570 --> 00:03:39,772 And you put the n-th one at time nt, n cap t. 71 00:03:39,772 --> 00:03:40,855 So impulse reconstruction. 72 00:03:40,855 --> 00:03:41,770 It's really easy. 73 00:03:41,770 --> 00:03:43,630 Take all the samples that you got 74 00:03:43,630 --> 00:03:49,150 by uniform sampling, substitute for every sample one impulse-- 75 00:03:49,150 --> 00:03:52,660 appropriately timed, appropriately weighted. 76 00:03:52,660 --> 00:03:54,490 OK, that's great. 77 00:03:54,490 --> 00:03:55,990 It's especially nice because there's 78 00:03:55,990 --> 00:04:00,730 a simple Fourier representation for that process. 79 00:04:00,730 --> 00:04:03,580 That process, if we think about just taking x of t 80 00:04:03,580 --> 00:04:09,480 and turning it into this impulse reconstruction, 81 00:04:09,480 --> 00:04:11,760 that impulse reconstruction is precisely the same 82 00:04:11,760 --> 00:04:16,029 as if I had multiplied the original signal x of t 83 00:04:16,029 --> 00:04:18,390 by an impulse train. 84 00:04:18,390 --> 00:04:22,780 Impulse is separated by capital T unit height. 85 00:04:22,780 --> 00:04:25,270 So that means the transformation can be thought of in terms 86 00:04:25,270 --> 00:04:27,550 of Fourier transforms as the convolution 87 00:04:27,550 --> 00:04:30,250 of the original spectrum, the original Fourier 88 00:04:30,250 --> 00:04:34,180 transform, with the Fourier transform of the impulse train, 89 00:04:34,180 --> 00:04:37,270 which is just another impulse train. 90 00:04:37,270 --> 00:04:42,460 So the rule is you can represent all the information 91 00:04:42,460 --> 00:04:46,340 in the signal if the signal started out being bandlimited. 92 00:04:46,340 --> 00:04:46,840 OK. 93 00:04:46,840 --> 00:04:51,250 If this signal had a region of frequency over which it is 94 00:04:51,250 --> 00:04:55,420 non-0 and for the rest of frequency the signal is 0, 95 00:04:55,420 --> 00:04:59,260 then when you do the aliasing, you can arrange the period 96 00:04:59,260 --> 00:05:01,180 so that the aliased copy-- 97 00:05:01,180 --> 00:05:05,950 so that the convolved copies don't overlap with each other. 98 00:05:05,950 --> 00:05:06,680 OK. 99 00:05:06,680 --> 00:05:10,850 So that was a simple way of thinking about, 100 00:05:10,850 --> 00:05:13,050 how much information was in the samples, 101 00:05:13,050 --> 00:05:15,410 by thinking about the impulse reconstruction. 102 00:05:15,410 --> 00:05:19,730 Of course, the signal that we reconstruct by this convolution 103 00:05:19,730 --> 00:05:24,530 process has multiple copies of the same frequency content. 104 00:05:24,530 --> 00:05:26,030 So we don't like that. 105 00:05:26,030 --> 00:05:28,760 So you can throw away those extra copies 106 00:05:28,760 --> 00:05:30,830 by doing a low-pass filtering operation. 107 00:05:30,830 --> 00:05:33,590 And we call that reconstruction-- the xr, 108 00:05:33,590 --> 00:05:36,870 we call that the bandlimited reconstruction. 109 00:05:36,870 --> 00:05:38,780 It's like the impulse reconstruction, 110 00:05:38,780 --> 00:05:41,050 except that it's bandlimited. 111 00:05:41,050 --> 00:05:42,410 OK. 112 00:05:42,410 --> 00:05:45,530 So we think of two ways of doing the reconstruction 113 00:05:45,530 --> 00:05:46,310 from the samples-- 114 00:05:46,310 --> 00:05:49,100 the impulse reconstruction, the bandlimited reconstruction. 115 00:05:49,100 --> 00:05:50,780 And the key is the sampling theorem. 116 00:05:50,780 --> 00:05:58,010 The sampling theorem says that if the original signal had 117 00:05:58,010 --> 00:06:02,720 non-zero frequency content over only some particular range 118 00:06:02,720 --> 00:06:07,160 of frequencies, you can sample fast enough so that you 119 00:06:07,160 --> 00:06:09,140 can represent all of the information that's 120 00:06:09,140 --> 00:06:13,400 in the continuous time signal with the samples. 121 00:06:13,400 --> 00:06:13,900 OK. 122 00:06:13,900 --> 00:06:15,080 Is that all clear? 123 00:06:15,080 --> 00:06:17,330 The point is we're trying to represent the information 124 00:06:17,330 --> 00:06:20,930 in a CT signal using DT. 125 00:06:20,930 --> 00:06:26,210 And that the Fourier transform is a way to visualize when you 126 00:06:26,210 --> 00:06:29,090 can do that and when you cannot do that. 127 00:06:29,090 --> 00:06:32,720 You still end up in a physical system, 128 00:06:32,720 --> 00:06:35,780 perhaps generating signals whose frequency content 129 00:06:35,780 --> 00:06:38,410 falls out of that range. 130 00:06:38,410 --> 00:06:41,920 We saw an illustration of that last time. 131 00:06:41,920 --> 00:06:43,420 So for example, if you were to try 132 00:06:43,420 --> 00:06:47,080 to represent a signal with this transform using 133 00:06:47,080 --> 00:06:52,420 a sampling period t, so that the impulses in frequency 134 00:06:52,420 --> 00:06:54,910 were separated by 2 pi over t, which 135 00:06:54,910 --> 00:07:00,610 happened to be less than twice this distance, 136 00:07:00,610 --> 00:07:03,320 then it would alias. 137 00:07:03,320 --> 00:07:05,150 That's bad. 138 00:07:05,150 --> 00:07:06,980 So we would typically also include 139 00:07:06,980 --> 00:07:11,030 an anti-aliasing filter, pre-filter the signal 140 00:07:11,030 --> 00:07:13,940 from physics, get rid of the parts 141 00:07:13,940 --> 00:07:17,450 that you know are going to be a problem when you try to sample. 142 00:07:17,450 --> 00:07:20,450 Then, go ahead and do the regular sampling, 143 00:07:20,450 --> 00:07:22,790 the regular uniform sampling, the regular bandlimited 144 00:07:22,790 --> 00:07:24,380 reconstruction. 145 00:07:24,380 --> 00:07:26,840 And the signal that you reconstruct 146 00:07:26,840 --> 00:07:29,840 won't be an identical copy, but it 147 00:07:29,840 --> 00:07:34,070 will be as close as you can given the sampling theorem. 148 00:07:34,070 --> 00:07:35,170 OK. 149 00:07:35,170 --> 00:07:36,650 So that's what we did last time. 150 00:07:36,650 --> 00:07:39,260 What I want to do today is think about some other issues 151 00:07:39,260 --> 00:07:44,290 that come up when you try to represent a continuous signal 152 00:07:44,290 --> 00:07:46,880 in a discrete domain. 153 00:07:46,880 --> 00:07:49,150 So in addition to thinking about discretizing time, 154 00:07:49,150 --> 00:07:53,530 we also have to think about discretizing amplitude. 155 00:07:53,530 --> 00:07:57,140 Because if we want to represent a signal by bits-- 156 00:07:57,140 --> 00:07:59,470 so we have to represent not only the time, 157 00:07:59,470 --> 00:08:03,250 but also the amplitude in bits. 158 00:08:03,250 --> 00:08:04,930 I'll talk about several different kinds 159 00:08:04,930 --> 00:08:06,580 of schemes for that. 160 00:08:06,580 --> 00:08:09,100 In the simplest kinds of schemes, 161 00:08:09,100 --> 00:08:12,400 the code for the representation in amplitude 162 00:08:12,400 --> 00:08:14,680 is separately derived from the code 163 00:08:14,680 --> 00:08:17,300 for the representation in time. 164 00:08:17,300 --> 00:08:21,670 So we can think of it as two boxes, a sampling box followed 165 00:08:21,670 --> 00:08:24,470 by a quantization box. 166 00:08:24,470 --> 00:08:27,620 The first box, the sampling box, takes the CT signal of time 167 00:08:27,620 --> 00:08:29,420 and turns it into a DT signal. 168 00:08:29,420 --> 00:08:32,179 The second box takes the samples, 169 00:08:32,179 --> 00:08:35,840 which have a continuous domain, and turn them into samples 170 00:08:35,840 --> 00:08:37,640 from a finite domain-- 171 00:08:37,640 --> 00:08:39,770 from a discrete domain. 172 00:08:39,770 --> 00:08:41,600 OK. 173 00:08:41,600 --> 00:08:45,867 So if you're doing that kind of a quantization scheme, 174 00:08:45,867 --> 00:08:47,450 then the thing you have to think about 175 00:08:47,450 --> 00:08:49,040 is how many bits you're willing to use 176 00:08:49,040 --> 00:08:50,375 to represent each sample. 177 00:08:50,375 --> 00:08:52,250 I mean, this is the simplest kind of a scheme 178 00:08:52,250 --> 00:08:52,820 that you could use. 179 00:08:52,820 --> 00:08:55,361 There's much more complicated schemes by the end of the hour. 180 00:08:55,361 --> 00:08:57,770 I'll tell you about a scheme that is 181 00:08:57,770 --> 00:08:59,892 much more efficient than this. 182 00:08:59,892 --> 00:09:01,350 But this is kind of the base level. 183 00:09:01,350 --> 00:09:02,600 This is where you would start. 184 00:09:02,600 --> 00:09:05,810 So if you wanted to represent an amplitude 185 00:09:05,810 --> 00:09:08,177 in a discrete representation, one way 186 00:09:08,177 --> 00:09:10,510 you could do about it-- one way you could think about it 187 00:09:10,510 --> 00:09:15,830 is to think about the map between the continuous values 188 00:09:15,830 --> 00:09:22,260 that the sample could acquire and map it to a discrete output 189 00:09:22,260 --> 00:09:23,040 set. 190 00:09:23,040 --> 00:09:28,980 So for example, if you were using 2 bits per sample, 191 00:09:28,980 --> 00:09:32,930 then you might represent any voltage between minus 1/2 192 00:09:32,930 --> 00:09:37,040 and 1/2 by some code 0, 1. 193 00:09:37,040 --> 00:09:42,260 Any voltage that's in the range 1/2 to 1 as the code 1, 0. 194 00:09:42,260 --> 00:09:46,550 And any voltage in the range minus 1 to minus 1/2 as 0, 0. 195 00:09:46,550 --> 00:09:49,239 That would be a way of taking a continuous range 196 00:09:49,239 --> 00:09:50,780 of possible amplitudes and turning it 197 00:09:50,780 --> 00:09:56,170 into a discrete number using just 2 bits. 198 00:09:56,170 --> 00:09:59,290 Obviously if you use more bits, you can get greater precision. 199 00:09:59,290 --> 00:10:02,680 What's showed below here is, what if my signal was 200 00:10:02,680 --> 00:10:04,200 a function of time-- 201 00:10:04,200 --> 00:10:07,350 looked like the red waveform. 202 00:10:07,350 --> 00:10:11,040 My discrete representation might look like the blue waveform, 203 00:10:11,040 --> 00:10:11,610 right? 204 00:10:11,610 --> 00:10:14,070 If I'm imagining that I only have 2 bits, 205 00:10:14,070 --> 00:10:19,440 then I only have 3 possible symmetric outputs. 206 00:10:19,440 --> 00:10:21,150 So that might be represented by the blue. 207 00:10:21,150 --> 00:10:23,108 And the difference between the red and the blue 208 00:10:23,108 --> 00:10:24,250 is showed in the green. 209 00:10:24,250 --> 00:10:26,550 And as you can see as you go to more bits, 210 00:10:26,550 --> 00:10:28,330 you obviously get errors-- 211 00:10:28,330 --> 00:10:30,330 the green signal as it's getting smaller, right? 212 00:10:30,330 --> 00:10:32,610 So the key thing then is, how many bits 213 00:10:32,610 --> 00:10:38,730 do you need for the thing that you're trying to represent? 214 00:10:38,730 --> 00:10:40,710 So I like hearing. 215 00:10:40,710 --> 00:10:44,910 So I'll illustrate the number of bits by thinking about sound. 216 00:10:44,910 --> 00:10:48,390 You can hear sounds that range in amplitude over a range 217 00:10:48,390 --> 00:10:51,490 of about a million to 1. 218 00:10:51,490 --> 00:10:55,990 So if you were to put a person with good ears-- not me, 219 00:10:55,990 --> 00:10:56,800 one of you. 220 00:10:56,800 --> 00:10:59,680 If you were to put one of you into a quiet room 221 00:10:59,680 --> 00:11:02,200 and let you sit there until you adapted, and then played 222 00:11:02,200 --> 00:11:06,490 the faintest sound that you could possibly hear, 223 00:11:06,490 --> 00:11:09,820 then multiplied by 10, multiplied by 10, 224 00:11:09,820 --> 00:11:14,320 multiplied by 10, you could make it a million times 225 00:11:14,320 --> 00:11:16,480 more intense in pressure. 226 00:11:16,480 --> 00:11:18,610 You could amplify the pressure by a million 227 00:11:18,610 --> 00:11:20,125 before it'd start to hurt. 228 00:11:20,125 --> 00:11:21,910 It wouldn't damage yet. 229 00:11:21,910 --> 00:11:25,420 You'd have to go to about 8 million, 230 00:11:25,420 --> 00:11:27,580 and then it would start to damage. 231 00:11:27,580 --> 00:11:30,190 But you could do about a million to 1 over the range 232 00:11:30,190 --> 00:11:34,940 from just barely audible to starts to hurt. 233 00:11:34,940 --> 00:11:36,980 So how many bits would it take to do that range? 234 00:12:12,764 --> 00:12:14,780 So how many bits would it take? 235 00:12:14,780 --> 00:12:15,500 Raise your hands. 236 00:12:15,500 --> 00:12:16,666 Show me a number of fingers. 237 00:12:16,666 --> 00:12:20,850 How many bits would it take to represent a million to 1? 238 00:12:20,850 --> 00:12:21,710 OK. 239 00:12:21,710 --> 00:12:22,210 100%. 240 00:12:22,210 --> 00:12:23,670 I think it's 100%. 241 00:12:23,670 --> 00:12:25,950 So easy question. 242 00:12:25,950 --> 00:12:30,360 So if you use 1 bit, you can represent 2 levels. 243 00:12:30,360 --> 00:12:32,480 If you use 2 bits, you can do 4. 244 00:12:32,480 --> 00:12:33,826 8, 16, 32. 245 00:12:33,826 --> 00:12:35,950 By the time you get to 10 bits, you're up to 1,024. 246 00:12:35,950 --> 00:12:39,270 By the time you're up to 20 bits, you're up to 1,024 247 00:12:39,270 --> 00:12:41,970 squared. 248 00:12:41,970 --> 00:12:45,810 OK, 20 bits ought to do it. 249 00:12:45,810 --> 00:12:48,320 And in fact, 20 bits-- 250 00:12:48,320 --> 00:12:51,590 if you were to buy a high-end audio system, 251 00:12:51,590 --> 00:12:52,910 it would be 24-bits. 252 00:12:52,910 --> 00:12:55,340 There are people who claim you need 32. 253 00:12:55,340 --> 00:12:56,840 I think they're kind of crazy. 254 00:12:56,840 --> 00:13:01,070 But a high-end audio system would be a 24-bit system. 255 00:13:01,070 --> 00:13:04,490 Now, if you were to listen to sort of CD quality, 256 00:13:04,490 --> 00:13:07,100 CDs are 16 bits. 257 00:13:07,100 --> 00:13:09,920 So there are people, even me, who 258 00:13:09,920 --> 00:13:13,100 claim that they can tell the difference between a concert 259 00:13:13,100 --> 00:13:15,500 and a CD representation of a concert. 260 00:13:15,500 --> 00:13:16,220 OK. 261 00:13:16,220 --> 00:13:18,980 So there might be some limitations of representing 262 00:13:18,980 --> 00:13:21,920 audio with 16 bits. 263 00:13:21,920 --> 00:13:24,170 But what I'll show you is a demo where 264 00:13:24,170 --> 00:13:26,120 I've showed the same piece of music 265 00:13:26,120 --> 00:13:30,110 at 16 bits, 8 bits, 6 bits, 4 bits, 2 bits, and 1 266 00:13:30,110 --> 00:13:33,680 bit per sample, so that you get the idea of what a quantization 267 00:13:33,680 --> 00:13:34,816 error sounds like. 268 00:13:34,816 --> 00:13:35,316 Yes. 269 00:13:35,316 --> 00:13:36,774 AUDIENCE: So I think the difference 270 00:13:36,774 --> 00:13:40,144 between a concert and a CD, it's mainly because [INAUDIBLE]. 271 00:13:40,144 --> 00:13:42,560 DENNIS FREEMAN: There's lots of things that are different. 272 00:13:42,560 --> 00:13:44,600 And you're raising a very good point. 273 00:13:44,600 --> 00:13:49,730 You certainly don't get the spatial aspects of a concert. 274 00:13:49,730 --> 00:13:51,310 We try to fake you out. 275 00:13:51,310 --> 00:13:53,540 We put false cues in, so the violin 276 00:13:53,540 --> 00:13:55,820 sounds like it's on the right side. 277 00:13:55,820 --> 00:13:57,380 But those are all fake, usually. 278 00:13:57,380 --> 00:14:00,170 Well, they're not completely fake. 279 00:14:00,170 --> 00:14:01,580 And we have stereo. 280 00:14:01,580 --> 00:14:04,400 And we have 5 plus 1. 281 00:14:04,400 --> 00:14:06,350 So we have lots of different representations. 282 00:14:06,350 --> 00:14:08,915 But if you were to imagine listening in a concert 283 00:14:08,915 --> 00:14:11,030 monaurally. 284 00:14:11,030 --> 00:14:15,200 So plug your ear, clamp your head so you can't turn, 285 00:14:15,200 --> 00:14:21,650 and compare that to listening with a mono headphone, that's 286 00:14:21,650 --> 00:14:23,409 what I'm talking about. 287 00:14:23,409 --> 00:14:25,700 So if you didn't get spatial cues and things like that. 288 00:14:28,410 --> 00:14:29,470 OK. 289 00:14:29,470 --> 00:14:33,020 So the issue then is to listen to different levels 290 00:14:33,020 --> 00:14:36,685 of quantization. 291 00:14:36,685 --> 00:14:40,178 [MUSIC PLAYING] 292 00:15:44,827 --> 00:15:47,160 DENNIS FREEMAN: So it's actually kind of amazing, right? 293 00:15:47,160 --> 00:15:50,336 You can sort of tell what the piece is the whole way down 294 00:15:50,336 --> 00:15:52,710 to-- how many of you could tell the difference between 16 295 00:15:52,710 --> 00:15:53,210 and 8? 296 00:15:55,838 --> 00:15:56,840 AUDIENCE: [INAUDIBLE] 297 00:15:56,840 --> 00:15:58,590 DENNIS FREEMAN: How many of you could tell 298 00:15:58,590 --> 00:15:59,881 the difference between 8 and 6? 299 00:16:02,700 --> 00:16:05,430 How many of you could tell any difference whatever? 300 00:16:05,430 --> 00:16:07,830 Just joking. 301 00:16:07,830 --> 00:16:11,350 What's the difference in the sound quality? 302 00:16:11,350 --> 00:16:13,467 What's the effect of quantizing? 303 00:16:13,467 --> 00:16:15,050 AUDIENCE: Fuzziness in the background. 304 00:16:15,050 --> 00:16:17,810 DENNIS FREEMAN: Kind of fuzzy. 305 00:16:17,810 --> 00:16:20,806 So could you simulate the fuzzy sound? 306 00:16:20,806 --> 00:16:22,430 What would you do if you wanted to sort 307 00:16:22,430 --> 00:16:24,859 of simulate the fuzzy sound? 308 00:16:24,859 --> 00:16:26,400 Besides, of course, quantizing, which 309 00:16:26,400 --> 00:16:30,246 would be a perfect simulation. 310 00:16:30,246 --> 00:16:31,650 AUDIENCE: [INAUDIBLE] 311 00:16:31,650 --> 00:16:32,790 DENNIS FREEMAN: Noise. 312 00:16:32,790 --> 00:16:34,920 It kind of sounds hissy. 313 00:16:34,920 --> 00:16:37,980 [HISSING] It sounds kind of noisy and that's 314 00:16:37,980 --> 00:16:41,000 kind of the point. 315 00:16:41,000 --> 00:16:42,680 And that's an important issue because it 316 00:16:42,680 --> 00:16:45,450 affects how much music you can put on any given medium. 317 00:16:45,450 --> 00:16:48,590 So for example, in a CD, CDs are 16 bits 318 00:16:48,590 --> 00:16:53,030 per sample, 2 channels, 44.1 kilosamples per second, 319 00:16:53,030 --> 00:16:54,500 60 seconds per minute. 320 00:16:54,500 --> 00:16:59,660 74 minutes is a typical recording time for a CD. 321 00:16:59,660 --> 00:17:02,234 So you end up with about a gigabyte. 322 00:17:02,234 --> 00:17:03,650 And that's what you can put on one 323 00:17:03,650 --> 00:17:06,099 of those little plastic things. 324 00:17:06,099 --> 00:17:12,130 If you were willing to live with 8-bit instead of 16-bit, 325 00:17:12,130 --> 00:17:17,589 you could obviously put on 148 minutes. 326 00:17:17,589 --> 00:17:22,480 So people don't make these decisions lightly. 327 00:17:22,480 --> 00:17:24,970 It's how many people do you make angry 328 00:17:24,970 --> 00:17:26,916 for one reason or the other, right? 329 00:17:26,916 --> 00:17:29,290 You can make them angry because they don't get much music 330 00:17:29,290 --> 00:17:30,915 or you can make them angry because they 331 00:17:30,915 --> 00:17:34,030 don't get high quality, right? 332 00:17:34,030 --> 00:17:36,040 So you get to sort of trade-off the kind 333 00:17:36,040 --> 00:17:38,910 of people who hate you. 334 00:17:38,910 --> 00:17:40,660 But that's the kind of idea. 335 00:17:40,660 --> 00:17:42,460 So if you have a piece of plastic 336 00:17:42,460 --> 00:17:46,600 on which you can put 1 gigabyte, you 337 00:17:46,600 --> 00:17:50,140 have to think about how you're going to represent it. 338 00:17:50,140 --> 00:17:53,270 And it matters how frequently you sample. 339 00:17:53,270 --> 00:17:58,180 And also, with what quantization you represent each sample. 340 00:17:58,180 --> 00:18:00,460 Same sort of thing happens for pictures. 341 00:18:00,460 --> 00:18:04,060 Here's a relatively high-quality picture, 342 00:18:04,060 --> 00:18:09,250 where it's 280 by 280 pixels. 343 00:18:09,250 --> 00:18:13,692 And it's an 8-bit representation in amplitude. 344 00:18:13,692 --> 00:18:15,400 The point's just that the kinds of things 345 00:18:15,400 --> 00:18:16,960 that happen when you quantize a picture 346 00:18:16,960 --> 00:18:19,293 are very similar to the same sorts of things that happen 347 00:18:19,293 --> 00:18:21,070 when you quantized audio. 348 00:18:21,070 --> 00:18:24,010 So if we take this picture and compare it to-- 349 00:18:24,010 --> 00:18:27,640 substitute for each pixel a quantized version 350 00:18:27,640 --> 00:18:29,380 of the amplitude. 351 00:18:29,380 --> 00:18:33,980 Quantized here to 8 bits and here to 7 bits. 352 00:18:33,980 --> 00:18:35,930 You might be able to see the difference. 353 00:18:35,930 --> 00:18:38,270 If I come up really close, I can certainly 354 00:18:38,270 --> 00:18:41,210 see quantization effects. 355 00:18:41,210 --> 00:18:58,740 If I drop the right one to 6, 5, 4, 3, 2, 1. 356 00:18:58,740 --> 00:18:59,240 OK. 357 00:18:59,240 --> 00:19:02,240 So here is 8 bits and 4 bits. 358 00:19:02,240 --> 00:19:05,630 Remember that when we thought about the audio example, 359 00:19:05,630 --> 00:19:07,130 it sounded fuzzy. 360 00:19:07,130 --> 00:19:08,500 It sounded hissy. 361 00:19:08,500 --> 00:19:12,514 [HISSING] What's the effect of quantizing here? 362 00:19:12,514 --> 00:19:14,462 Yeah. 363 00:19:14,462 --> 00:19:16,886 AUDIENCE: [INAUDIBLE] 364 00:19:16,886 --> 00:19:18,010 DENNIS FREEMAN: Sharp and-- 365 00:19:18,010 --> 00:19:18,890 say again? 366 00:19:18,890 --> 00:19:20,330 AUDIENCE: The contrast. 367 00:19:20,330 --> 00:19:22,413 DENNIS FREEMAN: Well, there's certainly a problem. 368 00:19:22,413 --> 00:19:25,930 So both of these pictures have high contrast, right? 369 00:19:25,930 --> 00:19:29,710 How would I see contrast in the pictures? 370 00:19:29,710 --> 00:19:32,320 Contrast refers to having big steps, 371 00:19:32,320 --> 00:19:34,300 step changes in brightness. 372 00:19:34,300 --> 00:19:37,480 So like, I might see a high contrast between this petal 373 00:19:37,480 --> 00:19:39,740 and that leaf. 374 00:19:39,740 --> 00:19:42,820 And I still have a high contrast at the analogous place 375 00:19:42,820 --> 00:19:44,210 over here. 376 00:19:44,210 --> 00:19:46,870 So there is some contrast effects. 377 00:19:46,870 --> 00:19:48,970 A little more subtly, the contrast 378 00:19:48,970 --> 00:19:53,420 affects how well you see the quantization. 379 00:19:53,420 --> 00:19:55,820 So if I changed the picture to have 380 00:19:55,820 --> 00:19:57,470 different amounts of contrast, I could 381 00:19:57,470 --> 00:20:03,960 effect whether you could see the quantization well or poorly. 382 00:20:03,960 --> 00:20:06,890 So in audio, the effect of quantizing-- 383 00:20:06,890 --> 00:20:09,080 as I quantized more and more and more, 384 00:20:09,080 --> 00:20:12,950 I caused more and more hiss [HISSING] in the background. 385 00:20:12,950 --> 00:20:14,272 What's the effect here? 386 00:20:14,272 --> 00:20:15,605 What's the effect of quantizing? 387 00:20:15,605 --> 00:20:16,350 Yeah. 388 00:20:16,350 --> 00:20:18,472 AUDIENCE: You have less grays to work with. 389 00:20:18,472 --> 00:20:19,930 DENNIS FREEMAN: I have fewer grays. 390 00:20:19,930 --> 00:20:22,355 AUDIENCE: So 1-bit was just black and white. 391 00:20:22,355 --> 00:20:25,646 So as you increase bits, you get more grays-- 392 00:20:25,646 --> 00:20:26,770 DENNIS FREEMAN: Absolutely. 393 00:20:26,770 --> 00:20:30,010 Could you give me sort of a qualitative assessment 394 00:20:30,010 --> 00:20:31,990 of the kinds of errors that you see here 395 00:20:31,990 --> 00:20:34,822 compared to the kinds of errors that you don't see there? 396 00:20:34,822 --> 00:20:35,322 Yeah. 397 00:20:35,322 --> 00:20:36,146 AUDIENCE: [INAUDIBLE] 398 00:20:36,146 --> 00:20:37,479 DENNIS FREEMAN: There's banding. 399 00:20:37,479 --> 00:20:39,040 Why would there be banding? 400 00:20:39,040 --> 00:20:41,140 Nobody said the audio sounded like it was banded. 401 00:20:43,830 --> 00:20:46,270 We just don't hear that way, right? 402 00:20:46,270 --> 00:20:48,560 Even though we're doing a similar process, 403 00:20:48,560 --> 00:20:51,880 why do we see banding in pictures? 404 00:20:51,880 --> 00:20:53,290 What's causing the banding? 405 00:20:53,290 --> 00:20:53,854 Yeah. 406 00:20:53,854 --> 00:20:57,920 AUDIENCE: [INAUDIBLE] 407 00:20:57,920 --> 00:20:59,170 DENNIS FREEMAN: Yeah, exactly. 408 00:20:59,170 --> 00:21:02,410 So the pixels that are nearby-- 409 00:21:02,410 --> 00:21:06,940 so take the pixels here, which came from pixels over here. 410 00:21:06,940 --> 00:21:09,700 They have nearly the same gray value, 411 00:21:09,700 --> 00:21:14,350 but the quantizer is making up its mind 412 00:21:14,350 --> 00:21:15,940 at a very precise level. 413 00:21:15,940 --> 00:21:18,159 It's deciding, oh, you're between these two levels. 414 00:21:18,159 --> 00:21:19,075 Turn into this number. 415 00:21:19,075 --> 00:21:20,533 If you're between these two levels, 416 00:21:20,533 --> 00:21:22,100 turn into this other number. 417 00:21:22,100 --> 00:21:24,190 So you get the bands because there's 418 00:21:24,190 --> 00:21:28,660 correlations in the brightnesses of pixels that are nearby. 419 00:21:28,660 --> 00:21:30,520 So you get this banding thing that 420 00:21:30,520 --> 00:21:33,730 can be objectionable whenever the quantization is not 421 00:21:33,730 --> 00:21:35,070 sufficient. 422 00:21:35,070 --> 00:21:37,000 OK. 423 00:21:37,000 --> 00:21:41,560 So one way you can reduce that is called dithering. 424 00:21:41,560 --> 00:21:44,040 Dithering means add noise. 425 00:21:44,040 --> 00:21:45,770 So that's kind of weird. 426 00:21:45,770 --> 00:21:47,770 So I want to get rid of the bands. 427 00:21:47,770 --> 00:21:49,000 So what do I do? 428 00:21:49,000 --> 00:21:51,550 I take every pixel. 429 00:21:51,550 --> 00:21:56,880 And before I quantize it, I add noise to it. 430 00:21:56,880 --> 00:22:00,220 Then even if the pixels came from a region that were nearly 431 00:22:00,220 --> 00:22:04,270 the same amplitude to start with, 432 00:22:04,270 --> 00:22:08,210 each individual pixel gets a different amount of noise 433 00:22:08,210 --> 00:22:11,680 so they quantize differently. 434 00:22:11,680 --> 00:22:15,160 And if I choose my noise in a clever way, 435 00:22:15,160 --> 00:22:18,310 I could use my noise to be plus or minus 1 quantum. 436 00:22:18,310 --> 00:22:20,500 So I could choose a random number generator 437 00:22:20,500 --> 00:22:23,440 that gave me numbers that were evenly 438 00:22:23,440 --> 00:22:27,130 distributed over the range minus 1/2 quantum 439 00:22:27,130 --> 00:22:29,410 to plus 1/2 quantum. 440 00:22:29,410 --> 00:22:31,630 And if I do that, then I can generate 441 00:22:31,630 --> 00:22:35,650 a picture that is quantized but was dithered 442 00:22:35,650 --> 00:22:37,600 before it was quantized. 443 00:22:37,600 --> 00:22:41,140 So the two pictures are both quantized at the level 444 00:22:41,140 --> 00:22:44,710 of 7 bits, but the one on the right 445 00:22:44,710 --> 00:22:46,550 had dither added to it first. 446 00:22:46,550 --> 00:22:51,430 So I'm adding noise before I do the quantization. 447 00:22:51,430 --> 00:22:53,494 And you can't see too much at 7. 448 00:22:53,494 --> 00:23:01,181 6, 5, 4, 3. 449 00:23:01,181 --> 00:23:01,680 OK. 450 00:23:01,680 --> 00:23:03,790 So what's the difference between the two? 451 00:23:03,790 --> 00:23:09,240 Well, over here I had these bands 452 00:23:09,240 --> 00:23:12,300 because the amplitudes were such that they all got 453 00:23:12,300 --> 00:23:14,940 converted into the same output. 454 00:23:14,940 --> 00:23:16,635 The bands have disappeared over there. 455 00:23:19,365 --> 00:23:21,150 2. 456 00:23:21,150 --> 00:23:25,440 Even 1 the bands have disappeared, right? 457 00:23:25,440 --> 00:23:27,380 But that's obviously not a good solution. 458 00:23:27,380 --> 00:23:28,760 So what's wrong with dither? 459 00:23:33,750 --> 00:23:34,674 AUDIENCE: Noisy. 460 00:23:34,674 --> 00:23:35,840 DENNIS FREEMAN: Noisy, yeah. 461 00:23:35,840 --> 00:23:38,540 I'm kind of going back to the hiss thing, right? 462 00:23:38,540 --> 00:23:42,045 Now, I've taken a picture that had had bands 463 00:23:42,045 --> 00:23:44,170 and I've turned it into a picture that looks noisy. 464 00:23:47,250 --> 00:23:50,570 There's a way to think about how the noise works. 465 00:23:50,570 --> 00:23:53,710 Imagine that I had a smoothly-varying signal showed 466 00:23:53,710 --> 00:23:59,330 in blue that was being turned from a continuous range 467 00:23:59,330 --> 00:24:02,220 of amplitudes into a discrete range of amplitudes. 468 00:24:02,220 --> 00:24:04,350 So let's represent the discrete amplitudes 469 00:24:04,350 --> 00:24:07,340 by the dashed red lines. 470 00:24:07,340 --> 00:24:09,710 Then, the signal that I might quantize 471 00:24:09,710 --> 00:24:12,184 could look like the red signal. 472 00:24:12,184 --> 00:24:13,850 And that's a very graphic representation 473 00:24:13,850 --> 00:24:16,910 of where the bands come from. 474 00:24:16,910 --> 00:24:20,480 So the bands come from the fact that the original signal 475 00:24:20,480 --> 00:24:27,390 sliced through a small number of quantized outputs. 476 00:24:27,390 --> 00:24:30,400 Everybody see where the bands are? 477 00:24:30,400 --> 00:24:34,780 Then, if I add dither, I can think about-- 478 00:24:34,780 --> 00:24:37,160 so this transformation from blue to red, 479 00:24:37,160 --> 00:24:40,550 I can think about that as being y equals Q of x. 480 00:24:40,550 --> 00:24:45,100 So x is the blue line, Q of x is the red line. 481 00:24:45,100 --> 00:24:47,020 Down here, what I've done is I've 482 00:24:47,020 --> 00:24:48,760 taken x and added noise to it. 483 00:24:48,760 --> 00:24:52,330 Then, I ran it through the same quantizer. 484 00:24:52,330 --> 00:24:55,134 And you can see that I've broken up the bands, 485 00:24:55,134 --> 00:24:57,175 but you can see that I've added a bunch of noise. 486 00:24:59,755 --> 00:25:01,380 So there's a slightly more clever thing 487 00:25:01,380 --> 00:25:03,510 that we can do that's called Robert's technique. 488 00:25:03,510 --> 00:25:07,560 Larry Roberts was a masters student here. 489 00:25:07,560 --> 00:25:09,660 He was here before I was here, which 490 00:25:09,660 --> 00:25:11,790 is kind of a remarkable thing. 491 00:25:11,790 --> 00:25:15,480 But they actually wrote thesis back then and they used paper. 492 00:25:15,480 --> 00:25:18,870 And you can go to the library and it's still there. 493 00:25:18,870 --> 00:25:22,440 So Larry thought of a method for dealing 494 00:25:22,440 --> 00:25:29,340 with this where what you do is you take the original signal x, 495 00:25:29,340 --> 00:25:31,350 you add n to it and quantize it, but then you 496 00:25:31,350 --> 00:25:32,350 subtract n back off. 497 00:25:35,659 --> 00:25:37,200 And that's called Robert's technique. 498 00:25:37,200 --> 00:25:40,920 And that's illustrated by this transformation. 499 00:25:40,920 --> 00:25:45,800 The good thing about this transformation is that this-- 500 00:25:45,800 --> 00:25:48,420 so here, the quantization error was clearly 501 00:25:48,420 --> 00:25:50,850 correlated with the signal. 502 00:25:50,850 --> 00:25:53,100 That's what banding is, right? 503 00:25:53,100 --> 00:25:55,410 Something about the signal turned into something 504 00:25:55,410 --> 00:25:58,580 about the error. 505 00:25:58,580 --> 00:26:05,150 Here, the error is still correlated with the signal. 506 00:26:05,150 --> 00:26:07,880 The correlation is less obvious, right? 507 00:26:07,880 --> 00:26:12,980 But here is a range of errors that are all positive. 508 00:26:12,980 --> 00:26:16,860 And here is a range of errors that are all negative. 509 00:26:16,860 --> 00:26:20,280 So the errors are still correlated 510 00:26:20,280 --> 00:26:23,220 with the original signal. 511 00:26:23,220 --> 00:26:25,470 So the result-- and when you do Robert's technique, 512 00:26:25,470 --> 00:26:28,000 you destroy the correlation. 513 00:26:28,000 --> 00:26:31,240 So with Robert's technique, you end up with-- it's still noisy. 514 00:26:31,240 --> 00:26:34,000 Because after all, I added noise to it. 515 00:26:34,000 --> 00:26:35,920 But I've added it in a very clever way 516 00:26:35,920 --> 00:26:39,280 that removes the correlation between the error 517 00:26:39,280 --> 00:26:40,360 and the signal. 518 00:26:40,360 --> 00:26:45,130 And the result is that the noise seems less. 519 00:26:45,130 --> 00:26:48,030 So if you compare 6 bits with dither 520 00:26:48,030 --> 00:26:51,400 to 6 bits with Robert's method, both pictures 521 00:26:51,400 --> 00:26:54,310 are represented by 6 bits. 522 00:26:54,310 --> 00:26:55,870 5 bits, 5 bits. 523 00:26:55,870 --> 00:26:57,415 4, 3. 524 00:27:01,040 --> 00:27:04,160 So the interesting thing is that the Robert's method 525 00:27:04,160 --> 00:27:06,650 looks like less noise. 526 00:27:06,650 --> 00:27:08,720 It's mathematically not. 527 00:27:08,720 --> 00:27:11,120 Mathematically, you can show that Robert's technique 528 00:27:11,120 --> 00:27:14,570 has the same energy in the noise as was in the ditherer 529 00:27:14,570 --> 00:27:15,620 technique. 530 00:27:15,620 --> 00:27:19,730 If you just calculate the energy in the error, 531 00:27:19,730 --> 00:27:21,410 they're identical. 532 00:27:21,410 --> 00:27:24,350 But in Robert's technique, he destroys the correlation 533 00:27:24,350 --> 00:27:27,380 and that makes the noise seem smaller. 534 00:27:27,380 --> 00:27:30,350 It's like physically less objectionable. 535 00:27:32,860 --> 00:27:34,992 What's the problem with Robert's technique? 536 00:27:38,610 --> 00:27:40,980 If I told you to implement a scheme 537 00:27:40,980 --> 00:27:48,440 that quantized according to Robert's technique. 538 00:27:48,440 --> 00:27:52,010 And say you're here and you're supposed to quantize a message, 539 00:27:52,010 --> 00:27:56,477 send it over the ethernet, and receive it in California. 540 00:27:56,477 --> 00:27:58,310 And you're only supposed to be sending, say, 541 00:27:58,310 --> 00:28:02,060 a 6-bit representation instead of a 16-bit representation. 542 00:28:02,060 --> 00:28:05,920 What's hard about Robert's technique compared to dither? 543 00:28:05,920 --> 00:28:07,220 Quantizing is easy, right? 544 00:28:07,220 --> 00:28:10,104 I take my 16-bit CD. 545 00:28:10,104 --> 00:28:11,270 I take off the first sample. 546 00:28:11,270 --> 00:28:12,270 I quantize it. 547 00:28:12,270 --> 00:28:15,000 I send it across the internet. 548 00:28:15,000 --> 00:28:16,190 I take off my second sample. 549 00:28:16,190 --> 00:28:16,830 I quantize it. 550 00:28:16,830 --> 00:28:20,670 I send those 6 bits over the internet, et cetera. 551 00:28:20,670 --> 00:28:22,666 Dither is sort of the same thing. 552 00:28:22,666 --> 00:28:23,790 I pick up the first sample. 553 00:28:23,790 --> 00:28:24,750 I add noise to it. 554 00:28:24,750 --> 00:28:25,440 I quantize it. 555 00:28:25,440 --> 00:28:28,505 I send those 6 bits over the internet. 556 00:28:28,505 --> 00:28:29,880 What's the hard part of Robert's? 557 00:28:35,127 --> 00:28:37,450 Yeah. 558 00:28:37,450 --> 00:28:38,950 AUDIENCE: Do you send the noise too? 559 00:28:38,950 --> 00:28:41,880 DENNIS FREEMAN: I have to send the noise, too. 560 00:28:41,880 --> 00:28:44,530 I have to know the precise value of the noise 561 00:28:44,530 --> 00:28:50,590 that I added to sample n, so I can subtract it back out. 562 00:28:50,590 --> 00:28:54,730 So Robert's technique says, I take the value x 563 00:28:54,730 --> 00:28:56,890 and I add some amount of noise n. 564 00:28:56,890 --> 00:28:58,330 End was a random number. 565 00:28:58,330 --> 00:29:01,030 I chose it by throwing a die or something. 566 00:29:01,030 --> 00:29:03,520 I quantize that, and then I subtract that same number 567 00:29:03,520 --> 00:29:04,270 back out. 568 00:29:04,270 --> 00:29:06,700 Well, that number has to be precise compared 569 00:29:06,700 --> 00:29:09,560 to the quantization levels. 570 00:29:09,560 --> 00:29:11,740 So for example, people would normally use-- 571 00:29:11,740 --> 00:29:14,110 if I'm doing 16-bit audio, people would normally 572 00:29:14,110 --> 00:29:18,490 use a 16-bit representation for n, 573 00:29:18,490 --> 00:29:21,280 which means that I take a 16-bit number off the CD. 574 00:29:21,280 --> 00:29:24,210 I take a random number. 575 00:29:24,210 --> 00:29:26,436 I add it, quantize it. 576 00:29:26,436 --> 00:29:30,190 And now, I can send the 6-bit number. 577 00:29:30,190 --> 00:29:32,480 But in order for that guy to reproduce the answer, 578 00:29:32,480 --> 00:29:35,865 he has to know n too. 579 00:29:35,865 --> 00:29:38,440 Everybody see that? 580 00:29:38,440 --> 00:29:40,637 So the problem is, how do you send the noise? 581 00:29:40,637 --> 00:29:42,220 And the trick is that we use something 582 00:29:42,220 --> 00:29:43,510 called pseudo random noise. 583 00:29:43,510 --> 00:29:46,090 Pseudo random noise is an algorithm 584 00:29:46,090 --> 00:29:50,890 that generates a sequence of numbers that looks random, 585 00:29:50,890 --> 00:29:52,840 but they were made algorithmically. 586 00:29:52,840 --> 00:29:55,900 So you can independently manufacture the same sequence 587 00:29:55,900 --> 00:29:58,711 here and there. 588 00:29:58,711 --> 00:30:00,210 That way, if you're using the same-- 589 00:30:00,210 --> 00:30:02,320 if you pre-agree that you're going to use the same 590 00:30:02,320 --> 00:30:06,850 algorithm, you can independently generate the same sequence 591 00:30:06,850 --> 00:30:07,480 of n's. 592 00:30:10,580 --> 00:30:12,548 OK. 593 00:30:12,548 --> 00:30:14,785 Yeah, so I jumped back to explain-- 594 00:30:17,281 --> 00:30:17,780 OK. 595 00:30:17,780 --> 00:30:23,920 So the point is that just like in audio, in pictures 596 00:30:23,920 --> 00:30:28,000 it's important how many bits you quantize to. 597 00:30:28,000 --> 00:30:31,840 That affects drastically the performance of communications 598 00:30:31,840 --> 00:30:33,030 or storage devices. 599 00:30:33,030 --> 00:30:34,780 How many pictures can you store someplace? 600 00:30:34,780 --> 00:30:37,270 How many pictures can you put on your iPhone? 601 00:30:37,270 --> 00:30:39,580 So all of that matters quite a bit. 602 00:30:39,580 --> 00:30:44,320 And the code that you use is very important. 603 00:30:44,320 --> 00:30:46,549 And you're not limited to just-- 604 00:30:46,549 --> 00:30:47,590 I have two more examples. 605 00:30:50,120 --> 00:30:52,990 So the simplest possible schemes are the ones 606 00:30:52,990 --> 00:30:54,490 that I've showed so far where you 607 00:30:54,490 --> 00:30:59,680 think about the sampling in time and the quantization 608 00:30:59,680 --> 00:31:02,780 in amplitude as separate processes. 609 00:31:02,780 --> 00:31:04,277 You don't have to do that. 610 00:31:04,277 --> 00:31:06,110 In fact, you can get much higher performance 611 00:31:06,110 --> 00:31:08,340 if you combine the two. 612 00:31:08,340 --> 00:31:10,310 So the first combination I want to think about 613 00:31:10,310 --> 00:31:13,940 is trading off precision for speed. 614 00:31:13,940 --> 00:31:17,010 And that's something that we call progressive refinement. 615 00:31:17,010 --> 00:31:19,610 The idea is, imagine that I want to make 616 00:31:19,610 --> 00:31:24,370 a digital representation of all the paintings in the Louvre. 617 00:31:24,370 --> 00:31:24,920 OK. 618 00:31:24,920 --> 00:31:30,434 It doesn't make sense to do 200 by 200 at 6-bit resolution 619 00:31:30,434 --> 00:31:32,350 if you were looking at pictures in the Louvre. 620 00:31:32,350 --> 00:31:33,808 That doesn't make any sense, right? 621 00:31:33,808 --> 00:31:36,930 You would like to see a high-resolution version. 622 00:31:36,930 --> 00:31:38,290 OK. 623 00:31:38,290 --> 00:31:39,880 And now you're a user, and what you'd 624 00:31:39,880 --> 00:31:42,670 like to do is leaf through them and find 625 00:31:42,670 --> 00:31:45,250 photos of something or other. 626 00:31:45,250 --> 00:31:46,930 Scenes of some type. 627 00:31:46,930 --> 00:31:47,680 OK. 628 00:31:47,680 --> 00:31:50,650 Well if you've got a high-resolution representation 629 00:31:50,650 --> 00:31:53,260 and you're trying to thumb through a lot of images. 630 00:31:53,260 --> 00:31:55,480 The problem is, if each one is represented 631 00:31:55,480 --> 00:31:58,960 with high resolution, that can take a long time. 632 00:31:58,960 --> 00:32:01,000 So if you didn't do something clever, 633 00:32:01,000 --> 00:32:05,720 basically you would have to download the Louvre before you 634 00:32:05,720 --> 00:32:07,710 could do your search. 635 00:32:07,710 --> 00:32:09,920 So the idea in progressive refinement 636 00:32:09,920 --> 00:32:15,410 is first send me a crude representation. 637 00:32:15,410 --> 00:32:18,740 And if I haven't changed in my browser, 638 00:32:18,740 --> 00:32:21,620 if I'm still looking at the same picture three seconds later, 639 00:32:21,620 --> 00:32:23,840 continue to load the information that 640 00:32:23,840 --> 00:32:27,670 makes the picture increasingly precise. 641 00:32:27,670 --> 00:32:30,480 Give me a crude representation as soon as you can. 642 00:32:30,480 --> 00:32:36,740 And then if I sit there, give me a more refined representation. 643 00:32:36,740 --> 00:32:40,640 But if I lead to someplace else, stop downloading that one 644 00:32:40,640 --> 00:32:42,332 and give me a crude representation 645 00:32:42,332 --> 00:32:43,040 of the new place. 646 00:32:43,040 --> 00:32:44,580 That's the idea. 647 00:32:44,580 --> 00:32:48,530 So the way you can do that is with discrete sampling. 648 00:32:48,530 --> 00:32:51,680 I started with a digital representation of a painting 649 00:32:51,680 --> 00:32:52,700 in the Louvre. 650 00:32:52,700 --> 00:32:59,870 Maybe it was 20,000 by 20,000 with 24 levels of color-- 651 00:32:59,870 --> 00:33:02,180 some huge picture. 652 00:33:02,180 --> 00:33:04,540 So what I'll do is I'll sample it. 653 00:33:04,540 --> 00:33:07,070 But this time, it's DT sampling. 654 00:33:07,070 --> 00:33:10,400 DT sampling-- you'll be completely shocked to hear 655 00:33:10,400 --> 00:33:11,510 this-- 656 00:33:11,510 --> 00:33:14,300 is completely analogous to CT sampling. 657 00:33:14,300 --> 00:33:15,980 It's almost the same thing. 658 00:33:18,650 --> 00:33:20,451 That shouldn't be too big of a surprise, 659 00:33:20,451 --> 00:33:21,950 all of the different transforms, all 660 00:33:21,950 --> 00:33:24,116 the different Fourier representations that we looked 661 00:33:24,116 --> 00:33:26,760 at, are almost the same thing. 662 00:33:26,760 --> 00:33:29,270 So DT sampling turns out to work almost exactly 663 00:33:29,270 --> 00:33:31,310 like CT sampling. 664 00:33:31,310 --> 00:33:36,770 So think about what you would do if you wanted to take a picture 665 00:33:36,770 --> 00:33:39,560 and represent it with a factor of 3 fewer 666 00:33:39,560 --> 00:33:43,850 pixels in the horizontal and a factor of 3 fewer 667 00:33:43,850 --> 00:33:45,890 pixels in the vertical. 668 00:33:45,890 --> 00:33:47,570 Well, you would sample it. 669 00:33:47,570 --> 00:33:51,470 In CT, we would think about multiplying the CT signal 670 00:33:51,470 --> 00:33:54,500 x of t by an impulse train. 671 00:33:54,500 --> 00:33:57,650 Here, we use a unit sample train. 672 00:33:57,650 --> 00:34:00,780 So we think about an original signal x of n. 673 00:34:00,780 --> 00:34:03,340 And we think about a sampling waveform 674 00:34:03,340 --> 00:34:09,350 that's now at an infinite unit-sampled training. 675 00:34:09,350 --> 00:34:11,960 We used to use an infinite impulse train, 676 00:34:11,960 --> 00:34:14,840 now we're using an infinite unit-sampled train. 677 00:34:14,840 --> 00:34:17,270 So we preserve every third sample 678 00:34:17,270 --> 00:34:20,389 and throw away the ones between. 679 00:34:20,389 --> 00:34:25,190 So that's a way of generating a new picture that 680 00:34:25,190 --> 00:34:27,334 only has one third of the information that 681 00:34:27,334 --> 00:34:28,500 was in the original picture. 682 00:34:28,500 --> 00:34:31,730 And as I said before, it should come as no surprise 683 00:34:31,730 --> 00:34:34,550 that the math for thinking about this sampling process 684 00:34:34,550 --> 00:34:36,679 is virtually identical to the math 685 00:34:36,679 --> 00:34:40,310 that you need to think about the CT sampling problem. 686 00:34:40,310 --> 00:34:43,040 In particular, the key is to think about the Fourier 687 00:34:43,040 --> 00:34:45,020 representation. 688 00:34:45,020 --> 00:34:48,500 If this were the original Fourier signal, 689 00:34:48,500 --> 00:34:53,210 if this were the Fourier representation of this signal, 690 00:34:53,210 --> 00:34:55,969 we have to think about the Fourier representation 691 00:34:55,969 --> 00:35:02,930 for the sampling signal, the infinite unit-sampled train. 692 00:35:02,930 --> 00:35:05,597 An infinite unit-sampled train, not surprisingly, 693 00:35:05,597 --> 00:35:08,180 the transform of that's going to be an infinite impulse train. 694 00:35:10,730 --> 00:35:13,260 All DT signals are periodic in 2 pi. 695 00:35:13,260 --> 00:35:15,560 That's a property of DT signals. 696 00:35:15,560 --> 00:35:18,170 That's a property of the unit circle. 697 00:35:18,170 --> 00:35:20,480 So we're not surprised to see that this signal was 698 00:35:20,480 --> 00:35:22,760 periodic in 2 pi. 699 00:35:22,760 --> 00:35:25,070 This signal is also periodic in 2 pi. 700 00:35:25,070 --> 00:35:26,200 That's because it's DT. 701 00:35:26,200 --> 00:35:29,950 But it's also periodic in one third of that. 702 00:35:29,950 --> 00:35:35,630 That's because of the periodicity here. 703 00:35:35,630 --> 00:35:36,520 OK. 704 00:35:36,520 --> 00:35:40,930 So if we had had a sample at each one of these, 705 00:35:40,930 --> 00:35:45,336 then the base periodicity would have been 2 pi. 706 00:35:45,336 --> 00:35:48,720 But here, because of the periodicity 707 00:35:48,720 --> 00:35:54,980 being 1 every third sample, we get 3 times that many impulses. 708 00:35:54,980 --> 00:35:59,090 So just like in CT sampling, we think 709 00:35:59,090 --> 00:36:01,310 about multiplying the original waveform 710 00:36:01,310 --> 00:36:04,610 by a sampling waveform that preserves only the information 711 00:36:04,610 --> 00:36:06,120 at the samples. 712 00:36:06,120 --> 00:36:07,610 We do the same thing here. 713 00:36:07,610 --> 00:36:10,950 Multiplication in time is convolution in frequency. 714 00:36:10,950 --> 00:36:12,920 So we take the original signal, we convolve it, 715 00:36:12,920 --> 00:36:17,540 and this is what comes out of that sampling process. 716 00:36:17,540 --> 00:36:23,570 We get the same rule for the sampling theorem 717 00:36:23,570 --> 00:36:25,290 that we got for CT. 718 00:36:28,100 --> 00:36:31,580 This process has to be such that when you do the convolution, 719 00:36:31,580 --> 00:36:37,500 the resulting nearest neighbors shouldn't overlap. 720 00:36:37,500 --> 00:36:42,870 So there is a maximum frequency for the discrete system, 721 00:36:42,870 --> 00:36:46,070 just like there was a maximum frequency for the CT system. 722 00:36:48,810 --> 00:36:49,890 There's one more step. 723 00:36:49,890 --> 00:36:52,800 Obviously, if I sample the picture at the Louvre, 724 00:36:52,800 --> 00:36:54,300 I don't want to send the 0's. 725 00:36:54,300 --> 00:36:57,070 That doesn't make any sense. 726 00:36:57,070 --> 00:37:00,640 So in order to not send the 0's, I smash together 727 00:37:00,640 --> 00:37:03,240 the non-0 samples. 728 00:37:03,240 --> 00:37:05,550 That's illustrated here. 729 00:37:05,550 --> 00:37:10,320 Smashing in time does what in frequency? 730 00:37:10,320 --> 00:37:11,320 AUDIENCE: [INAUDIBLE] 731 00:37:11,320 --> 00:37:16,210 DENNIS FREEMAN: Squish in time, stretch in frequency. 732 00:37:16,210 --> 00:37:18,630 They're reciprocal spaces, right? 733 00:37:18,630 --> 00:37:20,850 Frequency and time are reciprocal spaces. 734 00:37:20,850 --> 00:37:23,830 Smash in time, stretch in frequency. 735 00:37:23,830 --> 00:37:31,610 So the result is that when you smash the 0 entries out 736 00:37:31,610 --> 00:37:34,820 of the signal, you stretch the frequency representation 737 00:37:34,820 --> 00:37:36,680 by a factor of 3. 738 00:37:36,680 --> 00:37:38,510 And when you stretch by a factor of 3, 739 00:37:38,510 --> 00:37:41,540 this peak, which was at 1/3 of 2 pi, 740 00:37:41,540 --> 00:37:45,000 moves the whole way out to 2 pi. 741 00:37:45,000 --> 00:37:46,080 OK. 742 00:37:46,080 --> 00:37:49,650 So the idea then is that I've got this beautiful picture 743 00:37:49,650 --> 00:37:52,270 in the Louvre. 744 00:37:52,270 --> 00:37:52,770 Maybe. 745 00:37:56,490 --> 00:38:01,830 In order to send a lower resolution version of that, 746 00:38:01,830 --> 00:38:03,840 what I do is I low-pass filter it 747 00:38:03,840 --> 00:38:07,550 because I don't want the frequencies to alias. 748 00:38:07,550 --> 00:38:09,890 So I low-pass filter it. 749 00:38:09,890 --> 00:38:16,630 That gives me a representation that I can then downsample. 750 00:38:16,630 --> 00:38:17,290 OK. 751 00:38:17,290 --> 00:38:20,840 So this had the same size, but this one 752 00:38:20,840 --> 00:38:24,270 has fewer high-frequency components. 753 00:38:24,270 --> 00:38:26,090 So I can downsample, which gives me 754 00:38:26,090 --> 00:38:27,740 something that can be represented 755 00:38:27,740 --> 00:38:31,340 in the squeezed version with fewer pixels. 756 00:38:31,340 --> 00:38:34,190 I did a downsample by a factor of 2 in both, so that picture 757 00:38:34,190 --> 00:38:37,400 has 1/4 the number of pixels in it. 758 00:38:37,400 --> 00:38:44,150 Then, I can low-pass filter that one and downsample. 759 00:38:44,150 --> 00:38:47,060 And low-pass filter that one and downsample. 760 00:38:47,060 --> 00:38:50,810 And I end up with a very low-resolution image 761 00:38:50,810 --> 00:38:54,960 of this beautiful scene that I started with. 762 00:38:54,960 --> 00:38:55,770 OK. 763 00:38:55,770 --> 00:39:00,740 So that means that I start with some number of pixels. 764 00:39:00,740 --> 00:39:02,090 Here I have 1/4 as many. 765 00:39:02,090 --> 00:39:04,460 Here I have 1/4 of that. 766 00:39:04,460 --> 00:39:06,260 And here I have 1/4 of that. 767 00:39:06,260 --> 00:39:10,250 So I have a fourth cubed the original number of pictures. 768 00:39:10,250 --> 00:39:13,280 So it will go 4 cubed faster. 769 00:39:13,280 --> 00:39:17,060 So it'll take me a lot less time to get the low-res picture. 770 00:39:17,060 --> 00:39:18,230 So the result then-- 771 00:39:20,870 --> 00:39:22,200 skip this for the moment. 772 00:39:22,200 --> 00:39:24,830 So here's my low-res picture. 773 00:39:24,830 --> 00:39:30,680 With a lot of imagination, you can clearly see what that is. 774 00:39:30,680 --> 00:39:33,440 At the next level of refinement, you get this. 775 00:39:33,440 --> 00:39:35,780 At the next level of refinement, you get this. 776 00:39:35,780 --> 00:39:38,150 At the next level of refinement, you get this. 777 00:39:38,150 --> 00:39:39,571 By now, you're tired so you flick 778 00:39:39,571 --> 00:39:40,820 on something more interesting. 779 00:39:40,820 --> 00:39:41,360 No. 780 00:39:41,360 --> 00:39:43,490 You would continue to look at this, right? 781 00:39:43,490 --> 00:39:45,350 And finally, you get the original picture. 782 00:39:45,350 --> 00:39:49,650 So the idea then is that I want to not only transmit. 783 00:39:49,650 --> 00:39:55,170 But then the question is, how many bits do I need to do this? 784 00:39:55,170 --> 00:40:00,570 And the answer is that having transmitted this, 785 00:40:00,570 --> 00:40:03,870 I can use that information to help me generate this. 786 00:40:06,820 --> 00:40:07,320 OK. 787 00:40:07,320 --> 00:40:13,150 So what I do, I run the process backwards. 788 00:40:13,150 --> 00:40:16,760 Let me back up. 789 00:40:16,760 --> 00:40:22,040 So in order to go forwards, I thought about squishing this 790 00:40:22,040 --> 00:40:24,469 into a smaller representation. 791 00:40:24,469 --> 00:40:25,510 Well, I can go backwards. 792 00:40:25,510 --> 00:40:27,550 I can up-sample. 793 00:40:27,550 --> 00:40:30,410 When I up-sample, all I do is I take all the pictures 794 00:40:30,410 --> 00:40:32,570 in the shrunken version, I stretch them, 795 00:40:32,570 --> 00:40:35,210 and I put 0's between them. 796 00:40:35,210 --> 00:40:37,184 That gets me here. 797 00:40:37,184 --> 00:40:38,600 But that's not where I want to be. 798 00:40:38,600 --> 00:40:40,260 I want to be up here. 799 00:40:40,260 --> 00:40:44,040 So how do I go from here to here? 800 00:40:44,040 --> 00:40:45,990 So when I put the 0's in it. 801 00:40:45,990 --> 00:40:47,910 So I started with this, I put the 0's in it. 802 00:40:47,910 --> 00:40:49,020 That stretched it in time. 803 00:40:49,020 --> 00:40:51,390 That compressed it in frequency. 804 00:40:51,390 --> 00:40:53,580 When I compress this waveform into frequency, 805 00:40:53,580 --> 00:40:57,884 this 2 pi peak ended up at 2 pi over 3. 806 00:40:57,884 --> 00:41:00,300 So now if I want to get back to the original contribution, 807 00:41:00,300 --> 00:41:03,520 I have to low-pass filter. 808 00:41:03,520 --> 00:41:04,420 OK. 809 00:41:04,420 --> 00:41:07,210 Everybody see what I'm doing? 810 00:41:07,210 --> 00:41:10,620 So the final scheme then is that-- 811 00:41:10,620 --> 00:41:12,260 whoops. 812 00:41:12,260 --> 00:41:15,560 The final scheme is that I low-pass 813 00:41:15,560 --> 00:41:17,960 filter, downsample, low-pass, downsample, low-pass, 814 00:41:17,960 --> 00:41:19,190 downsample. 815 00:41:19,190 --> 00:41:23,900 Downsample, I can up-sample by putting 0's between all 816 00:41:23,900 --> 00:41:25,730 the rows and columns. 817 00:41:25,730 --> 00:41:29,190 Then, low-pass filter and that gives me this picture. 818 00:41:29,190 --> 00:41:30,950 So what I need to do is also transmit 819 00:41:30,950 --> 00:41:35,560 the high-pass information that I threw away. 820 00:41:35,560 --> 00:41:38,380 So if I separately transmit this picture 821 00:41:38,380 --> 00:41:41,512 in the high-pass part of this picture, 822 00:41:41,512 --> 00:41:43,345 then I can combine them to get that picture. 823 00:41:45,990 --> 00:41:49,709 And I don't actually need to transmit this one. 824 00:41:49,709 --> 00:41:51,500 So I don't need to transmit this one either 825 00:41:51,500 --> 00:41:52,458 because I can generate. 826 00:41:52,458 --> 00:41:55,420 So I only need to send this and this. 827 00:41:55,420 --> 00:41:57,760 Then, I do the same thing here. 828 00:41:57,760 --> 00:42:00,340 If I take this, I put 0's between it, low-pass filter. 829 00:42:00,340 --> 00:42:04,050 I can generate this picture, so I don't need to send it. 830 00:42:04,050 --> 00:42:06,710 But I do send this. 831 00:42:06,710 --> 00:42:10,400 Then, I combine these to get that recurse. 832 00:42:10,400 --> 00:42:11,270 OK. 833 00:42:11,270 --> 00:42:13,420 So the result is that I send-- 834 00:42:13,420 --> 00:42:17,480 so I don't send this, but I do send this. 835 00:42:17,480 --> 00:42:21,080 I don't send that because I'm going to regenerate it. 836 00:42:21,080 --> 00:42:22,010 I don't send that. 837 00:42:22,010 --> 00:42:22,940 I do send this. 838 00:42:22,940 --> 00:42:25,430 I only send this, this, this, and that. 839 00:42:25,430 --> 00:42:29,460 And that's enough information to reconstruct the picture. 840 00:42:29,460 --> 00:42:30,600 Right 841 00:42:30,600 --> 00:42:33,640 And notice it has the hierarchy that you would expect. 842 00:42:33,640 --> 00:42:36,320 You start with a low-res. 843 00:42:36,320 --> 00:42:37,911 It takes more bits to make this one. 844 00:42:37,911 --> 00:42:39,410 It takes more bits to make that one. 845 00:42:39,410 --> 00:42:42,520 And it takes more bits to make that one. 846 00:42:42,520 --> 00:42:44,440 You're worse off if you didn't do 847 00:42:44,440 --> 00:42:49,810 something clever by-- so I'm sending the full number of bits 848 00:42:49,810 --> 00:42:50,950 here. 849 00:42:50,950 --> 00:42:52,720 Then, I'm sending another 1/4. 850 00:42:52,720 --> 00:42:54,400 And then, another 1/16. 851 00:42:54,400 --> 00:42:56,250 Then, another 1/64. 852 00:42:56,250 --> 00:42:59,950 So I'm sending about 33% more bits total. 853 00:42:59,950 --> 00:43:01,930 But there's tricks. 854 00:43:01,930 --> 00:43:04,930 The trick is that the eye is less 855 00:43:04,930 --> 00:43:07,060 sensitive to these high frequencies 856 00:43:07,060 --> 00:43:08,200 than it is to these. 857 00:43:08,200 --> 00:43:14,240 So I really don't need to send the same resolution for this. 858 00:43:14,240 --> 00:43:15,770 So people use this all the time. 859 00:43:15,770 --> 00:43:17,810 If you go to a slow website, you may 860 00:43:17,810 --> 00:43:20,870 notice that you get that kind of low-res 861 00:43:20,870 --> 00:43:22,460 morphing into a higher-res. 862 00:43:22,460 --> 00:43:26,250 And that's exactly this kind of a scheme. 863 00:43:26,250 --> 00:43:28,667 But there are cleverer things you can do. 864 00:43:28,667 --> 00:43:30,000 So that's already pretty clever. 865 00:43:30,000 --> 00:43:32,910 And that's already something you see in today's technology, 866 00:43:32,910 --> 00:43:35,200 but there are even cleverer things that you can do. 867 00:43:35,200 --> 00:43:37,950 And so the last thing I want to talk about is JPEG. 868 00:43:37,950 --> 00:43:42,240 99% of the images that you download on the web are JPEG. 869 00:43:42,240 --> 00:43:46,800 JPEG is a clever technique that does quantization 870 00:43:46,800 --> 00:43:47,730 in the Fourier domain. 871 00:43:50,490 --> 00:43:52,050 And that's similar to what you would 872 00:43:52,050 --> 00:43:54,790 want to do in that progressive refinement 873 00:43:54,790 --> 00:43:56,790 because you would like to separate the frequency 874 00:43:56,790 --> 00:43:58,831 components and use less resolution for the higher 875 00:43:58,831 --> 00:44:01,320 frequency components because you can't see them as well. 876 00:44:01,320 --> 00:44:05,190 JPEG is a formalization of that idea. 877 00:44:05,190 --> 00:44:07,260 So this was made by a joint photography 878 00:44:07,260 --> 00:44:08,770 group that was very successful. 879 00:44:08,770 --> 00:44:10,800 It has four layers of coding. 880 00:44:10,800 --> 00:44:13,960 First thing you worry about is color. 881 00:44:13,960 --> 00:44:14,460 OK. 882 00:44:14,460 --> 00:44:16,650 We think we see a broad range of colors. 883 00:44:16,650 --> 00:44:17,160 Wrong. 884 00:44:17,160 --> 00:44:19,800 We only see three. 885 00:44:19,800 --> 00:44:23,740 So you can throw away the ones that we can't see. 886 00:44:23,740 --> 00:44:26,160 So that's the first step is taking advantage of the fact 887 00:44:26,160 --> 00:44:28,243 that we really can't see all the different colors. 888 00:44:28,243 --> 00:44:30,480 We can really only see three colors. 889 00:44:30,480 --> 00:44:32,580 So there are tricks that you can do 890 00:44:32,580 --> 00:44:35,250 to make the person think he's seeing 891 00:44:35,250 --> 00:44:39,210 the exact shade of yellow, which we don't see very well, 892 00:44:39,210 --> 00:44:42,880 by mixing together a different combination of red, green, 893 00:44:42,880 --> 00:44:44,460 and blue. 894 00:44:44,460 --> 00:44:48,290 So you get to move the colors around. 895 00:44:48,290 --> 00:44:52,760 And you can make it perceptually indistinguishable, 896 00:44:52,760 --> 00:44:53,916 but easier to code. 897 00:44:53,916 --> 00:44:55,790 We won't talk about how you do that, but it's 898 00:44:55,790 --> 00:44:58,940 a very straightforward process by which you start with one 899 00:44:58,940 --> 00:45:01,850 picture and you change all the colors to make them easier 900 00:45:01,850 --> 00:45:02,611 to send. 901 00:45:02,611 --> 00:45:03,110 OK. 902 00:45:03,110 --> 00:45:04,880 So that's the color coding. 903 00:45:04,880 --> 00:45:07,290 Then, they do a discrete cosine transform, 904 00:45:07,290 --> 00:45:10,460 which is really a kind of Fourier series. 905 00:45:10,460 --> 00:45:14,240 Then, they quantize the Fourier series, the DCT. 906 00:45:14,240 --> 00:45:18,380 And then, they code the resulting sequence 907 00:45:18,380 --> 00:45:20,104 using a lossless Huffman code. 908 00:45:20,104 --> 00:45:21,770 So we'll talk about the middle two steps 909 00:45:21,770 --> 00:45:22,978 because that's the fun stuff. 910 00:45:22,978 --> 00:45:25,220 That's the Fourier stuff. 911 00:45:25,220 --> 00:45:30,950 So the way DCT works is you take the image 912 00:45:30,950 --> 00:45:34,790 and you break it into 8 by 8 pixel squares. 913 00:45:34,790 --> 00:45:37,720 And then you do the same processing on each 8 by 8. 914 00:45:40,540 --> 00:45:42,880 So here is an example of an 8 by 8 image. 915 00:45:42,880 --> 00:45:44,860 This is a completely trivial one where 916 00:45:44,860 --> 00:45:47,536 I have linear taper from black to white, linear taper 917 00:45:47,536 --> 00:45:48,910 from black to white, the product. 918 00:45:48,910 --> 00:45:51,760 And all I want to think about is, what's the DCT? 919 00:45:51,760 --> 00:45:57,820 And why do they use a DCT instead of a Fourier transform? 920 00:45:57,820 --> 00:46:00,629 So just like you would expect from the other two-dimensional 921 00:46:00,629 --> 00:46:02,920 image processing, the examples that we've talked about, 922 00:46:02,920 --> 00:46:07,480 the way you do this is you do the DCT on all the rows. 923 00:46:07,480 --> 00:46:10,356 Then, you do the DCT on all the columns. 924 00:46:10,356 --> 00:46:11,230 And then you're done. 925 00:46:11,230 --> 00:46:14,260 That's a two-dimensional DCT. 926 00:46:14,260 --> 00:46:15,740 So here's an example. 927 00:46:15,740 --> 00:46:19,420 What if I took my sample image, which had this linear taper. 928 00:46:19,420 --> 00:46:22,300 So if I think about just one row and I 929 00:46:22,300 --> 00:46:25,960 plot brightness on the vertical, then this 930 00:46:25,960 --> 00:46:28,160 might be my image right here. 931 00:46:28,160 --> 00:46:34,060 And what I do is think about periodically repeating it. 932 00:46:34,060 --> 00:46:36,910 The original signal only had 8 numbers in it. 933 00:46:36,910 --> 00:46:39,340 I'm going to periodically repeat it because then I 934 00:46:39,340 --> 00:46:41,517 can take a Fourier series. 935 00:46:41,517 --> 00:46:43,600 It's a periodic signal, and it's a Fourier series. 936 00:46:43,600 --> 00:46:46,450 The reason I do that is that the Fourier series only 937 00:46:46,450 --> 00:46:51,090 has 8 coefficients. 938 00:46:51,090 --> 00:46:53,520 The Fourier series of an eight-long sequence 939 00:46:53,520 --> 00:46:57,780 has eight Fourier coefficients. 940 00:46:57,780 --> 00:47:02,370 So the idea is that by taking a signal that's 941 00:47:02,370 --> 00:47:03,907 only 8 samples long-- 942 00:47:03,907 --> 00:47:05,490 I mean, the obvious thing you could do 943 00:47:05,490 --> 00:47:12,270 is take the eight-long signal and take a discrete time 944 00:47:12,270 --> 00:47:14,460 Fourier transform. 945 00:47:14,460 --> 00:47:17,610 Problem with that is that that's a continuous function 946 00:47:17,610 --> 00:47:21,790 of omega over 2 pi, over the entire unit circle. 947 00:47:21,790 --> 00:47:27,000 So you take 8 samples and turn it into a function of omega 948 00:47:27,000 --> 00:47:29,130 which has lots of samples. 949 00:47:29,130 --> 00:47:31,170 By thinking about the 8 samples as having 950 00:47:31,170 --> 00:47:35,820 come from a periodic extension, then I 951 00:47:35,820 --> 00:47:39,210 don't get a continuous range of frequencies between minus pi 952 00:47:39,210 --> 00:47:40,260 to pi. 953 00:47:40,260 --> 00:47:44,790 I get exactly 8 of them, a0 through a7. 954 00:47:44,790 --> 00:47:45,660 OK. 955 00:47:45,660 --> 00:47:49,110 So the first step is to do periodic extension 956 00:47:49,110 --> 00:47:50,580 on the 8 samples. 957 00:47:50,580 --> 00:47:53,460 Then, I can represent it by 8 Fourier coefficients. 958 00:47:53,460 --> 00:47:55,860 In the DCT, they almost do that. 959 00:47:55,860 --> 00:47:59,130 But instead of writing down the numbers 1, 2, 3, 4, 5, 6, 7, 8. 960 00:47:59,130 --> 00:48:01,830 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8. 961 00:48:01,830 --> 00:48:03,950 Instead, they write 1, 2, 3, 4, 5, 6, 7, 8. 962 00:48:03,950 --> 00:48:07,356 8, 7, 6, 5, 4, 3, 2, 1. 963 00:48:07,356 --> 00:48:08,314 1, 2, 3, 4, 5, 6, 7, 8. 964 00:48:08,314 --> 00:48:09,830 8, 7, 6, 5, 4, 3, 2, 1. 965 00:48:09,830 --> 00:48:11,370 That seems like a dumb thing to do. 966 00:48:11,370 --> 00:48:12,930 I took an eight-long sequence, which 967 00:48:12,930 --> 00:48:16,530 could be represented with 8 coefficients, 968 00:48:16,530 --> 00:48:18,840 and I turned it into a 16-long sequence, which 969 00:48:18,840 --> 00:48:21,905 now takes 16 coefficients. 970 00:48:21,905 --> 00:48:24,890 Wow, that's brain dead. 971 00:48:24,890 --> 00:48:29,120 Except that it's actually very clever. 972 00:48:29,120 --> 00:48:31,055 Of these two signals, which has the higher 973 00:48:31,055 --> 00:48:32,170 high-frequency content? 974 00:48:36,670 --> 00:48:38,254 [INAUDIBLE] 975 00:48:38,254 --> 00:48:39,170 AUDIENCE: [INAUDIBLE]. 976 00:48:39,170 --> 00:48:41,360 DENNIS FREEMAN: Sharp drop, large amount 977 00:48:41,360 --> 00:48:42,830 of high frequencies. 978 00:48:42,830 --> 00:48:45,270 That's the trick. 979 00:48:45,270 --> 00:48:48,030 So because there's a large amount of high frequencies, 980 00:48:48,030 --> 00:48:52,770 this signal is hard to represent with Fourier series. 981 00:48:52,770 --> 00:48:55,980 This signal is easier because there's fewer high frequencies. 982 00:48:55,980 --> 00:48:58,500 You need fewer of those high frequencies 983 00:48:58,500 --> 00:49:01,830 to do a good job of representing the signal. 984 00:49:01,830 --> 00:49:03,960 You can throw away the high-frequency stuff 985 00:49:03,960 --> 00:49:06,120 and nobody will notice. 986 00:49:06,120 --> 00:49:07,610 OK. 987 00:49:07,610 --> 00:49:14,100 So the idea then is that you use this 16-long sequence, 988 00:49:14,100 --> 00:49:20,590 but then you know that whatever x of 8 was, 989 00:49:20,590 --> 00:49:23,180 it's the same as x of 9 because you always repeat it. 990 00:49:23,180 --> 00:49:26,580 And x of 7, that's the same as x of 10. 991 00:49:26,580 --> 00:49:29,100 So if you take advantage of knowing 992 00:49:29,100 --> 00:49:31,770 that there's a symmetry. 993 00:49:31,770 --> 00:49:34,350 And if you notice, they made it symmetric. 994 00:49:34,350 --> 00:49:37,230 So there's an even-odd kind of symmetry about a weird point. 995 00:49:37,230 --> 00:49:41,760 It's off by 1/2, but there's a symmetry this way, too. 996 00:49:41,760 --> 00:49:44,500 If you take those two things into account, 997 00:49:44,500 --> 00:49:50,040 you can actually represent the 16-length sequence 998 00:49:50,040 --> 00:49:52,810 with 8 numbers. 999 00:49:52,810 --> 00:49:55,020 That's the DCT. 1000 00:49:55,020 --> 00:49:57,630 It's exactly the same as a Fourier, 1001 00:49:57,630 --> 00:50:02,460 except that we're taking the 8 non-trivial numbers 1002 00:50:02,460 --> 00:50:05,880 and putting them together in a funny periodic fashion. 1003 00:50:05,880 --> 00:50:07,350 That's what a DCT does. 1004 00:50:07,350 --> 00:50:12,990 And the point is the DCT maps 8 real numbers, 1005 00:50:12,990 --> 00:50:16,870 which are these yn values. 1006 00:50:16,870 --> 00:50:24,470 It maps 8 real numbers into 8 DCT coefficients. 1007 00:50:24,470 --> 00:50:27,980 And the DCT coefficients, unlike the Fourier coefficients, 1008 00:50:27,980 --> 00:50:30,430 have real values. 1009 00:50:30,430 --> 00:50:32,540 So because of the trick with all the symmetries 1010 00:50:32,540 --> 00:50:34,206 and all that sort of stuff, they arrange 1011 00:50:34,206 --> 00:50:36,640 to make a transform whose imaginary part 1012 00:50:36,640 --> 00:50:38,650 is guaranteed to be 0. 1013 00:50:38,650 --> 00:50:40,870 So there's no information explosion 1014 00:50:40,870 --> 00:50:43,540 in going from the 8 to 16. 1015 00:50:46,990 --> 00:50:49,390 Here's the Fourier representation 1016 00:50:49,390 --> 00:50:50,990 for a 2D picture. 1017 00:50:50,990 --> 00:50:55,150 The Fourier coefficients are falling off like k. 1018 00:50:55,150 --> 00:50:58,800 Here's the DCT where they're falling off like k squared. 1019 00:50:58,800 --> 00:51:02,460 And the point is you can throw those away in the picture 1020 00:51:02,460 --> 00:51:04,890 and barely tell that they're even there. 1021 00:51:04,890 --> 00:51:06,480 That they're even gone. 1022 00:51:06,480 --> 00:51:11,430 So what they do then is they quantize the Fourier 1023 00:51:11,430 --> 00:51:14,490 coefficients at different levels. 1024 00:51:14,490 --> 00:51:17,760 So you divide the 0, 0 coefficient by 16 1025 00:51:17,760 --> 00:51:19,680 and send the whole part. 1026 00:51:19,680 --> 00:51:22,020 You divide the 1, 0 by 11. 1027 00:51:22,020 --> 00:51:25,680 You divide this guy by 61, so you use much less resolution 1028 00:51:25,680 --> 00:51:26,940 by a factor of 4. 1029 00:51:29,940 --> 00:51:31,830 Because then those numbers were chosen 1030 00:51:31,830 --> 00:51:34,650 so that they give rise to coefficients that 1031 00:51:34,650 --> 00:51:38,310 are equally visually distinct. 1032 00:51:38,310 --> 00:51:45,300 The result is that you get very high resolution 1033 00:51:45,300 --> 00:51:48,810 with a very small number of bits. 1034 00:51:48,810 --> 00:51:52,770 So here's an original. 1035 00:51:52,770 --> 00:51:58,930 This picture has 47 kilobytes of data in it. 1036 00:51:58,930 --> 00:52:01,440 And when you change Q, the quality 1037 00:52:01,440 --> 00:52:06,550 of JPEG, what you're really doing is choosing those tables. 1038 00:52:06,550 --> 00:52:10,600 So when you use a high Q, you get a good representation. 1039 00:52:10,600 --> 00:52:13,380 When you use a low Q, you're throwing away more data. 1040 00:52:13,380 --> 00:52:15,330 And you can see that you can throw away-- 1041 00:52:17,850 --> 00:52:20,250 so 47k down to 2k. 1042 00:52:20,250 --> 00:52:25,110 You can throw away 19 pieces of data out of 20 1043 00:52:25,110 --> 00:52:27,820 and you still get a very good resolution picture. 1044 00:52:27,820 --> 00:52:30,330 And that's because the quantization is happening 1045 00:52:30,330 --> 00:52:31,740 in the Fourier domain. 1046 00:52:31,740 --> 00:52:34,620 And you can match the Fourier resolution better 1047 00:52:34,620 --> 00:52:37,050 to the psychophysical properties of the eye. 1048 00:52:37,050 --> 00:52:40,650 So the point is to tell you how to represent signals 1049 00:52:40,650 --> 00:52:44,730 in discrete time in a way that the errors are 1050 00:52:44,730 --> 00:52:47,760 as imperceptible as possible. 1051 00:52:47,760 --> 00:52:50,230 And to demonstrate how the Fourier transform 1052 00:52:50,230 --> 00:52:51,920 lets you do that. 1053 00:52:51,920 --> 00:52:53,280 OK, thanks. 1054 00:52:53,280 --> 00:52:55,130 See you later.