1
00:00:00,090 --> 00:00:02,490
The following content is
provided under a Creative

2
00:00:02,490 --> 00:00:04,030
Commons license.

3
00:00:04,030 --> 00:00:06,330
Your support will help
MIT OpenCourseWare

4
00:00:06,330 --> 00:00:10,720
continue to offer high-quality
educational resources for free.

5
00:00:10,720 --> 00:00:13,320
To make a donation, or
view additional materials

6
00:00:13,320 --> 00:00:17,280
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:17,280 --> 00:00:18,450
at ocw.mit.edu.

8
00:00:26,554 --> 00:00:28,220
DENNIS FREEMAN: So
last time, we started

9
00:00:28,220 --> 00:00:30,500
to think about sampling.

10
00:00:30,500 --> 00:00:34,020
And that's what I want
to finish up today.

11
00:00:34,020 --> 00:00:36,844
I think sampling is a
very important issue.

12
00:00:36,844 --> 00:00:38,510
It's one of the
strengths of this course

13
00:00:38,510 --> 00:00:42,560
because we can think about on
equal footing the way signals

14
00:00:42,560 --> 00:00:46,490
work in a CT system, or in a
DT system, when the signals are

15
00:00:46,490 --> 00:00:48,470
CT, when the signals are DT.

16
00:00:48,470 --> 00:00:50,990
And specifically, when
you convert between them.

17
00:00:50,990 --> 00:00:55,250
Converting between them,
like we saw last time, that's

18
00:00:55,250 --> 00:00:57,890
a very important process because
many of the kinds of signals

19
00:00:57,890 --> 00:01:01,610
that we want to think
about occur in physical--

20
00:01:01,610 --> 00:01:04,849
have a physical origin
where they are naturally

21
00:01:04,849 --> 00:01:08,210
continuous time or continuous
space kinds of signals,

22
00:01:08,210 --> 00:01:10,970
but we would like to use
inexpensive digital electronics

23
00:01:10,970 --> 00:01:12,540
in order to process them.

24
00:01:12,540 --> 00:01:15,590
So it's important to understand
how we can take a CT signal

25
00:01:15,590 --> 00:01:21,900
and represent the information
that's there in a DT manner.

26
00:01:21,900 --> 00:01:28,280
And it's completely remarkable
that you can even do that.

27
00:01:28,280 --> 00:01:32,320
CT signals are in some sense
arbitrarily more complicated

28
00:01:32,320 --> 00:01:33,300
than DT signals.

29
00:01:33,300 --> 00:01:37,490
DT signals only exist at
integer multiples of time,

30
00:01:37,490 --> 00:01:40,360
at integer values of time.

31
00:01:40,360 --> 00:01:44,170
CT signals, in principle,
can do anything

32
00:01:44,170 --> 00:01:47,270
between two consecutive
samples of a DT signal.

33
00:01:47,270 --> 00:01:49,650
So in some sense, they're
arbitrarily more complicated.

34
00:01:49,650 --> 00:01:52,920
So it's kind of
remarkable at all

35
00:01:52,920 --> 00:01:55,720
that we can talk meaningfully
about how you can represent

36
00:01:55,720 --> 00:01:58,970
the information that's in a
CT system with a DT equivalent

37
00:01:58,970 --> 00:01:59,470
system.

38
00:01:59,470 --> 00:02:01,595
And the point is, and the
reason we're doing it now

39
00:02:01,595 --> 00:02:04,110
in this part of the
course, is that by thinking

40
00:02:04,110 --> 00:02:07,870
about Fourier transforms,
everything's very simple.

41
00:02:07,870 --> 00:02:10,539
Something that could be
conceptually quite complicated

42
00:02:10,539 --> 00:02:13,750
is in fact, extremely
simple to think about.

43
00:02:13,750 --> 00:02:16,990
So last time, we saw that the
way to think about the signal,

44
00:02:16,990 --> 00:02:19,450
if you want to sample
it, if you want

45
00:02:19,450 --> 00:02:21,839
to convert a CT
signal to a DT signal,

46
00:02:21,839 --> 00:02:24,130
the way to think about it is
to think about the Fourier

47
00:02:24,130 --> 00:02:26,570
transform.

48
00:02:26,570 --> 00:02:29,690
So then, the example that
we talked about last time,

49
00:02:29,690 --> 00:02:32,610
you think about a
CT signal, x of t.

50
00:02:32,610 --> 00:02:35,280
You think about its sample
is taken uniformly in time.

51
00:02:37,840 --> 00:02:40,619
And then in order to think about
the information and whether

52
00:02:40,619 --> 00:02:42,910
or not you've captured it
all, the question is, can you

53
00:02:42,910 --> 00:02:46,750
reconstruct the original
thing that you started

54
00:02:46,750 --> 00:02:50,250
with from the samples only?

55
00:02:50,250 --> 00:02:50,750
OK.

56
00:02:50,750 --> 00:02:52,270
Well, in general, no.

57
00:02:52,270 --> 00:02:55,300
So what we're really asking
is, what are the rules,

58
00:02:55,300 --> 00:02:58,180
what are the conditions
under which you can do that?

59
00:02:58,180 --> 00:03:00,779
And are they useful
conditions or not?

60
00:03:00,779 --> 00:03:03,070
So the first way you can
think about taking the samples

61
00:03:03,070 --> 00:03:05,200
and turning them back into
a continuous time signal

62
00:03:05,200 --> 00:03:08,545
is something that we called
impulse reconstruction.

63
00:03:08,545 --> 00:03:11,100
In impulse reconstruction,
we substitute

64
00:03:11,100 --> 00:03:16,140
for every sample an impulse
appropriately located in time

65
00:03:16,140 --> 00:03:18,210
and appropriately
scaled in amplitude.

66
00:03:18,210 --> 00:03:20,010
The appropriate
scale and amplitude

67
00:03:20,010 --> 00:03:23,610
is that you take the samples
and you weight the impulses.

68
00:03:23,610 --> 00:03:28,650
You weight the impulse
at the n-th time step

69
00:03:28,650 --> 00:03:34,570
by the sample value for time n.

70
00:03:34,570 --> 00:03:39,772
And you put the n-th
one at time nt, n cap t.

71
00:03:39,772 --> 00:03:40,855
So impulse reconstruction.

72
00:03:40,855 --> 00:03:41,770
It's really easy.

73
00:03:41,770 --> 00:03:43,630
Take all the
samples that you got

74
00:03:43,630 --> 00:03:49,150
by uniform sampling, substitute
for every sample one impulse--

75
00:03:49,150 --> 00:03:52,660
appropriately timed,
appropriately weighted.

76
00:03:52,660 --> 00:03:54,490
OK, that's great.

77
00:03:54,490 --> 00:03:55,990
It's especially
nice because there's

78
00:03:55,990 --> 00:04:00,730
a simple Fourier representation
for that process.

79
00:04:00,730 --> 00:04:03,580
That process, if we think
about just taking x of t

80
00:04:03,580 --> 00:04:09,480
and turning it into this
impulse reconstruction,

81
00:04:09,480 --> 00:04:11,760
that impulse reconstruction
is precisely the same

82
00:04:11,760 --> 00:04:16,029
as if I had multiplied
the original signal x of t

83
00:04:16,029 --> 00:04:18,390
by an impulse train.

84
00:04:18,390 --> 00:04:22,780
Impulse is separated by
capital T unit height.

85
00:04:22,780 --> 00:04:25,270
So that means the transformation
can be thought of in terms

86
00:04:25,270 --> 00:04:27,550
of Fourier transforms
as the convolution

87
00:04:27,550 --> 00:04:30,250
of the original spectrum,
the original Fourier

88
00:04:30,250 --> 00:04:34,180
transform, with the Fourier
transform of the impulse train,

89
00:04:34,180 --> 00:04:37,270
which is just another
impulse train.

90
00:04:37,270 --> 00:04:42,460
So the rule is you can
represent all the information

91
00:04:42,460 --> 00:04:46,340
in the signal if the signal
started out being bandlimited.

92
00:04:46,340 --> 00:04:46,840
OK.

93
00:04:46,840 --> 00:04:51,250
If this signal had a region
of frequency over which it is

94
00:04:51,250 --> 00:04:55,420
non-0 and for the rest of
frequency the signal is 0,

95
00:04:55,420 --> 00:04:59,260
then when you do the aliasing,
you can arrange the period

96
00:04:59,260 --> 00:05:01,180
so that the aliased copy--

97
00:05:01,180 --> 00:05:05,950
so that the convolved copies
don't overlap with each other.

98
00:05:05,950 --> 00:05:06,680
OK.

99
00:05:06,680 --> 00:05:10,850
So that was a simple
way of thinking about,

100
00:05:10,850 --> 00:05:13,050
how much information
was in the samples,

101
00:05:13,050 --> 00:05:15,410
by thinking about the
impulse reconstruction.

102
00:05:15,410 --> 00:05:19,730
Of course, the signal that we
reconstruct by this convolution

103
00:05:19,730 --> 00:05:24,530
process has multiple copies
of the same frequency content.

104
00:05:24,530 --> 00:05:26,030
So we don't like that.

105
00:05:26,030 --> 00:05:28,760
So you can throw away
those extra copies

106
00:05:28,760 --> 00:05:30,830
by doing a low-pass
filtering operation.

107
00:05:30,830 --> 00:05:33,590
And we call that
reconstruction-- the xr,

108
00:05:33,590 --> 00:05:36,870
we call that the
bandlimited reconstruction.

109
00:05:36,870 --> 00:05:38,780
It's like the impulse
reconstruction,

110
00:05:38,780 --> 00:05:41,050
except that it's bandlimited.

111
00:05:41,050 --> 00:05:42,410
OK.

112
00:05:42,410 --> 00:05:45,530
So we think of two ways of
doing the reconstruction

113
00:05:45,530 --> 00:05:46,310
from the samples--

114
00:05:46,310 --> 00:05:49,100
the impulse reconstruction,
the bandlimited reconstruction.

115
00:05:49,100 --> 00:05:50,780
And the key is the
sampling theorem.

116
00:05:50,780 --> 00:05:58,010
The sampling theorem says that
if the original signal had

117
00:05:58,010 --> 00:06:02,720
non-zero frequency content
over only some particular range

118
00:06:02,720 --> 00:06:07,160
of frequencies, you can
sample fast enough so that you

119
00:06:07,160 --> 00:06:09,140
can represent all of
the information that's

120
00:06:09,140 --> 00:06:13,400
in the continuous time
signal with the samples.

121
00:06:13,400 --> 00:06:13,900
OK.

122
00:06:13,900 --> 00:06:15,080
Is that all clear?

123
00:06:15,080 --> 00:06:17,330
The point is we're trying
to represent the information

124
00:06:17,330 --> 00:06:20,930
in a CT signal using DT.

125
00:06:20,930 --> 00:06:26,210
And that the Fourier transform
is a way to visualize when you

126
00:06:26,210 --> 00:06:29,090
can do that and when
you cannot do that.

127
00:06:29,090 --> 00:06:32,720
You still end up in
a physical system,

128
00:06:32,720 --> 00:06:35,780
perhaps generating signals
whose frequency content

129
00:06:35,780 --> 00:06:38,410
falls out of that range.

130
00:06:38,410 --> 00:06:41,920
We saw an illustration
of that last time.

131
00:06:41,920 --> 00:06:43,420
So for example,
if you were to try

132
00:06:43,420 --> 00:06:47,080
to represent a signal
with this transform using

133
00:06:47,080 --> 00:06:52,420
a sampling period t, so that
the impulses in frequency

134
00:06:52,420 --> 00:06:54,910
were separated by
2 pi over t, which

135
00:06:54,910 --> 00:07:00,610
happened to be less than
twice this distance,

136
00:07:00,610 --> 00:07:03,320
then it would alias.

137
00:07:03,320 --> 00:07:05,150
That's bad.

138
00:07:05,150 --> 00:07:06,980
So we would typically
also include

139
00:07:06,980 --> 00:07:11,030
an anti-aliasing filter,
pre-filter the signal

140
00:07:11,030 --> 00:07:13,940
from physics, get
rid of the parts

141
00:07:13,940 --> 00:07:17,450
that you know are going to be a
problem when you try to sample.

142
00:07:17,450 --> 00:07:20,450
Then, go ahead and do
the regular sampling,

143
00:07:20,450 --> 00:07:22,790
the regular uniform sampling,
the regular bandlimited

144
00:07:22,790 --> 00:07:24,380
reconstruction.

145
00:07:24,380 --> 00:07:26,840
And the signal that
you reconstruct

146
00:07:26,840 --> 00:07:29,840
won't be an identical
copy, but it

147
00:07:29,840 --> 00:07:34,070
will be as close as you can
given the sampling theorem.

148
00:07:34,070 --> 00:07:35,170
OK.

149
00:07:35,170 --> 00:07:36,650
So that's what we did last time.

150
00:07:36,650 --> 00:07:39,260
What I want to do today is
think about some other issues

151
00:07:39,260 --> 00:07:44,290
that come up when you try to
represent a continuous signal

152
00:07:44,290 --> 00:07:46,880
in a discrete domain.

153
00:07:46,880 --> 00:07:49,150
So in addition to thinking
about discretizing time,

154
00:07:49,150 --> 00:07:53,530
we also have to think about
discretizing amplitude.

155
00:07:53,530 --> 00:07:57,140
Because if we want to
represent a signal by bits--

156
00:07:57,140 --> 00:07:59,470
so we have to represent
not only the time,

157
00:07:59,470 --> 00:08:03,250
but also the amplitude in bits.

158
00:08:03,250 --> 00:08:04,930
I'll talk about
several different kinds

159
00:08:04,930 --> 00:08:06,580
of schemes for that.

160
00:08:06,580 --> 00:08:09,100
In the simplest
kinds of schemes,

161
00:08:09,100 --> 00:08:12,400
the code for the
representation in amplitude

162
00:08:12,400 --> 00:08:14,680
is separately
derived from the code

163
00:08:14,680 --> 00:08:17,300
for the representation in time.

164
00:08:17,300 --> 00:08:21,670
So we can think of it as two
boxes, a sampling box followed

165
00:08:21,670 --> 00:08:24,470
by a quantization box.

166
00:08:24,470 --> 00:08:27,620
The first box, the sampling
box, takes the CT signal of time

167
00:08:27,620 --> 00:08:29,420
and turns it into a DT signal.

168
00:08:29,420 --> 00:08:32,179
The second box
takes the samples,

169
00:08:32,179 --> 00:08:35,840
which have a continuous domain,
and turn them into samples

170
00:08:35,840 --> 00:08:37,640
from a finite domain--

171
00:08:37,640 --> 00:08:39,770
from a discrete domain.

172
00:08:39,770 --> 00:08:41,600
OK.

173
00:08:41,600 --> 00:08:45,867
So if you're doing that kind
of a quantization scheme,

174
00:08:45,867 --> 00:08:47,450
then the thing you
have to think about

175
00:08:47,450 --> 00:08:49,040
is how many bits
you're willing to use

176
00:08:49,040 --> 00:08:50,375
to represent each sample.

177
00:08:50,375 --> 00:08:52,250
I mean, this is the
simplest kind of a scheme

178
00:08:52,250 --> 00:08:52,820
that you could use.

179
00:08:52,820 --> 00:08:55,361
There's much more complicated
schemes by the end of the hour.

180
00:08:55,361 --> 00:08:57,770
I'll tell you about
a scheme that is

181
00:08:57,770 --> 00:08:59,892
much more efficient than this.

182
00:08:59,892 --> 00:09:01,350
But this is kind
of the base level.

183
00:09:01,350 --> 00:09:02,600
This is where you would start.

184
00:09:02,600 --> 00:09:05,810
So if you wanted to
represent an amplitude

185
00:09:05,810 --> 00:09:08,177
in a discrete
representation, one way

186
00:09:08,177 --> 00:09:10,510
you could do about it-- one
way you could think about it

187
00:09:10,510 --> 00:09:15,830
is to think about the map
between the continuous values

188
00:09:15,830 --> 00:09:22,260
that the sample could acquire
and map it to a discrete output

189
00:09:22,260 --> 00:09:23,040
set.

190
00:09:23,040 --> 00:09:28,980
So for example, if you were
using 2 bits per sample,

191
00:09:28,980 --> 00:09:32,930
then you might represent any
voltage between minus 1/2

192
00:09:32,930 --> 00:09:37,040
and 1/2 by some code 0, 1.

193
00:09:37,040 --> 00:09:42,260
Any voltage that's in the range
1/2 to 1 as the code 1, 0.

194
00:09:42,260 --> 00:09:46,550
And any voltage in the range
minus 1 to minus 1/2 as 0, 0.

195
00:09:46,550 --> 00:09:49,239
That would be a way of
taking a continuous range

196
00:09:49,239 --> 00:09:50,780
of possible amplitudes
and turning it

197
00:09:50,780 --> 00:09:56,170
into a discrete number
using just 2 bits.

198
00:09:56,170 --> 00:09:59,290
Obviously if you use more bits,
you can get greater precision.

199
00:09:59,290 --> 00:10:02,680
What's showed below here
is, what if my signal was

200
00:10:02,680 --> 00:10:04,200
a function of time--

201
00:10:04,200 --> 00:10:07,350
looked like the red waveform.

202
00:10:07,350 --> 00:10:11,040
My discrete representation might
look like the blue waveform,

203
00:10:11,040 --> 00:10:11,610
right?

204
00:10:11,610 --> 00:10:14,070
If I'm imagining that
I only have 2 bits,

205
00:10:14,070 --> 00:10:19,440
then I only have 3
possible symmetric outputs.

206
00:10:19,440 --> 00:10:21,150
So that might be
represented by the blue.

207
00:10:21,150 --> 00:10:23,108
And the difference between
the red and the blue

208
00:10:23,108 --> 00:10:24,250
is showed in the green.

209
00:10:24,250 --> 00:10:26,550
And as you can see as
you go to more bits,

210
00:10:26,550 --> 00:10:28,330
you obviously get errors--

211
00:10:28,330 --> 00:10:30,330
the green signal as it's
getting smaller, right?

212
00:10:30,330 --> 00:10:32,610
So the key thing then
is, how many bits

213
00:10:32,610 --> 00:10:38,730
do you need for the thing that
you're trying to represent?

214
00:10:38,730 --> 00:10:40,710
So I like hearing.

215
00:10:40,710 --> 00:10:44,910
So I'll illustrate the number
of bits by thinking about sound.

216
00:10:44,910 --> 00:10:48,390
You can hear sounds that range
in amplitude over a range

217
00:10:48,390 --> 00:10:51,490
of about a million to 1.

218
00:10:51,490 --> 00:10:55,990
So if you were to put a person
with good ears-- not me,

219
00:10:55,990 --> 00:10:56,800
one of you.

220
00:10:56,800 --> 00:10:59,680
If you were to put one
of you into a quiet room

221
00:10:59,680 --> 00:11:02,200
and let you sit there until
you adapted, and then played

222
00:11:02,200 --> 00:11:06,490
the faintest sound that
you could possibly hear,

223
00:11:06,490 --> 00:11:09,820
then multiplied by
10, multiplied by 10,

224
00:11:09,820 --> 00:11:14,320
multiplied by 10, you could
make it a million times

225
00:11:14,320 --> 00:11:16,480
more intense in pressure.

226
00:11:16,480 --> 00:11:18,610
You could amplify the
pressure by a million

227
00:11:18,610 --> 00:11:20,125
before it'd start to hurt.

228
00:11:20,125 --> 00:11:21,910
It wouldn't damage yet.

229
00:11:21,910 --> 00:11:25,420
You'd have to go
to about 8 million,

230
00:11:25,420 --> 00:11:27,580
and then it would
start to damage.

231
00:11:27,580 --> 00:11:30,190
But you could do about a
million to 1 over the range

232
00:11:30,190 --> 00:11:34,940
from just barely audible
to starts to hurt.

233
00:11:34,940 --> 00:11:36,980
So how many bits would
it take to do that range?

234
00:12:12,764 --> 00:12:14,780
So how many bits would it take?

235
00:12:14,780 --> 00:12:15,500
Raise your hands.

236
00:12:15,500 --> 00:12:16,666
Show me a number of fingers.

237
00:12:16,666 --> 00:12:20,850
How many bits would it take
to represent a million to 1?

238
00:12:20,850 --> 00:12:21,710
OK.

239
00:12:21,710 --> 00:12:22,210
100%.

240
00:12:22,210 --> 00:12:23,670
I think it's 100%.

241
00:12:23,670 --> 00:12:25,950
So easy question.

242
00:12:25,950 --> 00:12:30,360
So if you use 1 bit, you
can represent 2 levels.

243
00:12:30,360 --> 00:12:32,480
If you use 2 bits, you can do 4.

244
00:12:32,480 --> 00:12:33,826
8, 16, 32.

245
00:12:33,826 --> 00:12:35,950
By the time you get to 10
bits, you're up to 1,024.

246
00:12:35,950 --> 00:12:39,270
By the time you're up to
20 bits, you're up to 1,024

247
00:12:39,270 --> 00:12:41,970
squared.

248
00:12:41,970 --> 00:12:45,810
OK, 20 bits ought to do it.

249
00:12:45,810 --> 00:12:48,320
And in fact, 20 bits--

250
00:12:48,320 --> 00:12:51,590
if you were to buy a
high-end audio system,

251
00:12:51,590 --> 00:12:52,910
it would be 24-bits.

252
00:12:52,910 --> 00:12:55,340
There are people who
claim you need 32.

253
00:12:55,340 --> 00:12:56,840
I think they're kind of crazy.

254
00:12:56,840 --> 00:13:01,070
But a high-end audio system
would be a 24-bit system.

255
00:13:01,070 --> 00:13:04,490
Now, if you were to listen
to sort of CD quality,

256
00:13:04,490 --> 00:13:07,100
CDs are 16 bits.

257
00:13:07,100 --> 00:13:09,920
So there are
people, even me, who

258
00:13:09,920 --> 00:13:13,100
claim that they can tell the
difference between a concert

259
00:13:13,100 --> 00:13:15,500
and a CD representation
of a concert.

260
00:13:15,500 --> 00:13:16,220
OK.

261
00:13:16,220 --> 00:13:18,980
So there might be some
limitations of representing

262
00:13:18,980 --> 00:13:21,920
audio with 16 bits.

263
00:13:21,920 --> 00:13:24,170
But what I'll show
you is a demo where

264
00:13:24,170 --> 00:13:26,120
I've showed the
same piece of music

265
00:13:26,120 --> 00:13:30,110
at 16 bits, 8 bits, 6
bits, 4 bits, 2 bits, and 1

266
00:13:30,110 --> 00:13:33,680
bit per sample, so that you get
the idea of what a quantization

267
00:13:33,680 --> 00:13:34,816
error sounds like.

268
00:13:34,816 --> 00:13:35,316
Yes.

269
00:13:35,316 --> 00:13:36,774
AUDIENCE: So I
think the difference

270
00:13:36,774 --> 00:13:40,144
between a concert and a CD,
it's mainly because [INAUDIBLE].

271
00:13:40,144 --> 00:13:42,560
DENNIS FREEMAN: There's lots
of things that are different.

272
00:13:42,560 --> 00:13:44,600
And you're raising
a very good point.

273
00:13:44,600 --> 00:13:49,730
You certainly don't get the
spatial aspects of a concert.

274
00:13:49,730 --> 00:13:51,310
We try to fake you out.

275
00:13:51,310 --> 00:13:53,540
We put false cues
in, so the violin

276
00:13:53,540 --> 00:13:55,820
sounds like it's
on the right side.

277
00:13:55,820 --> 00:13:57,380
But those are all fake, usually.

278
00:13:57,380 --> 00:14:00,170
Well, they're not
completely fake.

279
00:14:00,170 --> 00:14:01,580
And we have stereo.

280
00:14:01,580 --> 00:14:04,400
And we have 5 plus 1.

281
00:14:04,400 --> 00:14:06,350
So we have lots of
different representations.

282
00:14:06,350 --> 00:14:08,915
But if you were to imagine
listening in a concert

283
00:14:08,915 --> 00:14:11,030
monaurally.

284
00:14:11,030 --> 00:14:15,200
So plug your ear, clamp
your head so you can't turn,

285
00:14:15,200 --> 00:14:21,650
and compare that to listening
with a mono headphone, that's

286
00:14:21,650 --> 00:14:23,409
what I'm talking about.

287
00:14:23,409 --> 00:14:25,700
So if you didn't get spatial
cues and things like that.

288
00:14:28,410 --> 00:14:29,470
OK.

289
00:14:29,470 --> 00:14:33,020
So the issue then is to
listen to different levels

290
00:14:33,020 --> 00:14:36,685
of quantization.

291
00:14:36,685 --> 00:14:40,178
[MUSIC PLAYING]

292
00:15:44,827 --> 00:15:47,160
DENNIS FREEMAN: So it's
actually kind of amazing, right?

293
00:15:47,160 --> 00:15:50,336
You can sort of tell what the
piece is the whole way down

294
00:15:50,336 --> 00:15:52,710
to-- how many of you could
tell the difference between 16

295
00:15:52,710 --> 00:15:53,210
and 8?

296
00:15:55,838 --> 00:15:56,840
AUDIENCE: [INAUDIBLE]

297
00:15:56,840 --> 00:15:58,590
DENNIS FREEMAN: How
many of you could tell

298
00:15:58,590 --> 00:15:59,881
the difference between 8 and 6?

299
00:16:02,700 --> 00:16:05,430
How many of you could tell
any difference whatever?

300
00:16:05,430 --> 00:16:07,830
Just joking.

301
00:16:07,830 --> 00:16:11,350
What's the difference
in the sound quality?

302
00:16:11,350 --> 00:16:13,467
What's the effect of quantizing?

303
00:16:13,467 --> 00:16:15,050
AUDIENCE: Fuzziness
in the background.

304
00:16:15,050 --> 00:16:17,810
DENNIS FREEMAN: Kind of fuzzy.

305
00:16:17,810 --> 00:16:20,806
So could you simulate
the fuzzy sound?

306
00:16:20,806 --> 00:16:22,430
What would you do if
you wanted to sort

307
00:16:22,430 --> 00:16:24,859
of simulate the fuzzy sound?

308
00:16:24,859 --> 00:16:26,400
Besides, of course,
quantizing, which

309
00:16:26,400 --> 00:16:30,246
would be a perfect simulation.

310
00:16:30,246 --> 00:16:31,650
AUDIENCE: [INAUDIBLE]

311
00:16:31,650 --> 00:16:32,790
DENNIS FREEMAN: Noise.

312
00:16:32,790 --> 00:16:34,920
It kind of sounds hissy.

313
00:16:34,920 --> 00:16:37,980
[HISSING] It sounds
kind of noisy and that's

314
00:16:37,980 --> 00:16:41,000
kind of the point.

315
00:16:41,000 --> 00:16:42,680
And that's an important
issue because it

316
00:16:42,680 --> 00:16:45,450
affects how much music you
can put on any given medium.

317
00:16:45,450 --> 00:16:48,590
So for example, in a
CD, CDs are 16 bits

318
00:16:48,590 --> 00:16:53,030
per sample, 2 channels,
44.1 kilosamples per second,

319
00:16:53,030 --> 00:16:54,500
60 seconds per minute.

320
00:16:54,500 --> 00:16:59,660
74 minutes is a typical
recording time for a CD.

321
00:16:59,660 --> 00:17:02,234
So you end up with
about a gigabyte.

322
00:17:02,234 --> 00:17:03,650
And that's what
you can put on one

323
00:17:03,650 --> 00:17:06,099
of those little plastic things.

324
00:17:06,099 --> 00:17:12,130
If you were willing to live
with 8-bit instead of 16-bit,

325
00:17:12,130 --> 00:17:17,589
you could obviously
put on 148 minutes.

326
00:17:17,589 --> 00:17:22,480
So people don't make
these decisions lightly.

327
00:17:22,480 --> 00:17:24,970
It's how many people
do you make angry

328
00:17:24,970 --> 00:17:26,916
for one reason or
the other, right?

329
00:17:26,916 --> 00:17:29,290
You can make them angry because
they don't get much music

330
00:17:29,290 --> 00:17:30,915
or you can make them
angry because they

331
00:17:30,915 --> 00:17:34,030
don't get high quality, right?

332
00:17:34,030 --> 00:17:36,040
So you get to sort
of trade-off the kind

333
00:17:36,040 --> 00:17:38,910
of people who hate you.

334
00:17:38,910 --> 00:17:40,660
But that's the kind of idea.

335
00:17:40,660 --> 00:17:42,460
So if you have a
piece of plastic

336
00:17:42,460 --> 00:17:46,600
on which you can
put 1 gigabyte, you

337
00:17:46,600 --> 00:17:50,140
have to think about how
you're going to represent it.

338
00:17:50,140 --> 00:17:53,270
And it matters how
frequently you sample.

339
00:17:53,270 --> 00:17:58,180
And also, with what quantization
you represent each sample.

340
00:17:58,180 --> 00:18:00,460
Same sort of thing
happens for pictures.

341
00:18:00,460 --> 00:18:04,060
Here's a relatively
high-quality picture,

342
00:18:04,060 --> 00:18:09,250
where it's 280 by 280 pixels.

343
00:18:09,250 --> 00:18:13,692
And it's an 8-bit
representation in amplitude.

344
00:18:13,692 --> 00:18:15,400
The point's just that
the kinds of things

345
00:18:15,400 --> 00:18:16,960
that happen when you
quantize a picture

346
00:18:16,960 --> 00:18:19,293
are very similar to the same
sorts of things that happen

347
00:18:19,293 --> 00:18:21,070
when you quantized audio.

348
00:18:21,070 --> 00:18:24,010
So if we take this picture
and compare it to--

349
00:18:24,010 --> 00:18:27,640
substitute for each
pixel a quantized version

350
00:18:27,640 --> 00:18:29,380
of the amplitude.

351
00:18:29,380 --> 00:18:33,980
Quantized here to 8
bits and here to 7 bits.

352
00:18:33,980 --> 00:18:35,930
You might be able to
see the difference.

353
00:18:35,930 --> 00:18:38,270
If I come up really
close, I can certainly

354
00:18:38,270 --> 00:18:41,210
see quantization effects.

355
00:18:41,210 --> 00:18:58,740
If I drop the right one
to 6, 5, 4, 3, 2, 1.

356
00:18:58,740 --> 00:18:59,240
OK.

357
00:18:59,240 --> 00:19:02,240
So here is 8 bits and 4 bits.

358
00:19:02,240 --> 00:19:05,630
Remember that when we thought
about the audio example,

359
00:19:05,630 --> 00:19:07,130
it sounded fuzzy.

360
00:19:07,130 --> 00:19:08,500
It sounded hissy.

361
00:19:08,500 --> 00:19:12,514
[HISSING] What's the
effect of quantizing here?

362
00:19:12,514 --> 00:19:14,462
Yeah.

363
00:19:14,462 --> 00:19:16,886
AUDIENCE: [INAUDIBLE]

364
00:19:16,886 --> 00:19:18,010
DENNIS FREEMAN: Sharp and--

365
00:19:18,010 --> 00:19:18,890
say again?

366
00:19:18,890 --> 00:19:20,330
AUDIENCE: The contrast.

367
00:19:20,330 --> 00:19:22,413
DENNIS FREEMAN: Well,
there's certainly a problem.

368
00:19:22,413 --> 00:19:25,930
So both of these pictures
have high contrast, right?

369
00:19:25,930 --> 00:19:29,710
How would I see contrast
in the pictures?

370
00:19:29,710 --> 00:19:32,320
Contrast refers to
having big steps,

371
00:19:32,320 --> 00:19:34,300
step changes in brightness.

372
00:19:34,300 --> 00:19:37,480
So like, I might see a high
contrast between this petal

373
00:19:37,480 --> 00:19:39,740
and that leaf.

374
00:19:39,740 --> 00:19:42,820
And I still have a high
contrast at the analogous place

375
00:19:42,820 --> 00:19:44,210
over here.

376
00:19:44,210 --> 00:19:46,870
So there is some
contrast effects.

377
00:19:46,870 --> 00:19:48,970
A little more
subtly, the contrast

378
00:19:48,970 --> 00:19:53,420
affects how well you
see the quantization.

379
00:19:53,420 --> 00:19:55,820
So if I changed
the picture to have

380
00:19:55,820 --> 00:19:57,470
different amounts
of contrast, I could

381
00:19:57,470 --> 00:20:03,960
effect whether you could see
the quantization well or poorly.

382
00:20:03,960 --> 00:20:06,890
So in audio, the
effect of quantizing--

383
00:20:06,890 --> 00:20:09,080
as I quantized more
and more and more,

384
00:20:09,080 --> 00:20:12,950
I caused more and more hiss
[HISSING] in the background.

385
00:20:12,950 --> 00:20:14,272
What's the effect here?

386
00:20:14,272 --> 00:20:15,605
What's the effect of quantizing?

387
00:20:15,605 --> 00:20:16,350
Yeah.

388
00:20:16,350 --> 00:20:18,472
AUDIENCE: You have less
grays to work with.

389
00:20:18,472 --> 00:20:19,930
DENNIS FREEMAN: I
have fewer grays.

390
00:20:19,930 --> 00:20:22,355
AUDIENCE: So 1-bit was
just black and white.

391
00:20:22,355 --> 00:20:25,646
So as you increase bits,
you get more grays--

392
00:20:25,646 --> 00:20:26,770
DENNIS FREEMAN: Absolutely.

393
00:20:26,770 --> 00:20:30,010
Could you give me sort of
a qualitative assessment

394
00:20:30,010 --> 00:20:31,990
of the kinds of errors
that you see here

395
00:20:31,990 --> 00:20:34,822
compared to the kinds of errors
that you don't see there?

396
00:20:34,822 --> 00:20:35,322
Yeah.

397
00:20:35,322 --> 00:20:36,146
AUDIENCE: [INAUDIBLE]

398
00:20:36,146 --> 00:20:37,479
DENNIS FREEMAN: There's banding.

399
00:20:37,479 --> 00:20:39,040
Why would there be banding?

400
00:20:39,040 --> 00:20:41,140
Nobody said the audio
sounded like it was banded.

401
00:20:43,830 --> 00:20:46,270
We just don't hear
that way, right?

402
00:20:46,270 --> 00:20:48,560
Even though we're doing
a similar process,

403
00:20:48,560 --> 00:20:51,880
why do we see
banding in pictures?

404
00:20:51,880 --> 00:20:53,290
What's causing the banding?

405
00:20:53,290 --> 00:20:53,854
Yeah.

406
00:20:53,854 --> 00:20:57,920
AUDIENCE: [INAUDIBLE]

407
00:20:57,920 --> 00:20:59,170
DENNIS FREEMAN: Yeah, exactly.

408
00:20:59,170 --> 00:21:02,410
So the pixels that are nearby--

409
00:21:02,410 --> 00:21:06,940
so take the pixels here, which
came from pixels over here.

410
00:21:06,940 --> 00:21:09,700
They have nearly
the same gray value,

411
00:21:09,700 --> 00:21:14,350
but the quantizer is
making up its mind

412
00:21:14,350 --> 00:21:15,940
at a very precise level.

413
00:21:15,940 --> 00:21:18,159
It's deciding, oh, you're
between these two levels.

414
00:21:18,159 --> 00:21:19,075
Turn into this number.

415
00:21:19,075 --> 00:21:20,533
If you're between
these two levels,

416
00:21:20,533 --> 00:21:22,100
turn into this other number.

417
00:21:22,100 --> 00:21:24,190
So you get the bands
because there's

418
00:21:24,190 --> 00:21:28,660
correlations in the brightnesses
of pixels that are nearby.

419
00:21:28,660 --> 00:21:30,520
So you get this
banding thing that

420
00:21:30,520 --> 00:21:33,730
can be objectionable whenever
the quantization is not

421
00:21:33,730 --> 00:21:35,070
sufficient.

422
00:21:35,070 --> 00:21:37,000
OK.

423
00:21:37,000 --> 00:21:41,560
So one way you can reduce
that is called dithering.

424
00:21:41,560 --> 00:21:44,040
Dithering means add noise.

425
00:21:44,040 --> 00:21:45,770
So that's kind of weird.

426
00:21:45,770 --> 00:21:47,770
So I want to get
rid of the bands.

427
00:21:47,770 --> 00:21:49,000
So what do I do?

428
00:21:49,000 --> 00:21:51,550
I take every pixel.

429
00:21:51,550 --> 00:21:56,880
And before I quantize
it, I add noise to it.

430
00:21:56,880 --> 00:22:00,220
Then even if the pixels came
from a region that were nearly

431
00:22:00,220 --> 00:22:04,270
the same amplitude
to start with,

432
00:22:04,270 --> 00:22:08,210
each individual pixel gets
a different amount of noise

433
00:22:08,210 --> 00:22:11,680
so they quantize differently.

434
00:22:11,680 --> 00:22:15,160
And if I choose my
noise in a clever way,

435
00:22:15,160 --> 00:22:18,310
I could use my noise to be
plus or minus 1 quantum.

436
00:22:18,310 --> 00:22:20,500
So I could choose a
random number generator

437
00:22:20,500 --> 00:22:23,440
that gave me numbers
that were evenly

438
00:22:23,440 --> 00:22:27,130
distributed over the
range minus 1/2 quantum

439
00:22:27,130 --> 00:22:29,410
to plus 1/2 quantum.

440
00:22:29,410 --> 00:22:31,630
And if I do that,
then I can generate

441
00:22:31,630 --> 00:22:35,650
a picture that is
quantized but was dithered

442
00:22:35,650 --> 00:22:37,600
before it was quantized.

443
00:22:37,600 --> 00:22:41,140
So the two pictures are
both quantized at the level

444
00:22:41,140 --> 00:22:44,710
of 7 bits, but the
one on the right

445
00:22:44,710 --> 00:22:46,550
had dither added to it first.

446
00:22:46,550 --> 00:22:51,430
So I'm adding noise before
I do the quantization.

447
00:22:51,430 --> 00:22:53,494
And you can't see too much at 7.

448
00:22:53,494 --> 00:23:01,181
6, 5, 4, 3.

449
00:23:01,181 --> 00:23:01,680
OK.

450
00:23:01,680 --> 00:23:03,790
So what's the difference
between the two?

451
00:23:03,790 --> 00:23:09,240
Well, over here
I had these bands

452
00:23:09,240 --> 00:23:12,300
because the amplitudes
were such that they all got

453
00:23:12,300 --> 00:23:14,940
converted into the same output.

454
00:23:14,940 --> 00:23:16,635
The bands have
disappeared over there.

455
00:23:19,365 --> 00:23:21,150
2.

456
00:23:21,150 --> 00:23:25,440
Even 1 the bands have
disappeared, right?

457
00:23:25,440 --> 00:23:27,380
But that's obviously
not a good solution.

458
00:23:27,380 --> 00:23:28,760
So what's wrong with dither?

459
00:23:33,750 --> 00:23:34,674
AUDIENCE: Noisy.

460
00:23:34,674 --> 00:23:35,840
DENNIS FREEMAN: Noisy, yeah.

461
00:23:35,840 --> 00:23:38,540
I'm kind of going back
to the hiss thing, right?

462
00:23:38,540 --> 00:23:42,045
Now, I've taken a picture
that had had bands

463
00:23:42,045 --> 00:23:44,170
and I've turned it into a
picture that looks noisy.

464
00:23:47,250 --> 00:23:50,570
There's a way to think
about how the noise works.

465
00:23:50,570 --> 00:23:53,710
Imagine that I had a
smoothly-varying signal showed

466
00:23:53,710 --> 00:23:59,330
in blue that was being turned
from a continuous range

467
00:23:59,330 --> 00:24:02,220
of amplitudes into a
discrete range of amplitudes.

468
00:24:02,220 --> 00:24:04,350
So let's represent the
discrete amplitudes

469
00:24:04,350 --> 00:24:07,340
by the dashed red lines.

470
00:24:07,340 --> 00:24:09,710
Then, the signal
that I might quantize

471
00:24:09,710 --> 00:24:12,184
could look like the red signal.

472
00:24:12,184 --> 00:24:13,850
And that's a very
graphic representation

473
00:24:13,850 --> 00:24:16,910
of where the bands come from.

474
00:24:16,910 --> 00:24:20,480
So the bands come from the
fact that the original signal

475
00:24:20,480 --> 00:24:27,390
sliced through a small
number of quantized outputs.

476
00:24:27,390 --> 00:24:30,400
Everybody see where
the bands are?

477
00:24:30,400 --> 00:24:34,780
Then, if I add dither,
I can think about--

478
00:24:34,780 --> 00:24:37,160
so this transformation
from blue to red,

479
00:24:37,160 --> 00:24:40,550
I can think about that
as being y equals Q of x.

480
00:24:40,550 --> 00:24:45,100
So x is the blue line,
Q of x is the red line.

481
00:24:45,100 --> 00:24:47,020
Down here, what
I've done is I've

482
00:24:47,020 --> 00:24:48,760
taken x and added noise to it.

483
00:24:48,760 --> 00:24:52,330
Then, I ran it through
the same quantizer.

484
00:24:52,330 --> 00:24:55,134
And you can see that
I've broken up the bands,

485
00:24:55,134 --> 00:24:57,175
but you can see that I've
added a bunch of noise.

486
00:24:59,755 --> 00:25:01,380
So there's a slightly
more clever thing

487
00:25:01,380 --> 00:25:03,510
that we can do that's
called Robert's technique.

488
00:25:03,510 --> 00:25:07,560
Larry Roberts was a
masters student here.

489
00:25:07,560 --> 00:25:09,660
He was here before
I was here, which

490
00:25:09,660 --> 00:25:11,790
is kind of a remarkable thing.

491
00:25:11,790 --> 00:25:15,480
But they actually wrote thesis
back then and they used paper.

492
00:25:15,480 --> 00:25:18,870
And you can go to the
library and it's still there.

493
00:25:18,870 --> 00:25:22,440
So Larry thought of
a method for dealing

494
00:25:22,440 --> 00:25:29,340
with this where what you do is
you take the original signal x,

495
00:25:29,340 --> 00:25:31,350
you add n to it and
quantize it, but then you

496
00:25:31,350 --> 00:25:32,350
subtract n back off.

497
00:25:35,659 --> 00:25:37,200
And that's called
Robert's technique.

498
00:25:37,200 --> 00:25:40,920
And that's illustrated
by this transformation.

499
00:25:40,920 --> 00:25:45,800
The good thing about this
transformation is that this--

500
00:25:45,800 --> 00:25:48,420
so here, the quantization
error was clearly

501
00:25:48,420 --> 00:25:50,850
correlated with the signal.

502
00:25:50,850 --> 00:25:53,100
That's what banding is, right?

503
00:25:53,100 --> 00:25:55,410
Something about the signal
turned into something

504
00:25:55,410 --> 00:25:58,580
about the error.

505
00:25:58,580 --> 00:26:05,150
Here, the error is still
correlated with the signal.

506
00:26:05,150 --> 00:26:07,880
The correlation is
less obvious, right?

507
00:26:07,880 --> 00:26:12,980
But here is a range of
errors that are all positive.

508
00:26:12,980 --> 00:26:16,860
And here is a range of
errors that are all negative.

509
00:26:16,860 --> 00:26:20,280
So the errors are
still correlated

510
00:26:20,280 --> 00:26:23,220
with the original signal.

511
00:26:23,220 --> 00:26:25,470
So the result-- and when
you do Robert's technique,

512
00:26:25,470 --> 00:26:28,000
you destroy the correlation.

513
00:26:28,000 --> 00:26:31,240
So with Robert's technique, you
end up with-- it's still noisy.

514
00:26:31,240 --> 00:26:34,000
Because after all,
I added noise to it.

515
00:26:34,000 --> 00:26:35,920
But I've added it
in a very clever way

516
00:26:35,920 --> 00:26:39,280
that removes the correlation
between the error

517
00:26:39,280 --> 00:26:40,360
and the signal.

518
00:26:40,360 --> 00:26:45,130
And the result is that
the noise seems less.

519
00:26:45,130 --> 00:26:48,030
So if you compare
6 bits with dither

520
00:26:48,030 --> 00:26:51,400
to 6 bits with Robert's
method, both pictures

521
00:26:51,400 --> 00:26:54,310
are represented by 6 bits.

522
00:26:54,310 --> 00:26:55,870
5 bits, 5 bits.

523
00:26:55,870 --> 00:26:57,415
4, 3.

524
00:27:01,040 --> 00:27:04,160
So the interesting thing
is that the Robert's method

525
00:27:04,160 --> 00:27:06,650
looks like less noise.

526
00:27:06,650 --> 00:27:08,720
It's mathematically not.

527
00:27:08,720 --> 00:27:11,120
Mathematically, you can
show that Robert's technique

528
00:27:11,120 --> 00:27:14,570
has the same energy in the
noise as was in the ditherer

529
00:27:14,570 --> 00:27:15,620
technique.

530
00:27:15,620 --> 00:27:19,730
If you just calculate
the energy in the error,

531
00:27:19,730 --> 00:27:21,410
they're identical.

532
00:27:21,410 --> 00:27:24,350
But in Robert's technique,
he destroys the correlation

533
00:27:24,350 --> 00:27:27,380
and that makes the
noise seem smaller.

534
00:27:27,380 --> 00:27:30,350
It's like physically
less objectionable.

535
00:27:32,860 --> 00:27:34,992
What's the problem with
Robert's technique?

536
00:27:38,610 --> 00:27:40,980
If I told you to
implement a scheme

537
00:27:40,980 --> 00:27:48,440
that quantized according
to Robert's technique.

538
00:27:48,440 --> 00:27:52,010
And say you're here and you're
supposed to quantize a message,

539
00:27:52,010 --> 00:27:56,477
send it over the ethernet,
and receive it in California.

540
00:27:56,477 --> 00:27:58,310
And you're only supposed
to be sending, say,

541
00:27:58,310 --> 00:28:02,060
a 6-bit representation instead
of a 16-bit representation.

542
00:28:02,060 --> 00:28:05,920
What's hard about Robert's
technique compared to dither?

543
00:28:05,920 --> 00:28:07,220
Quantizing is easy, right?

544
00:28:07,220 --> 00:28:10,104
I take my 16-bit CD.

545
00:28:10,104 --> 00:28:11,270
I take off the first sample.

546
00:28:11,270 --> 00:28:12,270
I quantize it.

547
00:28:12,270 --> 00:28:15,000
I send it across the internet.

548
00:28:15,000 --> 00:28:16,190
I take off my second sample.

549
00:28:16,190 --> 00:28:16,830
I quantize it.

550
00:28:16,830 --> 00:28:20,670
I send those 6 bits over
the internet, et cetera.

551
00:28:20,670 --> 00:28:22,666
Dither is sort of
the same thing.

552
00:28:22,666 --> 00:28:23,790
I pick up the first sample.

553
00:28:23,790 --> 00:28:24,750
I add noise to it.

554
00:28:24,750 --> 00:28:25,440
I quantize it.

555
00:28:25,440 --> 00:28:28,505
I send those 6 bits
over the internet.

556
00:28:28,505 --> 00:28:29,880
What's the hard
part of Robert's?

557
00:28:35,127 --> 00:28:37,450
Yeah.

558
00:28:37,450 --> 00:28:38,950
AUDIENCE: Do you
send the noise too?

559
00:28:38,950 --> 00:28:41,880
DENNIS FREEMAN: I have
to send the noise, too.

560
00:28:41,880 --> 00:28:44,530
I have to know the
precise value of the noise

561
00:28:44,530 --> 00:28:50,590
that I added to sample n, so
I can subtract it back out.

562
00:28:50,590 --> 00:28:54,730
So Robert's technique
says, I take the value x

563
00:28:54,730 --> 00:28:56,890
and I add some
amount of noise n.

564
00:28:56,890 --> 00:28:58,330
End was a random number.

565
00:28:58,330 --> 00:29:01,030
I chose it by throwing
a die or something.

566
00:29:01,030 --> 00:29:03,520
I quantize that, and then
I subtract that same number

567
00:29:03,520 --> 00:29:04,270
back out.

568
00:29:04,270 --> 00:29:06,700
Well, that number has
to be precise compared

569
00:29:06,700 --> 00:29:09,560
to the quantization levels.

570
00:29:09,560 --> 00:29:11,740
So for example, people
would normally use--

571
00:29:11,740 --> 00:29:14,110
if I'm doing 16-bit audio,
people would normally

572
00:29:14,110 --> 00:29:18,490
use a 16-bit
representation for n,

573
00:29:18,490 --> 00:29:21,280
which means that I take a
16-bit number off the CD.

574
00:29:21,280 --> 00:29:24,210
I take a random number.

575
00:29:24,210 --> 00:29:26,436
I add it, quantize it.

576
00:29:26,436 --> 00:29:30,190
And now, I can send
the 6-bit number.

577
00:29:30,190 --> 00:29:32,480
But in order for that guy
to reproduce the answer,

578
00:29:32,480 --> 00:29:35,865
he has to know n too.

579
00:29:35,865 --> 00:29:38,440
Everybody see that?

580
00:29:38,440 --> 00:29:40,637
So the problem is, how
do you send the noise?

581
00:29:40,637 --> 00:29:42,220
And the trick is
that we use something

582
00:29:42,220 --> 00:29:43,510
called pseudo random noise.

583
00:29:43,510 --> 00:29:46,090
Pseudo random noise
is an algorithm

584
00:29:46,090 --> 00:29:50,890
that generates a sequence of
numbers that looks random,

585
00:29:50,890 --> 00:29:52,840
but they were made
algorithmically.

586
00:29:52,840 --> 00:29:55,900
So you can independently
manufacture the same sequence

587
00:29:55,900 --> 00:29:58,711
here and there.

588
00:29:58,711 --> 00:30:00,210
That way, if you're
using the same--

589
00:30:00,210 --> 00:30:02,320
if you pre-agree that
you're going to use the same

590
00:30:02,320 --> 00:30:06,850
algorithm, you can independently
generate the same sequence

591
00:30:06,850 --> 00:30:07,480
of n's.

592
00:30:10,580 --> 00:30:12,548
OK.

593
00:30:12,548 --> 00:30:14,785
Yeah, so I jumped
back to explain--

594
00:30:17,281 --> 00:30:17,780
OK.

595
00:30:17,780 --> 00:30:23,920
So the point is that just
like in audio, in pictures

596
00:30:23,920 --> 00:30:28,000
it's important how many
bits you quantize to.

597
00:30:28,000 --> 00:30:31,840
That affects drastically the
performance of communications

598
00:30:31,840 --> 00:30:33,030
or storage devices.

599
00:30:33,030 --> 00:30:34,780
How many pictures can
you store someplace?

600
00:30:34,780 --> 00:30:37,270
How many pictures can
you put on your iPhone?

601
00:30:37,270 --> 00:30:39,580
So all of that
matters quite a bit.

602
00:30:39,580 --> 00:30:44,320
And the code that you
use is very important.

603
00:30:44,320 --> 00:30:46,549
And you're not limited to just--

604
00:30:46,549 --> 00:30:47,590
I have two more examples.

605
00:30:50,120 --> 00:30:52,990
So the simplest possible
schemes are the ones

606
00:30:52,990 --> 00:30:54,490
that I've showed
so far where you

607
00:30:54,490 --> 00:30:59,680
think about the sampling in
time and the quantization

608
00:30:59,680 --> 00:31:02,780
in amplitude as
separate processes.

609
00:31:02,780 --> 00:31:04,277
You don't have to do that.

610
00:31:04,277 --> 00:31:06,110
In fact, you can get
much higher performance

611
00:31:06,110 --> 00:31:08,340
if you combine the two.

612
00:31:08,340 --> 00:31:10,310
So the first combination
I want to think about

613
00:31:10,310 --> 00:31:13,940
is trading off
precision for speed.

614
00:31:13,940 --> 00:31:17,010
And that's something that we
call progressive refinement.

615
00:31:17,010 --> 00:31:19,610
The idea is, imagine
that I want to make

616
00:31:19,610 --> 00:31:24,370
a digital representation of all
the paintings in the Louvre.

617
00:31:24,370 --> 00:31:24,920
OK.

618
00:31:24,920 --> 00:31:30,434
It doesn't make sense to do
200 by 200 at 6-bit resolution

619
00:31:30,434 --> 00:31:32,350
if you were looking at
pictures in the Louvre.

620
00:31:32,350 --> 00:31:33,808
That doesn't make
any sense, right?

621
00:31:33,808 --> 00:31:36,930
You would like to see a
high-resolution version.

622
00:31:36,930 --> 00:31:38,290
OK.

623
00:31:38,290 --> 00:31:39,880
And now you're a
user, and what you'd

624
00:31:39,880 --> 00:31:42,670
like to do is leaf
through them and find

625
00:31:42,670 --> 00:31:45,250
photos of something or other.

626
00:31:45,250 --> 00:31:46,930
Scenes of some type.

627
00:31:46,930 --> 00:31:47,680
OK.

628
00:31:47,680 --> 00:31:50,650
Well if you've got a
high-resolution representation

629
00:31:50,650 --> 00:31:53,260
and you're trying to thumb
through a lot of images.

630
00:31:53,260 --> 00:31:55,480
The problem is, if
each one is represented

631
00:31:55,480 --> 00:31:58,960
with high resolution,
that can take a long time.

632
00:31:58,960 --> 00:32:01,000
So if you didn't do
something clever,

633
00:32:01,000 --> 00:32:05,720
basically you would have to
download the Louvre before you

634
00:32:05,720 --> 00:32:07,710
could do your search.

635
00:32:07,710 --> 00:32:09,920
So the idea in
progressive refinement

636
00:32:09,920 --> 00:32:15,410
is first send me a
crude representation.

637
00:32:15,410 --> 00:32:18,740
And if I haven't
changed in my browser,

638
00:32:18,740 --> 00:32:21,620
if I'm still looking at the same
picture three seconds later,

639
00:32:21,620 --> 00:32:23,840
continue to load
the information that

640
00:32:23,840 --> 00:32:27,670
makes the picture
increasingly precise.

641
00:32:27,670 --> 00:32:30,480
Give me a crude representation
as soon as you can.

642
00:32:30,480 --> 00:32:36,740
And then if I sit there, give me
a more refined representation.

643
00:32:36,740 --> 00:32:40,640
But if I lead to someplace
else, stop downloading that one

644
00:32:40,640 --> 00:32:42,332
and give me a crude
representation

645
00:32:42,332 --> 00:32:43,040
of the new place.

646
00:32:43,040 --> 00:32:44,580
That's the idea.

647
00:32:44,580 --> 00:32:48,530
So the way you can do that
is with discrete sampling.

648
00:32:48,530 --> 00:32:51,680
I started with a digital
representation of a painting

649
00:32:51,680 --> 00:32:52,700
in the Louvre.

650
00:32:52,700 --> 00:32:59,870
Maybe it was 20,000 by 20,000
with 24 levels of color--

651
00:32:59,870 --> 00:33:02,180
some huge picture.

652
00:33:02,180 --> 00:33:04,540
So what I'll do
is I'll sample it.

653
00:33:04,540 --> 00:33:07,070
But this time, it's DT sampling.

654
00:33:07,070 --> 00:33:10,400
DT sampling-- you'll be
completely shocked to hear

655
00:33:10,400 --> 00:33:11,510
this--

656
00:33:11,510 --> 00:33:14,300
is completely analogous
to CT sampling.

657
00:33:14,300 --> 00:33:15,980
It's almost the same thing.

658
00:33:18,650 --> 00:33:20,451
That shouldn't be too
big of a surprise,

659
00:33:20,451 --> 00:33:21,950
all of the different
transforms, all

660
00:33:21,950 --> 00:33:24,116
the different Fourier
representations that we looked

661
00:33:24,116 --> 00:33:26,760
at, are almost the same thing.

662
00:33:26,760 --> 00:33:29,270
So DT sampling turns out
to work almost exactly

663
00:33:29,270 --> 00:33:31,310
like CT sampling.

664
00:33:31,310 --> 00:33:36,770
So think about what you would do
if you wanted to take a picture

665
00:33:36,770 --> 00:33:39,560
and represent it with
a factor of 3 fewer

666
00:33:39,560 --> 00:33:43,850
pixels in the horizontal
and a factor of 3 fewer

667
00:33:43,850 --> 00:33:45,890
pixels in the vertical.

668
00:33:45,890 --> 00:33:47,570
Well, you would sample it.

669
00:33:47,570 --> 00:33:51,470
In CT, we would think about
multiplying the CT signal

670
00:33:51,470 --> 00:33:54,500
x of t by an impulse train.

671
00:33:54,500 --> 00:33:57,650
Here, we use a
unit sample train.

672
00:33:57,650 --> 00:34:00,780
So we think about an
original signal x of n.

673
00:34:00,780 --> 00:34:03,340
And we think about
a sampling waveform

674
00:34:03,340 --> 00:34:09,350
that's now at an infinite
unit-sampled training.

675
00:34:09,350 --> 00:34:11,960
We used to use an
infinite impulse train,

676
00:34:11,960 --> 00:34:14,840
now we're using an infinite
unit-sampled train.

677
00:34:14,840 --> 00:34:17,270
So we preserve
every third sample

678
00:34:17,270 --> 00:34:20,389
and throw away the ones between.

679
00:34:20,389 --> 00:34:25,190
So that's a way of
generating a new picture that

680
00:34:25,190 --> 00:34:27,334
only has one third of
the information that

681
00:34:27,334 --> 00:34:28,500
was in the original picture.

682
00:34:28,500 --> 00:34:31,730
And as I said before, it
should come as no surprise

683
00:34:31,730 --> 00:34:34,550
that the math for thinking
about this sampling process

684
00:34:34,550 --> 00:34:36,679
is virtually
identical to the math

685
00:34:36,679 --> 00:34:40,310
that you need to think about
the CT sampling problem.

686
00:34:40,310 --> 00:34:43,040
In particular, the key is
to think about the Fourier

687
00:34:43,040 --> 00:34:45,020
representation.

688
00:34:45,020 --> 00:34:48,500
If this were the
original Fourier signal,

689
00:34:48,500 --> 00:34:53,210
if this were the Fourier
representation of this signal,

690
00:34:53,210 --> 00:34:55,969
we have to think about
the Fourier representation

691
00:34:55,969 --> 00:35:02,930
for the sampling signal, the
infinite unit-sampled train.

692
00:35:02,930 --> 00:35:05,597
An infinite unit-sampled
train, not surprisingly,

693
00:35:05,597 --> 00:35:08,180
the transform of that's going
to be an infinite impulse train.

694
00:35:10,730 --> 00:35:13,260
All DT signals are
periodic in 2 pi.

695
00:35:13,260 --> 00:35:15,560
That's a property of DT signals.

696
00:35:15,560 --> 00:35:18,170
That's a property
of the unit circle.

697
00:35:18,170 --> 00:35:20,480
So we're not surprised to
see that this signal was

698
00:35:20,480 --> 00:35:22,760
periodic in 2 pi.

699
00:35:22,760 --> 00:35:25,070
This signal is also
periodic in 2 pi.

700
00:35:25,070 --> 00:35:26,200
That's because it's DT.

701
00:35:26,200 --> 00:35:29,950
But it's also periodic
in one third of that.

702
00:35:29,950 --> 00:35:35,630
That's because of
the periodicity here.

703
00:35:35,630 --> 00:35:36,520
OK.

704
00:35:36,520 --> 00:35:40,930
So if we had had a sample
at each one of these,

705
00:35:40,930 --> 00:35:45,336
then the base periodicity
would have been 2 pi.

706
00:35:45,336 --> 00:35:48,720
But here, because
of the periodicity

707
00:35:48,720 --> 00:35:54,980
being 1 every third sample, we
get 3 times that many impulses.

708
00:35:54,980 --> 00:35:59,090
So just like in CT
sampling, we think

709
00:35:59,090 --> 00:36:01,310
about multiplying
the original waveform

710
00:36:01,310 --> 00:36:04,610
by a sampling waveform that
preserves only the information

711
00:36:04,610 --> 00:36:06,120
at the samples.

712
00:36:06,120 --> 00:36:07,610
We do the same thing here.

713
00:36:07,610 --> 00:36:10,950
Multiplication in time is
convolution in frequency.

714
00:36:10,950 --> 00:36:12,920
So we take the original
signal, we convolve it,

715
00:36:12,920 --> 00:36:17,540
and this is what comes out
of that sampling process.

716
00:36:17,540 --> 00:36:23,570
We get the same rule
for the sampling theorem

717
00:36:23,570 --> 00:36:25,290
that we got for CT.

718
00:36:28,100 --> 00:36:31,580
This process has to be such that
when you do the convolution,

719
00:36:31,580 --> 00:36:37,500
the resulting nearest
neighbors shouldn't overlap.

720
00:36:37,500 --> 00:36:42,870
So there is a maximum frequency
for the discrete system,

721
00:36:42,870 --> 00:36:46,070
just like there was a maximum
frequency for the CT system.

722
00:36:48,810 --> 00:36:49,890
There's one more step.

723
00:36:49,890 --> 00:36:52,800
Obviously, if I sample
the picture at the Louvre,

724
00:36:52,800 --> 00:36:54,300
I don't want to send the 0's.

725
00:36:54,300 --> 00:36:57,070
That doesn't make any sense.

726
00:36:57,070 --> 00:37:00,640
So in order to not send
the 0's, I smash together

727
00:37:00,640 --> 00:37:03,240
the non-0 samples.

728
00:37:03,240 --> 00:37:05,550
That's illustrated here.

729
00:37:05,550 --> 00:37:10,320
Smashing in time does
what in frequency?

730
00:37:10,320 --> 00:37:11,320
AUDIENCE: [INAUDIBLE]

731
00:37:11,320 --> 00:37:16,210
DENNIS FREEMAN: Squish in
time, stretch in frequency.

732
00:37:16,210 --> 00:37:18,630
They're reciprocal
spaces, right?

733
00:37:18,630 --> 00:37:20,850
Frequency and time
are reciprocal spaces.

734
00:37:20,850 --> 00:37:23,830
Smash in time,
stretch in frequency.

735
00:37:23,830 --> 00:37:31,610
So the result is that when
you smash the 0 entries out

736
00:37:31,610 --> 00:37:34,820
of the signal, you stretch
the frequency representation

737
00:37:34,820 --> 00:37:36,680
by a factor of 3.

738
00:37:36,680 --> 00:37:38,510
And when you stretch
by a factor of 3,

739
00:37:38,510 --> 00:37:41,540
this peak, which
was at 1/3 of 2 pi,

740
00:37:41,540 --> 00:37:45,000
moves the whole way out to 2 pi.

741
00:37:45,000 --> 00:37:46,080
OK.

742
00:37:46,080 --> 00:37:49,650
So the idea then is that I've
got this beautiful picture

743
00:37:49,650 --> 00:37:52,270
in the Louvre.

744
00:37:52,270 --> 00:37:52,770
Maybe.

745
00:37:56,490 --> 00:38:01,830
In order to send a lower
resolution version of that,

746
00:38:01,830 --> 00:38:03,840
what I do is I
low-pass filter it

747
00:38:03,840 --> 00:38:07,550
because I don't want the
frequencies to alias.

748
00:38:07,550 --> 00:38:09,890
So I low-pass filter it.

749
00:38:09,890 --> 00:38:16,630
That gives me a representation
that I can then downsample.

750
00:38:16,630 --> 00:38:17,290
OK.

751
00:38:17,290 --> 00:38:20,840
So this had the same
size, but this one

752
00:38:20,840 --> 00:38:24,270
has fewer high-frequency
components.

753
00:38:24,270 --> 00:38:26,090
So I can downsample,
which gives me

754
00:38:26,090 --> 00:38:27,740
something that
can be represented

755
00:38:27,740 --> 00:38:31,340
in the squeezed version
with fewer pixels.

756
00:38:31,340 --> 00:38:34,190
I did a downsample by a factor
of 2 in both, so that picture

757
00:38:34,190 --> 00:38:37,400
has 1/4 the number
of pixels in it.

758
00:38:37,400 --> 00:38:44,150
Then, I can low-pass filter
that one and downsample.

759
00:38:44,150 --> 00:38:47,060
And low-pass filter
that one and downsample.

760
00:38:47,060 --> 00:38:50,810
And I end up with a very
low-resolution image

761
00:38:50,810 --> 00:38:54,960
of this beautiful scene
that I started with.

762
00:38:54,960 --> 00:38:55,770
OK.

763
00:38:55,770 --> 00:39:00,740
So that means that I start
with some number of pixels.

764
00:39:00,740 --> 00:39:02,090
Here I have 1/4 as many.

765
00:39:02,090 --> 00:39:04,460
Here I have 1/4 of that.

766
00:39:04,460 --> 00:39:06,260
And here I have 1/4 of that.

767
00:39:06,260 --> 00:39:10,250
So I have a fourth cubed the
original number of pictures.

768
00:39:10,250 --> 00:39:13,280
So it will go 4 cubed faster.

769
00:39:13,280 --> 00:39:17,060
So it'll take me a lot less
time to get the low-res picture.

770
00:39:17,060 --> 00:39:18,230
So the result then--

771
00:39:20,870 --> 00:39:22,200
skip this for the moment.

772
00:39:22,200 --> 00:39:24,830
So here's my low-res picture.

773
00:39:24,830 --> 00:39:30,680
With a lot of imagination, you
can clearly see what that is.

774
00:39:30,680 --> 00:39:33,440
At the next level of
refinement, you get this.

775
00:39:33,440 --> 00:39:35,780
At the next level of
refinement, you get this.

776
00:39:35,780 --> 00:39:38,150
At the next level of
refinement, you get this.

777
00:39:38,150 --> 00:39:39,571
By now, you're
tired so you flick

778
00:39:39,571 --> 00:39:40,820
on something more interesting.

779
00:39:40,820 --> 00:39:41,360
No.

780
00:39:41,360 --> 00:39:43,490
You would continue to
look at this, right?

781
00:39:43,490 --> 00:39:45,350
And finally, you get
the original picture.

782
00:39:45,350 --> 00:39:49,650
So the idea then is that I
want to not only transmit.

783
00:39:49,650 --> 00:39:55,170
But then the question is, how
many bits do I need to do this?

784
00:39:55,170 --> 00:40:00,570
And the answer is that
having transmitted this,

785
00:40:00,570 --> 00:40:03,870
I can use that information
to help me generate this.

786
00:40:06,820 --> 00:40:07,320
OK.

787
00:40:07,320 --> 00:40:13,150
So what I do, I run
the process backwards.

788
00:40:13,150 --> 00:40:16,760
Let me back up.

789
00:40:16,760 --> 00:40:22,040
So in order to go forwards, I
thought about squishing this

790
00:40:22,040 --> 00:40:24,469
into a smaller representation.

791
00:40:24,469 --> 00:40:25,510
Well, I can go backwards.

792
00:40:25,510 --> 00:40:27,550
I can up-sample.

793
00:40:27,550 --> 00:40:30,410
When I up-sample, all I do
is I take all the pictures

794
00:40:30,410 --> 00:40:32,570
in the shrunken
version, I stretch them,

795
00:40:32,570 --> 00:40:35,210
and I put 0's between them.

796
00:40:35,210 --> 00:40:37,184
That gets me here.

797
00:40:37,184 --> 00:40:38,600
But that's not
where I want to be.

798
00:40:38,600 --> 00:40:40,260
I want to be up here.

799
00:40:40,260 --> 00:40:44,040
So how do I go
from here to here?

800
00:40:44,040 --> 00:40:45,990
So when I put the 0's in it.

801
00:40:45,990 --> 00:40:47,910
So I started with this,
I put the 0's in it.

802
00:40:47,910 --> 00:40:49,020
That stretched it in time.

803
00:40:49,020 --> 00:40:51,390
That compressed it in frequency.

804
00:40:51,390 --> 00:40:53,580
When I compress this
waveform into frequency,

805
00:40:53,580 --> 00:40:57,884
this 2 pi peak ended
up at 2 pi over 3.

806
00:40:57,884 --> 00:41:00,300
So now if I want to get back
to the original contribution,

807
00:41:00,300 --> 00:41:03,520
I have to low-pass filter.

808
00:41:03,520 --> 00:41:04,420
OK.

809
00:41:04,420 --> 00:41:07,210
Everybody see what I'm doing?

810
00:41:07,210 --> 00:41:10,620
So the final scheme
then is that--

811
00:41:10,620 --> 00:41:12,260
whoops.

812
00:41:12,260 --> 00:41:15,560
The final scheme
is that I low-pass

813
00:41:15,560 --> 00:41:17,960
filter, downsample, low-pass,
downsample, low-pass,

814
00:41:17,960 --> 00:41:19,190
downsample.

815
00:41:19,190 --> 00:41:23,900
Downsample, I can up-sample
by putting 0's between all

816
00:41:23,900 --> 00:41:25,730
the rows and columns.

817
00:41:25,730 --> 00:41:29,190
Then, low-pass filter and
that gives me this picture.

818
00:41:29,190 --> 00:41:30,950
So what I need to
do is also transmit

819
00:41:30,950 --> 00:41:35,560
the high-pass information
that I threw away.

820
00:41:35,560 --> 00:41:38,380
So if I separately
transmit this picture

821
00:41:38,380 --> 00:41:41,512
in the high-pass
part of this picture,

822
00:41:41,512 --> 00:41:43,345
then I can combine them
to get that picture.

823
00:41:45,990 --> 00:41:49,709
And I don't actually need
to transmit this one.

824
00:41:49,709 --> 00:41:51,500
So I don't need to
transmit this one either

825
00:41:51,500 --> 00:41:52,458
because I can generate.

826
00:41:52,458 --> 00:41:55,420
So I only need to
send this and this.

827
00:41:55,420 --> 00:41:57,760
Then, I do the same thing here.

828
00:41:57,760 --> 00:42:00,340
If I take this, I put 0's
between it, low-pass filter.

829
00:42:00,340 --> 00:42:04,050
I can generate this picture,
so I don't need to send it.

830
00:42:04,050 --> 00:42:06,710
But I do send this.

831
00:42:06,710 --> 00:42:10,400
Then, I combine these
to get that recurse.

832
00:42:10,400 --> 00:42:11,270
OK.

833
00:42:11,270 --> 00:42:13,420
So the result is that I send--

834
00:42:13,420 --> 00:42:17,480
so I don't send this,
but I do send this.

835
00:42:17,480 --> 00:42:21,080
I don't send that because
I'm going to regenerate it.

836
00:42:21,080 --> 00:42:22,010
I don't send that.

837
00:42:22,010 --> 00:42:22,940
I do send this.

838
00:42:22,940 --> 00:42:25,430
I only send this,
this, this, and that.

839
00:42:25,430 --> 00:42:29,460
And that's enough information
to reconstruct the picture.

840
00:42:29,460 --> 00:42:30,600
Right

841
00:42:30,600 --> 00:42:33,640
And notice it has the hierarchy
that you would expect.

842
00:42:33,640 --> 00:42:36,320
You start with a low-res.

843
00:42:36,320 --> 00:42:37,911
It takes more bits
to make this one.

844
00:42:37,911 --> 00:42:39,410
It takes more bits
to make that one.

845
00:42:39,410 --> 00:42:42,520
And it takes more
bits to make that one.

846
00:42:42,520 --> 00:42:44,440
You're worse off
if you didn't do

847
00:42:44,440 --> 00:42:49,810
something clever by-- so I'm
sending the full number of bits

848
00:42:49,810 --> 00:42:50,950
here.

849
00:42:50,950 --> 00:42:52,720
Then, I'm sending another 1/4.

850
00:42:52,720 --> 00:42:54,400
And then, another 1/16.

851
00:42:54,400 --> 00:42:56,250
Then, another 1/64.

852
00:42:56,250 --> 00:42:59,950
So I'm sending about
33% more bits total.

853
00:42:59,950 --> 00:43:01,930
But there's tricks.

854
00:43:01,930 --> 00:43:04,930
The trick is that
the eye is less

855
00:43:04,930 --> 00:43:07,060
sensitive to these
high frequencies

856
00:43:07,060 --> 00:43:08,200
than it is to these.

857
00:43:08,200 --> 00:43:14,240
So I really don't need to send
the same resolution for this.

858
00:43:14,240 --> 00:43:15,770
So people use this all the time.

859
00:43:15,770 --> 00:43:17,810
If you go to a slow
website, you may

860
00:43:17,810 --> 00:43:20,870
notice that you get
that kind of low-res

861
00:43:20,870 --> 00:43:22,460
morphing into a higher-res.

862
00:43:22,460 --> 00:43:26,250
And that's exactly
this kind of a scheme.

863
00:43:26,250 --> 00:43:28,667
But there are cleverer
things you can do.

864
00:43:28,667 --> 00:43:30,000
So that's already pretty clever.

865
00:43:30,000 --> 00:43:32,910
And that's already something
you see in today's technology,

866
00:43:32,910 --> 00:43:35,200
but there are even cleverer
things that you can do.

867
00:43:35,200 --> 00:43:37,950
And so the last thing I
want to talk about is JPEG.

868
00:43:37,950 --> 00:43:42,240
99% of the images that you
download on the web are JPEG.

869
00:43:42,240 --> 00:43:46,800
JPEG is a clever technique
that does quantization

870
00:43:46,800 --> 00:43:47,730
in the Fourier domain.

871
00:43:50,490 --> 00:43:52,050
And that's similar
to what you would

872
00:43:52,050 --> 00:43:54,790
want to do in that
progressive refinement

873
00:43:54,790 --> 00:43:56,790
because you would like
to separate the frequency

874
00:43:56,790 --> 00:43:58,831
components and use less
resolution for the higher

875
00:43:58,831 --> 00:44:01,320
frequency components because
you can't see them as well.

876
00:44:01,320 --> 00:44:05,190
JPEG is a formalization
of that idea.

877
00:44:05,190 --> 00:44:07,260
So this was made by
a joint photography

878
00:44:07,260 --> 00:44:08,770
group that was very successful.

879
00:44:08,770 --> 00:44:10,800
It has four layers of coding.

880
00:44:10,800 --> 00:44:13,960
First thing you
worry about is color.

881
00:44:13,960 --> 00:44:14,460
OK.

882
00:44:14,460 --> 00:44:16,650
We think we see a
broad range of colors.

883
00:44:16,650 --> 00:44:17,160
Wrong.

884
00:44:17,160 --> 00:44:19,800
We only see three.

885
00:44:19,800 --> 00:44:23,740
So you can throw away the
ones that we can't see.

886
00:44:23,740 --> 00:44:26,160
So that's the first step is
taking advantage of the fact

887
00:44:26,160 --> 00:44:28,243
that we really can't see
all the different colors.

888
00:44:28,243 --> 00:44:30,480
We can really only
see three colors.

889
00:44:30,480 --> 00:44:32,580
So there are tricks
that you can do

890
00:44:32,580 --> 00:44:35,250
to make the person
think he's seeing

891
00:44:35,250 --> 00:44:39,210
the exact shade of yellow,
which we don't see very well,

892
00:44:39,210 --> 00:44:42,880
by mixing together a different
combination of red, green,

893
00:44:42,880 --> 00:44:44,460
and blue.

894
00:44:44,460 --> 00:44:48,290
So you get to move
the colors around.

895
00:44:48,290 --> 00:44:52,760
And you can make it
perceptually indistinguishable,

896
00:44:52,760 --> 00:44:53,916
but easier to code.

897
00:44:53,916 --> 00:44:55,790
We won't talk about how
you do that, but it's

898
00:44:55,790 --> 00:44:58,940
a very straightforward process
by which you start with one

899
00:44:58,940 --> 00:45:01,850
picture and you change all
the colors to make them easier

900
00:45:01,850 --> 00:45:02,611
to send.

901
00:45:02,611 --> 00:45:03,110
OK.

902
00:45:03,110 --> 00:45:04,880
So that's the color coding.

903
00:45:04,880 --> 00:45:07,290
Then, they do a discrete
cosine transform,

904
00:45:07,290 --> 00:45:10,460
which is really a kind
of Fourier series.

905
00:45:10,460 --> 00:45:14,240
Then, they quantize the
Fourier series, the DCT.

906
00:45:14,240 --> 00:45:18,380
And then, they code
the resulting sequence

907
00:45:18,380 --> 00:45:20,104
using a lossless Huffman code.

908
00:45:20,104 --> 00:45:21,770
So we'll talk about
the middle two steps

909
00:45:21,770 --> 00:45:22,978
because that's the fun stuff.

910
00:45:22,978 --> 00:45:25,220
That's the Fourier stuff.

911
00:45:25,220 --> 00:45:30,950
So the way DCT works
is you take the image

912
00:45:30,950 --> 00:45:34,790
and you break it into
8 by 8 pixel squares.

913
00:45:34,790 --> 00:45:37,720
And then you do the same
processing on each 8 by 8.

914
00:45:40,540 --> 00:45:42,880
So here is an example
of an 8 by 8 image.

915
00:45:42,880 --> 00:45:44,860
This is a completely
trivial one where

916
00:45:44,860 --> 00:45:47,536
I have linear taper from
black to white, linear taper

917
00:45:47,536 --> 00:45:48,910
from black to
white, the product.

918
00:45:48,910 --> 00:45:51,760
And all I want to think
about is, what's the DCT?

919
00:45:51,760 --> 00:45:57,820
And why do they use a DCT
instead of a Fourier transform?

920
00:45:57,820 --> 00:46:00,629
So just like you would expect
from the other two-dimensional

921
00:46:00,629 --> 00:46:02,920
image processing, the examples
that we've talked about,

922
00:46:02,920 --> 00:46:07,480
the way you do this is you
do the DCT on all the rows.

923
00:46:07,480 --> 00:46:10,356
Then, you do the DCT
on all the columns.

924
00:46:10,356 --> 00:46:11,230
And then you're done.

925
00:46:11,230 --> 00:46:14,260
That's a two-dimensional DCT.

926
00:46:14,260 --> 00:46:15,740
So here's an example.

927
00:46:15,740 --> 00:46:19,420
What if I took my sample image,
which had this linear taper.

928
00:46:19,420 --> 00:46:22,300
So if I think about
just one row and I

929
00:46:22,300 --> 00:46:25,960
plot brightness on the
vertical, then this

930
00:46:25,960 --> 00:46:28,160
might be my image right here.

931
00:46:28,160 --> 00:46:34,060
And what I do is think about
periodically repeating it.

932
00:46:34,060 --> 00:46:36,910
The original signal only
had 8 numbers in it.

933
00:46:36,910 --> 00:46:39,340
I'm going to periodically
repeat it because then I

934
00:46:39,340 --> 00:46:41,517
can take a Fourier series.

935
00:46:41,517 --> 00:46:43,600
It's a periodic signal,
and it's a Fourier series.

936
00:46:43,600 --> 00:46:46,450
The reason I do that is
that the Fourier series only

937
00:46:46,450 --> 00:46:51,090
has 8 coefficients.

938
00:46:51,090 --> 00:46:53,520
The Fourier series of
an eight-long sequence

939
00:46:53,520 --> 00:46:57,780
has eight Fourier coefficients.

940
00:46:57,780 --> 00:47:02,370
So the idea is that by
taking a signal that's

941
00:47:02,370 --> 00:47:03,907
only 8 samples long--

942
00:47:03,907 --> 00:47:05,490
I mean, the obvious
thing you could do

943
00:47:05,490 --> 00:47:12,270
is take the eight-long signal
and take a discrete time

944
00:47:12,270 --> 00:47:14,460
Fourier transform.

945
00:47:14,460 --> 00:47:17,610
Problem with that is that
that's a continuous function

946
00:47:17,610 --> 00:47:21,790
of omega over 2 pi, over
the entire unit circle.

947
00:47:21,790 --> 00:47:27,000
So you take 8 samples and turn
it into a function of omega

948
00:47:27,000 --> 00:47:29,130
which has lots of samples.

949
00:47:29,130 --> 00:47:31,170
By thinking about the
8 samples as having

950
00:47:31,170 --> 00:47:35,820
come from a periodic
extension, then I

951
00:47:35,820 --> 00:47:39,210
don't get a continuous range
of frequencies between minus pi

952
00:47:39,210 --> 00:47:40,260
to pi.

953
00:47:40,260 --> 00:47:44,790
I get exactly 8 of
them, a0 through a7.

954
00:47:44,790 --> 00:47:45,660
OK.

955
00:47:45,660 --> 00:47:49,110
So the first step is to
do periodic extension

956
00:47:49,110 --> 00:47:50,580
on the 8 samples.

957
00:47:50,580 --> 00:47:53,460
Then, I can represent it
by 8 Fourier coefficients.

958
00:47:53,460 --> 00:47:55,860
In the DCT, they almost do that.

959
00:47:55,860 --> 00:47:59,130
But instead of writing down the
numbers 1, 2, 3, 4, 5, 6, 7, 8.

960
00:47:59,130 --> 00:48:01,830
1, 2, 3, 4, 5, 6, 7, 8,
1, 2, 3, 4, 5, 6, 7, 8.

961
00:48:01,830 --> 00:48:03,950
Instead, they write 1,
2, 3, 4, 5, 6, 7, 8.

962
00:48:03,950 --> 00:48:07,356
8, 7, 6, 5, 4, 3, 2, 1.

963
00:48:07,356 --> 00:48:08,314
1, 2, 3, 4, 5, 6, 7, 8.

964
00:48:08,314 --> 00:48:09,830
8, 7, 6, 5, 4, 3, 2, 1.

965
00:48:09,830 --> 00:48:11,370
That seems like a
dumb thing to do.

966
00:48:11,370 --> 00:48:12,930
I took an eight-long
sequence, which

967
00:48:12,930 --> 00:48:16,530
could be represented
with 8 coefficients,

968
00:48:16,530 --> 00:48:18,840
and I turned it into a
16-long sequence, which

969
00:48:18,840 --> 00:48:21,905
now takes 16 coefficients.

970
00:48:21,905 --> 00:48:24,890
Wow, that's brain dead.

971
00:48:24,890 --> 00:48:29,120
Except that it's
actually very clever.

972
00:48:29,120 --> 00:48:31,055
Of these two signals,
which has the higher

973
00:48:31,055 --> 00:48:32,170
high-frequency content?

974
00:48:36,670 --> 00:48:38,254
[INAUDIBLE]

975
00:48:38,254 --> 00:48:39,170
AUDIENCE: [INAUDIBLE].

976
00:48:39,170 --> 00:48:41,360
DENNIS FREEMAN: Sharp
drop, large amount

977
00:48:41,360 --> 00:48:42,830
of high frequencies.

978
00:48:42,830 --> 00:48:45,270
That's the trick.

979
00:48:45,270 --> 00:48:48,030
So because there's a large
amount of high frequencies,

980
00:48:48,030 --> 00:48:52,770
this signal is hard to
represent with Fourier series.

981
00:48:52,770 --> 00:48:55,980
This signal is easier because
there's fewer high frequencies.

982
00:48:55,980 --> 00:48:58,500
You need fewer of
those high frequencies

983
00:48:58,500 --> 00:49:01,830
to do a good job of
representing the signal.

984
00:49:01,830 --> 00:49:03,960
You can throw away the
high-frequency stuff

985
00:49:03,960 --> 00:49:06,120
and nobody will notice.

986
00:49:06,120 --> 00:49:07,610
OK.

987
00:49:07,610 --> 00:49:14,100
So the idea then is that you
use this 16-long sequence,

988
00:49:14,100 --> 00:49:20,590
but then you know that
whatever x of 8 was,

989
00:49:20,590 --> 00:49:23,180
it's the same as x of 9
because you always repeat it.

990
00:49:23,180 --> 00:49:26,580
And x of 7, that's
the same as x of 10.

991
00:49:26,580 --> 00:49:29,100
So if you take
advantage of knowing

992
00:49:29,100 --> 00:49:31,770
that there's a symmetry.

993
00:49:31,770 --> 00:49:34,350
And if you notice,
they made it symmetric.

994
00:49:34,350 --> 00:49:37,230
So there's an even-odd kind of
symmetry about a weird point.

995
00:49:37,230 --> 00:49:41,760
It's off by 1/2, but there's
a symmetry this way, too.

996
00:49:41,760 --> 00:49:44,500
If you take those two
things into account,

997
00:49:44,500 --> 00:49:50,040
you can actually represent
the 16-length sequence

998
00:49:50,040 --> 00:49:52,810
with 8 numbers.

999
00:49:52,810 --> 00:49:55,020
That's the DCT.

1000
00:49:55,020 --> 00:49:57,630
It's exactly the
same as a Fourier,

1001
00:49:57,630 --> 00:50:02,460
except that we're taking
the 8 non-trivial numbers

1002
00:50:02,460 --> 00:50:05,880
and putting them together
in a funny periodic fashion.

1003
00:50:05,880 --> 00:50:07,350
That's what a DCT does.

1004
00:50:07,350 --> 00:50:12,990
And the point is the
DCT maps 8 real numbers,

1005
00:50:12,990 --> 00:50:16,870
which are these yn values.

1006
00:50:16,870 --> 00:50:24,470
It maps 8 real numbers
into 8 DCT coefficients.

1007
00:50:24,470 --> 00:50:27,980
And the DCT coefficients,
unlike the Fourier coefficients,

1008
00:50:27,980 --> 00:50:30,430
have real values.

1009
00:50:30,430 --> 00:50:32,540
So because of the trick
with all the symmetries

1010
00:50:32,540 --> 00:50:34,206
and all that sort of
stuff, they arrange

1011
00:50:34,206 --> 00:50:36,640
to make a transform
whose imaginary part

1012
00:50:36,640 --> 00:50:38,650
is guaranteed to be 0.

1013
00:50:38,650 --> 00:50:40,870
So there's no
information explosion

1014
00:50:40,870 --> 00:50:43,540
in going from the 8 to 16.

1015
00:50:46,990 --> 00:50:49,390
Here's the Fourier
representation

1016
00:50:49,390 --> 00:50:50,990
for a 2D picture.

1017
00:50:50,990 --> 00:50:55,150
The Fourier coefficients
are falling off like k.

1018
00:50:55,150 --> 00:50:58,800
Here's the DCT where they're
falling off like k squared.

1019
00:50:58,800 --> 00:51:02,460
And the point is you can throw
those away in the picture

1020
00:51:02,460 --> 00:51:04,890
and barely tell that
they're even there.

1021
00:51:04,890 --> 00:51:06,480
That they're even gone.

1022
00:51:06,480 --> 00:51:11,430
So what they do then is
they quantize the Fourier

1023
00:51:11,430 --> 00:51:14,490
coefficients at
different levels.

1024
00:51:14,490 --> 00:51:17,760
So you divide the 0,
0 coefficient by 16

1025
00:51:17,760 --> 00:51:19,680
and send the whole part.

1026
00:51:19,680 --> 00:51:22,020
You divide the 1, 0 by 11.

1027
00:51:22,020 --> 00:51:25,680
You divide this guy by 61, so
you use much less resolution

1028
00:51:25,680 --> 00:51:26,940
by a factor of 4.

1029
00:51:29,940 --> 00:51:31,830
Because then those
numbers were chosen

1030
00:51:31,830 --> 00:51:34,650
so that they give rise
to coefficients that

1031
00:51:34,650 --> 00:51:38,310
are equally visually distinct.

1032
00:51:38,310 --> 00:51:45,300
The result is that you
get very high resolution

1033
00:51:45,300 --> 00:51:48,810
with a very small
number of bits.

1034
00:51:48,810 --> 00:51:52,770
So here's an original.

1035
00:51:52,770 --> 00:51:58,930
This picture has 47
kilobytes of data in it.

1036
00:51:58,930 --> 00:52:01,440
And when you change
Q, the quality

1037
00:52:01,440 --> 00:52:06,550
of JPEG, what you're really
doing is choosing those tables.

1038
00:52:06,550 --> 00:52:10,600
So when you use a high Q, you
get a good representation.

1039
00:52:10,600 --> 00:52:13,380
When you use a low Q, you're
throwing away more data.

1040
00:52:13,380 --> 00:52:15,330
And you can see that
you can throw away--

1041
00:52:17,850 --> 00:52:20,250
so 47k down to 2k.

1042
00:52:20,250 --> 00:52:25,110
You can throw away 19
pieces of data out of 20

1043
00:52:25,110 --> 00:52:27,820
and you still get a very
good resolution picture.

1044
00:52:27,820 --> 00:52:30,330
And that's because the
quantization is happening

1045
00:52:30,330 --> 00:52:31,740
in the Fourier domain.

1046
00:52:31,740 --> 00:52:34,620
And you can match the
Fourier resolution better

1047
00:52:34,620 --> 00:52:37,050
to the psychophysical
properties of the eye.

1048
00:52:37,050 --> 00:52:40,650
So the point is to tell you
how to represent signals

1049
00:52:40,650 --> 00:52:44,730
in discrete time in a
way that the errors are

1050
00:52:44,730 --> 00:52:47,760
as imperceptible as possible.

1051
00:52:47,760 --> 00:52:50,230
And to demonstrate how
the Fourier transform

1052
00:52:50,230 --> 00:52:51,920
lets you do that.

1053
00:52:51,920 --> 00:52:53,280
OK, thanks.

1054
00:52:53,280 --> 00:52:55,130
See you later.