1
00:00:00,000 --> 00:00:02,350
The following content is
provided under a Creative

2
00:00:02,350 --> 00:00:03,650
Commons license.

3
00:00:03,650 --> 00:00:06,540
Your support will help MIT
OpenCourseWare continue to

4
00:00:06,540 --> 00:00:09,970
offer high quality educational
resources for free.

5
00:00:09,970 --> 00:00:12,780
To make a donation or to view
additional materials from

6
00:00:12,780 --> 00:00:16,780
hundreds of MIT courses, visit
MIT OpenCourseWare at

7
00:00:16,780 --> 00:00:18,030
ocw.mit.edu.

8
00:00:21,920 --> 00:00:24,800
PROFESSOR: I'm going to spend a
couple of minutes reviewing

9
00:00:24,800 --> 00:00:29,560
the major things that we talked
about last time and

10
00:00:29,560 --> 00:00:34,740
then get into discrete source
coding, which is the major

11
00:00:34,740 --> 00:00:36,920
topic for today.

12
00:00:36,920 --> 00:00:41,200
The first major thing that we
talked about last time, along

13
00:00:41,200 --> 00:00:48,460
with all of the philosophy and
all those other things, was

14
00:00:48,460 --> 00:00:53,150
the sense of what digital
communication really is.

15
00:00:53,150 --> 00:00:56,270
I said that what digital
communication is, is it's

16
00:00:56,270 --> 00:01:00,800
communication where there's
a binary interface between

17
00:01:00,800 --> 00:01:02,630
source and destination.

18
00:01:02,630 --> 00:01:04,650
The source is very
often analog.

19
00:01:04,650 --> 00:01:07,740
The most interesting
sources are analog.

20
00:01:07,740 --> 00:01:11,040
The channel is often analog,
most interesting.

21
00:01:11,040 --> 00:01:14,040
Channels are analog, and we'll
say more about what I mean by

22
00:01:14,040 --> 00:01:15,880
analog later.

23
00:01:15,880 --> 00:01:20,070
What's important is that you
have this binary interface

24
00:01:20,070 --> 00:01:23,180
between source and
channel coding.

25
00:01:23,180 --> 00:01:26,640
We said a little bit about why
we wanted a binary interface,

26
00:01:26,640 --> 00:01:30,210
aside from the fact that it's
there now, there's nothing you

27
00:01:30,210 --> 00:01:33,340
can do about it even if
you don't like it.

28
00:01:33,340 --> 00:01:37,380
One reason is standardization,
which means it simplifies

29
00:01:37,380 --> 00:01:40,630
implementation, which
means you can do

30
00:01:40,630 --> 00:01:42,430
everything in the same way.

31
00:01:42,430 --> 00:01:45,580
If you have ten different kinds
of channel coding and

32
00:01:45,580 --> 00:01:48,470
you have ten different kinds of
source coding and you have

33
00:01:48,470 --> 00:01:52,780
a binary interface, it means
you need to develop 20

34
00:01:52,780 --> 00:01:56,300
different things -- ten at the
source and ten at the decoder.

35
00:01:56,300 --> 00:01:59,000
If you don't have that
standardization with a binary

36
00:01:59,000 --> 00:02:01,780
interface between them, you
need 100 different things.

37
00:02:01,780 --> 00:02:04,530
You need to match every kind of
source with every kind of

38
00:02:04,530 --> 00:02:06,390
destination.

39
00:02:06,390 --> 00:02:09,800
That raises the price of
all chips enormously.

40
00:02:09,800 --> 00:02:13,040
One of the other things we said
is the price of chips is

41
00:02:13,040 --> 00:02:15,450
very much the cost of
development divided by the

42
00:02:15,450 --> 00:02:17,660
number of them that
you stamp out.

43
00:02:17,660 --> 00:02:19,870
That's not quite true,
but it's a good first

44
00:02:19,870 --> 00:02:20,345
approximation.

45
00:02:20,345 --> 00:02:23,040
In other words, standardization
is important.

46
00:02:23,040 --> 00:02:24,540
Layering.

47
00:02:24,540 --> 00:02:29,100
Layering is in many ways very
similar to standardization

48
00:02:29,100 --> 00:02:33,850
because this binary interface is
also a layer between source

49
00:02:33,850 --> 00:02:35,080
and destination.

50
00:02:35,080 --> 00:02:39,220
But the idea there is not that
it standardizes to make things

51
00:02:39,220 --> 00:02:43,530
cheaper, but it simplifies
the conceptualization of

52
00:02:43,530 --> 00:02:44,520
what's going on.

53
00:02:44,520 --> 00:02:48,060
You can look at a source and
only focus on one thing.

54
00:02:48,060 --> 00:02:51,290
How do I take that source and
turn it into the smallest

55
00:02:51,290 --> 00:02:54,940
number of binary digits
possible?

56
00:02:54,940 --> 00:02:58,200
We'll talk a good deal about
what that means later because

57
00:02:58,200 --> 00:03:00,890
there's something stochastic
involved in there and will

58
00:03:00,890 --> 00:03:03,950
take us awhile to really
understand that.

59
00:03:03,950 --> 00:03:08,100
Finally, using a binary
interface loses nothing in

60
00:03:08,100 --> 00:03:09,430
performance.

61
00:03:09,430 --> 00:03:13,920
That's what Shannon said,
it's what he proved.

62
00:03:13,920 --> 00:03:19,000
There's some questions there
when you get to networks, but

63
00:03:19,000 --> 00:03:24,160
the important thing is the
places where you want to study

64
00:03:24,160 --> 00:03:28,240
non-binary interfaces, you will
never get a clue of what

65
00:03:28,240 --> 00:03:31,820
it is that you're looking at or
why if you don't first very

66
00:03:31,820 --> 00:03:34,280
well understand why
you want a binary

67
00:03:34,280 --> 00:03:35,470
interface to start with.

68
00:03:35,470 --> 00:03:38,010
In other words, if you look at
these other cases, there's

69
00:03:38,010 --> 00:03:42,690
exceptions to the rule, and if
you don't know what the rule

70
00:03:42,690 --> 00:03:44,755
is, you certainly can't
understand what

71
00:03:44,755 --> 00:03:48,270
the exceptions are.

72
00:03:48,270 --> 00:03:55,270
So for today we're going to
start out by studying this

73
00:03:55,270 --> 00:03:58,870
part of the problem in here.

74
00:03:58,870 --> 00:04:01,910
Namely, how do you turn a
source and put a general

75
00:04:01,910 --> 00:04:05,540
source input into binary digits
that you're going to

76
00:04:05,540 --> 00:04:06,700
put into the channel.

77
00:04:06,700 --> 00:04:09,920
How do I study this without
studying that?

78
00:04:09,920 --> 00:04:13,350
Well, one thing is these
are binary digits here.

79
00:04:13,350 --> 00:04:17,180
But the other thing is we're
going to assume that what

80
00:04:17,180 --> 00:04:21,080
binary digits go in here
come out here.

81
00:04:21,080 --> 00:04:22,670
In other words, there
aren't any errors.

82
00:04:22,670 --> 00:04:24,600
It's an error-free system.

83
00:04:24,600 --> 00:04:27,200
Part of the purpose of studying
channel encoding and

84
00:04:27,200 --> 00:04:31,180
channel decoding is to say how
is it that you get that

85
00:04:31,180 --> 00:04:32,770
error-free performance.

86
00:04:32,770 --> 00:04:36,180
You can't quite get error-free
performance, you get almost

87
00:04:36,180 --> 00:04:39,940
error-free performance, but the
idea is when errors come

88
00:04:39,940 --> 00:04:44,210
out here, it's not this guy's
fault, it's this guy's fault.

89
00:04:44,210 --> 00:04:47,730
Therefore, what we're going to
study here is how we do our

90
00:04:47,730 --> 00:04:49,060
job over here.

91
00:04:49,060 --> 00:04:52,940
Namely, how we deal with
decoding, the same string of

92
00:04:52,940 --> 00:04:57,230
bits that went into there and
decode them coming out.

93
00:04:57,230 --> 00:05:03,970
So that's where we'll be for
the next three weeks or so.

94
00:05:03,970 --> 00:05:07,600
We talked a little bit last time
about how do you layer

95
00:05:07,600 --> 00:05:09,230
source coding itself.

96
00:05:09,230 --> 00:05:13,170
I want to come back, because we
were talking about so many

97
00:05:13,170 --> 00:05:17,850
things last time, and emphasize
what this means a

98
00:05:17,850 --> 00:05:19,890
little bit.

99
00:05:19,890 --> 00:05:22,750
We're going to break source
coding up into three different

100
00:05:22,750 --> 00:05:23,890
layers again.

101
00:05:23,890 --> 00:05:28,150
You start out with some kind of
input wave form or image or

102
00:05:28,150 --> 00:05:31,850
video or whatever
the heck it is.

103
00:05:31,850 --> 00:05:35,140
You're going to do something
like sampling it or expanding

104
00:05:35,140 --> 00:05:38,700
it in some kind of expansion,
and we'll talk a great deal

105
00:05:38,700 --> 00:05:39,620
about that later.

106
00:05:39,620 --> 00:05:42,740
That's not an obvious thing,
how to do that.

107
00:05:42,740 --> 00:05:44,770
When you finish doing
that, you wind up

108
00:05:44,770 --> 00:05:46,250
with an analog sequence.

109
00:05:46,250 --> 00:05:49,240
In other words, you wind up
with a sequence of real

110
00:05:49,240 --> 00:05:52,250
numbers or sequence of
complex numbers.

111
00:05:52,250 --> 00:05:54,950
Those go into a quantizer.

112
00:05:54,950 --> 00:05:59,070
What the quantizer does is to
turn an uncountably infinite

113
00:05:59,070 --> 00:06:04,130
set of things into a finite
set of things.

114
00:06:04,130 --> 00:06:07,710
When you turn an uncountably
infinite set of possibilities

115
00:06:07,710 --> 00:06:10,240
into a finite set of
possibilities, you get

116
00:06:10,240 --> 00:06:10,880
distortion.

117
00:06:10,880 --> 00:06:12,480
There's no way you
can avoid it.

118
00:06:12,480 --> 00:06:15,060
So that's a part of what
happens there.

119
00:06:15,060 --> 00:06:21,590
Then at this point you have a
finite alphabet of symbols.

120
00:06:21,590 --> 00:06:25,310
That goes into the discrete
coder, goes through what we're

121
00:06:25,310 --> 00:06:30,470
now calling a reliable binary
channel and comes out here.

122
00:06:30,470 --> 00:06:34,400
What we're going to be studying
for the next two

123
00:06:34,400 --> 00:06:40,670
weeks or so is this piece of
the system right in here.

124
00:06:40,670 --> 00:06:44,200
Again, what we're going to be
doing is assuming a reliable

125
00:06:44,200 --> 00:06:46,370
binary channel to the right
of this, which is

126
00:06:46,370 --> 00:06:48,030
what we already assumed.

127
00:06:48,030 --> 00:06:49,820
We're going to assume
that these things do

128
00:06:49,820 --> 00:06:51,610
whatever they have to.

129
00:06:51,610 --> 00:06:56,100
But this problem here, this
isolated problem is important

130
00:06:56,100 --> 00:07:01,940
because this is dealing with
the entire problem of text,

131
00:07:01,940 --> 00:07:05,170
and you know what text is,
it's computer files, it's

132
00:07:05,170 --> 00:07:09,630
English language text, it's
Chinese text, it's whatever

133
00:07:09,630 --> 00:07:11,150
kind of text.

134
00:07:14,080 --> 00:07:17,440
If we understand how to do that,
we can then go on to

135
00:07:17,440 --> 00:07:20,570
talk about quantization because
we'll have some idea

136
00:07:20,570 --> 00:07:22,950
of what we're trying to
accomplish with quantization.

137
00:07:22,950 --> 00:07:24,840
Without that we won't know
what the purpose of

138
00:07:24,840 --> 00:07:27,280
quantization is.

139
00:07:27,280 --> 00:07:30,400
Without the quantization we
won't know what we're trying

140
00:07:30,400 --> 00:07:32,140
to accomplish over here.

141
00:07:32,140 --> 00:07:35,430
There's another reason for
studying this problem, which

142
00:07:35,430 --> 00:07:37,850
is that virtually all the ideas
that come into this

143
00:07:37,850 --> 00:07:43,910
whole bunch of things are all
tucked into this one subject

144
00:07:43,910 --> 00:07:46,470
in the simplest possible way.

145
00:07:46,470 --> 00:07:50,390
One of the nice things about
information theory, which

146
00:07:50,390 --> 00:07:55,600
we're going to touch on I said
in this course, is that one of

147
00:07:55,600 --> 00:08:00,880
the reasons for studying these
simple things first is that

148
00:08:00,880 --> 00:08:03,950
information theory is really
like a symphony.

149
00:08:03,950 --> 00:08:07,150
You see themes coming out, those
themes get repeated,

150
00:08:07,150 --> 00:08:10,350
they get repeated again with
more and more complexity each

151
00:08:10,350 --> 00:08:13,690
time, and when you understand
the simple idea of the theme,

152
00:08:13,690 --> 00:08:15,750
then you understand
what's going on.

153
00:08:15,750 --> 00:08:19,640
So, that's the other reason
for dealing with that.

154
00:08:19,640 --> 00:08:24,360
To summarize those things --
most of this I already said.

155
00:08:24,360 --> 00:08:30,500
Examples of analog sources are
voice, music, video, images.

156
00:08:30,500 --> 00:08:34,350
We're going to restrict this
to just wave form sources,

157
00:08:34,350 --> 00:08:35,910
which is voice and music.

158
00:08:35,910 --> 00:08:40,600
In other words, an image is
something where you're mapping

159
00:08:40,600 --> 00:08:46,730
from two dimensions this way and
this way into a sequence

160
00:08:46,730 --> 00:08:47,700
of binary digits.

161
00:08:47,700 --> 00:08:51,820
So it's a mapping after you
get done sampling from r

162
00:08:51,820 --> 00:08:58,380
square, which is this axis and
this axis, into your outut.

163
00:08:58,380 --> 00:09:02,250
Namely, for each point in this
plane, there's some real

164
00:09:02,250 --> 00:09:07,250
number that represents the
amplitude at that point.

165
00:09:07,250 --> 00:09:10,740
Video is a three-dimensional
to one-dimensional thing,

166
00:09:10,740 --> 00:09:12,820
namely, you have time.

167
00:09:12,820 --> 00:09:17,090
You also have this way, you
have this way, so you're

168
00:09:17,090 --> 00:09:20,200
mapping from r cubed into r.

169
00:09:20,200 --> 00:09:23,500
We're not going to deal with
those because really all the

170
00:09:23,500 --> 00:09:25,840
ideas are just contained
in dealing

171
00:09:25,840 --> 00:09:27,220
with wave form sources.

172
00:09:27,220 --> 00:09:30,800
In other words, the conventional
functions that

173
00:09:30,800 --> 00:09:32,140
you're used to seeing.

174
00:09:32,140 --> 00:09:35,480
Namely, things that you can draw
on a piece of paper and

175
00:09:35,480 --> 00:09:37,980
you can understand what's
going on with them.

176
00:09:37,980 --> 00:09:41,500
These are usually samples
or expanded into series

177
00:09:41,500 --> 00:09:44,440
expansions almost invariably,
and we'll

178
00:09:44,440 --> 00:09:47,580
understand why later.

179
00:09:47,580 --> 00:09:51,050
That, in fact, is a major
portion of the course.

180
00:09:51,050 --> 00:09:54,090
That's where all of the
stuff from signals and

181
00:09:54,090 --> 00:09:55,500
systems come in.

182
00:09:55,500 --> 00:09:58,710
We'll have to expand that a
whole lot because you didn't

183
00:09:58,710 --> 00:10:00,450
learn enough there.

184
00:10:00,450 --> 00:10:05,000
We need a lot of other things,
and that's what we need to

185
00:10:05,000 --> 00:10:07,500
deal with wave forms.

186
00:10:07,500 --> 00:10:09,370
We'll take the sequence
of numbers that

187
00:10:09,370 --> 00:10:11,670
comes out of the sampler.

188
00:10:11,670 --> 00:10:15,800
We're then going to quantize
that sequence of numbers.

189
00:10:15,800 --> 00:10:17,890
That's the next thing we're
going to study.

190
00:10:17,890 --> 00:10:21,520
Then we're going to get into
analog and discrete sources,

191
00:10:21,520 --> 00:10:26,770
which is the topic we will
study right now.

192
00:10:26,770 --> 00:10:27,850
So we're going to study this.

193
00:10:27,850 --> 00:10:31,210
After we get done this, we're
going to study this also.

194
00:10:31,210 --> 00:10:35,060
When we study this, we'll have
what we know about this as a

195
00:10:35,060 --> 00:10:38,160
way of knowing how to deal with
the whole problem from

196
00:10:38,160 --> 00:10:39,740
here out to here.

197
00:10:39,740 --> 00:10:42,130
Finally, we'll deal with wave
forms and deal with the whole

198
00:10:42,130 --> 00:10:45,070
problem from here out to here.

199
00:10:45,070 --> 00:10:47,770
So that's our plan.

200
00:10:47,770 --> 00:10:52,360
In fact, this whole course is
devoted to studying this

201
00:10:52,360 --> 00:10:55,740
problem, then this problem, then
this problem -- that's

202
00:10:55,740 --> 00:10:58,140
the source part of the course.

203
00:10:58,140 --> 00:11:03,630
Then dealing with -- if I can
find it again -- with the

204
00:11:03,630 --> 00:11:05,700
various parts of this problem.

205
00:11:05,700 --> 00:11:09,010
So first we study sources,
then we study channels.

206
00:11:09,010 --> 00:11:11,400
Because of the binary interface,
when we're all done

207
00:11:11,400 --> 00:11:15,340
with that we understand
digital communication.

208
00:11:15,340 --> 00:11:19,330
When we get towards the end of
the term we'll be looking at

209
00:11:19,330 --> 00:11:21,890
more sophisticated kinds of
channels than we look at

210
00:11:21,890 --> 00:11:26,620
earlier, which are really models
for wireless channels.

211
00:11:26,620 --> 00:11:28,120
So that's where we're
going to end up.

212
00:11:31,020 --> 00:11:33,785
So discrete source coding,
which is what we want

213
00:11:33,785 --> 00:11:35,580
to deal with now.

214
00:11:35,580 --> 00:11:37,970
What's the objective?

215
00:11:37,970 --> 00:11:42,330
We're going to map a sequence
of symbols into a binary

216
00:11:42,330 --> 00:11:47,090
sequence and we're going to do
it with unique decodability.

217
00:11:47,090 --> 00:11:51,220
I'm not going to define unique
decodability at this point.

218
00:11:51,220 --> 00:11:53,400
I'm going to define it
a little bit later.

219
00:11:53,400 --> 00:12:06,240
But roughly what it
means is this.

220
00:12:06,240 --> 00:12:08,720
We have a sequence of symbols
which come into

221
00:12:08,720 --> 00:12:11,200
the encoding encoder.

222
00:12:11,200 --> 00:12:13,930
They go through this
binary channel.

223
00:12:13,930 --> 00:12:16,670
They come out as a sequence
of binary digits.

224
00:12:16,670 --> 00:12:22,000
Unique decodability says if
this guy does his job, can

225
00:12:22,000 --> 00:12:23,450
this guy do his job?

226
00:12:23,450 --> 00:12:27,480
If this guy can always do his
job when these digits are

227
00:12:27,480 --> 00:12:30,140
correct, then you have something
called unique

228
00:12:30,140 --> 00:12:31,260
decodability.

229
00:12:31,260 --> 00:12:35,940
Namely, you can guarantee that
whatever comes out here,

230
00:12:35,940 --> 00:12:39,340
whatever comes in here,
will turn into a

231
00:12:39,340 --> 00:12:41,100
sequence of binary digits.

232
00:12:41,100 --> 00:12:44,150
That sequence of binary digits
goes through here.

233
00:12:44,150 --> 00:12:48,270
These symbols are the same
as these symbols.

234
00:12:48,270 --> 00:12:51,240
In other words, you are
reproducing things error-free

235
00:12:51,240 --> 00:12:55,460
if, in fact, this reproduces
things error-free.

236
00:12:55,460 --> 00:12:56,710
So that's our objective.

237
00:12:59,320 --> 00:13:02,600
There's a very trivial approach
to this, and I hope

238
00:13:02,600 --> 00:13:04,620
all of you will agree
that this is

239
00:13:04,620 --> 00:13:07,020
really, in fact, trivial.

240
00:13:07,020 --> 00:13:09,490
You map each source
symbol into an

241
00:13:09,490 --> 00:13:12,690
l-tuple of binary digits.

242
00:13:12,690 --> 00:13:21,800
If you have an alphabet of size
m, how many different

243
00:13:21,800 --> 00:13:24,210
binary strings are there
of length l?

244
00:13:24,210 --> 00:13:26,200
Well, there are 2 to
the l of them.

245
00:13:26,200 --> 00:13:32,230
If l is equal to 2, you have
0, 0, 0, 1, 1, 0, and 1, 1.

246
00:13:32,230 --> 00:13:36,440
If l is equal to 3, you have
strings of 3, which is 0, 0,

247
00:13:36,440 --> 00:13:39,790
0, 0, 0, 1, 0, 1, 0, blah,
blah, blah, blah, blah.

248
00:13:42,360 --> 00:13:46,050
What comes out to be 2 to the
3, which is equal to 8.

249
00:13:46,050 --> 00:13:50,340
So what we need if we're going
to use this approach, which is

250
00:13:50,340 --> 00:13:52,940
the simplest possible approach,
which is called the

251
00:13:52,940 --> 00:13:58,100
fixed length approach, is you
need the alphabet size to be

252
00:13:58,100 --> 00:14:01,680
less than or equal to the number
of binary digits that

253
00:14:01,680 --> 00:14:04,840
you use in these strings.

254
00:14:04,840 --> 00:14:06,960
Now, is that trivial or
isn't it trivial?

255
00:14:06,960 --> 00:14:09,490
I hope it's trivial.

256
00:14:12,480 --> 00:14:14,710
We don't want to waste bits
when we're doing this,

257
00:14:14,710 --> 00:14:18,520
particularly, so we don't want
to make l any bigger than we

258
00:14:18,520 --> 00:14:22,500
have to, because for every
symbol that comes in, we get l

259
00:14:22,500 --> 00:14:23,910
symbols coming out.

260
00:14:23,910 --> 00:14:28,010
So we'd like to minimize l
subject to this constraint

261
00:14:28,010 --> 00:14:30,810
that 2 to the l has to
be bigger, greater

262
00:14:30,810 --> 00:14:33,230
than or equal to m.

263
00:14:33,230 --> 00:14:36,420
So, what we want to do is we
want to choose l as the

264
00:14:36,420 --> 00:14:40,090
smallest integer which
satisfies this.

265
00:14:40,090 --> 00:14:43,160
In other words, when you take
the logarithm to the base 2 of

266
00:14:43,160 --> 00:14:47,690
this, you get log to the base 2
of m has to be less than or

267
00:14:47,690 --> 00:14:51,660
equal to l, and l is then going
to be less than log to

268
00:14:51,660 --> 00:14:54,360
the base 2 of m plus 1.

269
00:14:54,360 --> 00:14:58,120
This is the constraint which
says you don't make l any

270
00:14:58,120 --> 00:14:59,990
bigger than you have
to make it.

271
00:14:59,990 --> 00:15:03,570
So in other words, we're going
to choose l equal to the

272
00:15:03,570 --> 00:15:05,840
ceiling function of
log to the base m.

273
00:15:05,840 --> 00:15:11,230
In other words, this is the
integer which is greater than

274
00:15:11,230 --> 00:15:15,860
or equal to log to
the base 2 of m.

275
00:15:15,860 --> 00:15:18,570
So let me give you a couple
of examples of that.

276
00:15:18,570 --> 00:15:21,880
Excuse me for boring you with
something which really is

277
00:15:21,880 --> 00:15:25,300
trivial, but there's notation
here you have to get used to.

278
00:15:25,300 --> 00:15:29,050
You get confused with this
because there's the alphabet

279
00:15:29,050 --> 00:15:32,330
size which we call m, there's
the string length which we

280
00:15:32,330 --> 00:15:36,820
call l, and you keep getting
mixed up between these two.

281
00:15:36,820 --> 00:15:38,550
Everybody gets mixed
up between them.

282
00:15:38,550 --> 00:15:42,480
I had a doctoral student the
other day who got mixed up in

283
00:15:42,480 --> 00:15:45,740
it, and I read what she had
written four times and I

284
00:15:45,740 --> 00:15:47,250
didn't catch it either.

285
00:15:47,250 --> 00:15:51,320
So this does get confusing
at times.

286
00:15:51,320 --> 00:15:54,540
If you have an alphabet, which
is five different kinds of the

287
00:15:54,540 --> 00:15:56,360
letter a --

288
00:15:56,360 --> 00:15:58,940
that's one reason why these
source codes get messy, you

289
00:15:58,940 --> 00:16:02,790
have too many different kinds
of each letter, which

290
00:16:02,790 --> 00:16:07,870
technical people who like a lot
of jargon use all of them.

291
00:16:07,870 --> 00:16:11,210
In fact, when people start
writing papers and books you

292
00:16:11,210 --> 00:16:14,000
find many more than
five there.

293
00:16:14,000 --> 00:16:17,350
In terms of Latex, you get math
cow, you get math gold,

294
00:16:17,350 --> 00:16:19,550
you get math blah, blah, blah.

295
00:16:19,550 --> 00:16:21,450
Everything in little and big.

296
00:16:21,450 --> 00:16:23,580
You get the Greek version.

297
00:16:23,580 --> 00:16:27,820
You get the Roman version and
the Arabic version, if you're

298
00:16:27,820 --> 00:16:32,650
smart enough to know that
language, those languages.

299
00:16:32,650 --> 00:16:36,520
What we mean by code is alpha
gets mapped into 0, 0, 0. a

300
00:16:36,520 --> 00:16:39,110
gets mapped into 0, 0, 1.

301
00:16:39,110 --> 00:16:43,220
Capital A into this
and so forth.

302
00:16:43,220 --> 00:16:49,010
Does it make any difference
what mapping you use here?

303
00:16:49,010 --> 00:16:51,840
Can you find any possible reason
why it wouldn't make a

304
00:16:51,840 --> 00:16:56,670
difference whether I map alpha
into 0, 0, 0, and a into 0, 0,

305
00:16:56,670 --> 00:16:58,610
1 or vice versa?

306
00:16:58,610 --> 00:17:02,400
I can't find any reason
for that.

307
00:17:02,400 --> 00:17:04,860
Would it make any difference
of instead of having this

308
00:17:04,860 --> 00:17:12,870
alphabet I had beta b, capital
B, script b, and capital B

309
00:17:12,870 --> 00:17:14,090
with a line over it?

310
00:17:14,090 --> 00:17:16,020
I can't see any reason
why that would make

311
00:17:16,020 --> 00:17:17,380
a difference either.

312
00:17:17,380 --> 00:17:19,420
In other words, when we're
talking about fixed length

313
00:17:19,420 --> 00:17:22,790
codes, there are only two
things of importance.

314
00:17:22,790 --> 00:17:26,640
One of them is how big is the
alphabet -- that's why we talk

315
00:17:26,640 --> 00:17:29,550
about alphabets all the time.

316
00:17:29,550 --> 00:17:31,980
After you know how big the
alphabet is and after you know

317
00:17:31,980 --> 00:17:36,340
you want to do a fixed length
binary encoding, then you just

318
00:17:36,340 --> 00:17:42,600
assign a binary string to
each of these letters.

319
00:17:42,600 --> 00:17:44,670
In other words, there's nothing

320
00:17:44,670 --> 00:17:47,650
important in these symbols.

321
00:17:47,650 --> 00:17:50,360
This is a very important
principle

322
00:17:50,360 --> 00:17:52,350
of information theory.

323
00:17:52,350 --> 00:17:55,850
It sort of underlines
the whole subject.

324
00:17:55,850 --> 00:17:58,730
I'm not really talking about
information theory here, as I

325
00:17:58,730 --> 00:18:00,930
said, we're talking about
communication.

326
00:18:00,930 --> 00:18:04,300
But communication these days is
built on these information

327
00:18:04,300 --> 00:18:06,340
theoretic ideas.

328
00:18:06,340 --> 00:18:09,700
Symbols don't have any
inherent meaning.

329
00:18:09,700 --> 00:18:12,390
As far as communication is
concerned, all you're

330
00:18:12,390 --> 00:18:15,050
interested in is what is
the set of things --

331
00:18:15,050 --> 00:18:20,100
I could call this a1, a2, a3,
a4, a5, and we're going to

332
00:18:20,100 --> 00:18:23,940
start doing this after awhile
because we will recognize that

333
00:18:23,940 --> 00:18:27,550
the name of the symbols don't
make any difference.

334
00:18:27,550 --> 00:18:31,780
If you listen to a political
speech if it's by a Republican

335
00:18:31,780 --> 00:18:34,730
there are n different things
they might say, and you might

336
00:18:34,730 --> 00:18:37,750
as well number them
a1 to a sub n.

337
00:18:37,750 --> 00:18:41,620
If you listen to one of the
Democratic candidates there

338
00:18:41,620 --> 00:18:44,060
are m different things
they might say.

339
00:18:44,060 --> 00:18:47,360
You can number them 1 to m,
and you can talk to other

340
00:18:47,360 --> 00:18:51,950
people about it and say oh, he
said a1 today, which is how do

341
00:18:51,950 --> 00:18:55,170
we get out of the war in Iraq.

342
00:18:55,170 --> 00:18:59,050
Or he said number 2 today, which
is we need more taxes or

343
00:18:59,050 --> 00:19:01,620
less taxes and so forth.

344
00:19:01,620 --> 00:19:05,220
So it's not what they say as
far as communication is

345
00:19:05,220 --> 00:19:08,290
concerned, it's just
distinguishing the different

346
00:19:08,290 --> 00:19:10,660
possible symbols.

347
00:19:10,660 --> 00:19:13,610
So, you can easily
decode this --

348
00:19:13,610 --> 00:19:17,110
you see three bits and
you decode them.

349
00:19:17,110 --> 00:19:18,590
Can I?

350
00:19:18,590 --> 00:19:21,730
Is this right or is there
something missing here?

351
00:19:24,860 --> 00:19:26,310
Of course, there's something
missing.

352
00:19:26,310 --> 00:19:28,890
You need synchronization if
you're going to do this.

353
00:19:28,890 --> 00:19:32,890
If I see a very long string of
binary digits and I'm going to

354
00:19:32,890 --> 00:19:36,980
decode them into these letters
here, I need to know where the

355
00:19:36,980 --> 00:19:38,570
beginning is.

356
00:19:38,570 --> 00:19:42,660
In other words, if it's a
semi-infinite string of binary

357
00:19:42,660 --> 00:19:45,970
digits, I don't know
how to look at it.

358
00:19:45,970 --> 00:19:50,870
So, inherently, we believe that
somebody else gives us

359
00:19:50,870 --> 00:19:51,770
synchronization.

360
00:19:51,770 --> 00:19:54,940
This is one of these things
we always assume.

361
00:19:54,940 --> 00:19:58,140
When you start building a system
after you decide how to

362
00:19:58,140 --> 00:20:01,570
do this kind of coding, somebody
at some point has to

363
00:20:01,570 --> 00:20:03,520
go through and decide
where do you get the

364
00:20:03,520 --> 00:20:05,550
synchronization from.

365
00:20:05,550 --> 00:20:08,610
But you shouldn't think of the
synchronization first.

366
00:20:08,610 --> 00:20:14,310
If I'm encoding 10 million
symbols and it takes me 1,000

367
00:20:14,310 --> 00:20:18,260
bits to achieve the
synchronization, that 1,000

368
00:20:18,260 --> 00:20:21,950
bits gets amortized over 10
million different symbols, and

369
00:20:21,950 --> 00:20:24,990
therefore, it doesn't make any
difference, and therefore,

370
00:20:24,990 --> 00:20:26,780
we're going to ignore it.

371
00:20:26,780 --> 00:20:31,290
It's an important problem
but we ignore it.

372
00:20:31,290 --> 00:20:34,350
The ASCII code is a more
important example in this.

373
00:20:34,350 --> 00:20:37,220
It was invented many,
many years ago.

374
00:20:37,220 --> 00:20:41,140
It was a mapping from 256
different symbols which are

375
00:20:41,140 --> 00:20:44,810
all the letters, all the
numbers, all the things that

376
00:20:44,810 --> 00:20:46,740
people used on typewriters.

377
00:20:46,740 --> 00:20:48,960
Anybody remember what
a typewriter is?

378
00:20:48,960 --> 00:20:51,290
Well, it's something people used
to use before they had

379
00:20:51,290 --> 00:20:54,450
computers, and these typewriters
had a lot of

380
00:20:54,450 --> 00:20:57,140
different keys on them and
they had a lot of special

381
00:20:57,140 --> 00:20:58,640
things you could do with them.

382
00:20:58,640 --> 00:21:03,040
And somebody dreamed up 256
different things that they

383
00:21:03,040 --> 00:21:04,040
might want to do.

384
00:21:04,040 --> 00:21:06,730
Why do they use l equals 8?

385
00:21:06,730 --> 00:21:10,020
Nothing to do with communication
or with

386
00:21:10,020 --> 00:21:13,330
information theory or with
any of these things.

387
00:21:13,330 --> 00:21:15,240
It was that 8 is
a nice number.

388
00:21:15,240 --> 00:21:16,900
It's 2 to the 3.

389
00:21:16,900 --> 00:21:21,640
In other words, this was a
standard length of both

390
00:21:21,640 --> 00:21:24,780
computer words and of lots
of other things.

391
00:21:24,780 --> 00:21:29,020
Everybody likes to deal with 8
bits, which you call a byte,

392
00:21:29,020 --> 00:21:32,290
rather than 7 bits which is
sort of awkward or 6 bits

393
00:21:32,290 --> 00:21:34,600
which was an earlier standard,
which would have been

394
00:21:34,600 --> 00:21:37,720
perfectly adequate for most
things that people wanted.

395
00:21:37,720 --> 00:21:41,620
But no, they had to go
to 8 bits because it

396
00:21:41,620 --> 00:21:44,710
just sounded nicer.

397
00:21:44,710 --> 00:21:47,820
These codes are called
fixed length codes.

398
00:21:47,820 --> 00:21:50,020
I'd like to say more about them
but there really isn't

399
00:21:50,020 --> 00:21:53,500
much more to say about them.

400
00:21:53,500 --> 00:21:57,810
There is a more general version
of them, which we'll

401
00:21:57,810 --> 00:22:00,820
call generalized fixed
length codes.

402
00:22:00,820 --> 00:22:05,090
The idea there is to segment
the source sequence.

403
00:22:05,090 --> 00:22:08,110
In other words, we're always
visualizing now having a

404
00:22:08,110 --> 00:22:10,280
sequence of symbols
which starts at

405
00:22:10,280 --> 00:22:13,640
time zero, runs forever.

406
00:22:13,640 --> 00:22:17,450
We want to segment that into
blocks of length n.

407
00:22:17,450 --> 00:22:21,590
Namely, you pick off the first
n symbols, you find the code

408
00:22:21,590 --> 00:22:25,370
word for those n symbols, then
you find the code word for the

409
00:22:25,370 --> 00:22:28,640
next n symbols, then you find
the code word for the next n

410
00:22:28,640 --> 00:22:30,510
symbols and so forth.

411
00:22:30,510 --> 00:22:32,230
So it's really the
same problem that

412
00:22:32,230 --> 00:22:33,860
we looked at before.

413
00:22:33,860 --> 00:22:37,560
It's just that the alphabet
before had the number of

414
00:22:37,560 --> 00:22:43,380
symbols as the alphabet size.

415
00:22:43,380 --> 00:22:47,190
Now, instead of having an
alphabet size which is m,

416
00:22:47,190 --> 00:22:52,900
we're looking at blocks of m
symbols and how many possible

417
00:22:52,900 --> 00:22:57,530
combinations are there of blocks
where every symbol is

418
00:22:57,530 --> 00:22:58,980
one of m different things.

419
00:22:58,980 --> 00:23:01,960
Well, if you have two symbols,
the first one can be any one

420
00:23:01,960 --> 00:23:05,530
of m things, the second one can
be any one of m things.

421
00:23:05,530 --> 00:23:08,710
So there are m squared possible
combinations for the

422
00:23:08,710 --> 00:23:12,400
first two symbols, there are m
cubed possible combinations

423
00:23:12,400 --> 00:23:15,080
for the first three symbols
and so forth.

424
00:23:15,080 --> 00:23:19,890
So we're going to have an
alphabet on blocks of m to the

425
00:23:19,890 --> 00:23:24,080
n different n tuples
of source letters.

426
00:23:24,080 --> 00:23:27,435
Well, once you see that we're
done because what we're going

427
00:23:27,435 --> 00:23:30,300
to do is find a binary sequence
for every one of

428
00:23:30,300 --> 00:23:33,810
these blocks of m to
the n symbols.

429
00:23:33,810 --> 00:23:36,120
As I said before, the only
thing important is

430
00:23:36,120 --> 00:23:37,100
how many are there.

431
00:23:37,100 --> 00:23:41,660
It doesn't matter that they're
blocks or that they're stacked

432
00:23:41,660 --> 00:23:44,850
this way or that they're stacked
around in a circle or

433
00:23:44,850 --> 00:23:45,790
anything else.

434
00:23:45,790 --> 00:23:49,020
All you're interested in is how
many of them are there.

435
00:23:49,020 --> 00:23:51,990
So there are m to
the n of them.

436
00:23:51,990 --> 00:23:56,480
So, what we want to do is make
the binary length that we're

437
00:23:56,480 --> 00:24:00,960
dealing with equal to log to the
base 2 of m to the n, the

438
00:24:00,960 --> 00:24:03,760
ceiling function of that.

439
00:24:03,760 --> 00:24:10,240
Which says log to the base 2 of
m is less than or equal to

440
00:24:10,240 --> 00:24:15,130
l bar where l bar is going to be
the bits per source symbol.

441
00:24:15,130 --> 00:24:18,100
I'm going to abbreviate that
bits per source symbol.

442
00:24:18,100 --> 00:24:24,130
I would like to abbreviate it
bps, but I and everyone else

443
00:24:24,130 --> 00:24:27,120
will keep thinking that bps
means bits per second.

444
00:24:27,120 --> 00:24:30,160
We don't have to worry about
seconds here, seconds had

445
00:24:30,160 --> 00:24:31,820
nothing to do with
this problem.

446
00:24:31,820 --> 00:24:34,150
We're just dealing with
sequences of things and we

447
00:24:34,150 --> 00:24:37,460
don't care how often
they occur.

448
00:24:37,460 --> 00:24:40,780
They might just be sitting in
a computer file and we're

449
00:24:40,780 --> 00:24:43,000
doing them offline, so seconds
has nothing to

450
00:24:43,000 --> 00:24:45,940
do with this problem.

451
00:24:45,940 --> 00:24:51,790
So, log to the base 2 of m is
less than or equal to l over

452
00:24:51,790 --> 00:24:56,790
n, which is less than log to the
base 2 of m plus 1 over n.

453
00:24:56,790 --> 00:25:01,450
In other words, we're just
taking this dividing it by n,

454
00:25:01,450 --> 00:25:05,000
we're taking this dividing by
n, the ceiling function is

455
00:25:05,000 --> 00:25:08,520
between log to the base 2 of m
to the n, and log to the base

456
00:25:08,520 --> 00:25:10,420
2 of m to the n plus 1.

457
00:25:10,420 --> 00:25:15,550
When we divide by n, that
1 becomes 1 over n.

458
00:25:15,550 --> 00:25:19,740
What happens when you make n
large? l approaches log to the

459
00:25:19,740 --> 00:25:22,270
base 2 of m from above.

460
00:25:22,270 --> 00:25:27,830
Therefore, fixed length coding
requires log to the base 2 of

461
00:25:27,830 --> 00:25:32,080
n bits per source symbol
if, in fact, you

462
00:25:32,080 --> 00:25:33,910
make n large enough.

463
00:25:33,910 --> 00:25:35,850
In other words, for the example
of five different

464
00:25:35,850 --> 00:25:41,030
kinds of a's, we had
m equal to 5.

465
00:25:41,030 --> 00:25:52,030
So if you have m equal to 5,
that leads to m squared equals

466
00:25:52,030 --> 00:26:03,350
25, that leads to l equals --
what's the ceiling function of

467
00:26:03,350 --> 00:26:04,440
log of this?

468
00:26:04,440 --> 00:26:06,620
It's 5.

469
00:26:06,620 --> 00:26:15,010
l bar is equal to --
what's half of 5?

470
00:26:15,010 --> 00:26:16,260
2 and 1/2, yes.

471
00:26:18,730 --> 00:26:22,110
As you get older you can't
do arithmetic anymore.

472
00:26:22,110 --> 00:26:23,930
So look what we've
accomplished.

473
00:26:23,930 --> 00:26:26,655
We've gone from three bits per
symbol down to two and and

474
00:26:26,655 --> 00:26:29,870
half bits per symbol,
isn't that exciting?

475
00:26:29,870 --> 00:26:31,490
Well, you look at it
and you say no,

476
00:26:31,490 --> 00:26:34,700
that's not very exciting.

477
00:26:34,700 --> 00:26:38,920
I mean yes, you can do it, but
most people don't do that.

478
00:26:38,920 --> 00:26:41,510
So why do we bother with this?

479
00:26:41,510 --> 00:26:43,960
Well, it's the same reason we
bother with a lot of things in

480
00:26:43,960 --> 00:26:47,350
this course, and the whole first
two weeks of this course

481
00:26:47,350 --> 00:26:50,740
will be dealing with things
where when you look at them

482
00:26:50,740 --> 00:26:54,850
and you ask is this important,
you have to answer no, it's

483
00:26:54,850 --> 00:26:57,750
not important, it doesn't really
have much to do with

484
00:26:57,750 --> 00:27:02,600
anything, it's a mathematical
idea.

485
00:27:02,600 --> 00:27:06,500
What it does have to do with is
the principle involved here

486
00:27:06,500 --> 00:27:07,820
is important.

487
00:27:07,820 --> 00:27:10,710
It says that the lower limit of
what you can do with fixed

488
00:27:10,710 --> 00:27:14,140
coding is log to the
base 2 of m.

489
00:27:14,140 --> 00:27:17,650
You have an alphabet of size
m, you can get as close to

490
00:27:17,650 --> 00:27:19,260
this as you want to.

491
00:27:19,260 --> 00:27:23,640
We will find out later that
if you have equally likely

492
00:27:23,640 --> 00:27:26,870
symbols when we get to talking
about probability, we will

493
00:27:26,870 --> 00:27:28,810
find out that nothing
in the world can do

494
00:27:28,810 --> 00:27:31,010
any better than this.

495
00:27:31,010 --> 00:27:33,350
That's the more important thing,
because what we're

496
00:27:33,350 --> 00:27:37,370
eventually interested in is
what's the best you can do if

497
00:27:37,370 --> 00:27:39,250
you do things very
complicated.

498
00:27:39,250 --> 00:27:41,260
Why do you want to know what
the best is if you do

499
00:27:41,260 --> 00:27:42,930
something very complicated?

500
00:27:42,930 --> 00:27:46,450
Because if you can do that
simply then you know you don't

501
00:27:46,450 --> 00:27:48,990
have to look any further.

502
00:27:48,990 --> 00:27:51,090
So that's the important thing.

503
00:27:51,090 --> 00:27:53,960
Namely, it lets you do something
simple and know

504
00:27:53,960 --> 00:27:56,060
that, in fact, what you're
doing makes sense.

505
00:27:58,580 --> 00:28:01,140
That's why we do all of that.

506
00:28:01,140 --> 00:28:05,120
But then after we say well
there's no place else to go on

507
00:28:05,120 --> 00:28:08,560
fixed length codes, we say well,
let's look at variable

508
00:28:08,560 --> 00:28:10,970
length codes.

509
00:28:10,970 --> 00:28:15,580
The motivation for variable
length codes is that probable

510
00:28:15,580 --> 00:28:21,420
symbols should probably have
shorter code words than very

511
00:28:21,420 --> 00:28:23,810
unlikely symbols.

512
00:28:23,810 --> 00:28:27,650
And Morse thought of this a
long, long time ago when Morse

513
00:28:27,650 --> 00:28:28,820
code came along.

514
00:28:28,820 --> 00:28:33,800
Probably other people thought of
it earlier, but he actually

515
00:28:33,800 --> 00:28:37,630
developed the system
and it worked.

516
00:28:37,630 --> 00:28:41,880
Everyone since then has
understood that if you have a

517
00:28:41,880 --> 00:28:47,040
symbol that only occurs very,
very, very rarely, you would

518
00:28:47,040 --> 00:28:51,670
like to do something, make a
code word which is very long

519
00:28:51,670 --> 00:28:55,630
for it so it doesn't interfere
with other code words.

520
00:28:55,630 --> 00:28:58,570
Namely, one of the things that
you often do when you're

521
00:28:58,570 --> 00:29:01,710
developing a code is think of a
whole bunch of things which

522
00:29:01,710 --> 00:29:03,320
are sort of exceptions.

523
00:29:03,320 --> 00:29:05,220
They hardly ever happen.

524
00:29:05,220 --> 00:29:08,450
You use the fixed length code
for all the things that happen

525
00:29:08,450 --> 00:29:13,200
all the time, and you make one
extra code word for all these

526
00:29:13,200 --> 00:29:15,110
exceptions.

527
00:29:15,110 --> 00:29:18,130
Then you have this exception and
paste it on at the end of

528
00:29:18,130 --> 00:29:22,670
the exception is a number
which represents which

529
00:29:22,670 --> 00:29:25,130
exception you're looking at.

530
00:29:25,130 --> 00:29:27,710
Presto, you have a variable
length code.

531
00:29:27,710 --> 00:29:30,890
Namely, you have two different
possible code lengths -- one

532
00:29:30,890 --> 00:29:34,780
of them for all of the likely
things and the indication that

533
00:29:34,780 --> 00:29:39,200
there is an exception, and two,
all the unlikely things.

534
00:29:39,200 --> 00:29:41,630
There's an important
feature there.

535
00:29:41,630 --> 00:29:44,010
You can't drop out having
the code word

536
00:29:44,010 --> 00:29:45,970
saying this is an exception.

537
00:29:45,970 --> 00:29:48,950
If you just have a bunch of
short code words and a bunch

538
00:29:48,950 --> 00:29:53,050
of long code words, then you see
a short code word and you

539
00:29:53,050 --> 00:29:57,530
don't know -- well, if you see
a long code word starting or

540
00:29:57,530 --> 00:30:00,260
you have a short code word, you
don't know which it is and

541
00:30:00,260 --> 00:30:01,980
you're stuck.

542
00:30:01,980 --> 00:30:07,040
So one example of a variable
length code -- we'll use some

543
00:30:07,040 --> 00:30:09,000
jargon here.

544
00:30:09,000 --> 00:30:14,700
We'll call the code
a script c.

545
00:30:14,700 --> 00:30:18,360
We'll think of script c as a
mapping which goes from the

546
00:30:18,360 --> 00:30:21,620
symbols onto binary strings.

547
00:30:21,620 --> 00:30:26,300
In other words, c of x is the
code word corresponding

548
00:30:26,300 --> 00:30:27,600
to the symbol x.

549
00:30:27,600 --> 00:30:33,390
So for each x in the alphabet,
capital X, and we have to

550
00:30:33,390 --> 00:30:35,350
think of what the
capital X is.

551
00:30:35,350 --> 00:30:39,030
But as we say, the only thing
we're really interested in is

552
00:30:39,030 --> 00:30:41,160
how big is this alphabet --

553
00:30:41,160 --> 00:30:43,810
that's the only thing
of importance.

554
00:30:43,810 --> 00:30:46,540
So if we have an alphabet which
consists of the three

555
00:30:46,540 --> 00:30:51,160
letters a, b and c, we might
make a code where the code

556
00:30:51,160 --> 00:30:54,840
word for a is equal to zero, the
code word for b is equal

557
00:30:54,840 --> 00:30:59,300
to 1, zero, and the code word
for c is equal to 1,1.

558
00:30:59,300 --> 00:31:01,310
Now it turns out that's
a perfectly fine

559
00:31:01,310 --> 00:31:04,130
code and that works.

560
00:31:04,130 --> 00:31:06,380
Let me show you another
example of a code.

561
00:31:09,020 --> 00:31:14,560
Let me just show you an example
of a code here so we

562
00:31:14,560 --> 00:31:17,350
can see that not everything
works.

563
00:31:17,350 --> 00:31:30,210
Suppose c of a is zero, c of b
is 1, and c of c is -- this is

564
00:31:30,210 --> 00:31:35,800
a script c, that's a little
c -- is 1, zero.

565
00:31:35,800 --> 00:31:37,050
Does that work?

566
00:31:39,520 --> 00:31:44,800
Well, all of the symbols have
different code words, but this

567
00:31:44,800 --> 00:31:47,820
is an incredibly stupid
thing to do.

568
00:31:47,820 --> 00:31:52,880
It's an incredibly stupid thing
to do because if I send

569
00:31:52,880 --> 00:31:59,110
a b followed by an a, what the
poor decoder sees is 1

570
00:31:59,110 --> 00:32:01,380
followed by zero.

571
00:32:01,380 --> 00:32:04,020
In other words, one of the
things that I didn't tell you

572
00:32:04,020 --> 00:32:07,610
about is when we're using
variable length codes we're

573
00:32:07,610 --> 00:32:10,820
just concatenating all of these
code words together.

574
00:32:10,820 --> 00:32:13,010
We don't put any spaces
between them.

575
00:32:13,010 --> 00:32:15,720
We don't put any commas
between them.

576
00:32:15,720 --> 00:32:18,570
If, in fact, I put a space
between them, I would really

577
00:32:18,570 --> 00:32:22,890
have not a binary alphabet
but a ternary alphabet.

578
00:32:22,890 --> 00:32:25,180
I would have zeros and I would
have 1's and I would have

579
00:32:25,180 --> 00:32:27,830
spaces, and you don't
like to do that

580
00:32:27,830 --> 00:32:29,550
because it's much harder.

581
00:32:29,550 --> 00:32:31,590
When we start to study channels
we'll see that

582
00:32:31,590 --> 00:32:34,780
ternary alphabets are much more
difficult to work with

583
00:32:34,780 --> 00:32:36,660
than binary alphabets.

584
00:32:36,660 --> 00:32:41,100
So this doesn't work,
this does work.

585
00:32:41,100 --> 00:32:44,940
Part of what we're going to be
interested in is what are the

586
00:32:44,940 --> 00:32:48,670
conditions under why
this works and why

587
00:32:48,670 --> 00:32:51,650
this doesn't work.

588
00:32:51,650 --> 00:32:55,350
Again, when you understand this
problem you will say it's

589
00:32:55,350 --> 00:32:59,570
very simple, and then you come
back to look at it again and

590
00:32:59,570 --> 00:33:03,200
you'll say it's complicated
and then it looks simple.

591
00:33:03,200 --> 00:33:05,470
It's one of these problems that
looks simple when you

592
00:33:05,470 --> 00:33:09,490
look at it in the right way, and
it looks complicated when

593
00:33:09,490 --> 00:33:14,140
you get turned around and you
look at it backwards.

594
00:33:14,140 --> 00:33:19,330
So the success of code words of
a variable length code are

595
00:33:19,330 --> 00:33:22,760
all transmitted just as a
continuing sequence of bits.

596
00:33:22,760 --> 00:33:26,080
You don't have any of these
commas or spaces in them.

597
00:33:26,080 --> 00:33:28,810
If I have a sequence of symbols
which come into the

598
00:33:28,810 --> 00:33:33,480
encoder, those get mapped into
a sequence of bits, variable

599
00:33:33,480 --> 00:33:36,300
length sequences of bits
which come out.

600
00:33:36,300 --> 00:33:40,020
They all get pushed together
and just come out

601
00:33:40,020 --> 00:33:42,860
one after the other.

602
00:33:42,860 --> 00:33:45,490
Buffering can be a problem here,
because when you have a

603
00:33:45,490 --> 00:33:47,560
variable length code --

604
00:33:47,560 --> 00:33:51,640
I mean look at what
happens here.

605
00:33:51,640 --> 00:33:55,300
If I've got a very long string
of a's coming in, I got a very

606
00:33:55,300 --> 00:33:57,480
short string of bits
coming out.

607
00:33:57,480 --> 00:34:00,870
If I have a long string of b's
and c's coming in, I have a

608
00:34:00,870 --> 00:34:03,330
very long string of
bits coming out.

609
00:34:03,330 --> 00:34:06,970
Now usually the way the channels
work is that you put

610
00:34:06,970 --> 00:34:10,510
in bits at a fixed
rate in time.

611
00:34:10,510 --> 00:34:15,250
Usually the way that sources
work is that symbols arrive at

612
00:34:15,250 --> 00:34:17,030
a fixed rate in time.

613
00:34:17,030 --> 00:34:20,530
Therefore, here, if symbols are
coming in at a fixed rate

614
00:34:20,530 --> 00:34:25,000
in time, they're going out at
a non-fixed rate in time.

615
00:34:25,000 --> 00:34:28,280
We have to bring them into a
channel at a fixed rate in

616
00:34:28,280 --> 00:34:31,990
time, so we need a buffer to
take care of the difference

617
00:34:31,990 --> 00:34:35,000
between the rate at which they
come out and the rate at which

618
00:34:35,000 --> 00:34:36,020
they go in.

619
00:34:36,020 --> 00:34:39,880
We will talk about that problem
later, but for now we

620
00:34:39,880 --> 00:34:42,270
just say OK, we have
a buffer, we'll put

621
00:34:42,270 --> 00:34:43,610
them all in a buffer.

622
00:34:43,610 --> 00:34:46,510
If the buffer ever empties out
-- well, that's sort of like

623
00:34:46,510 --> 00:34:48,130
the problem of initial
synchronization.

624
00:34:48,130 --> 00:34:52,120
It's something that doesn't
happen very often, and we'll

625
00:34:52,120 --> 00:34:54,690
put some junior engineer on
that because it's a hard

626
00:34:54,690 --> 00:34:57,280
problem, and seeing your
engineers never deal with the

627
00:34:57,280 --> 00:34:59,920
hard problems, they always
give those to the junior

628
00:34:59,920 --> 00:35:03,430
engineers so that they can
assert their superiority over

629
00:35:03,430 --> 00:35:05,500
the junior engineers.

630
00:35:05,500 --> 00:35:07,610
It's a standard thing you
find in the industry.

631
00:35:12,810 --> 00:35:15,390
We also require unique
decodability.

632
00:35:15,390 --> 00:35:21,290
Namely, the encoded bit stream
has to be uniquely deparsed at

633
00:35:21,290 --> 00:35:22,460
the decoder.

634
00:35:22,460 --> 00:35:25,450
I have to have some way of
taking that long string of

635
00:35:25,450 --> 00:35:28,990
bits and figuring out where the
commas would have gone if

636
00:35:28,990 --> 00:35:30,870
I put commas in it.

637
00:35:30,870 --> 00:35:33,760
Then from that I have
to decode things.

638
00:35:33,760 --> 00:35:38,970
In other words, it means that
every symbol in the alphabet

639
00:35:38,970 --> 00:35:43,630
has to have a distinct code
word connected with it.

640
00:35:43,630 --> 00:35:45,040
We have that here.

641
00:35:45,040 --> 00:35:46,000
We have that here.

642
00:35:46,000 --> 00:35:49,340
Every symbol has a distinct
code word.

643
00:35:49,340 --> 00:35:52,060
But it has to be
more than that.

644
00:35:52,060 --> 00:35:55,180
I'm not even going to talk about
precisely what that more

645
00:35:55,180 --> 00:35:59,090
means for a little bit.

646
00:35:59,090 --> 00:36:03,830
We also assume to make life easy
for the decoder that it

647
00:36:03,830 --> 00:36:06,200
has initial synchronization.

648
00:36:06,200 --> 00:36:09,190
There's another obvious
property that we have.

649
00:36:09,190 --> 00:36:13,980
Namely, both the encoder and the
decoder know what the code

650
00:36:13,980 --> 00:36:15,900
is to start with.

651
00:36:15,900 --> 00:36:18,490
In other words, the code is
built into these devices.

652
00:36:18,490 --> 00:36:23,320
When you design a coder and a
decoder, what you're doing is

653
00:36:23,320 --> 00:36:27,210
you figure out what an
appropriate code should be,

654
00:36:27,210 --> 00:36:30,610
you give it to both the encoder
and the decoder, both

655
00:36:30,610 --> 00:36:33,540
of them know what the code is
and therefore, both of them

656
00:36:33,540 --> 00:36:35,450
can start decoding.

657
00:36:35,450 --> 00:36:38,620
A piece of confusion.

658
00:36:38,620 --> 00:36:43,510
We have an alphabet here which
has a list of symbols in it.

659
00:36:43,510 --> 00:36:48,540
So there's a symbol a1,
a2, a3, up to a sub m.

660
00:36:48,540 --> 00:36:52,370
We're sending a sequence of
symbols, and we usually call

661
00:36:52,370 --> 00:36:55,990
the sequence of symbols we're
sending x1, x2, x3,

662
00:36:55,990 --> 00:37:00,230
x4, x5 and so forth.

663
00:37:00,230 --> 00:37:03,490
The difference is the symbols
in the alphabet are all

664
00:37:03,490 --> 00:37:06,200
distinct, we're listing them
one after the other.

665
00:37:06,200 --> 00:37:08,310
Usually there's a finite
number of them.

666
00:37:08,310 --> 00:37:11,680
Incidentally, we could have a
countable number of symbols.

667
00:37:11,680 --> 00:37:17,880
You could try to do everything
we're doing here say with the

668
00:37:17,880 --> 00:37:21,210
integers, and there's a
countable number of integers.

669
00:37:21,210 --> 00:37:24,710
All of this theory pretty much
carries through with various

670
00:37:24,710 --> 00:37:26,250
little complications.

671
00:37:26,250 --> 00:37:30,350
We're leaving that out here
because after you understand

672
00:37:30,350 --> 00:37:33,290
what we're doing, making
it apply to integers is

673
00:37:33,290 --> 00:37:34,560
straightforward.

674
00:37:34,560 --> 00:37:37,590
Putting in the integers to start
with, you'll always be

675
00:37:37,590 --> 00:37:41,340
fussing about various silly
little special cases, and I

676
00:37:41,340 --> 00:37:44,975
don't know a single situation
where anybody deals with a

677
00:37:44,975 --> 00:37:49,450
countable alphabet, except
by truncating it.

678
00:37:49,450 --> 00:37:52,130
When you truncate an infinite
alphabet you

679
00:37:52,130 --> 00:37:53,390
get a finite alphabet.

680
00:37:53,390 --> 00:37:57,680
So, we'll assume initial
synchronization, we'll also

681
00:37:57,680 --> 00:38:01,410
assume that there's
a finite alphabet.

682
00:38:01,410 --> 00:38:04,340
You should always make sure that
you know whether you're

683
00:38:04,340 --> 00:38:08,330
talking about a listing of the
symbols in the alphabet or a

684
00:38:08,330 --> 00:38:10,470
listing of the symbols
in a sequence.

685
00:38:10,470 --> 00:38:13,570
The symbols in a sequence can
all be the same, they can all

686
00:38:13,570 --> 00:38:14,930
be different.

687
00:38:14,930 --> 00:38:17,620
They can be anything at all.

688
00:38:17,620 --> 00:38:21,240
The listing of symbols in the
alphabet, there's just one for

689
00:38:21,240 --> 00:38:23,230
each symbol.

690
00:38:23,230 --> 00:38:26,700
We're going to talk about a very
simple case of uniquely

691
00:38:26,700 --> 00:38:31,330
decodable codes which are called
prefix-free codes.

692
00:38:31,330 --> 00:38:36,240
A code is prefix-free if no code
word is a prefix of any

693
00:38:36,240 --> 00:38:37,000
other code word.

694
00:38:37,000 --> 00:38:40,560
In other words, a code word is
a string of binary digits.

695
00:38:40,560 --> 00:38:44,710
A prefix of a string
of binary digits.

696
00:38:47,600 --> 00:38:53,070
For example, if we have the
binary string 1, 0, 1, 1, 1.

697
00:38:53,070 --> 00:38:55,760
What are the prefixes of that?

698
00:38:55,760 --> 00:39:06,790
Well, one prefix
is 1, 0, 1, 1.

699
00:39:06,790 --> 00:39:09,760
Another one is 1, 0, 1.

700
00:39:09,760 --> 00:39:11,900
Another one is 1, 0.

701
00:39:11,900 --> 00:39:13,250
Another is 1.

702
00:39:13,250 --> 00:39:16,940
In other words, it's what you
get by starting out at the

703
00:39:16,940 --> 00:39:19,530
beginning and not quite
getting to the end.

704
00:39:19,530 --> 00:39:22,620
All of these things are
called prefixes.

705
00:39:22,620 --> 00:39:26,000
If you want to be general you
could call 1, 0, 1, 1, 1, a

706
00:39:26,000 --> 00:39:28,130
prefix of itself.

707
00:39:28,130 --> 00:39:31,020
We won't bother to do that
because it just is -- that's

708
00:39:31,020 --> 00:39:33,180
the kind of things that
mathematicians do to save a

709
00:39:33,180 --> 00:39:35,560
few words in the proofs
that they give and we

710
00:39:35,560 --> 00:39:37,550
won't bother with that.

711
00:39:37,550 --> 00:39:41,390
We will rely a little more
on common sense.

712
00:39:41,390 --> 00:39:48,660
Incidentally, I prove a lot of
things in these notes here.

713
00:39:48,660 --> 00:39:52,550
I will ask you to prove
a lot of things.

714
00:39:52,550 --> 00:39:57,780
One of the questions that people
always have is what

715
00:39:57,780 --> 00:39:59,400
does a proof really mean?

716
00:39:59,400 --> 00:40:02,470
I mean what is a proof and
what isn't a proof?

717
00:40:02,470 --> 00:40:06,540
When you take mathematics
courses you get one idea of

718
00:40:06,540 --> 00:40:09,030
what a proof is, which
is appropriate

719
00:40:09,030 --> 00:40:10,860
for mathematics courses.

720
00:40:10,860 --> 00:40:14,860
Namely, you prove things using
the correct terminology for

721
00:40:14,860 --> 00:40:15,420
proving them.

722
00:40:15,420 --> 00:40:19,820
Namely, everything that you deal
with you define it ahead

723
00:40:19,820 --> 00:40:24,470
of time so that all of the
terminology you're using all

724
00:40:24,470 --> 00:40:27,120
has correct definitions.

725
00:40:27,120 --> 00:40:30,930
Then everything should follow
from those definitions and you

726
00:40:30,930 --> 00:40:34,240
should be able to follow a
proof through without any

727
00:40:34,240 --> 00:40:37,220
insight at all about
what is going on.

728
00:40:37,220 --> 00:40:39,750
You should be able to follow
a mathematical proof

729
00:40:39,750 --> 00:40:43,600
step-by-step without knowing
anything about what this is

730
00:40:43,600 --> 00:40:47,020
going to be used for, why
anybody is interested in it or

731
00:40:47,020 --> 00:40:50,230
anything else, and that's an
important thing to learn.

732
00:40:50,230 --> 00:40:52,960
That's not what we're
interested in here.

733
00:40:52,960 --> 00:40:55,210
What we're interested in
here for a proof --

734
00:40:55,210 --> 00:40:59,120
I mean yes, you know all of
the things around this

735
00:40:59,120 --> 00:41:02,120
particular proof that we're
dealing with, and what you're

736
00:41:02,120 --> 00:41:05,280
trying to do is to construct
a proof that covers

737
00:41:05,280 --> 00:41:07,880
all possible cases.

738
00:41:07,880 --> 00:41:10,270
You're going to use insight for
that, you're going to use

739
00:41:10,270 --> 00:41:15,320
common sense, you're going to
use whatever you have to use.

740
00:41:15,320 --> 00:41:21,350
And eventually you start to get
some sort of second sense

741
00:41:21,350 --> 00:41:23,900
about when you're leaving
something out that really

742
00:41:23,900 --> 00:41:25,910
should be there.

743
00:41:25,910 --> 00:41:28,410
That's what we're going to be
focusing on when we worry

744
00:41:28,410 --> 00:41:31,750
about trying to be
precise here.

745
00:41:31,750 --> 00:41:34,210
When I start proving things
about prefix codes, I think

746
00:41:34,210 --> 00:41:37,070
you'll see this because you will
look at it and say that's

747
00:41:37,070 --> 00:41:41,770
not a proof, and, in fact,
it really is a proof.

748
00:41:41,770 --> 00:41:44,620
Any good mathematician would
look at it and say yes, that

749
00:41:44,620 --> 00:41:46,130
is a proof.

750
00:41:46,130 --> 00:41:48,370
Bad mathematicians sometimes
look at it and say well, it

751
00:41:48,370 --> 00:41:52,170
doesn't look like proof so
it can't be a proof.

752
00:41:52,170 --> 00:41:53,990
But they are.

753
00:41:53,990 --> 00:41:56,360
So here we have prefix-free
codes.

754
00:41:56,360 --> 00:41:59,360
The definition is no code
word is a prefix of

755
00:41:59,360 --> 00:42:01,020
any other code word.

756
00:42:01,020 --> 00:42:04,510
If you have a prefix-free code,
you can express it in

757
00:42:04,510 --> 00:42:06,620
terms of a binary tree.

758
00:42:06,620 --> 00:42:10,570
Now a binary tree starts at a
root, this is the beginning,

759
00:42:10,570 --> 00:42:12,730
moves off to the right -- you
might have it start at the

760
00:42:12,730 --> 00:42:16,820
bottom and move up or whatever
direction you want to go in,

761
00:42:16,820 --> 00:42:19,400
it doesn't make any
difference.

762
00:42:19,400 --> 00:42:23,430
If you take the zero path
you come to some leaf.

763
00:42:23,430 --> 00:42:26,150
If you take the one path
you come to some

764
00:42:26,150 --> 00:42:28,760
intermediate node here.

765
00:42:28,760 --> 00:42:31,510
From the intermediate
node, you either go

766
00:42:31,510 --> 00:42:32,700
up or you go down.

767
00:42:32,700 --> 00:42:35,540
Namely, you have
a 1 or a zero.

768
00:42:35,540 --> 00:42:38,740
From this intermediate node
you go up and you go down.

769
00:42:38,740 --> 00:42:42,740
In other words, a binary tree,
every node in it is either an

770
00:42:42,740 --> 00:42:46,580
intermediate node, which means
there are two branches going

771
00:42:46,580 --> 00:42:50,330
out from it, or it's a leaf
which means there aren't any

772
00:42:50,330 --> 00:42:52,220
branches going out from it.

773
00:42:52,220 --> 00:42:56,480
You can't, in a binary tree,
have just one branch coming

774
00:42:56,480 --> 00:42:57,910
out of a node.

775
00:42:57,910 --> 00:43:00,950
There are either no branches
or two branches, just by

776
00:43:00,950 --> 00:43:04,590
definition of what we mean
by a binary tree --

777
00:43:04,590 --> 00:43:06,890
binary says two.

778
00:43:06,890 --> 00:43:13,320
So, here this tree corresponds
where we label

779
00:43:13,320 --> 00:43:15,160
various ones of the leafs.

780
00:43:15,160 --> 00:43:22,000
It corresponds to the code where
a corresponds to the

781
00:43:22,000 --> 00:43:27,930
string zero, b corresponds
to the string 1, 1, and c

782
00:43:27,930 --> 00:43:32,000
corresponds to the
string 1, 0, 1.

783
00:43:32,000 --> 00:43:35,460
Now you look at this and when
you look at the tree, when you

784
00:43:35,460 --> 00:43:37,970
look at this as a
code, it's not.

785
00:43:37,970 --> 00:43:41,650
Obvious that it's something
really stupid about it.

786
00:43:41,650 --> 00:43:44,600
When you look at the tree,
it's pretty obvious that

787
00:43:44,600 --> 00:43:49,840
there's something stupid about
it, because here we have this

788
00:43:49,840 --> 00:43:55,460
c here, which is sitting off on
this leaf, and here we have

789
00:43:55,460 --> 00:44:00,070
this leaf here which isn't doing
anything for us at all.

790
00:44:00,070 --> 00:44:03,690
We say gee, we could still keep
this prefix condition if

791
00:44:03,690 --> 00:44:09,030
we moved this into here
and we drop this off.

792
00:44:12,050 --> 00:44:15,790
So any time that there's
something hanging here without

793
00:44:15,790 --> 00:44:18,830
corresponding to a symbol,
you would really

794
00:44:18,830 --> 00:44:21,050
like to shorten it.

795
00:44:21,050 --> 00:44:24,320
When you shorten these things
and you can't shorten anything

796
00:44:24,320 --> 00:44:27,430
else, namely, when every leaf
has a symbol on it you call it

797
00:44:27,430 --> 00:44:29,220
a full tree.

798
00:44:29,220 --> 00:44:32,680
So a full tree is more than a
tree, a full tree is a code

799
00:44:32,680 --> 00:44:37,440
tree where the leaves correspond
to symbols.

800
00:44:37,440 --> 00:44:39,770
So a full tree has
no empty leaves.

801
00:44:39,770 --> 00:44:43,520
Empty leaves can be shortened
just like I showed you here,

802
00:44:43,520 --> 00:44:46,920
so we'll talk about full trees,
and full trees are sort

803
00:44:46,920 --> 00:44:47,990
of the good trees.

804
00:44:47,990 --> 00:44:53,120
But prefix-free codes don't
necessarily have to worry

805
00:44:53,120 --> 00:44:55,740
about that.

806
00:44:55,740 --> 00:45:00,870
Well, now I'm going to prove
something to you, and at this

807
00:45:00,870 --> 00:45:04,810
point you really should object,
but I don't care.

808
00:45:04,810 --> 00:45:06,320
We will come back
and you'll get

809
00:45:06,320 --> 00:45:07,940
straightened out on it later.

810
00:45:07,940 --> 00:45:10,600
I'm going to prove that
prefix-free codes are uniquely

811
00:45:10,600 --> 00:45:16,230
decodable, and you should cry
foul because I really haven't

812
00:45:16,230 --> 00:45:18,520
defined what uniquely
decodable means yet.

813
00:45:21,290 --> 00:45:23,590
You think you know what uniquely
decodable means,

814
00:45:23,590 --> 00:45:25,270
which is good.

815
00:45:25,270 --> 00:45:28,120
It means physically that you can
look at a string of code

816
00:45:28,120 --> 00:45:31,930
words and you can pick out
what all of them are.

817
00:45:31,930 --> 00:45:34,610
We will define it later
and you'll find out

818
00:45:34,610 --> 00:45:37,080
it's not that simple.

819
00:45:37,080 --> 00:45:40,310
As we move on, when we start
talking about Lempel Ziv codes

820
00:45:40,310 --> 00:45:41,740
and things like that.

821
00:45:41,740 --> 00:45:43,730
You will start to really
wonder what

822
00:45:43,730 --> 00:45:46,110
uniquely decodable means.

823
00:45:46,110 --> 00:45:49,030
So it's not quite as
simple as it looks.

824
00:45:49,030 --> 00:45:52,640
But anyway, let's prove that
prefix-free codes are uniquely

825
00:45:52,640 --> 00:45:56,800
decodable anyway, because
prefix-free codes are a

826
00:45:56,800 --> 00:46:00,810
particularly simple example of
uniquely decodable codes, and

827
00:46:00,810 --> 00:46:05,280
it's sort of clear that you
can, in fact, decode them

828
00:46:05,280 --> 00:46:08,230
because of one of the properties
that they have.

829
00:46:08,230 --> 00:46:11,790
The way we're going to prove
this is we want to look at a

830
00:46:11,790 --> 00:46:15,710
sequence of symbols or a string
of symbols that come

831
00:46:15,710 --> 00:46:18,250
out of the source.

832
00:46:18,250 --> 00:46:23,110
As that string of symbols come
out of the source, each symbol

833
00:46:23,110 --> 00:46:29,080
in the string gets mapped into
a binary string, and then we

834
00:46:29,080 --> 00:46:32,670
concatenate all those binary
strings together.

835
00:46:32,670 --> 00:46:34,180
That's a big mouthful.

836
00:46:34,180 --> 00:46:39,600
So let's look at this code we
were just talking about where

837
00:46:39,600 --> 00:46:45,010
the code words are b, c and a.

838
00:46:45,010 --> 00:46:50,060
So if a 1 comes out of the
source and then another 1, it

839
00:46:50,060 --> 00:46:52,520
corresponds to the
first letter b.

840
00:46:52,520 --> 00:46:55,050
If a 1, zero comes out,
it corresponds to the

841
00:46:55,050 --> 00:46:56,410
first letter c.

842
00:46:56,410 --> 00:47:00,170
If a zero comes out, that
corresponds to the letter a.

843
00:47:00,170 --> 00:47:04,540
Well now the second symbol comes
in and what happens on

844
00:47:04,540 --> 00:47:08,580
that second symbol is if the
first symbol was an a, the

845
00:47:08,580 --> 00:47:14,150
second symbol could be a b or a
c or an a, which gives rise

846
00:47:14,150 --> 00:47:15,880
to this little sub-tree here.

847
00:47:19,060 --> 00:47:22,700
If the first letter is a b,
the second letter could be

848
00:47:22,700 --> 00:47:26,720
either an a, b or a c, which
gives rise to this little

849
00:47:26,720 --> 00:47:29,030
sub-tree here.

850
00:47:29,030 --> 00:47:33,480
If we have a c followed by
anything, that gives rise to

851
00:47:33,480 --> 00:47:36,300
this little sub-tree here.

852
00:47:36,300 --> 00:47:40,370
You can imagine growing this
tree as far as you want to,

853
00:47:40,370 --> 00:47:42,950
although it gets hard
to write down.

854
00:47:42,950 --> 00:47:45,420
How do you decode this?

855
00:47:45,420 --> 00:47:50,290
Well, as many things, you want
to start at the beginning, and

856
00:47:50,290 --> 00:47:53,270
we know where the
beginning is.

857
00:47:53,270 --> 00:47:56,260
That's a basic assumption on
all of this source coding.

858
00:47:56,260 --> 00:47:59,530
So knowing where the beginning
is, you sit there and you look

859
00:47:59,530 --> 00:48:04,050
at it, and you see a zero as
the first letter as a first

860
00:48:04,050 --> 00:48:09,690
binary digit, and zero says I
move this way in the tree, and

861
00:48:09,690 --> 00:48:15,440
presto, I say gee, an a must
have occurred as the first

862
00:48:15,440 --> 00:48:17,940
source letter.

863
00:48:17,940 --> 00:48:19,070
So what do I do?

864
00:48:19,070 --> 00:48:23,900
I remove the a, I print out a,
and then I start to look at

865
00:48:23,900 --> 00:48:25,070
this point.

866
00:48:25,070 --> 00:48:29,730
At this point I'm back where I
started at, so if I can decode

867
00:48:29,730 --> 00:48:31,470
the first letter,
I can certainly

868
00:48:31,470 --> 00:48:34,050
decode everything else.

869
00:48:34,050 --> 00:48:37,000
If the first letter is a
b, what I see is a 1

870
00:48:37,000 --> 00:48:38,710
followed by a 1.

871
00:48:38,710 --> 00:48:43,840
Namely, when I see the first
binary 1 come out of the

872
00:48:43,840 --> 00:48:46,660
channel, I don't know
what was said.

873
00:48:46,660 --> 00:48:48,940
I know either a b
or c was sent.

874
00:48:48,940 --> 00:48:52,800
I have to look at the second
letter, the second binary

875
00:48:52,800 --> 00:48:55,230
digit resolves my confusion.

876
00:48:55,230 --> 00:48:59,020
I know that the first source
letter was in a b, if it's 1

877
00:48:59,020 --> 00:49:01,940
1, or a c, if it's 1 zero.

878
00:49:01,940 --> 00:49:05,610
I decode that first source
letter and then where am I?

879
00:49:05,610 --> 00:49:09,610
I'm either on this tree or on
this tree, each of which goes

880
00:49:09,610 --> 00:49:13,490
extending off into the
wild blue yonder.

881
00:49:13,490 --> 00:49:18,410
So this says if I know where the
beginning is, I can decode

882
00:49:18,410 --> 00:49:19,940
the first letter.

883
00:49:19,940 --> 00:49:23,510
But if I can decode the first
letter, I know where the

884
00:49:23,510 --> 00:49:26,710
beginning is for everything
else.

885
00:49:26,710 --> 00:49:30,570
Therefore, I can decode
that also.

886
00:49:30,570 --> 00:49:33,210
Well, aside from any small
amount of confusion about what

887
00:49:33,210 --> 00:49:36,540
uniquely decodable means,
that's a perfectly fine

888
00:49:36,540 --> 00:49:39,120
mathematical proof.

889
00:49:39,120 --> 00:49:45,010
So, prefix-free codes are, in
fact, uniquely decodable and

890
00:49:45,010 --> 00:49:47,170
that's nice.

891
00:49:47,170 --> 00:49:48,550
So then there's a question.

892
00:49:51,670 --> 00:50:01,290
What is the condition on the
lengths of a prefix-free code

893
00:50:01,290 --> 00:50:03,270
which allow you to have
unique decodability?

894
00:50:06,020 --> 00:50:11,650
The Kraft inequality is a test
on whether there are

895
00:50:11,650 --> 00:50:16,850
prefix-free codes or there
are not prefix-free codes

896
00:50:16,850 --> 00:50:20,340
connected with any given set
a code word lengths.

897
00:50:20,340 --> 00:50:23,890
This is a very interesting
enough inequality.

898
00:50:23,890 --> 00:50:27,350
This is one of the relatively
few things in information

899
00:50:27,350 --> 00:50:30,780
theory that was not invented
by Claude Shannon.

900
00:50:30,780 --> 00:50:33,670
You sit there and you wonder
why didn't Claude Shannon

901
00:50:33,670 --> 00:50:35,300
realize this?

902
00:50:35,300 --> 00:50:38,820
Well, it's because I
think he sort of

903
00:50:38,820 --> 00:50:41,230
realized that it was trivial.

904
00:50:41,230 --> 00:50:44,030
He sort of understood it and he
was really eager to get on

905
00:50:44,030 --> 00:50:47,540
to the meat of things, which is
unusual for him because he

906
00:50:47,540 --> 00:50:52,140
was somebody, more than anyone
else I know, who really

907
00:50:52,140 --> 00:50:54,900
understood why you should
understand the simple things

908
00:50:54,900 --> 00:50:57,660
before you go on to the more
complicated thing.

909
00:50:57,660 --> 00:50:59,890
But anyway, he missed this.

910
00:50:59,890 --> 00:51:02,930
Bob Fano, who some of you
might know, who was a

911
00:51:02,930 --> 00:51:08,280
professor emeritus over in
LCS, was interested in

912
00:51:08,280 --> 00:51:09,310
information theory.

913
00:51:09,310 --> 00:51:12,030
Then he was teaching a graduate
course back in the

914
00:51:12,030 --> 00:51:18,280
'50s here at MIT, and as he
often did, he threw out these

915
00:51:18,280 --> 00:51:21,420
problems and said nobody knows
how to figure this out.

916
00:51:21,420 --> 00:51:25,650
How kinds of lengths can you
have on prefix-free codes, and

917
00:51:25,650 --> 00:51:28,700
what kinds of lengths
can't you have?

918
00:51:28,700 --> 00:51:31,880
Kraft was a graduate student
at the time.

919
00:51:31,880 --> 00:51:36,330
The next day he came in with
this beautiful, elegant proof

920
00:51:36,330 --> 00:51:40,850
and everybody's always known who
Kraft is ever since then.

921
00:51:40,850 --> 00:51:44,170
Nobody's ever known what
he did after that.

922
00:51:44,170 --> 00:51:46,200
But at least he made his
mark on the world

923
00:51:46,200 --> 00:51:47,640
as a graduate student.

924
00:51:47,640 --> 00:51:53,500
So, in a sense, those were
good days to be around,

925
00:51:53,500 --> 00:51:58,670
because all the obvious things
hadn't been done yet.

926
00:51:58,670 --> 00:52:01,180
But the other thing is you never
know what the obvious

927
00:52:01,180 --> 00:52:04,060
things are until you do them.

928
00:52:04,060 --> 00:52:06,940
This didn't look like an obvious
problem ahead of time.

929
00:52:06,940 --> 00:52:09,520
Don't talk about a number of
other obvious things that cuts

930
00:52:09,520 --> 00:52:11,980
off, because somebody was
looking at it in a slightly

931
00:52:11,980 --> 00:52:15,340
different way than other people
were looking at it.

932
00:52:15,340 --> 00:52:19,440
You see, back then people said
we want to look at these

933
00:52:19,440 --> 00:52:24,370
variable length codes because
we want to have some

934
00:52:24,370 --> 00:52:29,130
capability of mapping improbable
symbols into long

935
00:52:29,130 --> 00:52:33,820
code words and probable symbols
into short code words.

936
00:52:33,820 --> 00:52:36,420
You'll notice that I've done
something strange here.

937
00:52:36,420 --> 00:52:39,360
That was our motivation for
looking at variable length

938
00:52:39,360 --> 00:52:43,360
codes, but I haven't said a
thing about probability.

939
00:52:43,360 --> 00:52:46,530
All I'm dealing with now is
the question of what is

940
00:52:46,530 --> 00:52:49,850
possible and what
is not possible.

941
00:52:49,850 --> 00:52:52,770
We'll bring in probability
later, but now all we're

942
00:52:52,770 --> 00:52:55,840
trying to figure out is what
are the sets of code word

943
00:52:55,840 --> 00:52:58,580
lengths you can use, and what
are the sets of code word

944
00:52:58,580 --> 00:53:01,350
lengths you can't use.

945
00:53:01,350 --> 00:53:04,530
So what Kraft said is every
prefix-free code for an

946
00:53:04,530 --> 00:53:10,090
alphabet x with code word
lengths l of x for each letter

947
00:53:10,090 --> 00:53:15,950
in the alphabet x satisfies the
sum 2 to the minus length

948
00:53:15,950 --> 00:53:17,840
less than or equal to 1.

949
00:53:17,840 --> 00:53:21,050
In other words, you take all
of the code words in the

950
00:53:21,050 --> 00:53:26,640
alphabet, you take the length
of each of those code words,

951
00:53:26,640 --> 00:53:31,530
you take 2 to the minus
l of that length.

952
00:53:31,530 --> 00:53:35,950
And if this inequality is not
satisfied, your code does not

953
00:53:35,950 --> 00:53:40,640
satisfy the prefix condition,
there's no way you can create

954
00:53:40,640 --> 00:53:46,280
a prefix-free code which
has these lengths, so

955
00:53:46,280 --> 00:53:47,360
you're out of luck.

956
00:53:47,360 --> 00:53:50,390
So you better create a new set
of lengths which satisfies

957
00:53:50,390 --> 00:53:51,720
this inequality.

958
00:53:51,720 --> 00:53:54,420
There's also a simple procedure
you can go through

959
00:53:54,420 --> 00:53:58,350
which lets you construct a code
which has these lengths.

960
00:53:58,350 --> 00:54:01,100
So, in other words, this, in
a sense, is a necessary and

961
00:54:01,100 --> 00:54:05,600
sufficient condition on the
possibility of constructing

962
00:54:05,600 --> 00:54:08,080
codes with a particular
set of lengths.

963
00:54:08,080 --> 00:54:11,600
It has nothing to do
with probability.

964
00:54:11,600 --> 00:54:15,520
So it's, in a sense, cleaner
than these other results.

965
00:54:18,500 --> 00:54:22,000
So, conversely, if this
inequality is satisfied, you

966
00:54:22,000 --> 00:54:25,850
can construct a prefix-free
code, and even more strangely,

967
00:54:25,850 --> 00:54:29,270
you can construct it very, very
easily, as we'll see.

968
00:54:29,270 --> 00:54:32,920
Finally, a prefix-free code is
full -- you remember what a

969
00:54:32,920 --> 00:54:34,470
full prefix-free code is?

970
00:54:34,470 --> 00:54:39,180
It's a code where the tree has
nothing that's unused if and

971
00:54:39,180 --> 00:54:43,850
only if this inequality is
satisfied with a quality.

972
00:54:43,850 --> 00:54:47,250
So it's a neat result.

973
00:54:47,250 --> 00:54:52,520
It's useful in a lot of places
other than source coding.

974
00:54:52,520 --> 00:54:55,420
If you ever get involved with
designing protocols for

975
00:54:55,420 --> 00:54:59,500
computer networks or protocols
for any kind of computer

976
00:54:59,500 --> 00:55:02,460
communication, you'll find
that you use this all the

977
00:55:02,460 --> 00:55:06,320
time, because this says you can
do some things, you can't

978
00:55:06,320 --> 00:55:08,590
do other things.

979
00:55:08,590 --> 00:55:11,590
So let's see why it's true.

980
00:55:11,590 --> 00:55:14,750
I'll give you another funny
proof that doesn't look like a

981
00:55:14,750 --> 00:55:17,090
proof but it really is.

982
00:55:17,090 --> 00:55:22,680
What I'm going to do is to
associate code words with base

983
00:55:22,680 --> 00:55:23,930
2 expansions.

984
00:55:26,330 --> 00:55:29,400
There's a little Genie that
early in the morning leaves

985
00:55:29,400 --> 00:55:33,090
things out of these slides
when I make them.

986
00:55:33,090 --> 00:55:36,410
It wasn't me, I put it in.

987
00:55:36,410 --> 00:55:38,810
So we're going to prove this by
associating code words with

988
00:55:38,810 --> 00:55:42,760
base 2 expansions, which are
like decimals, but decimals to

989
00:55:42,760 --> 00:55:44,070
the base 2.

990
00:55:44,070 --> 00:55:50,180
In other words, we're going to
take a code word, y1, y2 up to

991
00:55:50,180 --> 00:55:54,510
y sub m where y1 is a binary
digit, y2 is a binary digit.

992
00:55:54,510 --> 00:55:57,870
This is a string of binary
digits, and we're going to

993
00:55:57,870 --> 00:56:00,510
represent this as
a real number.

994
00:56:00,510 --> 00:56:03,490
The real number is the decimal,
but it's not a

995
00:56:03,490 --> 00:56:11,160
decimal, it's a becimal, if you
will, which is dot y1, y2

996
00:56:11,160 --> 00:56:12,130
up to y sub m.

997
00:56:12,130 --> 00:56:17,430
Which means y1 over 2 plus y2
over 4 plus dot dot dot plus y

998
00:56:17,430 --> 00:56:20,030
sub m over 2 to the minus m.

999
00:56:20,030 --> 00:56:24,850
If you think of it, an ordinary
becimal, y1, y2 up to

1000
00:56:24,850 --> 00:56:31,770
y sub m, means y1 over 10 plus
y2 over 100 plus y3 over 1,000

1001
00:56:31,770 --> 00:56:32,800
and so forth.

1002
00:56:32,800 --> 00:56:38,200
So this is what people would
have developed for decimals

1003
00:56:38,200 --> 00:56:42,700
if, in fact, we lived in
a base 2 world instead

1004
00:56:42,700 --> 00:56:44,770
of a base 10 world.

1005
00:56:44,770 --> 00:56:47,680
If you were born without fingers
and you only had two

1006
00:56:47,680 --> 00:56:51,130
fingers, this is the number
system you would use.

1007
00:56:54,400 --> 00:56:56,040
When we think about
decimals there's

1008
00:56:56,040 --> 00:56:57,530
something more involved.

1009
00:56:57,530 --> 00:57:01,740
We use decimals all the time
to approximate things.

1010
00:57:01,740 --> 00:57:09,840
Namely, if I say that a number
is 0.12, I don't mean usually

1011
00:57:09,840 --> 00:57:13,540
that it's exactly 12
one hundredths.

1012
00:57:13,540 --> 00:57:17,340
Usually I mean it's about
12 one hundredths.

1013
00:57:17,340 --> 00:57:21,340
The easiest way to do this is
to round things down to two

1014
00:57:21,340 --> 00:57:22,440
decimal points.

1015
00:57:22,440 --> 00:57:26,210
In other words, when I say 0.12,
what I really mean is I

1016
00:57:26,210 --> 00:57:30,050
am talking about a real number
which lies between 12 one

1017
00:57:30,050 --> 00:57:32,730
hundredths and 13
one hundredths.

1018
00:57:32,730 --> 00:57:35,570
It's greater than or equal to
12 one hundredths and it's

1019
00:57:35,570 --> 00:57:37,650
less than 13 one hundredths.

1020
00:57:37,650 --> 00:57:40,740
I'll do the same thing
in base 2.

1021
00:57:40,740 --> 00:57:43,570
As soon as I do this you'll see
where the Kraft inequality

1022
00:57:43,570 --> 00:57:44,820
comes from.

1023
00:57:47,070 --> 00:57:50,810
So I'm going to have this
interval here, which the

1024
00:57:50,810 --> 00:57:57,730
interval associated with a
binary expansion to m digits,

1025
00:57:57,730 --> 00:57:59,720
there's a number associated
with it which

1026
00:57:59,720 --> 00:58:02,230
is this number here.

1027
00:58:02,230 --> 00:58:04,840
There's also an interval
associated with it, which is 2

1028
00:58:04,840 --> 00:58:06,570
to the minus m.

1029
00:58:06,570 --> 00:58:19,450
So if I have a code consisting
of 0, 0, 0, 1 and 1, what I'm

1030
00:58:19,450 --> 00:58:24,870
going to do is represent zero
zero as a binary expansion, so

1031
00:58:24,870 --> 00:58:30,450
0, 0, is a binary expansion
is 0.00, which is zero.

1032
00:58:30,450 --> 00:58:35,290
But also as an approximation
it's between zero and 1/4.

1033
00:58:35,290 --> 00:58:39,330
So I have this interval
associated with 0, 0, which is

1034
00:58:39,330 --> 00:58:43,690
the interval from
zero up to 1/4.

1035
00:58:43,690 --> 00:58:47,660
For the code word zero 1, if I'm
trying to see whether that

1036
00:58:47,660 --> 00:58:53,750
is part of a prefix code, I map
it into a number, 0.01 as

1037
00:58:53,750 --> 00:58:55,460
a binary expansion.

1038
00:58:55,460 --> 00:59:01,460
This number corresponds to the
number 1/4, and it also

1039
00:59:01,460 --> 00:59:06,060
corresponds into sub length 2
to an interval of size 1/4.

1040
00:59:06,060 --> 00:59:10,530
So we go from 1/4 up to 1/2.

1041
00:59:10,530 --> 00:59:17,170
Finally, I have 1, which
corresponds to the number 1/2,

1042
00:59:17,170 --> 00:59:20,600
and since it's only one binary
digit long, it corresponds to

1043
00:59:20,600 --> 00:59:22,460
the interval 1/2 to 1.

1044
00:59:22,460 --> 00:59:26,890
Namely, if I truncate thing to
one binary digit, I'm talking

1045
00:59:26,890 --> 00:59:29,380
about the entire interval
from 1/2 to 1.

1046
00:59:33,600 --> 00:59:38,330
So where does the Kraft
inequality come from and what

1047
00:59:38,330 --> 00:59:39,660
does it have to do with this?

1048
00:59:39,660 --> 00:59:43,560
Incidentally, this isn't the
way that Kraft proved it.

1049
00:59:43,560 --> 00:59:45,570
Kraft was very smart.

1050
00:59:45,570 --> 00:59:50,200
He did this as his Master's
thesis, too, I believe, and

1051
00:59:50,200 --> 00:59:52,800
since he wanted it to be his
Master's thesis he didn't want

1052
00:59:52,800 --> 00:59:55,960
to make it look quite that
trivial or Bob Fano would have

1053
00:59:55,960 --> 00:59:58,200
said oh, you ought to do
something else for a Master's

1054
00:59:58,200 --> 00:59:59,730
thesis also.

1055
00:59:59,730 --> 01:00:04,270
So he was cagey and made his
proof look a little more

1056
01:00:04,270 --> 01:00:05,810
complicated.

1057
01:00:05,810 --> 01:00:10,350
So, if a code word x is a prefix
of code word y, in

1058
01:00:10,350 --> 01:00:16,950
other words, y has some binary
expansion, x has some binary

1059
01:00:16,950 --> 01:00:21,920
expansion which is the first
few letters of y.

1060
01:00:21,920 --> 01:00:25,550
Then the number corresponding
to x and the interval

1061
01:00:25,550 --> 01:00:32,240
corresponding to x, namely, x
covers that entire range of

1062
01:00:32,240 --> 01:00:37,140
decimal expansions which start
with x and goes up to

1063
01:00:37,140 --> 01:00:43,820
something which differs from x
only in that mth binary digit.

1064
01:00:43,820 --> 01:00:46,940
In other words, let me
show you what that

1065
01:00:46,940 --> 01:00:52,660
means in terms of here.

1066
01:00:52,660 --> 01:01:01,990
If I tried to create a code word
0, 0, 0, 1, 0, 0, 0, 1

1067
01:01:01,990 --> 01:01:04,880
would correspond to
the number 1/16.

1068
01:01:08,970 --> 01:01:13,540
1/16 lies in that
interval there.

1069
01:01:13,540 --> 01:01:16,980
In other words, any time I
create a code word which lies

1070
01:01:16,980 --> 01:01:20,830
in the interval corresponding to
another code word, it means

1071
01:01:20,830 --> 01:01:26,200
that this code word has a prefix
of that code word.

1072
01:01:26,200 --> 01:01:26,990
Sure enough it does --

1073
01:01:26,990 --> 01:01:31,530
0, 0, 0, 1, this has
this as a prefix.

1074
01:01:31,530 --> 01:01:34,240
In other words, there is a
perfect mapping between

1075
01:01:34,240 --> 01:01:39,060
intervals associated
with code words and

1076
01:01:39,060 --> 01:01:42,140
prefixes of code words.

1077
01:01:42,140 --> 01:01:47,570
So in other words, if we have
a prefix-free code, the

1078
01:01:47,570 --> 01:01:52,740
intervals for each of these code
words has to be distinct.

1079
01:01:52,740 --> 01:01:56,610
Well, now we're in nice shape
because we know what the size

1080
01:01:56,610 --> 01:01:58,610
of each of these intervals is.

1081
01:01:58,610 --> 01:02:01,630
The size of the interval
associated with a code word of

1082
01:02:01,630 --> 01:02:06,610
length 2 is 2 to the minus 2.

1083
01:02:06,610 --> 01:02:08,450
To be a prefix-free
code, all these

1084
01:02:08,450 --> 01:02:11,430
intervals have to be disjoint.

1085
01:02:11,430 --> 01:02:15,030
But everything is contained here
between zero and 1, and

1086
01:02:15,030 --> 01:02:17,920
therefore, when we add up all
of these intervals we get a

1087
01:02:17,920 --> 01:02:20,050
number which is at most 1.

1088
01:02:22,970 --> 01:02:24,510
That's the Kraft inequality.

1089
01:02:24,510 --> 01:02:28,450
That's all there is to it.

1090
01:02:28,450 --> 01:02:30,410
There was one more
thing in it.

1091
01:02:30,410 --> 01:02:34,970
It's a full code if and only
if the Kraft inequality is

1092
01:02:34,970 --> 01:02:36,780
satisfied with a quality.

1093
01:02:39,470 --> 01:02:40,720
Where was that?

1094
01:02:43,270 --> 01:02:46,660
The code is full if and only
if the expansion intervals

1095
01:02:46,660 --> 01:02:48,470
fill up zero and 1.

1096
01:02:48,470 --> 01:02:58,910
In other words, suppose this was
1 zero, which would lead

1097
01:02:58,910 --> 01:03:09,520
into 0.1 with an interval 1/2
to 3/4, and this was all you

1098
01:03:09,520 --> 01:03:14,830
had, then this interval up here
would be empty, and, in

1099
01:03:14,830 --> 01:03:17,540
fact, since this interval
is empty you could

1100
01:03:17,540 --> 01:03:20,200
shorten the code down.

1101
01:03:20,200 --> 01:03:24,290
In other words, you'd have
intervals which weren't full

1102
01:03:24,290 --> 01:03:27,390
which means that you would have
code words that could be

1103
01:03:27,390 --> 01:03:30,240
put in there which
are not there.

1104
01:03:30,240 --> 01:03:33,060
So, that completes the proof.

1105
01:03:42,090 --> 01:03:44,570
So now finally, it's time to
define unique decodability.

1106
01:03:48,630 --> 01:03:53,910
The definition in the notes is
a mouthful, so I broke it

1107
01:03:53,910 --> 01:03:58,290
apart into a bunch of different
pieces here.

1108
01:03:58,290 --> 01:04:04,670
A code c for a discrete source
is uniquely decodable if for

1109
01:04:04,670 --> 01:04:09,370
each string of source letters,
x1 up to x sub m, these are

1110
01:04:09,370 --> 01:04:12,100
not distinct letters of the
alphabet, these are just the

1111
01:04:12,100 --> 01:04:15,390
things that might come out of
the source. x1 could be the

1112
01:04:15,390 --> 01:04:18,510
same as x2, it could be
different from x2.

1113
01:04:18,510 --> 01:04:22,800
If all of these letters coming
out of the source, that

1114
01:04:22,800 --> 01:04:27,810
corresponds to some
concatenation of these code

1115
01:04:27,810 --> 01:04:34,130
words, namely, c of x1, c of
x2 up to c of x sub m.

1116
01:04:34,130 --> 01:04:38,220
So I have this coming out of the
source, this is a string

1117
01:04:38,220 --> 01:04:42,900
of binary digits that come out
corresponding to this, and I

1118
01:04:42,900 --> 01:04:46,870
require that this differs from
the concatenation of the code

1119
01:04:46,870 --> 01:04:50,750
words c of x1 prime up
to c of xm prime.

1120
01:04:50,750 --> 01:04:54,850
For any other string, x1 prime
x2 prime, x of m prime of

1121
01:04:54,850 --> 01:04:56,060
source letters.

1122
01:04:56,060 --> 01:04:59,740
Example of this, the thing
that we were trying to

1123
01:04:59,740 --> 01:05:11,910
construct before c of a equals
1c of b equals zero, c of c

1124
01:05:11,910 --> 01:05:20,950
equals 1 zero, doesn't work
because the concatenation of a

1125
01:05:20,950 --> 01:05:26,940
and b yields 1 zero, c of
x1 -- take x1 to be a,

1126
01:05:26,940 --> 01:05:28,680
take x2 to be b.

1127
01:05:28,680 --> 01:05:36,630
This concatenation, c of x1,
c of x2 is c of a, c

1128
01:05:36,630 --> 01:05:40,770
of b equals 1 zero.

1129
01:05:40,770 --> 01:05:46,200
C of c equals 1 zero, and
therefore, you don't have

1130
01:05:46,200 --> 01:05:47,110
something that works.

1131
01:05:47,110 --> 01:05:52,150
Note that n here can be
different from m here.

1132
01:05:52,150 --> 01:05:55,550
You'll deal with that in the
homework a little bit, not

1133
01:05:55,550 --> 01:05:58,300
this week's set.

1134
01:05:58,300 --> 01:06:02,490
But that's what unique
decodability says.

1135
01:06:06,540 --> 01:06:08,230
Let me give you an example.

1136
01:06:15,760 --> 01:06:17,010
Here's an example.

1137
01:06:20,430 --> 01:06:25,170
Turns out that all uniquely
decodable codes have to

1138
01:06:25,170 --> 01:06:28,580
satisfy the Kraft
inequality also.

1139
01:06:28,580 --> 01:06:31,150
Kraft didn't prove this.

1140
01:06:31,150 --> 01:06:35,440
In fact, it's a bit of a
bear to prove it, and

1141
01:06:35,440 --> 01:06:37,540
we'll prove it later.

1142
01:06:37,540 --> 01:06:41,790
I suspect that about 2/3 of you
will see the proof and say

1143
01:06:41,790 --> 01:06:45,550
ugh, and 1/3 of you will say
oh, this is really, really

1144
01:06:45,550 --> 01:06:47,720
interesting.

1145
01:06:47,720 --> 01:06:51,870
I sort of say gee, this is
interesting sometimes, and

1146
01:06:51,870 --> 01:06:56,240
more often I say ugh, why
do we have to do this?

1147
01:06:56,240 --> 01:07:03,420
But one example of a code which
is uniquely decodable is

1148
01:07:03,420 --> 01:07:07,660
first code word is 1, second
code word is 1, 0, third is 1,

1149
01:07:07,660 --> 01:07:12,850
0, 0, and the fourth
is 1, 0, 0, 0.

1150
01:07:12,850 --> 01:07:15,760
It doesn't satisfy the Kraft
inequality with the quality,

1151
01:07:15,760 --> 01:07:18,110
it satisfies it with
inequality.

1152
01:07:18,110 --> 01:07:19,980
It is uniquely decodable.

1153
01:07:19,980 --> 01:07:21,850
How do I know it's uniquely
decodable by

1154
01:07:21,850 --> 01:07:24,290
just looking at it?

1155
01:07:24,290 --> 01:07:26,550
Because any time I see
a 1 I know it's the

1156
01:07:26,550 --> 01:07:28,070
beginning of a code word.

1157
01:07:28,070 --> 01:07:31,450
So I look at some along binary
string, it starts out with the

1158
01:07:31,450 --> 01:07:36,270
1, I just read digits till it
comes to the next one, I say

1159
01:07:36,270 --> 01:07:39,690
ah-ha, that next 1 is the first
binary digit in the

1160
01:07:39,690 --> 01:07:44,590
second code word, the third 1
that I see is the first digit

1161
01:07:44,590 --> 01:07:47,570
in the third code word
and so forth.

1162
01:07:47,570 --> 01:07:51,040
You might say why don't I make
the 1 the end of the code word

1163
01:07:51,040 --> 01:07:53,610
instead of the beginning of the
code word and then we'll

1164
01:07:53,610 --> 01:07:57,410
have the prefix condition
again.

1165
01:07:57,410 --> 01:08:02,960
All I can say is because I want
to be perverse and I want

1166
01:08:02,960 --> 01:08:06,240
to give you an example of
something that is uniquely

1167
01:08:06,240 --> 01:08:08,610
decodable but doesn't satisfy
the Kraft inequality.

1168
01:08:11,180 --> 01:08:12,040
So it's a question.

1169
01:08:12,040 --> 01:08:15,120
Why don't we just stick the
prefix-free codes and forget

1170
01:08:15,120 --> 01:08:18,160
about unique decodability?

1171
01:08:18,160 --> 01:08:21,880
You won't understand the answer
to that really until we

1172
01:08:21,880 --> 01:08:25,510
start looking at things like
Lempel Ziv codes, which are,

1173
01:08:25,510 --> 01:08:29,850
in fact, a bunch of different
things all put together which

1174
01:08:29,850 --> 01:08:33,680
are, in fact, very, very
practical codes.

1175
01:08:33,680 --> 01:08:36,990
But they're not prefix-free
codes, and you'll see why

1176
01:08:36,990 --> 01:08:40,550
they're not prefix-free codes
when we study them.

1177
01:08:40,550 --> 01:08:44,060
Then you will see why we want
to have a definition of

1178
01:08:44,060 --> 01:08:46,500
something which is more
involved than that.

1179
01:08:46,500 --> 01:08:48,950
So don't worry about that
for the time being.

1180
01:08:48,950 --> 01:08:52,380
For the time being, the correct
idea to take away from

1181
01:08:52,380 --> 01:08:57,280
this is that why not just use
prefix-free codes, and the

1182
01:08:57,280 --> 01:09:00,560
answer is for quite
a while we will.

1183
01:09:00,560 --> 01:09:04,440
We know that anything we can do
with prefix-free codes we

1184
01:09:04,440 --> 01:09:07,180
can also do with uniquely
decodable codes, anything we

1185
01:09:07,180 --> 01:09:10,610
can do with uniquely decodable
codes, we can do with

1186
01:09:10,610 --> 01:09:11,640
prefix-free codes.

1187
01:09:11,640 --> 01:09:16,490
Namely, any old code that you
invent has like certain set of

1188
01:09:16,490 --> 01:09:21,210
lengths associated with the
code words, and if it

1189
01:09:21,210 --> 01:09:24,650
satisfies the Kraft inequality,
you can easily

1190
01:09:24,650 --> 01:09:28,490
develop a prefix-free code which
has those lengths and

1191
01:09:28,490 --> 01:09:31,410
you might as well do it because
then it makes the

1192
01:09:31,410 --> 01:09:34,100
coding a lot easier.

1193
01:09:34,100 --> 01:09:37,530
Namely, if we have a prefix-free
code -- let's go

1194
01:09:37,530 --> 01:09:40,690
back and look at that because
I never mentioned it and it

1195
01:09:40,690 --> 01:09:45,240
really is one of the important
advantages

1196
01:09:45,240 --> 01:09:47,960
of prefix-free codes.

1197
01:09:47,960 --> 01:09:51,360
When I look at this picture and
I look at the proof of how

1198
01:09:51,360 --> 01:09:55,570
I saw that this was uniquely
decodable, what we said was

1199
01:09:55,570 --> 01:10:00,930
you start at the beginning and
as soon as the decoder sees

1200
01:10:00,930 --> 01:10:06,400
the last binary digit of a code
word, the decoder can say

1201
01:10:06,400 --> 01:10:10,340
ah-ah, it's that code word.

1202
01:10:10,340 --> 01:10:13,120
So it's instantaneously
decodable.

1203
01:10:13,120 --> 01:10:15,940
In other words, all you need to
see is the end of the code

1204
01:10:15,940 --> 01:10:19,710
word and at that point you
know it's the end.

1205
01:10:19,710 --> 01:10:23,310
Incidentally, that makes
figuring out when you have a

1206
01:10:23,310 --> 01:10:25,880
long sequence of code words and
you want to stop the whole

1207
01:10:25,880 --> 01:10:28,730
thing, it makes things
a little bit easier.

1208
01:10:28,730 --> 01:10:32,610
This example we started
out with of --

1209
01:10:32,610 --> 01:10:35,180
I can't find it anymore --

1210
01:10:35,180 --> 01:10:40,200
but the example of a uniquely
decodable, but non-prefix-free

1211
01:10:40,200 --> 01:10:43,710
code, you always had to look at
the first digit of the next

1212
01:10:43,710 --> 01:10:47,460
code word to know that the old
code word was finished.

1213
01:10:47,460 --> 01:10:52,210
So, prefix-free codes have
that advantage also.

1214
01:10:56,140 --> 01:11:00,650
The next topic that we're going
to take up is discrete

1215
01:11:00,650 --> 01:11:02,020
memoryless sources.

1216
01:11:02,020 --> 01:11:05,690
Namely, at this point we have
gone as far as we can in

1217
01:11:05,690 --> 01:11:09,600
studying prefix-free codes and
uniquely decodable codes

1218
01:11:09,600 --> 01:11:13,410
strictly in terms of their
non-probabalistic properties.

1219
01:11:13,410 --> 01:11:17,420
Namely, the question of what set
of lengths can you use in

1220
01:11:17,420 --> 01:11:21,380
a prefix-free code or uniquely
decodable code, and what sets

1221
01:11:21,380 --> 01:11:22,920
of lengths can't you use.

1222
01:11:22,920 --> 01:11:26,380
So the next thing we want to do
is to start looking at the

1223
01:11:26,380 --> 01:11:30,060
probabilities of these different
symbols and looking

1224
01:11:30,060 --> 01:11:32,580
at the probabilities of
the different symbols.

1225
01:11:32,580 --> 01:11:36,500
We want to find out what sort of
lengths we want to choose.

1226
01:11:39,480 --> 01:11:42,430
There will be a simple
answer to that.

1227
01:11:42,430 --> 01:11:45,370
In fact, there'll be two ways of
looking at it, one of which

1228
01:11:45,370 --> 01:11:48,350
will lead to the idea of
entropy, and the other which

1229
01:11:48,350 --> 01:11:51,610
will lead to the idea of
generating an optimal code.

1230
01:11:51,610 --> 01:11:54,850
Both of those approaches are
extremely interesting.

1231
01:11:54,850 --> 01:11:58,300
But to do that we have
to think about a very

1232
01:11:58,300 --> 01:12:00,150
simple kind of source.

1233
01:12:00,150 --> 01:12:02,820
The simple kind of source
is called a

1234
01:12:02,820 --> 01:12:05,050
discrete memoryless source.

1235
01:12:05,050 --> 01:12:08,510
We know what a discrete source
is -- it's a source which

1236
01:12:08,510 --> 01:12:12,970
spews out a sequence of symbols
from this finite

1237
01:12:12,970 --> 01:12:16,130
alphabet that we know and
the decoder knows.

1238
01:12:20,210 --> 01:12:23,650
The next thing we have to do
is to put a probability

1239
01:12:23,650 --> 01:12:27,120
measure on the output
of the source.

1240
01:12:27,120 --> 01:12:30,550
There's a little review
of probability at

1241
01:12:30,550 --> 01:12:32,610
the end of this lecture.

1242
01:12:32,610 --> 01:12:35,510
You should read it carefully.

1243
01:12:35,510 --> 01:12:39,200
When you study probability, you
have undoubtedly studied

1244
01:12:39,200 --> 01:12:44,650
it like most students do,
as a way of learning

1245
01:12:44,650 --> 01:12:47,690
how to do the problems.

1246
01:12:47,690 --> 01:12:53,260
You don't necessarily think of
the generalizations of this,

1247
01:12:53,260 --> 01:12:57,040
you don't necessarily think of
why is it that when you define

1248
01:12:57,040 --> 01:13:01,760
a probability space you start
out with a sample space and

1249
01:13:01,760 --> 01:13:04,320
you talk about elements in the
sample space, what's their

1250
01:13:04,320 --> 01:13:06,100
sample points.

1251
01:13:06,100 --> 01:13:08,380
What do those sample points
have to do with random

1252
01:13:08,380 --> 01:13:11,360
variables and all
of that stuff?

1253
01:13:11,360 --> 01:13:15,050
That's the first thing you
forget when you haven't been

1254
01:13:15,050 --> 01:13:19,270
looking at probability
for a while.

1255
01:13:19,270 --> 01:13:22,870
Unfortunately, it's something
you have to understand when

1256
01:13:22,870 --> 01:13:26,190
we're dealing with this because
we have a bunch of

1257
01:13:26,190 --> 01:13:28,290
things which are not random
variables here.

1258
01:13:28,290 --> 01:13:31,540
These letters here are
things which we

1259
01:13:31,540 --> 01:13:33,980
will call chance variables.

1260
01:13:33,980 --> 01:13:38,290
A chance variable is just like
a random variable but the set

1261
01:13:38,290 --> 01:13:43,110
of possible values that it has
are not necessarily numbers,

1262
01:13:43,110 --> 01:13:46,430
they're just events,
as it turns out.

1263
01:13:46,430 --> 01:13:50,500
So the sample space is just
some set of letters, as we

1264
01:13:50,500 --> 01:13:52,400
call them, which are
really events in

1265
01:13:52,400 --> 01:13:54,060
this probability space.

1266
01:13:54,060 --> 01:13:57,030
The probability space assigns
probabilities to

1267
01:13:57,030 --> 01:13:59,510
sequences of letters.

1268
01:13:59,510 --> 01:14:03,200
What we're assuming here is that
the sequence of letters

1269
01:14:03,200 --> 01:14:07,230
are all statistically
independent of each other.

1270
01:14:07,230 --> 01:14:16,210
So for example, if you go to Las
Vegas and you're reporting

1271
01:14:16,210 --> 01:14:20,550
the outcome of some gambling
game and you're sending it

1272
01:14:20,550 --> 01:14:23,940
back your home computer and your
home computer is figuring

1273
01:14:23,940 --> 01:14:28,120
out what your odds are in black
jack or something, then

1274
01:14:28,120 --> 01:14:31,970
every time the dice are rolled
you get an independent -- we

1275
01:14:31,970 --> 01:14:36,260
hope it's independent if
the game is fair --

1276
01:14:36,260 --> 01:14:39,870
outcome of the dice.

1277
01:14:39,870 --> 01:14:41,980
So that what we're sending
then, what we're going to

1278
01:14:41,980 --> 01:14:47,450
encode is a sequence of
independent, random -- not

1279
01:14:47,450 --> 01:14:50,720
random variables because it's
not necessarily numbers that

1280
01:14:50,720 --> 01:14:57,340
you're interested in, it's
this sequence of symbols.

1281
01:14:57,340 --> 01:15:04,620
But if we deal with the English
text, for example, the

1282
01:15:04,620 --> 01:15:08,250
idea that the letters in English
text are independent

1283
01:15:08,250 --> 01:15:13,140
of each other is absolutely
ludicrous.

1284
01:15:13,140 --> 01:15:17,820
If it's early enough in the
term that you're not

1285
01:15:17,820 --> 01:15:21,680
overloaded already, I would
suggest that those of you with

1286
01:15:21,680 --> 01:15:25,800
a little time go back and read
at least the first part of

1287
01:15:25,800 --> 01:15:29,780
Shannon's original article about
information theory where

1288
01:15:29,780 --> 01:15:33,010
he talks about the problem
of modeling English.

1289
01:15:33,010 --> 01:15:36,770
It's a beautiful treatment,
because he starts out same way

1290
01:15:36,770 --> 01:15:41,150
we are, dealing with sources
which are independent,

1291
01:15:41,150 --> 01:15:44,160
identically distributed
chance variables.

1292
01:15:44,160 --> 01:15:49,180
Then he goes from there, as we
will, to looking at Markov

1293
01:15:49,180 --> 01:15:54,210
chains of source variables.

1294
01:15:54,210 --> 01:15:57,590
Some of you will cringe at this
because you might have

1295
01:15:57,590 --> 01:16:00,820
seen Markov chains and forgotten
about them or you

1296
01:16:00,820 --> 01:16:02,680
might have never seen them.

1297
01:16:02,680 --> 01:16:05,620
Don't worry about it, there's
not that much that's peculiar

1298
01:16:05,620 --> 01:16:06,670
about them.

1299
01:16:06,670 --> 01:16:09,290
Then he goes on from
there to talk about

1300
01:16:09,290 --> 01:16:12,810
actual English language.

1301
01:16:12,810 --> 01:16:17,370
But the point that he makes is
that when you want to study

1302
01:16:17,370 --> 01:16:21,060
something as complicated as the
English language, the way

1303
01:16:21,060 --> 01:16:24,470
that you do it is not to start
out by taking a lot of

1304
01:16:24,470 --> 01:16:27,390
statistics about English.

1305
01:16:27,390 --> 01:16:31,150
If you want to encode English,
you start out by making highly

1306
01:16:31,150 --> 01:16:34,060
simplifying assumptions, like
the assumption that we're

1307
01:16:34,060 --> 01:16:35,770
making here that we're
dealing with a

1308
01:16:35,770 --> 01:16:37,990
discrete memoryless source.

1309
01:16:37,990 --> 01:16:41,760
You then learn how to encode
discrete memoryless sources.

1310
01:16:41,760 --> 01:16:46,190
You then look at blocks of
letters out of these sources,

1311
01:16:46,190 --> 01:16:49,010
and if they're not independent
you look at the probabilities

1312
01:16:49,010 --> 01:16:50,450
of these blocks.

1313
01:16:50,450 --> 01:16:57,240
If you know how to generate an
optimal code for IID letters,

1314
01:16:57,240 --> 01:17:00,920
then all you have to do is take
these blocks of length m

1315
01:17:00,920 --> 01:17:05,070
where you'd have a probability
on each possible block, and

1316
01:17:05,070 --> 01:17:07,640
your generate a code
for the block.

1317
01:17:07,640 --> 01:17:11,100
You don't worry about the
statistical relationships

1318
01:17:11,100 --> 01:17:12,330
between different blocks.

1319
01:17:12,330 --> 01:17:15,630
You just say well, if I make my
block long enough I don't

1320
01:17:15,630 --> 01:17:18,940
care about what happens at the
edges, and I'm going to get

1321
01:17:18,940 --> 01:17:21,560
everything of interest.

1322
01:17:21,560 --> 01:17:26,285
So the idea is by starting out
here you have all the clues

1323
01:17:26,285 --> 01:17:30,150
you need to start looking at
the more interesting cases.

1324
01:17:30,150 --> 01:17:32,390
As it turns out with source
coding there's another

1325
01:17:32,390 --> 01:17:37,380
advantage involved -- looking
at independent letters is in

1326
01:17:37,380 --> 01:17:39,530
some sense a worst case.

1327
01:17:39,530 --> 01:17:45,010
When you look at this worst
case, in fact, presto, you

1328
01:17:45,010 --> 01:17:49,200
will say if the letters are
statistically related, fine.

1329
01:17:49,200 --> 01:17:51,470
I'd do even better.

1330
01:17:51,470 --> 01:17:54,820
I could do better if I took that
into account, but if I'm

1331
01:17:54,820 --> 01:17:57,620
not taking it into account,
I know exactly

1332
01:17:57,620 --> 01:17:59,150
how well I can do.

1333
01:17:59,150 --> 01:18:01,200
So what's the definition
of that?

1334
01:18:01,200 --> 01:18:05,620
Source output is an unending
sequence -- x1, x2, x3 --

1335
01:18:05,620 --> 01:18:09,510
of randomly selected letters,
and these randomly selected

1336
01:18:09,510 --> 01:18:12,240
letters are called
chance variables.

1337
01:18:12,240 --> 01:18:16,580
Each source output is selected
from the alphabet using a

1338
01:18:16,580 --> 01:18:18,280
common probability measure.

1339
01:18:18,280 --> 01:18:21,930
In other words, they're
identically distributed.

1340
01:18:21,930 --> 01:18:25,200
Each source output is
statistically independent of

1341
01:18:25,200 --> 01:18:29,820
the other source outputs,
x1 up to x k plus 1.

1342
01:18:29,820 --> 01:18:34,610
We will call that independent
identically distributed, and

1343
01:18:34,610 --> 01:18:37,190
we'll abbreviate it IID.

1344
01:18:37,190 --> 01:18:42,780
It doesn't mean that the
probability measure is 1 over

1345
01:18:42,780 --> 01:18:47,250
m for each letter, it's not what
we were assuming before.

1346
01:18:47,250 --> 01:18:49,490
It means you can have an
arbitrary probability

1347
01:18:49,490 --> 01:18:52,610
assignment on the different
letters, but every letter has

1348
01:18:52,610 --> 01:18:55,730
the same probability assignment
on it and they're

1349
01:18:55,730 --> 01:18:57,930
all independent of each other.

1350
01:18:57,930 --> 01:18:59,500
So that's the kind of source
we're going to be

1351
01:18:59,500 --> 01:19:01,140
dealing with first.

1352
01:19:01,140 --> 01:19:04,760
We will find out everything we
want to know about how we deal

1353
01:19:04,760 --> 01:19:06,090
with that source.

1354
01:19:06,090 --> 01:19:08,280
You will understand that source
completely and the

1355
01:19:08,280 --> 01:19:13,090
other sources you will half
understand a little later.