1
00:00:00,090 --> 00:00:02,490
The following content is
provided under a Creative

2
00:00:02,490 --> 00:00:04,030
Commons license.

3
00:00:04,030 --> 00:00:06,330
Your support will help
MIT OpenCourseWare

4
00:00:06,330 --> 00:00:10,720
continue to offer high quality
educational resources for free.

5
00:00:10,720 --> 00:00:13,320
To make a donation or
view additional materials

6
00:00:13,320 --> 00:00:15,780
from hundreds of
MIT courses, visit

7
00:00:15,780 --> 00:00:17,670
MITOpenCourseWare@OCW.MIT.edu.

8
00:00:27,210 --> 00:00:28,680
PROFESSOR: Welcome.

9
00:00:28,680 --> 00:00:31,140
One quick announcement--
if you have not yet

10
00:00:31,140 --> 00:00:34,680
picked up your graded
exams, you can do so

11
00:00:34,680 --> 00:00:38,338
by seeing the TAs
after the hour.

12
00:00:38,338 --> 00:00:41,170
OK?

13
00:00:41,170 --> 00:00:43,650
So today I want to
continue to think

14
00:00:43,650 --> 00:00:48,660
about what we started last week,
thinking about Fourier series.

15
00:00:48,660 --> 00:00:54,780
The idea is to
develop a theory that

16
00:00:54,780 --> 00:00:59,610
lets us look at signals on the
basis of frequency content,

17
00:00:59,610 --> 00:01:02,820
much as we looked at
frequency responses

18
00:01:02,820 --> 00:01:05,850
as a characterization
of systems,

19
00:01:05,850 --> 00:01:08,610
according to the way
they process frequencies.

20
00:01:08,610 --> 00:01:11,850
And we saw last time that
there were a number of kinds

21
00:01:11,850 --> 00:01:15,360
of signals, for example,
musical signals,

22
00:01:15,360 --> 00:01:17,520
where that kind of
an on approach--

23
00:01:17,520 --> 00:01:20,940
thinking about the signal
according to the frequencies

24
00:01:20,940 --> 00:01:21,870
that are in it--

25
00:01:21,870 --> 00:01:25,290
makes a lot of sense
and can lead to insight.

26
00:01:25,290 --> 00:01:27,600
We also developed
some formalism.

27
00:01:27,600 --> 00:01:32,310
We figured out how you can
break a signal into components

28
00:01:32,310 --> 00:01:37,090
and then assemble the components
to generate the signal.

29
00:01:37,090 --> 00:01:39,810
And what I want to mention at
the beginning of the hour today

30
00:01:39,810 --> 00:01:42,870
is just how to think
about this operation

31
00:01:42,870 --> 00:01:45,810
in a more familiar way.

32
00:01:45,810 --> 00:01:50,430
We do this kind of a
thing, breaking something

33
00:01:50,430 --> 00:01:53,826
into components all the time.

34
00:01:53,826 --> 00:01:55,200
One of the more
familiar examples

35
00:01:55,200 --> 00:01:58,380
might be thinking
about 3-space, right?

36
00:01:58,380 --> 00:02:02,220
The Cartesian analysis of
3-space is based on the idea

37
00:02:02,220 --> 00:02:05,070
that you can think about a
vector location in 3-space

38
00:02:05,070 --> 00:02:06,090
as having components.

39
00:02:06,090 --> 00:02:07,965
There's a component in
the x direction, the y

40
00:02:07,965 --> 00:02:09,180
direction, the z direction.

41
00:02:09,180 --> 00:02:10,919
That's completely
analogous to the way

42
00:02:10,919 --> 00:02:15,330
we're thinking about Fourier
representations for signals.

43
00:02:15,330 --> 00:02:20,070
So just like we would
think about synthesizing

44
00:02:20,070 --> 00:02:24,570
the location of a point by
adding together three pieces,

45
00:02:24,570 --> 00:02:28,290
and we would think about
analyzing a point to figure out

46
00:02:28,290 --> 00:02:30,930
how big the components are
in each of those directions,

47
00:02:30,930 --> 00:02:34,770
it's exactly the same when we
think about Fourier series.

48
00:02:34,770 --> 00:02:38,860
We think about representing
a signal as a sum of things.

49
00:02:38,860 --> 00:02:41,970
So the sum is
precisely the same.

50
00:02:41,970 --> 00:02:44,750
This one happens to have an
infinite number of terms,

51
00:02:44,750 --> 00:02:49,680
[? eh. ?] The top one has
three terms, [? eh. ?]

52
00:02:49,680 --> 00:02:52,230
The principles are very similar.

53
00:02:52,230 --> 00:02:55,830
So we think about representing
a signal as a sum of components.

54
00:02:55,830 --> 00:02:58,380
We think about representing
a point in 3-space

55
00:02:58,380 --> 00:03:00,180
as a sum of components,
and we think

56
00:03:00,180 --> 00:03:06,470
about analyzing the signal
or the vector in 3-space,

57
00:03:06,470 --> 00:03:11,040
so that we figure out what
each of those components are.

58
00:03:11,040 --> 00:03:13,980
And we do it in an operation
where it's actually

59
00:03:13,980 --> 00:03:18,240
very convenient to think
about the decomposition

60
00:03:18,240 --> 00:03:21,270
of the Fourier components using
precisely the same language

61
00:03:21,270 --> 00:03:24,240
that we would use for
thinking about vector spaces.

62
00:03:24,240 --> 00:03:26,701
So we would think about--

63
00:03:26,701 --> 00:03:28,200
in the case of the
Fourier, we think

64
00:03:28,200 --> 00:03:32,112
about integrating over the
period sifts out a component.

65
00:03:32,112 --> 00:03:33,570
The analogous
operation for 3-space

66
00:03:33,570 --> 00:03:35,700
is to think about a dot product.

67
00:03:35,700 --> 00:03:37,606
The way you take a
vector and figure out

68
00:03:37,606 --> 00:03:39,480
the component in the x
direction is to dot it

69
00:03:39,480 --> 00:03:42,720
with it, the dot product.

70
00:03:42,720 --> 00:03:44,670
In the Fourier case,
we think about it

71
00:03:44,670 --> 00:03:45,810
as being an inner product.

72
00:03:48,550 --> 00:03:50,880
The idea is
completely analogous.

73
00:03:50,880 --> 00:03:54,720
So we think about having the
inner product of two things--

74
00:03:54,720 --> 00:03:59,880
the reference direction
and the vector.

75
00:03:59,880 --> 00:04:02,460
So reference
direction and vector--

76
00:04:02,460 --> 00:04:04,830
we think about it exactly
the same way, except now it's

77
00:04:04,830 --> 00:04:07,316
an inner product, which means
that after we've multiplied,

78
00:04:07,316 --> 00:04:08,190
we have to integrate.

79
00:04:08,190 --> 00:04:11,070
That's the only difference
between inner product.

80
00:04:11,070 --> 00:04:13,830
Inner product implies some are
[? integrated ?] after you've

81
00:04:13,830 --> 00:04:17,029
done the multiplication.

82
00:04:17,029 --> 00:04:19,880
So we do exactly the same
thing, except that now we

83
00:04:19,880 --> 00:04:24,730
think about the inner
product of a and b.

84
00:04:24,730 --> 00:04:29,530
That's just the integral, where
we take the complex conjugate

85
00:04:29,530 --> 00:04:34,180
of one of the signals only
because by defining it

86
00:04:34,180 --> 00:04:38,480
with a complex conjugate there,
we set up the inner product

87
00:04:38,480 --> 00:04:41,320
so that the answer
is zero unless we

88
00:04:41,320 --> 00:04:45,930
take the inner product of two
things in the same direction.

89
00:04:45,930 --> 00:04:47,360
OK?

90
00:04:47,360 --> 00:04:50,300
By putting the minus sign
there, if the two reference

91
00:04:50,300 --> 00:04:53,090
directions, that
is to say the one

92
00:04:53,090 --> 00:04:59,180
characterized by k and m, the
ones characterized by k and m,

93
00:04:59,180 --> 00:05:00,800
the inner product
will be zero if we

94
00:05:00,800 --> 00:05:05,180
take the complex conjugate as
long as k is not equal to m.

95
00:05:05,180 --> 00:05:08,050
k equals m is the only
not zero component.

96
00:05:08,050 --> 00:05:11,270
OK, is that all clear?

97
00:05:11,270 --> 00:05:14,420
So to make sure that it's
clear, here's a question.

98
00:05:17,507 --> 00:05:19,340
How many of the following
pairs of functions

99
00:05:19,340 --> 00:05:23,015
are orthogonal in T equals 3?

100
00:05:25,412 --> 00:05:26,870
Part of the goal
of the exercise is

101
00:05:26,870 --> 00:05:30,020
to figure out what the little
caveat in T equals three means.

102
00:05:30,020 --> 00:05:32,810
So look at your neighbor,
say hello, figure out

103
00:05:32,810 --> 00:05:35,636
a number between 0 and 4.

104
00:05:35,636 --> 00:05:38,624
[SIDE CONVERSATIONS]

105
00:07:48,340 --> 00:07:50,780
OK, so how many signals,
how many of the pairs

106
00:07:50,780 --> 00:07:54,470
are orthogonal to each other?

107
00:07:54,470 --> 00:07:56,960
Raise your hand with some
number between 0 and 4,

108
00:07:56,960 --> 00:07:59,990
unless you're completely
bizarre and raise five, I mean.

109
00:07:59,990 --> 00:08:01,200
OK, come on, come on.

110
00:08:01,200 --> 00:08:02,760
Higher, so I can see them.

111
00:08:02,760 --> 00:08:04,080
Remember if you're wrong.

112
00:08:04,080 --> 00:08:08,710
It's your partner's fault,
it's not your fault.

113
00:08:08,710 --> 00:08:11,530
OK, not quite.

114
00:08:11,530 --> 00:08:15,230
A lot of bad partners, no, no.

115
00:08:15,230 --> 00:08:16,460
Let's do the first one.

116
00:08:16,460 --> 00:08:21,290
Is the cos of 2 pi t
orthogonal to the sine

117
00:08:21,290 --> 00:08:25,910
of 2 pi t over the interval
capital T equals 3?

118
00:08:29,100 --> 00:08:31,770
Yes?

119
00:08:31,770 --> 00:08:34,400
No.

120
00:08:34,400 --> 00:08:37,049
I haven't a clue.

121
00:08:37,049 --> 00:08:38,695
I don't care.

122
00:08:38,695 --> 00:08:39,320
No, no, no, no.

123
00:08:39,320 --> 00:08:42,039
You all care, no.

124
00:08:42,039 --> 00:08:42,919
Are they orthogonal?

125
00:08:42,919 --> 00:08:43,460
So what do you--

126
00:08:43,460 --> 00:08:45,043
how do I formally
ask the question are

127
00:08:45,043 --> 00:08:46,015
they are orthogonal?

128
00:08:49,580 --> 00:08:52,820
OK, so it's either the last
slide or the next slide.

129
00:08:52,820 --> 00:08:54,059
So go back to the last slide.

130
00:08:54,059 --> 00:08:55,600
What's it mean if
they're orthogonal?

131
00:08:59,560 --> 00:09:00,191
Yeah?

132
00:09:00,191 --> 00:09:05,010
AUDIENCE: [INAUDIBLE]

133
00:09:05,010 --> 00:09:07,110
PROFESSOR: So how do I
take the dot product?

134
00:09:07,110 --> 00:09:08,490
What do I do?

135
00:09:08,490 --> 00:09:11,152
AUDIENCE: [INAUDIBLE] conjugate.

136
00:09:11,152 --> 00:09:12,110
PROFESSOR: Conjugate 1.

137
00:09:12,110 --> 00:09:12,693
So I want to--

138
00:09:12,693 --> 00:09:18,300
I'm thinking about 1 over T, the
integral over T, a star of T,

139
00:09:18,300 --> 00:09:20,260
b of t, dt.

140
00:09:20,260 --> 00:09:21,990
Right?

141
00:09:21,990 --> 00:09:24,480
So the t comes in
here, right, I'm

142
00:09:24,480 --> 00:09:26,640
integrating over a period t.

143
00:09:26,640 --> 00:09:31,180
So I take the two functions
and I multiply them together.

144
00:09:31,180 --> 00:09:35,370
So I have this function,
and I have that function.

145
00:09:35,370 --> 00:09:38,010
I multiply them together.

146
00:09:38,010 --> 00:09:41,700
If you multiply two sinusoids
of the same frequency

147
00:09:41,700 --> 00:09:45,470
but different phase,
what do you get?

148
00:09:45,470 --> 00:09:47,206
Another sinusoid, right?

149
00:09:47,206 --> 00:09:49,580
So you all know all these
complicated trig relationships,

150
00:09:49,580 --> 00:09:50,079
right?

151
00:09:50,079 --> 00:09:50,920
Here's one of them.

152
00:09:50,920 --> 00:09:53,230
If you multiply cos of 2
pi T times the sine of pi T

153
00:09:53,230 --> 00:09:57,940
you get half the sign
of double the frequency.

154
00:09:57,940 --> 00:09:58,480
OK?

155
00:09:58,480 --> 00:09:59,813
You don't need to memorize that.

156
00:09:59,813 --> 00:10:03,130
You just look at this picture,
you look at that picture.

157
00:10:03,130 --> 00:10:07,490
This one over the
interval 3 has 3 periods.

158
00:10:07,490 --> 00:10:08,620
Right?

159
00:10:08,620 --> 00:10:12,640
There are 3 periods
of that waveform

160
00:10:12,640 --> 00:10:15,340
over the period of
capital T. There's

161
00:10:15,340 --> 00:10:18,920
an integer number 3, same here.

162
00:10:18,920 --> 00:10:21,190
Here-- how many periods?

163
00:10:21,190 --> 00:10:22,630
Twice that.

164
00:10:22,630 --> 00:10:24,960
But it's exactly six.

165
00:10:24,960 --> 00:10:28,840
So you get a pure sinusoid.

166
00:10:28,840 --> 00:10:30,490
You get an integer
number of periods.

167
00:10:30,490 --> 00:10:33,670
You integrate over an integer
number of periods, you get 0.

168
00:10:33,670 --> 00:10:35,440
They're orthogonal.

169
00:10:35,440 --> 00:10:37,414
Had I chosen the
period differently,

170
00:10:37,414 --> 00:10:38,830
they may not have
been orthogonal.

171
00:10:38,830 --> 00:10:40,611
It depends on the period.

172
00:10:40,611 --> 00:10:41,110
OK?

173
00:10:41,110 --> 00:10:42,790
So the inner product
depends on the period,

174
00:10:42,790 --> 00:10:44,415
because the inner
product has something

175
00:10:44,415 --> 00:10:45,520
to do with integrator sum.

176
00:10:45,520 --> 00:10:50,550
And so the range over which
you sum or integrate matters.

177
00:10:50,550 --> 00:10:54,570
How about cos 2 pi T cos 4 pi T?

178
00:10:54,570 --> 00:10:55,320
Orthogonal?

179
00:10:58,680 --> 00:10:59,640
Yeah?

180
00:10:59,640 --> 00:11:01,080
AUDIENCE: Yes.

181
00:11:01,080 --> 00:11:03,558
PROFESSOR: And the reason is?

182
00:11:03,558 --> 00:11:07,446
AUDIENCE: So think if you
wrapped [INAUDIBLE] together,

183
00:11:07,446 --> 00:11:10,119
then there's a lot
of symmetry that

184
00:11:10,119 --> 00:11:14,260
goes on [INAUDIBLE]
is going to be 0.

185
00:11:14,260 --> 00:11:18,260
PROFESSOR: So now we've got
two different frequencies.

186
00:11:18,260 --> 00:11:20,651
But we still get these
funny cosine relationships

187
00:11:20,651 --> 00:11:22,400
that have to do with
sums and differences.

188
00:11:22,400 --> 00:11:23,810
And the sums and
differences both

189
00:11:23,810 --> 00:11:27,840
happen to be periodic over
3, over the interval capital

190
00:11:27,840 --> 00:11:29,690
T equals 3, right?

191
00:11:29,690 --> 00:11:33,390
So we still get the property
that the average value here,

192
00:11:33,390 --> 00:11:36,470
which is what the interval was
pulling out, the average is 0.

193
00:11:36,470 --> 00:11:38,500
So they're also orthogonal.

194
00:11:38,500 --> 00:11:41,390
How about cos 2 pi T sine pi T?

195
00:11:43,980 --> 00:11:46,735
OK, I've asked two questions,
and they were both yes.

196
00:11:46,735 --> 00:11:48,340
So I'm getting
bored at this point,

197
00:11:48,340 --> 00:11:52,336
so by the theory of questions
in lecture, the answer is?

198
00:11:52,336 --> 00:11:54,779
[LAUGHTER]

199
00:11:54,779 --> 00:11:55,570
Now, wait a minute.

200
00:11:55,570 --> 00:11:57,100
I'm not that boring.

201
00:11:57,100 --> 00:11:59,730
Well, maybe.

202
00:11:59,730 --> 00:12:04,200
So is this periodic over
a capital T equals 3?

203
00:12:07,610 --> 00:12:08,676
Ah, excuse me.

204
00:12:08,676 --> 00:12:10,300
I didn't say the
right question, sorry.

205
00:12:10,300 --> 00:12:16,540
Is this function have an integer
number of periods in the time

206
00:12:16,540 --> 00:12:19,319
interval capital T equals 3?

207
00:12:19,319 --> 00:12:21,360
What's the period-- what's
the fundamental period

208
00:12:21,360 --> 00:12:22,800
of this waveform?

209
00:12:22,800 --> 00:12:23,580
AUDIENCE: 1.

210
00:12:23,580 --> 00:12:24,710
PROFESSOR: 1.

211
00:12:24,710 --> 00:12:29,037
So it has 3 periods over
the interval cap T equals 3.

212
00:12:29,037 --> 00:12:29,870
What about this one?

213
00:12:33,502 --> 00:12:34,410
AUDIENCE: 2.

214
00:12:34,410 --> 00:12:35,610
PROFESSOR: Period is 2.

215
00:12:35,610 --> 00:12:39,290
How many periods are there in
the time interval capital T

216
00:12:39,290 --> 00:12:40,297
equals 3?

217
00:12:40,297 --> 00:12:43,159
AUDIENCE: [INAUDIBLE]

218
00:12:43,159 --> 00:12:44,430
PROFESSOR: A period is 2.

219
00:12:44,430 --> 00:12:46,290
How many periods are there--

220
00:12:46,290 --> 00:12:48,290
1 and 1/2, not an integer.

221
00:12:48,290 --> 00:12:49,760
Bad news, right?

222
00:12:49,760 --> 00:12:54,080
So integer number three,
not integer number.

223
00:12:54,080 --> 00:12:57,170
If you were to integrate
this over the period t

224
00:12:57,170 --> 00:13:00,440
equals 3, if I
didn't multiply them

225
00:13:00,440 --> 00:13:03,170
if I just did that, if I just
thought about that integral,

226
00:13:03,170 --> 00:13:05,600
I wouldn't get 0, right?

227
00:13:05,600 --> 00:13:08,600
There's more positives
than there are negatives.

228
00:13:08,600 --> 00:13:11,720
And when I multiply them, the
same sort of thing happens.

229
00:13:11,720 --> 00:13:14,340
I get two big peaks down
and only one big peak up.

230
00:13:14,340 --> 00:13:18,980
It's because the resulting
waveform no longer

231
00:13:18,980 --> 00:13:22,570
has an integer number of periods
in the interval capital T

232
00:13:22,570 --> 00:13:23,270
equals 3.

233
00:13:23,270 --> 00:13:23,770
OK?

234
00:13:27,510 --> 00:13:30,900
Last one-- cos 2 pi T e to the--

235
00:13:30,900 --> 00:13:33,735
whoops.

236
00:13:33,735 --> 00:13:37,410
Is that what I actually said?

237
00:13:37,410 --> 00:13:39,600
Good, I forgot the j.

238
00:13:39,600 --> 00:13:42,240
Because without the j,
they would obviously not

239
00:13:42,240 --> 00:13:43,290
be orthogonal.

240
00:13:43,290 --> 00:13:44,550
Obviously, right?

241
00:13:44,550 --> 00:13:45,450
OK.

242
00:13:45,450 --> 00:13:48,400
I didn't mean to ask
something quite that obvious.

243
00:13:48,400 --> 00:13:54,720
So what about cos 2 pi
T and e to the j 2 pi T?

244
00:13:57,350 --> 00:14:00,870
Orthogonal.

245
00:14:00,870 --> 00:14:09,731
Not, I'm as clueless as I was
on Part A. No, no, no, no,

246
00:14:09,731 --> 00:14:10,230
you're not.

247
00:14:10,230 --> 00:14:10,855
No, you're not.

248
00:14:10,855 --> 00:14:12,230
No, you're not.

249
00:14:12,230 --> 00:14:15,790
So how do you think about that?

250
00:14:15,790 --> 00:14:17,370
You can use Euler's expression.

251
00:14:17,370 --> 00:14:18,990
And if there had
been a j there, this

252
00:14:18,990 --> 00:14:21,350
would have been a
correct expression.

253
00:14:21,350 --> 00:14:22,050
OK?

254
00:14:22,050 --> 00:14:23,580
It's not quite a
correct expression

255
00:14:23,580 --> 00:14:25,477
because I forgot
to put the j there.

256
00:14:25,477 --> 00:14:27,060
But had there been
a j there, it would

257
00:14:27,060 --> 00:14:30,600
have been cos 2 pi T
plus j sine 2 pi T.

258
00:14:30,600 --> 00:14:33,330
And the awkward thing is
that the cos and the cos

259
00:14:33,330 --> 00:14:35,820
are obviously not
orthogonal with each other.

260
00:14:35,820 --> 00:14:40,650
A signal is not orthogonal
with itself, OK?

261
00:14:40,650 --> 00:14:45,150
So because part of this
signal is that signal,

262
00:14:45,150 --> 00:14:49,650
those two signals
are not orthogonal.

263
00:14:49,650 --> 00:14:50,580
OK?

264
00:14:50,580 --> 00:14:52,940
Yes?

265
00:14:52,940 --> 00:14:54,620
OK, so that's kind
of-- so that's

266
00:14:54,620 --> 00:14:56,180
the idea of orthogonality.

267
00:14:56,180 --> 00:14:59,810
It's a very good way to
think about decompositions.

268
00:15:02,182 --> 00:15:04,640
And even though we only spent
about half an hour last time,

269
00:15:04,640 --> 00:15:06,380
and only about 15
minutes this time,

270
00:15:06,380 --> 00:15:09,660
that is the whole theory
of Fourier series.

271
00:15:09,660 --> 00:15:11,736
That doesn't mean we
can't ask hard questions.

272
00:15:11,736 --> 00:15:13,110
There were a couple
of questions.

273
00:15:13,110 --> 00:15:14,200
Yes, you were first.

274
00:15:14,200 --> 00:15:16,540
AUDIENCE: Is there a way to
think about orthogonality

275
00:15:16,540 --> 00:15:20,260
using the Fourier [INAUDIBLE].

276
00:15:20,260 --> 00:15:22,610
PROFESSOR: Well, the
Fourier coefficients

277
00:15:22,610 --> 00:15:25,947
are the result of orthogonality.

278
00:15:25,947 --> 00:15:27,530
I don't think you
can tell-- if I just

279
00:15:27,530 --> 00:15:29,060
told you a bunch of
Fourier coefficients,

280
00:15:29,060 --> 00:15:30,920
I don't know if you
can tell me something

281
00:15:30,920 --> 00:15:34,070
about the orthogonality of
the underlying signals or not.

282
00:15:34,070 --> 00:15:36,975
AUDIENCE: What if [INAUDIBLE].

283
00:15:36,975 --> 00:15:37,850
PROFESSOR: Excuse me?

284
00:15:37,850 --> 00:15:39,433
AUDIENCE: [INAUDIBLE]
[? the period ?]

285
00:15:39,433 --> 00:15:41,570
and the Fourier [INAUDIBLE].

286
00:15:44,370 --> 00:15:46,529
PROFESSOR: Let's see,
so I'm not completely

287
00:15:46,529 --> 00:15:47,820
sure I know what you're asking.

288
00:15:47,820 --> 00:15:51,410
Certainly if you tell me that
the Fourier's coefficients are

289
00:15:51,410 --> 00:15:55,030
blah, blah, blah,
3, 2 7, and 16.

290
00:15:55,030 --> 00:15:58,050
And if you tell me that you're
working with a simple Fourier

291
00:15:58,050 --> 00:16:03,594
series periodic in 3, then
you've told me everything.

292
00:16:03,594 --> 00:16:05,260
And so there's a way
for me to backtrack

293
00:16:05,260 --> 00:16:06,760
that it was orthogonal.

294
00:16:06,760 --> 00:16:09,610
I am not sure if I'm connecting
with you, so if I'm not,

295
00:16:09,610 --> 00:16:11,320
ask me after lecture
to make sure that--

296
00:16:11,320 --> 00:16:12,234
AUDIENCE: [INAUDIBLE]

297
00:16:12,234 --> 00:16:12,900
PROFESSOR: Sure.

298
00:16:12,900 --> 00:16:14,399
AUDIENCE: I think
he's saying if you

299
00:16:14,399 --> 00:16:17,337
have two signals [INAUDIBLE]
coefficients [INAUDIBLE] two

300
00:16:17,337 --> 00:16:21,281
signals, can I tell if those
two signals are orthogonal

301
00:16:21,281 --> 00:16:27,220
[INAUDIBLE] the coefficients
are orthogonal [INAUDIBLE]..

302
00:16:27,220 --> 00:16:31,150
PROFESSOR: If they have
components in common,

303
00:16:31,150 --> 00:16:34,060
they couldn't possibly
be orthogonal.

304
00:16:34,060 --> 00:16:38,750
So I would answer
yes to that question.

305
00:16:38,750 --> 00:16:41,560
So if that's what you were-- so
I think that's probably right.

306
00:16:41,560 --> 00:16:43,140
Does that sound right?

307
00:16:43,140 --> 00:16:44,430
Yeah, OK.

308
00:16:44,430 --> 00:16:45,630
Yes?

309
00:16:45,630 --> 00:16:48,130
AUDIENCE: I'm awfully confused
by the complex conjugate,

310
00:16:48,130 --> 00:16:49,130
the [INAUDIBLE].

311
00:16:49,130 --> 00:16:50,762
PROFESSOR: Yes, yes, yes.

312
00:16:50,762 --> 00:16:52,928
AUDIENCE: So does that mean
we're taking the complex

313
00:16:52,928 --> 00:16:55,835
conjugate [? of a ?] and
we're [? applying it ?] to b?

314
00:16:55,835 --> 00:16:57,710
PROFESSOR: We're taking
the complex conjugate

315
00:16:57,710 --> 00:16:59,570
of the entire function.

316
00:16:59,570 --> 00:17:03,200
At every point in time, we take
the complex conjugate of it.

317
00:17:03,200 --> 00:17:06,230
And it's especially
useful to think about

318
00:17:06,230 --> 00:17:08,099
if you're doing
something of the form--

319
00:17:08,099 --> 00:17:17,480
if a of t were e to the
j 2 pi mt and if b of t

320
00:17:17,480 --> 00:17:22,940
were e to the j 2 pi lt.

321
00:17:22,940 --> 00:17:25,190
The only thing
we're trying to do--

322
00:17:25,190 --> 00:17:27,030
but this comes up
quite frequently--

323
00:17:27,030 --> 00:17:28,490
the only thing
we're trying to do

324
00:17:28,490 --> 00:17:32,420
is when you conjugate
one of these,

325
00:17:32,420 --> 00:17:35,850
you rig it so that when
you add the exponents,

326
00:17:35,850 --> 00:17:39,920
the result goes to 0 by
putting the minus up there.

327
00:17:39,920 --> 00:17:40,653
That's all.

328
00:17:40,653 --> 00:17:42,028
AUDIENCE: It
doesn't seem like we

329
00:17:42,028 --> 00:17:44,944
had to do any of that for the
example we just worked on.

330
00:17:44,944 --> 00:17:47,860
It seems like there were just
like [? signals ?] [INAUDIBLE]..

331
00:17:47,860 --> 00:17:49,290
PROFESSOR: Oh, interesting.

332
00:17:52,299 --> 00:17:53,340
That's a very good point.

333
00:17:53,340 --> 00:17:54,590
That's interesting.

334
00:17:54,590 --> 00:17:57,530
So I didn't intend to
throw you a ringer.

335
00:17:57,530 --> 00:18:00,590
These were signals, all
of these except that one,

336
00:18:00,590 --> 00:18:04,160
are real functions of time.

337
00:18:04,160 --> 00:18:06,980
That's why the complex
conjugate didn't come up.

338
00:18:06,980 --> 00:18:07,790
So I apologize.

339
00:18:07,790 --> 00:18:10,970
I wasn't trying to
make it seem tricky.

340
00:18:10,970 --> 00:18:14,480
OK, so it's because
this function of time

341
00:18:14,480 --> 00:18:19,280
is everywhere real that we
didn't need to rehearse this.

342
00:18:19,280 --> 00:18:20,920
We did have to do
it in that one.

343
00:18:23,300 --> 00:18:23,800
OK?

344
00:18:26,800 --> 00:18:28,940
OK, so the point is that
we've already covered,

345
00:18:28,940 --> 00:18:31,450
even though we've only done a
little bit of work in lecture,

346
00:18:31,450 --> 00:18:34,150
we've already covered
all of the theory.

347
00:18:34,150 --> 00:18:36,310
What remains though is
to do some practice.

348
00:18:36,310 --> 00:18:39,760
And also what remains is to
understand how this is useful.

349
00:18:39,760 --> 00:18:43,180
So it's not just music.

350
00:18:43,180 --> 00:18:47,800
The example that I want to
talk about today is speech.

351
00:18:47,800 --> 00:18:50,350
The same sort of thing that we
could do with music last time,

352
00:18:50,350 --> 00:18:51,610
we can do with speech.

353
00:18:51,610 --> 00:18:53,192
And here are some utterances.

354
00:18:53,192 --> 00:18:53,858
[AUDIO PLAYBACK]

355
00:18:53,858 --> 00:19:00,797
- Bat, bait, bet, beet, bit,
bite, bought, boat, but, boot.

356
00:19:00,797 --> 00:19:01,380
[END PLAYBACK]

357
00:19:01,380 --> 00:19:02,755
PROFESSOR: All
right, it was just

358
00:19:02,755 --> 00:19:05,650
intended to be a bunch of
sounds that we can analyze

359
00:19:05,650 --> 00:19:08,890
with Fourier analysis to
get some insight into how

360
00:19:08,890 --> 00:19:14,050
to think about, in particular,
speech recognition and speech

361
00:19:14,050 --> 00:19:15,970
synthesis.

362
00:19:15,970 --> 00:19:18,580
So we can take those
utterances, and all

363
00:19:18,580 --> 00:19:21,340
I did was write a little Python
program to do the decomposition

364
00:19:21,340 --> 00:19:24,490
that I showed on
the previous slides,

365
00:19:24,490 --> 00:19:27,850
so that I could break
these time waveforms.

366
00:19:27,850 --> 00:19:31,670
Here I'm illustrating one, two,
three, four, five, six periods.

367
00:19:31,670 --> 00:19:37,090
So I took one
period of that sound

368
00:19:37,090 --> 00:19:41,680
and ran it through that
kind of an integral

369
00:19:41,680 --> 00:19:43,910
to break it into Fourier
components, which

370
00:19:43,910 --> 00:19:45,400
are showed here.

371
00:19:45,400 --> 00:19:47,500
And what I want
you to see is just

372
00:19:47,500 --> 00:19:50,770
like you could have
recognized a pattern here,

373
00:19:50,770 --> 00:19:55,060
and you might try to
recognize which vowel is which

374
00:19:55,060 --> 00:19:58,780
by the signature in time.

375
00:19:58,780 --> 00:20:01,570
An alternative, and
far more useful way

376
00:20:01,570 --> 00:20:04,150
of thinking about it,
is to try to recognize

377
00:20:04,150 --> 00:20:07,480
the pattern in frequency.

378
00:20:07,480 --> 00:20:10,030
So there are characteristic
differences in the sounds,

379
00:20:10,030 --> 00:20:12,701
and we'll look at the
basis for why there are.

380
00:20:12,701 --> 00:20:14,200
There are characteristic
differences

381
00:20:14,200 --> 00:20:18,520
in the sound that can help
us to identify automatically,

382
00:20:18,520 --> 00:20:22,610
by a machine, what
was being said.

383
00:20:22,610 --> 00:20:24,730
And so what we
want to do is learn

384
00:20:24,730 --> 00:20:29,410
to think about a pattern
that characterizes ah,

385
00:20:29,410 --> 00:20:34,151
ee, oo in the frequency domain,
as opposed to the time domain.

386
00:20:34,151 --> 00:20:34,817
[AUDIO PLAYBACK]

387
00:20:34,817 --> 00:20:36,397
- Bat beet, boot.

388
00:20:36,397 --> 00:20:36,980
[END PLAYBACK]

389
00:20:36,980 --> 00:20:39,480
PROFESSOR: So there's something
different about those sounds

390
00:20:39,480 --> 00:20:44,870
that manifests a difference
in this Fourier signature.

391
00:20:44,870 --> 00:20:48,350
So that's one of the useful
applications of this.

392
00:20:48,350 --> 00:20:50,060
And we'd like to
understand that better.

393
00:20:50,060 --> 00:20:54,440
There's a really good physical
reason why that happens.

394
00:20:54,440 --> 00:20:57,860
And it has to do with the
way we produce speech.

395
00:20:57,860 --> 00:21:03,080
So you can think about speech as
being generated by some source.

396
00:21:03,080 --> 00:21:05,180
Ultimately, the
source of my speech

397
00:21:05,180 --> 00:21:10,070
is somewhere down here,
which always amuses me

398
00:21:10,070 --> 00:21:14,270
when I see the cut off heads
talking, like at Halloween

399
00:21:14,270 --> 00:21:16,310
and like on some cartoon shows.

400
00:21:16,310 --> 00:21:17,980
Because you can't
do that, right?

401
00:21:17,980 --> 00:21:20,510
Because the source has to
do with down here someplace,

402
00:21:20,510 --> 00:21:21,080
right?

403
00:21:21,080 --> 00:21:23,060
My lungs push in.

404
00:21:23,060 --> 00:21:24,800
That pushes air
through something

405
00:21:24,800 --> 00:21:27,860
and starts making noise somehow.

406
00:21:27,860 --> 00:21:33,230
I'm going to focus today on
things that we call voiced--

407
00:21:33,230 --> 00:21:35,560
in a voiced sound.

408
00:21:35,560 --> 00:21:39,800
Ah, in a voiced sound,
it's caused by vibrations

409
00:21:39,800 --> 00:21:40,715
of the vocal chords.

410
00:21:40,715 --> 00:21:44,910
So if you were to stick a
camera down someone's throat,

411
00:21:44,910 --> 00:21:47,060
this is the sort of
thing that you would see.

412
00:21:47,060 --> 00:21:52,100
It's an enormously
complex structure

413
00:21:52,100 --> 00:21:57,020
whose mechanics are extremely
difficult to understand.

414
00:21:57,020 --> 00:22:03,050
Because what happens is when
you want to make a high sound,

415
00:22:03,050 --> 00:22:05,210
you tense the structure.

416
00:22:05,210 --> 00:22:07,970
You pull on some muscles
that pull the cords.

417
00:22:07,970 --> 00:22:13,170
The cords are normally
rattling pretty fast.

418
00:22:13,170 --> 00:22:16,170
And what you do is,
you pull on a muscle

419
00:22:16,170 --> 00:22:20,507
that tenses them to
make it go higher.

420
00:22:20,507 --> 00:22:22,590
But your intuition should
say, now wait a minute--

421
00:22:22,590 --> 00:22:25,290
you're making it longer
to make it higher?

422
00:22:25,290 --> 00:22:28,140
And your intuition
would be right.

423
00:22:28,140 --> 00:22:33,670
Normally, long organ pipes are
higher or lower in frequency?

424
00:22:33,670 --> 00:22:34,640
AUDIENCE: Lower.

425
00:22:34,640 --> 00:22:37,300
PROFESSOR: Lower.

426
00:22:37,300 --> 00:22:40,720
So you have to do a lot
of mental calculations

427
00:22:40,720 --> 00:22:44,090
in order to move these
muscles correctly.

428
00:22:44,090 --> 00:22:46,570
So that the resulting
frequency of the vibration

429
00:22:46,570 --> 00:22:47,590
comes out right.

430
00:22:47,590 --> 00:22:49,780
It's not obvious,
because two things

431
00:22:49,780 --> 00:22:51,510
happen as you tense the muscle.

432
00:22:51,510 --> 00:22:55,750
The folds-- the vocal cords get
longer which you would think

433
00:22:55,750 --> 00:22:58,930
would make the frequency
lower, but they

434
00:22:58,930 --> 00:23:03,480
get tighter, which, of course,
goes the other direction,

435
00:23:03,480 --> 00:23:03,980
right?

436
00:23:03,980 --> 00:23:05,310
So it's a very
complicated thing.

437
00:23:05,310 --> 00:23:06,851
And in fact, it's
something that goes

438
00:23:06,851 --> 00:23:08,270
bad with professional speakers.

439
00:23:08,270 --> 00:23:11,720
But even more,
professional singers

440
00:23:11,720 --> 00:23:15,230
often have a lot of trouble
with the enormous stress

441
00:23:15,230 --> 00:23:19,040
that happens on this structure
with repeated use and repeated

442
00:23:19,040 --> 00:23:20,860
overuse.

443
00:23:20,860 --> 00:23:23,870
Anyway, this takes
a real beating.

444
00:23:23,870 --> 00:23:27,764
But that's ultimately
the source of speech.

445
00:23:27,764 --> 00:23:29,930
But if that were all you
had, it wouldn't sound much

446
00:23:29,930 --> 00:23:31,790
like speech.

447
00:23:31,790 --> 00:23:35,960
A lot of the interesting stuff
comes from these cavities

448
00:23:35,960 --> 00:23:37,760
that you intentionally
manipulate

449
00:23:37,760 --> 00:23:41,000
as you're speaking to make
the different characteristic

450
00:23:41,000 --> 00:23:43,880
sounds.

451
00:23:43,880 --> 00:23:51,020
So the idea then is that you
have a source that contains

452
00:23:51,020 --> 00:23:53,030
information like frequency.

453
00:23:53,030 --> 00:23:57,320
What's the pitch
of the utterance?

454
00:23:57,320 --> 00:24:02,240
But you have this other thing,
which is acting like a filter.

455
00:24:02,240 --> 00:24:04,940
If you think about the
whole thing as a system,

456
00:24:04,940 --> 00:24:09,620
we have a block which
represents a filter, which

457
00:24:09,620 --> 00:24:12,039
is the thing that has
a frequency response.

458
00:24:12,039 --> 00:24:13,580
The frequency response
depends on how

459
00:24:13,580 --> 00:24:16,460
I've put my tongue in my mouth
and how I've opened my lips

460
00:24:16,460 --> 00:24:17,356
and stuff like that.

461
00:24:17,356 --> 00:24:18,480
We'll see that in a minute.

462
00:24:18,480 --> 00:24:21,410
But it also depends on
how the vocal folds--

463
00:24:21,410 --> 00:24:25,710
it has an input, which
is the vocal folds.

464
00:24:25,710 --> 00:24:29,330
So the idea then is the same
kind of a source filter idea

465
00:24:29,330 --> 00:24:34,320
that we motivated last time by
way of the RC filter example.

466
00:24:34,320 --> 00:24:37,070
If you put a resistor
and a capacitor

467
00:24:37,070 --> 00:24:40,130
together with a source,
a convenient way

468
00:24:40,130 --> 00:24:42,830
to think about that is
as a low pass filter.

469
00:24:42,830 --> 00:24:45,170
We think about it having
a frequency response.

470
00:24:45,170 --> 00:24:50,330
So the system, just the RC
part, has a frequency response

471
00:24:50,330 --> 00:24:53,480
which we can characterize
by a Bode diagram.

472
00:24:53,480 --> 00:24:54,860
So we can think about--

473
00:24:54,860 --> 00:24:56,510
we did this last time--

474
00:24:56,510 --> 00:24:59,270
so we can think about the
low frequencies go through

475
00:24:59,270 --> 00:25:00,410
without attenuation.

476
00:25:00,410 --> 00:25:04,190
The gain is 1, and
the phase is 0.

477
00:25:04,190 --> 00:25:07,820
So basically low frequencies
go through the filter

478
00:25:07,820 --> 00:25:10,310
without any change.

479
00:25:10,310 --> 00:25:11,684
High frequencies are attenuated.

480
00:25:11,684 --> 00:25:13,100
The higher the
frequency, the more

481
00:25:13,100 --> 00:25:20,030
the attenuation, and phase
shifted by lagging pi over 2.

482
00:25:20,030 --> 00:25:23,090
So that's a way of thinking
about the RC circuit

483
00:25:23,090 --> 00:25:24,980
as a low pass filter.

484
00:25:24,980 --> 00:25:26,510
And it gives us
insight in the kinds

485
00:25:26,510 --> 00:25:28,790
of signals that go through
and don't go through.

486
00:25:28,790 --> 00:25:33,710
So that, if we think about
a signal like a square wave

487
00:25:33,710 --> 00:25:37,380
having a Fourier
series decomposition,

488
00:25:37,380 --> 00:25:41,550
it only has odd components and
the odd components fall with k.

489
00:25:41,550 --> 00:25:44,690
The magnitude of the
component is inverse with k.

490
00:25:44,690 --> 00:25:49,640
So we get components that,
if I plot on a log scale,

491
00:25:49,640 --> 00:25:53,625
the reciprocal relationship of
the weight of the components,

492
00:25:53,625 --> 00:25:56,000
the magnitude of the components,
makes it a straight line

493
00:25:56,000 --> 00:25:58,756
with a slope of minus 1.

494
00:25:58,756 --> 00:26:02,450
And now we can think about
putting this signal into the RC

495
00:26:02,450 --> 00:26:07,040
filter and thinking about what
the output should look like.

496
00:26:07,040 --> 00:26:09,080
If the frequency
of the square wave,

497
00:26:09,080 --> 00:26:12,440
if the fundamental
frequency, 2 pi

498
00:26:12,440 --> 00:26:17,250
over the period, if
2 pi over capital T,

499
00:26:17,250 --> 00:26:21,650
if 2 pi over capital T is some
frequency that's low compared

500
00:26:21,650 --> 00:26:25,370
to the corner frequency
of the low pass filter,

501
00:26:25,370 --> 00:26:28,440
basically the output
of the filter, which

502
00:26:28,440 --> 00:26:32,390
is showed in green, overlaps the
input, which is showed in red.

503
00:26:32,390 --> 00:26:35,780
You can't tell the difference
because all the components

504
00:26:35,780 --> 00:26:39,750
have the same magnitude
and phase as the input.

505
00:26:39,750 --> 00:26:42,420
But if you change the
frequency of the square wave

506
00:26:42,420 --> 00:26:45,630
so that the
fundamental is higher,

507
00:26:45,630 --> 00:26:48,870
some of the higher
frequencies are attenuated

508
00:26:48,870 --> 00:26:51,700
and phase shifted.

509
00:26:51,700 --> 00:26:54,280
The shape of the
waveform, showed in green,

510
00:26:54,280 --> 00:26:56,267
starts to deviate.

511
00:26:56,267 --> 00:26:57,850
If you go to still
higher frequencies,

512
00:26:57,850 --> 00:26:59,080
the deviation's even greater.

513
00:26:59,080 --> 00:27:01,690
And if you go to high
enough frequencies,

514
00:27:01,690 --> 00:27:04,810
they're all in the region
where the magnitude is

515
00:27:04,810 --> 00:27:09,070
being attenuated by
whatever frequency.

516
00:27:09,070 --> 00:27:12,730
So my dependence of 1 over
k becomes 1 over k squared,

517
00:27:12,730 --> 00:27:17,560
and it goes from being a
square wave to a triangle wave.

518
00:27:17,560 --> 00:27:20,430
So that's a way of thinking
about the signal transformation

519
00:27:20,430 --> 00:27:21,450
in terms of a filter.

520
00:27:21,450 --> 00:27:22,590
We did that last time.

521
00:27:22,590 --> 00:27:25,880
What's going on with speech
is exactly the same thing.

522
00:27:25,880 --> 00:27:27,510
What we want to do
is think about--

523
00:27:27,510 --> 00:27:32,310
the glottis makes some kind of
a sound that goes into a filter.

524
00:27:32,310 --> 00:27:34,860
The filter is this
thing that is controlled

525
00:27:34,860 --> 00:27:36,830
by my tongue's
position and my jaw

526
00:27:36,830 --> 00:27:39,210
position and my lip position
and stuff like that.

527
00:27:39,210 --> 00:27:41,790
And what comes out is speech.

528
00:27:41,790 --> 00:27:45,000
To demonstrate
that, here's a film

529
00:27:45,000 --> 00:27:46,350
that was made by Ken Stevens.

530
00:27:46,350 --> 00:27:48,308
Ken Stevens was a professor
in this department.

531
00:27:48,308 --> 00:27:50,790
He just recently retired.

532
00:27:50,790 --> 00:27:53,370
This was done when he
was a graduate student.

533
00:27:53,370 --> 00:27:57,630
It's very hard to see because
the contrast is not great.

534
00:27:57,630 --> 00:27:59,310
But you have to take
into consideration

535
00:27:59,310 --> 00:28:01,050
this was made with X-rays.

536
00:28:01,050 --> 00:28:04,030
OK, we probably
wouldn't do this today.

537
00:28:04,030 --> 00:28:07,530
It was a relatively
large exposure

538
00:28:07,530 --> 00:28:10,620
to x-rays, which we sort
of frown on these days.

539
00:28:10,620 --> 00:28:14,340
Just so you're not too worried,
Ken Stevens, when he retired,

540
00:28:14,340 --> 00:28:17,140
had the longest teaching
career in our history.

541
00:28:17,140 --> 00:28:21,380
He was a lecturer who actively
lectured for 50 years.

542
00:28:21,380 --> 00:28:22,800
So he seemed to have done OK.

543
00:28:22,800 --> 00:28:23,910
He survived.

544
00:28:23,910 --> 00:28:26,280
So you don't need to worry
about what happened to him.

545
00:28:26,280 --> 00:28:29,340
But we probably
wouldn't repeat this.

546
00:28:29,340 --> 00:28:32,100
It's a little hard to see.

547
00:28:32,100 --> 00:28:35,400
The bone is easy, right,
because x-rays don't

548
00:28:35,400 --> 00:28:37,500
go through bones very well.

549
00:28:37,500 --> 00:28:40,560
What you can just
barely see is his lips.

550
00:28:40,560 --> 00:28:45,180
And it's important to
watch the lips, too.

551
00:28:45,180 --> 00:28:50,250
It's also important that
his chin is on a chin rest

552
00:28:50,250 --> 00:28:51,880
to simplify analysis.

553
00:28:51,880 --> 00:28:56,580
The idea of this was to get
quantitative measurements

554
00:28:56,580 --> 00:29:00,450
to fit the source filter idea.

555
00:29:00,450 --> 00:29:06,074
OK, so now I'm going to play
a film, a recording of him.

556
00:29:06,074 --> 00:29:10,058
[VIDEO PLAYBACK]

557
00:29:10,058 --> 00:29:11,552
- Test.

558
00:29:11,552 --> 00:29:13,046
Test.

559
00:29:13,046 --> 00:29:14,042
Test.

560
00:29:14,042 --> 00:29:17,030
[? The tongue. ?]
[? The tongue. ?]

561
00:29:17,030 --> 00:29:18,524
The [INAUDIBLE].

562
00:29:18,524 --> 00:29:22,508
The [INAUDIBLE]..
[? The neck. ?] [INAUDIBLE]..

563
00:29:22,508 --> 00:29:24,002
[INAUDIBLE]

564
00:29:24,002 --> 00:29:25,496
[INAUDIBLE]

565
00:29:25,496 --> 00:29:26,990
[INAUDIBLE]

566
00:29:26,990 --> 00:29:28,982
[INAUDIBLE]

567
00:29:28,982 --> 00:29:30,476
[INAUDIBLE]

568
00:29:30,476 --> 00:29:33,962
[? Fox. ?] [? Clock. ?]
The [INAUDIBLE]..

569
00:29:33,962 --> 00:29:35,456
The [INAUDIBLE].

570
00:29:35,456 --> 00:29:36,950
[INAUDIBLE]

571
00:29:36,950 --> 00:29:41,432
[? Took. ?] [? Two. ?]
[INAUDIBLE]..

572
00:29:41,432 --> 00:29:44,918
[? Tot. ?] [? Tech. ?]
[INAUDIBLE]..

573
00:29:44,918 --> 00:29:46,910
[INAUDIBLE]

574
00:29:46,910 --> 00:29:48,404
[INAUDIBLE]

575
00:29:48,404 --> 00:29:49,898
[INAUDIBLE]

576
00:29:49,898 --> 00:29:51,392
[INAUDIBLE]

577
00:29:51,392 --> 00:29:53,384
[INAUDIBLE]

578
00:29:53,384 --> 00:29:58,025
Why did [INAUDIBLE] set the
[INAUDIBLE] on top of his desk?

579
00:29:58,025 --> 00:29:59,858
I have put [? blood under ?]
[? two clean ?]

580
00:29:59,858 --> 00:30:03,507
[? yellow shoes. ?]

581
00:30:03,507 --> 00:30:04,340
[END VIDEO PLAYBACK]

582
00:30:04,340 --> 00:30:05,834
[LAUGHTER]

583
00:30:05,834 --> 00:30:11,050
PROFESSOR: OK, so what
you were supposed to see

584
00:30:11,050 --> 00:30:14,680
is that the thing that
we associate with speech

585
00:30:14,680 --> 00:30:16,390
is only a small part of it.

586
00:30:16,390 --> 00:30:17,890
His lips were obviously moving.

587
00:30:17,890 --> 00:30:20,350
That's what we see.

588
00:30:20,350 --> 00:30:21,880
But if you were
paying attention,

589
00:30:21,880 --> 00:30:27,389
his tongue was going up and down
not a little bit, but a lot.

590
00:30:27,389 --> 00:30:29,680
So the gap between his tongue
and the roof of his mouth

591
00:30:29,680 --> 00:30:31,385
was going from 0
to about that far.

592
00:30:34,810 --> 00:30:40,360
The velum back here was opening
very broadly on occasion.

593
00:30:40,360 --> 00:30:42,580
So there was a
significant variation

594
00:30:42,580 --> 00:30:48,860
in the shape of the structure
through which the glottis wave

595
00:30:48,860 --> 00:30:50,470
form was passing.

596
00:30:50,470 --> 00:30:55,090
And that's the basis of the
filtering that gives rise

597
00:30:55,090 --> 00:30:57,740
to the different speech sounds.

598
00:30:57,740 --> 00:31:01,030
So to convince you
of that, here I

599
00:31:01,030 --> 00:31:04,936
have a carefully machined item.

600
00:31:04,936 --> 00:31:06,084
Let's see.

601
00:31:06,084 --> 00:31:07,000
I don't want this one.

602
00:31:07,000 --> 00:31:08,000
I want this one.

603
00:31:08,000 --> 00:31:10,930
So this is a Japanese oo.

604
00:31:10,930 --> 00:31:13,570
OK, now I don't know Japanese,
so I have to just sort of trust

605
00:31:13,570 --> 00:31:15,153
the guy who made
this that it actually

606
00:31:15,153 --> 00:31:17,110
sounds like a Japanese oo.

607
00:31:17,110 --> 00:31:20,380
The second one I'll show is
a Japanese ee, which actually

608
00:31:20,380 --> 00:31:22,220
sounds more like an ee to me.

609
00:31:22,220 --> 00:31:26,260
But anyway, this model was made
from measurements of the type

610
00:31:26,260 --> 00:31:28,300
that I just showed with Ken.

611
00:31:28,300 --> 00:31:31,420
So the idea was to estimate
the size of those cavities

612
00:31:31,420 --> 00:31:33,520
through which the
air was passing,

613
00:31:33,520 --> 00:31:37,020
and then make, by
machining in Plexiglas,

614
00:31:37,020 --> 00:31:40,520
a structure that has that shape.

615
00:31:40,520 --> 00:31:43,670
So this was an early test
of whether the source filter

616
00:31:43,670 --> 00:31:44,750
idea works.

617
00:31:44,750 --> 00:31:48,500
So if that is the explanation
for how speech is generated,

618
00:31:48,500 --> 00:31:51,021
then I ought to be able
to take a boring sound

619
00:31:51,021 --> 00:31:52,895
of the type that's
generated by the glottis--

620
00:31:52,895 --> 00:31:54,650
[BUZZING SOUND]

621
00:31:57,545 --> 00:32:00,500
And put it through this, and it
should sound more like a vowel.

622
00:32:00,500 --> 00:32:01,730
OK?

623
00:32:01,730 --> 00:32:02,290
Got it?

624
00:32:02,290 --> 00:32:04,200
Know what I'm talking about?

625
00:32:04,200 --> 00:32:08,952
So this is a Japanese oo.

626
00:32:08,952 --> 00:32:11,850
[BUZZING SOUND]

627
00:32:11,850 --> 00:32:14,605
[COMBINES BUZZING SOUND WITH
 'OO' SOUND]

628
00:32:14,605 --> 00:32:16,230
I don't know if
anybody knows Japanese.

629
00:32:16,230 --> 00:32:17,856
I don't know if that
sounds like an oo.

630
00:32:17,856 --> 00:32:19,563
Does anybody know
Japanese, and does that

631
00:32:19,563 --> 00:32:20,580
sound like an oo or not?

632
00:32:23,410 --> 00:32:24,696
OK, I'll pass.

633
00:32:24,696 --> 00:32:28,590
[BUZZING SOUND]

634
00:32:28,590 --> 00:32:30,115
This is an ee.

635
00:32:30,115 --> 00:32:33,390
OK, now notice that the
ee looks very different.

636
00:32:33,390 --> 00:32:34,770
Right?

637
00:32:34,770 --> 00:32:37,470
The question is whether
that's a big enough difference

638
00:32:37,470 --> 00:32:44,130
to make the difference
between an oo and an ee.

639
00:32:44,130 --> 00:32:47,270
I'm pressing the same button,
nothing up my sleeve, nothing--

640
00:32:47,270 --> 00:32:50,048
OK, so same button.

641
00:32:50,048 --> 00:32:53,520
[COMBINES BUZZING SOUND WITH
 'EE' SOUND]

642
00:33:00,640 --> 00:33:02,570
OK, so what you're
supposed to be convinced of

643
00:33:02,570 --> 00:33:07,780
is there is enough
information in the shape

644
00:33:07,780 --> 00:33:12,440
of the vocal
structures to account

645
00:33:12,440 --> 00:33:14,755
for the difference
in the sounds.

646
00:33:17,380 --> 00:33:23,650
Now of course, we don't really
care about the acoustics

647
00:33:23,650 --> 00:33:28,600
if we're trying to, for example,
synthesize or analyze speech.

648
00:33:28,600 --> 00:33:30,670
We don't particularly
care about that.

649
00:33:30,670 --> 00:33:32,650
We do like to know that there
is a theory that underlies it,

650
00:33:32,650 --> 00:33:33,149
right?

651
00:33:33,149 --> 00:33:36,070
And there's a very
sound physical basis

652
00:33:36,070 --> 00:33:39,610
for why we should think
about the source filter idea.

653
00:33:39,610 --> 00:33:43,690
When I say source
filter, source filter--

654
00:33:43,690 --> 00:33:47,350
so everybody calls it the
source filter model of speech.

655
00:33:47,350 --> 00:33:48,850
So is there any
good physical reason

656
00:33:48,850 --> 00:33:50,980
for why that should be true?

657
00:33:50,980 --> 00:33:56,740
Of course, what we care about
is the frequency response.

658
00:33:56,740 --> 00:33:58,920
So here what's showed
is measurements

659
00:33:58,920 --> 00:34:01,860
of frequency responses
taken from speakers.

660
00:34:01,860 --> 00:34:03,780
So now we don't do
the x-ray thing.

661
00:34:03,780 --> 00:34:09,179
All we do is record somebody
saying heed, had, hood,

662
00:34:09,179 --> 00:34:12,380
haw'd, who'd.

663
00:34:12,380 --> 00:34:16,070
And we look at men,
women, and children,

664
00:34:16,070 --> 00:34:22,100
and we characterize how their
frequency responses change

665
00:34:22,100 --> 00:34:24,210
when they make those
different sounds.

666
00:34:24,210 --> 00:34:30,889
So what's showed here is that
you get a relatively good fit

667
00:34:30,889 --> 00:34:36,409
by thinking about the frequency
response having three formants.

668
00:34:36,409 --> 00:34:39,770
The formants are the
peak frequencies.

669
00:34:39,770 --> 00:34:42,886
There's a theory,
which I won't go into,

670
00:34:42,886 --> 00:34:44,510
for how you take this
shape and turn it

671
00:34:44,510 --> 00:34:47,540
into a formant frequency.

672
00:34:47,540 --> 00:34:51,949
And given just the formant
frequencies, or given

673
00:34:51,949 --> 00:34:54,650
the frequency response
measured at uniform spacing

674
00:34:54,650 --> 00:34:57,260
across frequencies,
there is a theory

675
00:34:57,260 --> 00:35:02,470
for how you can generate the
smooth line, which is really--

676
00:35:02,470 --> 00:35:05,050
this is an 11th order
[? fit, ?] which

677
00:35:05,050 --> 00:35:09,550
means that there are
11 poles and no zeros.

678
00:35:09,550 --> 00:35:12,310
So what you do to
get this shape then

679
00:35:12,310 --> 00:35:14,970
is take the locations
and amplitudes

680
00:35:14,970 --> 00:35:20,020
of the formant frequencies
and do a fit using poles.

681
00:35:20,020 --> 00:35:24,270
And so here's a table showing
measured formant frequencies,

682
00:35:24,270 --> 00:35:29,440
F1, F2, and F3, for
whatever, six different

683
00:35:29,440 --> 00:35:34,720
sounds for three different
categories of speakers.

684
00:35:34,720 --> 00:35:35,980
OK?

685
00:35:35,980 --> 00:35:39,070
And that's kind of a complete
analysis then in terms

686
00:35:39,070 --> 00:35:41,560
of the source filter idea.

687
00:35:41,560 --> 00:35:45,610
So this figure
summarizes the idea.

688
00:35:45,610 --> 00:35:47,280
We think about source filters.

689
00:35:47,280 --> 00:35:49,450
So the source is the glottis.

690
00:35:49,450 --> 00:35:55,290
The filter is the formants
created by the throat.

691
00:35:55,290 --> 00:35:58,900
And speech is the thing that
comes out of the source filter.

692
00:35:58,900 --> 00:36:02,140
The source is some
periodic waveform

693
00:36:02,140 --> 00:36:07,330
caused by the banging
together of the vocal folds.

694
00:36:07,330 --> 00:36:15,430
The filter is the frequency
response of the throat.

695
00:36:15,430 --> 00:36:18,310
And the result, then, is just
passing this glottis wave

696
00:36:18,310 --> 00:36:21,160
form-- so this is a measured--

697
00:36:21,160 --> 00:36:23,660
by sticking a microphone
in somebody's throat,

698
00:36:23,660 --> 00:36:26,740
this is a measurement of what
the glottis acoustics looks

699
00:36:26,740 --> 00:36:28,300
like.

700
00:36:28,300 --> 00:36:33,910
This is a Fourier decomposition
of that periodic waveform.

701
00:36:33,910 --> 00:36:37,360
Then this is the frequency
response of that thing.

702
00:36:37,360 --> 00:36:41,560
And this is the
Fourier coefficients

703
00:36:41,560 --> 00:36:44,400
of the output for
different sounds.

704
00:36:44,400 --> 00:36:45,510
So here is the frequency.

705
00:36:45,510 --> 00:36:50,230
So the same glottis signal
underlies and ee sound and ah

706
00:36:50,230 --> 00:36:54,670
sound and generates
two different spectra.

707
00:36:54,670 --> 00:36:57,100
We call that combination
of magnitudes and angles.

708
00:36:57,100 --> 00:36:59,505
We call that the
Fourier spectrum.

709
00:36:59,505 --> 00:37:00,880
So you get two
different spectra,

710
00:37:00,880 --> 00:37:04,550
depending on the filter shape.

711
00:37:04,550 --> 00:37:07,140
OK?

712
00:37:07,140 --> 00:37:09,330
And that's the basis--

713
00:37:09,330 --> 00:37:11,490
this theory, this
source filter idea--

714
00:37:11,490 --> 00:37:13,320
is the basis of the
current technology

715
00:37:13,320 --> 00:37:16,690
for speech recognition
and speech production.

716
00:37:16,690 --> 00:37:18,300
So I actually cheated.

717
00:37:18,300 --> 00:37:19,860
Those sounds that
I played earlier--

718
00:37:19,860 --> 00:37:22,410
bit, bat, bought, beat,
all those things--

719
00:37:22,410 --> 00:37:25,090
those were actually
synthetic speech.

720
00:37:25,090 --> 00:37:27,510
OK, all I did is I ran
a speech synthesizer,

721
00:37:27,510 --> 00:37:30,180
and I said, synthesize bit.

722
00:37:30,180 --> 00:37:32,850
So that was really
a synthesized thing.

723
00:37:32,850 --> 00:37:34,920
That was not a real person.

724
00:37:34,920 --> 00:37:41,250
And so the synthesizer
used this theory

725
00:37:41,250 --> 00:37:43,710
in order to generate
this synthetic speech.

726
00:37:43,710 --> 00:37:47,650
We also use this theory in
order to recognize speech.

727
00:37:47,650 --> 00:37:49,650
And you'll do a homework
problem in Homework 10,

728
00:37:49,650 --> 00:37:53,700
I think it is, in which you'll
build the primitive front

729
00:37:53,700 --> 00:37:59,360
end of a speech recognizer
using this theory.

730
00:37:59,360 --> 00:38:01,940
And I'll give you a couple of
utterances of different vowels

731
00:38:01,940 --> 00:38:05,180
and you'll have to classify
which vowel is being said,

732
00:38:05,180 --> 00:38:08,000
according to some
automatic speech recognizer

733
00:38:08,000 --> 00:38:10,310
based on this theory.

734
00:38:10,310 --> 00:38:13,430
The theory is also just
fun, because a theory

735
00:38:13,430 --> 00:38:16,132
lets us figure out anomalies.

736
00:38:16,132 --> 00:38:17,840
So when somebody has
a speech impediment,

737
00:38:17,840 --> 00:38:20,131
for example, when I did, when
I was a little kid I did.

738
00:38:20,131 --> 00:38:21,560
And I was sent to speech school.

739
00:38:21,560 --> 00:38:23,310
Now they do a much
better job because they

740
00:38:23,310 --> 00:38:27,320
do analysis to figure
out what I'm doing wrong,

741
00:38:27,320 --> 00:38:29,120
using this sort of
source filter idea.

742
00:38:29,120 --> 00:38:31,250
We can also use the
source filter idea

743
00:38:31,250 --> 00:38:35,390
to understand paradoxes.

744
00:38:35,390 --> 00:38:38,990
So for example, I've told you
before I work on hearing aids.

745
00:38:38,990 --> 00:38:40,820
I tried to make
hearing aids hear.

746
00:38:40,820 --> 00:38:45,410
And so people with hearing
deficiencies like mine,

747
00:38:45,410 --> 00:38:47,480
where I have sort of
progressive age-related

748
00:38:47,480 --> 00:38:50,120
because I'm old, right,
that's what happens.

749
00:38:50,120 --> 00:38:52,940
I have age-related
hearing loss, which

750
00:38:52,940 --> 00:38:55,220
means that I'm losing
high frequencies.

751
00:38:55,220 --> 00:38:56,900
I'm less sensitive
to high frequencies.

752
00:38:56,900 --> 00:39:00,770
People like me, which is
the vast majority of people

753
00:39:00,770 --> 00:39:03,620
my age--

754
00:39:03,620 --> 00:39:07,580
it's easier to understand male
speech than female speech.

755
00:39:07,580 --> 00:39:08,080
Why?

756
00:39:10,990 --> 00:39:12,080
Higher frequencies.

757
00:39:12,080 --> 00:39:16,480
So those higher
frequencies shift

758
00:39:16,480 --> 00:39:18,066
some of the important
stuff that I

759
00:39:18,066 --> 00:39:19,690
should be listening
to into frequencies

760
00:39:19,690 --> 00:39:21,640
I don't hear anymore.

761
00:39:21,640 --> 00:39:22,630
Right?

762
00:39:22,630 --> 00:39:24,640
So that's a way of
using this theory

763
00:39:24,640 --> 00:39:27,640
to try to understand
what's wrong with me.

764
00:39:27,640 --> 00:39:30,310
But there's also things--
it's not just me.

765
00:39:30,310 --> 00:39:33,700
Normal people have trouble
distinguishing female speech,

766
00:39:33,700 --> 00:39:36,621
especially in taxing
environments, and one of those

767
00:39:36,621 --> 00:39:37,120
is singing.

768
00:39:40,170 --> 00:39:43,030
So if you consider
altos and sopranos,

769
00:39:43,030 --> 00:39:46,350
sopranos are like,
the worst, right?

770
00:39:46,350 --> 00:39:49,230
Because they are not only
female, but they're at the high

771
00:39:49,230 --> 00:39:51,490
end the females.

772
00:39:51,490 --> 00:39:54,180
And there are those
who complain about not

773
00:39:54,180 --> 00:39:57,360
being able to understand
female singers.

774
00:39:57,360 --> 00:39:59,640
OK, so here's a
demo that will help

775
00:39:59,640 --> 00:40:04,120
us to understand whether that's
a valid kind of a criticism

776
00:40:04,120 --> 00:40:04,620
or not.

777
00:40:04,620 --> 00:40:10,110
So what I've got is
a professional singer

778
00:40:10,110 --> 00:40:13,080
singing, la, la, la, la--

779
00:40:13,080 --> 00:40:14,210
on a scale.

780
00:40:14,210 --> 00:40:16,430
So from low frequency
to high frequency,

781
00:40:16,430 --> 00:40:18,180
then a different sound,
a different sound,

782
00:40:18,180 --> 00:40:20,877
a different sound,
a different sound.

783
00:40:20,877 --> 00:40:22,460
So the first thing
that I want to do--

784
00:40:22,460 --> 00:40:24,376
I want you to listen to
those different sounds

785
00:40:24,376 --> 00:40:27,439
as she goes across the scale.

786
00:40:27,439 --> 00:40:29,480
Then I'm going to play
just the low ones and just

787
00:40:29,480 --> 00:40:31,990
the high ones, just the
low frequency ones and just

788
00:40:31,990 --> 00:40:32,990
the high frequency ones.

789
00:40:32,990 --> 00:40:36,080
So first, the different scales--

790
00:40:36,080 --> 00:40:39,834
la, la, lore, loo, lee, OK?

791
00:40:39,834 --> 00:40:40,500
[AUDIO PLAYBACK]

792
00:40:40,500 --> 00:40:47,500
- La, la, la, la, la, la, la,
la, la, la, la, la, la, la, la,

793
00:40:47,500 --> 00:40:48,500
la.

794
00:40:48,500 --> 00:40:50,500
La.

795
00:40:50,500 --> 00:40:55,000
Lore, lore, lore, lore, lore,
lore, lore, lore, lore, lore,

796
00:40:55,000 --> 00:40:58,500
lore, lore, lore,
lore, lore, lore, lore.

797
00:40:58,500 --> 00:41:00,000
Lore.

798
00:41:00,000 --> 00:41:05,750
Loo, loo, loo, loo, loo, loo,
loo, loo, loo, loo, loo, loo,

799
00:41:05,750 --> 00:41:07,500
loo, loo, loo, loo.

800
00:41:07,500 --> 00:41:09,500
Loo.

801
00:41:09,500 --> 00:41:15,250
Ler, ler, ler, ler, ler, ler,
ler, ler, ler, ler, ler, ler,

802
00:41:15,250 --> 00:41:17,000
ler, ler, ler, ler.

803
00:41:17,000 --> 00:41:19,000
Ler.

804
00:41:19,000 --> 00:41:24,750
Lee, lee, lee, lee, lee, lee,
lee, lee, lee, lee, lee, lee,

805
00:41:24,750 --> 00:41:26,500
lee, lee, lee, lee.

806
00:41:26,500 --> 00:41:28,500
Lee.

807
00:41:28,500 --> 00:41:30,000
[END PLAYBACK]

808
00:41:30,000 --> 00:41:32,770
PROFESSOR: OK, so
now what I've done

809
00:41:32,770 --> 00:41:37,420
is I've sliced out the lowest
frequency, the very first

810
00:41:37,420 --> 00:41:40,810
of the scale from
each of the sounds

811
00:41:40,810 --> 00:41:45,490
and pasted them together to
get the low frequency run.

812
00:41:45,490 --> 00:41:51,470
And then I took out the high
ones and pasted them together.

813
00:41:51,470 --> 00:41:52,110
OK?

814
00:41:52,110 --> 00:41:56,990
Exactly the same sounds, just
played in a different order.

815
00:41:56,990 --> 00:41:59,540
So first the low frequency ones.

816
00:41:59,540 --> 00:42:01,022
SINGER: La.

817
00:42:01,022 --> 00:42:02,998
Lore.

818
00:42:02,998 --> 00:42:04,974
Loo.

819
00:42:04,974 --> 00:42:06,950
Ler.

820
00:42:06,950 --> 00:42:09,914
Lee.

821
00:42:09,914 --> 00:42:13,026
And the high frequency ones.

822
00:42:13,026 --> 00:42:15,022
La.

823
00:42:15,022 --> 00:42:16,519
Lore.

824
00:42:16,519 --> 00:42:18,515
Loo.

825
00:42:18,515 --> 00:42:20,511
Ler.

826
00:42:20,511 --> 00:42:21,509
Lee.

827
00:42:21,509 --> 00:42:26,000
[LAUGHTER]

828
00:42:26,000 --> 00:42:32,570
PROFESSOR: It's not her fault.
She's doing everything right.

829
00:42:32,570 --> 00:42:34,030
And you can see that.

830
00:42:34,030 --> 00:42:40,010
Here is, again, a Python program
analyzing those same segments.

831
00:42:40,010 --> 00:42:46,040
So what's shown here is the ee,
the filter derived from the ee,

832
00:42:46,040 --> 00:42:50,360
by thinking about the lee,
lee, lee, lee, lee, lee--

833
00:42:50,360 --> 00:42:53,990
by looking at that
sequence, and averaging

834
00:42:53,990 --> 00:42:57,620
across the frequencies.

835
00:42:57,620 --> 00:43:00,890
So here's the filter.

836
00:43:00,890 --> 00:43:08,180
Here's the filtered glottis
spectrum for a low frequency

837
00:43:08,180 --> 00:43:10,880
and intermediate frequency
and high frequency.

838
00:43:10,880 --> 00:43:14,340
What's the difference between
the low, middle, and high?

839
00:43:17,330 --> 00:43:19,870
What's characteristically
different at low and high?

840
00:43:26,110 --> 00:43:31,880
AUDIENCE: [INAUDIBLE]
frequency, like high amplitude.

841
00:43:31,880 --> 00:43:37,230
PROFESSOR: So if you look at the
low frequency, the low pitch,

842
00:43:37,230 --> 00:43:42,560
there are more frequency
components in a given range.

843
00:43:42,560 --> 00:43:45,400
So if I say analyze the
frequencies between 0

844
00:43:45,400 --> 00:43:50,110
and 1,000 hertz, 1,000
cycles per second,

845
00:43:50,110 --> 00:43:55,210
there are more lines when
you have a low frequency.

846
00:43:55,210 --> 00:43:57,760
And so you get the
density of the lines

847
00:43:57,760 --> 00:44:00,340
is greater for the low
frequency utterance

848
00:44:00,340 --> 00:44:04,200
than it is for the high
frequency utterance.

849
00:44:04,200 --> 00:44:09,450
The low frequency utterance
is spaced close enough

850
00:44:09,450 --> 00:44:17,240
that you can clearly figure out
this pattern from that spacing,

851
00:44:17,240 --> 00:44:20,660
because there are
multiple lines per peak.

852
00:44:20,660 --> 00:44:23,430
The problem is that
the speech waveforms

853
00:44:23,430 --> 00:44:26,870
have very sharp resonances.

854
00:44:26,870 --> 00:44:29,510
The peaks are narrow.

855
00:44:29,510 --> 00:44:32,720
So that as you go to
a higher frequency,

856
00:44:32,720 --> 00:44:36,380
now it's very hard to see.

857
00:44:36,380 --> 00:44:39,140
So where there was two lines
characterizing this guy,

858
00:44:39,140 --> 00:44:41,860
now there's one.

859
00:44:41,860 --> 00:44:44,460
And at the highest frequency,
there's nothing there.

860
00:44:47,680 --> 00:44:51,340
Similarly with these
peaks, again, several lines

861
00:44:51,340 --> 00:44:52,450
representing each peak.

862
00:44:52,450 --> 00:44:56,950
One line representing-- nothing
representing this peak, nothing

863
00:44:56,950 --> 00:44:58,630
representing that peak.

864
00:44:58,630 --> 00:45:05,270
There is nothing about
ee in that signal.

865
00:45:05,270 --> 00:45:08,520
And if you do the
same analysis for ah,

866
00:45:08,520 --> 00:45:12,030
you get the same result.
There's nothing about ee,

867
00:45:12,030 --> 00:45:13,740
and there's nothing about ah.

868
00:45:13,740 --> 00:45:15,940
There's just nothing there.

869
00:45:15,940 --> 00:45:19,470
There's no way anybody is going
to tell those two sounds apart.

870
00:45:19,470 --> 00:45:24,840
So if the singer put her
voice, her vocal tract,

871
00:45:24,840 --> 00:45:29,750
in precisely the
right location, there

872
00:45:29,750 --> 00:45:32,330
would be no difference
between those sounds,

873
00:45:32,330 --> 00:45:35,330
OK, regardless of what
the director said.

874
00:45:35,330 --> 00:45:37,620
OK, so that's the problem.

875
00:45:37,620 --> 00:45:39,950
So that's a way of
using the Fourier

876
00:45:39,950 --> 00:45:44,570
analysis to gain some insight
into some anomalous situations.

877
00:45:44,570 --> 00:45:45,319
Yeah?

878
00:45:45,319 --> 00:45:46,777
AUDIENCE: Does this
have more to do

879
00:45:46,777 --> 00:45:51,070
with the rate at
which you're sampling?

880
00:45:51,070 --> 00:45:52,600
PROFESSOR: No.

881
00:45:52,600 --> 00:45:56,620
It has to do only with
the frequency content

882
00:45:56,620 --> 00:45:59,800
of the glottis waveform.

883
00:45:59,800 --> 00:46:01,450
You can think about
it as sampling.

884
00:46:01,450 --> 00:46:09,460
And that's a good insight,
because the Fourier series only

885
00:46:09,460 --> 00:46:13,990
has components at integer
multiples of a base frequency.

886
00:46:13,990 --> 00:46:18,680
So that means we're sampling
in frequency, not in time.

887
00:46:18,680 --> 00:46:23,860
So we have this potentially
continuous frequency response,

888
00:46:23,860 --> 00:46:26,230
which is characterizing this.

889
00:46:26,230 --> 00:46:27,250
That is continuous.

890
00:46:27,250 --> 00:46:29,680
I could excite this at any
frequency that I want to.

891
00:46:29,680 --> 00:46:32,920
But the glottis
waveform of the singer

892
00:46:32,920 --> 00:46:37,150
is only sampling that at
particular frequencies--

893
00:46:37,150 --> 00:46:45,010
C, C prime, C double prime, B,
B prime, B double prime, right?

894
00:46:45,010 --> 00:46:47,410
So there's only certain
frequencies at which

895
00:46:47,410 --> 00:46:51,625
the singer is sampling this.

896
00:46:51,625 --> 00:46:53,750
So there is a way of thinking
about it as sampling.

897
00:46:53,750 --> 00:46:55,904
But it's not sampling
due to my ADD converter

898
00:46:55,904 --> 00:46:56,820
or anything like that.

899
00:46:56,820 --> 00:46:59,450
It's sampling in frequency.

900
00:46:59,450 --> 00:47:03,790
So the point is that this
kind of a source filter idea,

901
00:47:03,790 --> 00:47:05,830
and more generally,
the filter idea,

902
00:47:05,830 --> 00:47:08,770
is such a powerful
representation

903
00:47:08,770 --> 00:47:10,270
that next time we'll
think about how

904
00:47:10,270 --> 00:47:15,020
to do the same sort of thing
for non-periodic [? stimuli. ?]

905
00:47:15,020 --> 00:47:16,860
See you then.