Flash and JavaScript are required for this feature.
Download the video from iTunes U or the Internet Archive.
Topics covered: Approximating delta y by f(x) delta x; discussion of that difference between delta y and f'(x) delta x; introduction to the chain rule.
Instructor/speaker: Prof. Herbert Gross
Lecture 2: Approximations a...
Related Resources
This section contains documents that are inaccessible to screen reader software. A "#" symbol is used to denote such documents.
Part II Study Guide (PDF - 29MB)#
Supplementary Notes (PDF - 46MB)#
Blackboard Photos (PDF - 8MB)#
The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To make a donation, or to view additional materials from hundreds of MIT courses, visit MIT OpenCourseWare at ocw.mit.edu.
PROFESSOR: Hi. Welcome once again to our Calculus Revisited lecture, where today we shall discuss the concept of infinitesimals, a rather elusive but very important concept. And because most textbooks illustrate this topic in terms of approximations, our topic today will be called approximations and infinitesimals.
Now, how shall we introduce our subject in terms of topics that you may be more familiar with? Perhaps the easiest way is to go back to an elementary algebra course, to distance, rate and time problems, when one talked about distance equaling rate times time. The question, of course, is what rate do you use if the rate is not constant? You see, the question of distance equals rate times time presupposes that you are dealing with a constant rate. Now what does this mean, and how is it directly connected with the development of our calculus course? This shall be the subject of our investigation today.
See, the idea is this. Let's consider the curve 'y' equals 'f of x', and let's suppose that the curve is smooth. That is, that it possesses a derivative, say, at the point 'x' equals 'x1'. Let's draw the tangent line to the curve at 'x' equals 'x1'.
Now the idea is this. In general what we investigate in a calculus course is the concept known as 'delta y'. 'Delta y' geometrically is what? It's how much 'y' has changed along the curve with respect to 'x'.
It turns out that there is a simpler thing that we could have computed. Notice, if we look at this particular diagram, that since the tangent line never changes its slope-- and by the way, when I say tangent line I mean at the point 'x1'-- that once we leave the point 'x1' the tangent line, in a way, no longer resembles the curve.
But the point that is important is that I can compute the change in 'y' to the tangent line here very easily. And the reason for this, you see, is rather apparent. Namely, the slope of the tangent line is, on the one hand, 'delta y-tan' divided by 'delta x'. 'Delta y-tan' divided by 'delta x'. On the other hand, by definition of slope, the slope of the line 'L' is also equal to what? It's 'dy dx'. It's 'dy dx' evaluated at the point, or the value, 'x' equals 'x1'. You see, in general the slope of the curve varies from point to point, so when we talk about the tangent line we must emphasize at what point on the curve we've drawn the tangent line.
At any rate, from this particular diagram it is not difficult to see that to compute the change in 'y' to the tangent line, that this is nothing more than what? 'dy dx' evaluated at 'x' equals 'x1' times 'delta x'.
And you see this is not an approximation. This is precisely the value of 'delta y-tan'. The approximation seems to be when we say let 'delta y-tan' represent 'delta y'. In other words, we get the intuitive feeling that as 'delta x' gets small, the difference between the true 'delta y' and 'delta y-tan' also gets small. Another way of saying this is what? Our intuitive feeling is that, in a neighborhood of the point of tangency, the tangent line serves as a good approximation to the curve itself.
Now let's see what this means in terms of a specific example. I've taken the liberty of computing the cube of 4.01 in advance. It turns out to be 64.481201. And it's sort of arbitrary, like cubing this. If this if this doesn't look messy enough for you we could have taken this to the sixth power and then we could have squared this. But that part is irrelevant. A simple check shows that, more or less, this will be a correct statement. And what I would like to do, you see, is simply illustrate what our earlier comments mean in terms of this specific example.
Let's suppose I want to find an approximation for 4.01 fairly rapidly. The idea is this. What I do know is one number that's very easy to cube, which is near 4.01, is 4 itself. In other words, I know that 4 cubed is 64. And by the way, I've deliberately drawn this slightly distorted according to scale so that we can see what's happening over here. What 64.481201 represents is the actual change in height from here to here along the curve. In other words, it would be the length of the segment joining the point 'P' to the point 'Q' here.
What I claim is that if I instead tried to find the length of 'PR', the change in 'y' not along the curve but along the line tangent to the curve at the point 4.64, this is what I can find fairly rapidly. In other words, what I am going to do is to work this same idea here with a special case. You see, I'm going to take 'x1' to equal 4 and 'delta x' to be 0.01. Now you see the curve is 'y' equals 'x cubed'. From this I can compute 'dy dx' rather quickly. Now, I don't want 'dy dx' at any old point, I want to compute it when 'x' is 4 so I can find the slope of the line 'L'. And when 'x' is 4 this, of course, simply is what? 4 squared is 16, times 3 is 48.
So what do I have? I have that the slope is 48. I also have that 'delta x' is 0.01. So according to my recipe, 'delta y-tan' is what? It's 'dy dx' evaluated at 'x' equals 4, which is 48, times 'delta x', which is 0.01. And that's 0.48. See again, let's just juxtaposition these two. All I have done now is computed this recipe in the particular example of trying to find the cube of 4.01. And you see, now notice that this point 0.48 is exactly the length of the accented line here. It's the length of 'PR'. And what we do know is that the height from the x-axis to 'R' is now exactly what? Well, it's the 64 plus the 0.48. This then is our approximation. And notice that this compares with what? The precise answer, which is 64.481201.
In other words, notice what a small error we happen to have in this particular case. And this is the way the subject is usually brought up. It is not a very important thing from my point of view. In other words, I think it's rather easy to see that, first of all, this approximation is rather nebulous in the sense that it requires a knowledge of how fast the tangent line is separating from the curve. And this is a rather difficult topic in its own right. And secondly, this was a rather simple example. And we had the luxury here, you see, of being able to find the exact answer so we could compare our approximation with the exact answer. In many cases it is difficult or impossible to find the exact answer.
To emphasize this more abstractly and more generally, let's consider the following. Instead of trying to find the cube of 4.01, the generalization here is what? That we could take the curve 'y' equals 'x cubed'. The derivative of 'y' with respect to 'x' would then be '3 x squared'. Evaluated at an arbitrary point 'x' equals 'x1', we would get '3 x sub 1 squared'. In which case 'delta y-tan' would be '3 x1 squared' times 'delta x'.
Could we have computed the exact value of 'delta y' had we wished? And the answer, of course, is yes. Namely, what is the exact value of 'delta y'? Well, we want to compute this between 'x' equals 'x1' plus 'delta x'. Well, what is the value of 'y' when 'x' is 'x1 plus delta x'? It's ''x1 plus delta x' cubed'. Then we subtract off 'x1 cubed'. And if we expand this, watch what happens. By the binomial theorem we get an 'x1 cubed' term here, which cancels with the 'x1 cubed' term over here. Then we get what? A '3 x1 squared delta x', so using the binomial theorem here. And then what else do we get? We get plus '3x1 delta x squared' plus 'delta x cubed'.
And you see, what I'd like to have us view over here is if we look at just this much of the answer, this part here is precisely what we sought to be 'delta y-tan' before. See, this is 'delta y-tan'. And what's left over is the difference, of course, between 'delta y-tan' and delta y. After all, delta y is just 'delta y-tan' plus this portion here. And and now if you look at this particular portion over here, observe that, as we expected, what the size of this thing is depends both on where we're measuring 'x1', and also on how big the size of the interval is, namely 'delta x'.
And notice something that we're going to come back to here. Notice that the 'delta x' factor here appears at least to the second power. In other words, notice over here that as 'delta x' get small, both of these terms get small very rapidly. Because, you see, when 'delta x' is close to 0 the square of a number close to 0 is even closer to 0.
But you see the important thing for now is to notice that this is the error in approximating 'delta y' by 'delta y-tan'. And that this error depends both on 'xy' and 'delta x'. And the question is, can we state that in a little bit more of a mathematical language? The answer, of course, is that we can. In particular, what we're saying is let's take the difference between 'dy dx' at 'x' equals 'x1', namely the slope of the tangent line, and what the average change of 'y' with respect to 'x' is along the curve.
Now, this is very typical in mathematics. And it's a very nice trick to learn, and a very quick trick to learn. We know that these two things are not equal and so all we do is tack on a correction factor. We tack on 'k'. What is 'k'? 'k' is the difference between these two expressions. In other words, by definition I add on the difference of these two numbers and that makes this an equality. Notice, by the way, that 'k' is a variable. It depends on how big 'delta x' is. You see, notice that once 'x1' is chosen-- and this is an important observation. As messy as this thing looks, it's a fixed number for a particular value of 'x1'. Namely, we compute the derivative and evaluate it when 'x' equals 'x1'. This is a number. This, of course, clearly depends on the size of 'delta x'. 'Delta y' depends on 'delta x'.
At any rate, here's what we do. We recognize that 'dy dx' is the limit of 'delta y' over 'delta x' as 'delta x' approaches 0. So with this as a hint, we take the limit of both sides here as 'delta x' approaches 0.
Notice the structure again. In our last series of lectures we talked about limit theorems. We talked about the limit of a sum being the sum of the limits, and things of this type. And now, you see, we're going to use these results. What we do is we take the limit of both sides here as 'delta x' approaches 0. The point being what? That the limit of a sum is the sum of the limits. So when I take the limit of this entire side I can express that as the sum of the limits. It is the limit of this term as 'delta x' approaches 0 plus the limit of this term as 'delta x' approaches 0.
Now let's stop to think what these things mean. The limit of 'delta x' approaching 0 of 'delta y' divided by 'delta x' is precisely the definition of derivative. In other words, this term here is just the 'dy dx'. And keep in mind we've evaluated this at 'x' equals 'x1'. That's the point at which we're starting. Now the fact that this is a constant, and we've already learned that the limit of a constant as 'delta x' approaches 0 is that constant, this term here is also 'dy dx' evaluated at 'x' equals 'x1'. And of course this term here is just the limit of 'k' as 'delta x' approaches 0. And now you see if we look at this, since all we have to do is observe that this term appears on both sides of the equation, canceling, or subtracting equals from equals, we wind up with the result that the limit of 'k' as 'delta x' approaches 0 is 0.
And we're going to talk more about that pictorially and analytically in a little while. But from this result, all I'm going to do now is go back to our first statement here. Remembering that 'delta x' is a nonzero number, I will simply multiply both sides of this equation by 'delta x'. And if I do that, what do I wind up with? I wind up with that 'delta y' is 'dy dx' evaluated at 'x' equals 'x1' times 'delta x' plus 'k delta x'.
Now, if I keep in mind here that if I take the slope at a particular point and multiply that by the change in 'x', that that's precisely what we define to be 'delta y-tan'. If I put this all together I get what? That 'delta y' is equal to ''delta y-tan' plus 'k delta x'', where the limit of 'k' as 'delta x' approaches 0 is 0.
Now you see, when we started this program and started to talk at the beginning of our lecture about approximations, we did something that was quite crude. The crude part was simply this. What we had said was, why don't we compute 'delta y' by computing 'delta y-tan' instead. In other words, this is a very easy thing to compute. It's just a derivative multiplied by a change in 'x'. This is a very simple thing to compute. And we'll use that as an approximation for the thing that we really want, which is 'delta y'.
The question that we deliberately overlooked at this point was how big was the error when you do this? And what we discovered was something very interesting and, by the way, may be something which teaches us why the analysis is better than the picture even though the picture is easier to visualize.
You see, we knew intuitively that as you picked a smaller neighborhood at the point of tangency. the tangent line became a better and better approximation to the curve itself. This boxed-in result tells us much more than that. What this tells us is, look, if this term by itself became small as 'delta x' approached 0, this would still say the same thing. This says much more than that. You see, what this says is-- and watch, this is very, very profound-- as 'delta x' approaches 0, 'k' is also approaching 0. In other words, the term 'k' times 'delta x' seems to be approaching 0 much more rapidly than 'delta x' itself.
That, by the way, in as simple a way as I know how, defines what an infinitesimal is. See, 'k' times 'delta x' is called an infinitesimal because it goes to 0 faster than 'delta x' itself. In other words, anything that approaches 0 faster than 'delta x' itself approaches 0 is called an infinitesimal with respect to 'delta x'.
Now I'm not going to go into a long philosophical discussion about that. Rather, I'm going to let the actions speak louder than the words, and try to show why this is such a crucial idea. In other words, again let me emphasize that. What's crucial here is not so much that the error goes to 0, it's that the error goes to 0 much faster than 'delta x' goes to 0. See, what is the error? The error is 'k' times 'delta x'. That's the difference between these two things.
Again, if you want to see what this thing means pictorially, and I wish I knew a better way of doing this but I don't, notice that 'k' times 'delta x' was defined to be the difference between 'delta y' and 'delta y-tan'. In other words, since this length here is 'delta y-tan', and since this entire length would be called 'delta y', again the accented line, this length here, is 'k' times 'delta x'. And what does it mean to say that 'k' approaches 0 as 'delta x' approaches 0? What it means is this. It means that not only does this vertical difference get small as 'delta x' goes to 0, but more importantly, it means that this vertical distance gets small very rapidly compared to how 'delta x' get small.
Now how can I prove that? Well I'm not even going to try to prove that. What I'm going to just try to do is to have you see from the picture what's happening here. See, here is a fairly large 'delta x'. And the difference between 'delta y' and 'delta y-tan' for that large 'x' is this length over here. Now suppose we take a smaller 'delta x'. In other words, let's take 'delta x' to be this length over here. Notice that the length of 'delta x' here is still quite significant. I think if you look at this you can see it's a fairly significant length. On the other hand, notice what the difference between 'delta y' and 'delta y-tan' is now. It's just this little tiny thing over here. In other words, notice that as 'delta x' gets small, ratio-wise as 'delta x' get small, the vertical difference between the tangent line and the curve is getting small at a much faster rate.
Now of course the question is why is that so important? And the answer will always come back to our old friend of 0/0. In general, if you tell a person that the numerator of a fraction is small, he jumps to the conclusion that the size of the fraction must be small. But you see, the trouble is when the denominator is also small then we cannot conclude that the fraction itself is small. And therefore in trying to cancel out a term as being insignificant, when we're dealing with something like this, it is no longer enough to say but the numerator is going to 0.
Let me jump ahead to the topic that will be covered in our next lecture just to use this as an insight as to what we have to see and what's going on over here. Let me just show you what I mean by this. let's go back to our recipe, which says 'delta y' is this thing over here where what? This is important to put in here, the limit of 'k' as 'delta x' approaches 0 is 0. Now all I'm going to do something like this. Let's suppose we're dealing in a situation where 'y' and 'x' happen to be functions, say of, 't' as well, some third variable 't'. And what we really want to find is 'dy dt'. You see, what we'd be tempted to do is simply say what? Let's divide through by 'delta t'. We do this, we divide through by 'delta t'. Then we take the limit of this thing as 'delta t' approaches 0.
Now what happens? As 'delta t' approaches 0, this is a sum. The limit of a sum is the sum of the limits. Each of the terms in the sum is a product. And the limit of a product is the product of the limits. So working out the regular limit theorems, this is what? The limit, 'delta t' approaches 0, 'dy dx' evaluated at 'x' equals 'x1', times the limit of 'delta x' divided by 'delta t' as 'delta t' approaches 0, plus the limit of 'k' as 'delta t' approaches 0, times the limit of 'delta x' divided by 'delta t' as 'delta t' approaches 0.
Now, if we remember what this thing here means, this is just by definition 'dy dt'. For the sake of brevity I will leave out subscripts now, but we'll just talk about this. This is a constant, and the limit of constant as 'delta t' approaches 0 is just that constant, which I'll call 'dy dx'. It's understood this is evaluated at 'x' equals 'x1'. By definition the limit of 'delta x' divided by 'delta t' as 'delta t' approaches 0, that's called 'dx dt'. Now we come to this term over here. And here's the key point. Many people say, well, as 'delta t' goes to 0 so does 'delta x'. That makes the numerator here 0 and that makes the whole fraction 0. But the point is that's not true. What is true here is that both the numerator and denominator are going to 0. This does not make this 0, but rather it makes it what? 'dx dt'. That's the definition of 'dx dt'. The important point is what? That as 'delta t' approaches 0 so does 'delta x'. And what property does 'k' have? 'k' has the property that the limit of 'k' as 'delta x' approaches 0 is 0 itself.
In other words, the topics that we'll be talking about in our next lecture concerns this recipe. And in the development of this recipe, the reason that the error term drops out is not because the numerator of this term was small, it was because the number multiplying limit happened to be 0. Now, we will talk about that in more detail next time. But the message I want you to see for right now is how we get hung up on this 0/0 bit, and why we must always be careful in how we handle something like this.
Now, by the way, there is something rather interesting that does happen over here. If you look at this thing, you get the feeling that it's almost as if you could cancel a common factor from both the numerator and the denominator. We have to be very, very careful. And I would like to introduce the language which you will be reading in the text in this assignment, called 'Differentials Now'. And what that means is simply this. Notice that 'dy dx' is one symbol it is not 'dy' divided by 'dx' because these things have not been defined separately. In other words, you can't just operate with symbols the way you might like to.
Here's an interesting little aside that you may find entertaining if it has no other value at all: the idea of being told that you can cancel common factors from both numerator and denominator. The uninitiated may say, you know, here's a 6 and here's a 6. I'll cancel them out. Now this is not what cancellation really meant, even though you had the same number both upstairs and downstairs. This is not what we meant by cancellation. Yet, notice that you happen to, by accident, get the right answer in this particular case. You see, if you do cancel this, this does happen to be 1/4.
This lecture has been going on in kind of a dull way for you. Just in case you'd like some comic relief, there are a few other examples that work the same way. And you can amaze your friends at the next party with them. I mean 19/95 happens to be 1/5. 26/65 happens to be 2/5. And I think there's one more. 49/98 happens to be 4/8. But these, I believe, are the only four fractions that this works for.
In other words, the mere fact that it looks like something is going to work is no guarantee that you can do these things without running into trouble. On the other hand, it's fair to assume that the men who invented differential calculus must've been clever enough not to have invented a symbolism that would have gotten us into trouble. That, by the way, is a big assumption to make, that they must have been clever enough. We will find, in places during our course, that this was not always the case. The point is this though. That when you write 'dy dx' in this way, if it were not going to be somehow identifiable with a quotient such as 'dy' divided by 'dx', the chances are we would never have invented this notation in the first place.
And so what I'd like to just point out briefly now is the following. Can we, in fact we should, how shall we-- let's put it this way-- how shall we define separate symbols dy and dx so that when we write down 'dy dx' it will make no difference whether you think of this as being the derivative or whether you think of it as being the quotient of two numbers? And by the way, you see the answer to this question will not be that difficult if we stop to think for just a moment over here. We've already answered this question except that we haven't concentrated on the fact that we solved the problem.
See, let's go back to this line 'L' again. What is the slope of the line 'L'? On the one hand we've looked at slope as the derivative 'dy dx'. On the other hand, just by looking at this little accentuated triangle here, the slope of the line 'L' is also given by 'delta y-tan' divided by 'delta x'.
And now one of the best ways to define 'dy' and 'dx' separately, it would seem to me, is what? Notice that this number divided by this number yields the symbol 'dy dx'. Therefore why not define this number to be 'dy' and define this number to be 'dx'? In other words the symbol 'dx' will just be a fancy name for 'delta x'. On the other hand, the symbol 'dy' will not be a fancy name for 'delta y'. In fact, let's emphasize that. It's not equal to 'delta y', it is equal to 'delta y-tan'. In other words, if I allow 'dy' to stand for 'delta y-tan' and 'dx' to stand for 'delta x', I can now treat 'dy dx' as if it were what? 'dy' divided by 'dx'.
Well, that's easy to show in terms of an example just mechanically. Remember, given 'y' equals 'x cubed' we've already found what? That 'dy dx' is '3 x squared'. This thing practically begs to be treated like a fraction. We would like to be able to say, hey, if this divided by this is this, why isn't this equal to this times this? In other words, why isn't 'dy' equal to '3 x squared' dx'? And the answer is that since we've identified or defined 'dy' to be 'delta y-tan', and since we've defined 'dx' to be 'delta x', and since we already know that this recipe is correct, it means, in particular, that we can now write things like this without having to worry about whether it's proper or not.
Now in later lectures this is going to play a very important role. This is what is known as the language of differentials. And differentials are the backbone off both differential and integral calculus. But the thing that I want you to really get into our minds today is the basic overall recipe. And I've taken the liberty of boxing this off over here. I want you to practice with approximations. I want you to think carefully about how you can get quick approximations. I don't want you to come away with the feeling that the approximation is what was important.
What was important was what? That the change in 'y', the true 'delta y', is ''dy dx' times 'delta x'' plus 'k delta x', where the limit of 'k' as 'delta x' approaches 0 is 0. And by the way, again, most books write it this way. I prefer that we emphasize that the derivative is evaluated or taken at a particular value of 'x'. And finally what I'd like to point out is that again, and we've done this many, many times, whereas the language we're used to in terms of intuition is geometry, that these results make perfectly good sense without reference to any diagram at all. In other words, then, the same result stated in analytic language simply says this: if 'f' is a function of 'x' and is differentiable when 'x' equals 'x1', then 'f of 'x1 plus delta x'' minus 'f of x1'-- you see that's geometrically what corresponds to your 'delta y'. That's ''f prime of x1' times 'delta x'' plus 'k delta x' where the limit of 'k' as 'delta x' approaches 0 is 0.
Now these two recipes summarize precisely what it is that we are interested in when we deal with the subject called infinitesimals. As I said before, and I can't emphasize this thing strongly enough, when it comes time to make approximations we will find better ways of getting approximations than by the method known as differentials and 'delta y-tan'. What is crucial is that you use the language of approximations enough so as you can see pictorially what's going on and then cement down the final recipe, which I've boxed in here. This will be the building block, as we shall see you next time, in the lecture which develops the derivative of composite functions. But more about that next time. And until next time, goodbye.
Funding for the publication of this video was provided by the Gabriella and Paul Rosenbaum Foundation. Help OCW continue to provide free and open access to MIT courses by making a donation at ocw.mit.edu/donate.