1 00:00:00,080 --> 00:00:01,770 The following content is provided 2 00:00:01,770 --> 00:00:04,000 under a Creative Commons license. 3 00:00:04,000 --> 00:00:06,860 Your support will help MIT OpenCourseWare continue 4 00:00:06,860 --> 00:00:10,720 to offer high-quality educational resources for free. 5 00:00:10,720 --> 00:00:13,330 To make a donation or view additional materials 6 00:00:13,330 --> 00:00:17,207 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,207 --> 00:00:17,832 at ocw.mit.edu. 8 00:00:22,080 --> 00:00:23,710 PROFESSOR: Continuing in the theme 9 00:00:23,710 --> 00:00:27,190 of sorting in general, but in particular, binary search 10 00:00:27,190 --> 00:00:28,910 trees, which are a kind of way of doing 11 00:00:28,910 --> 00:00:31,210 dynamic sorting, if you will, where 12 00:00:31,210 --> 00:00:32,759 the elements are coming and going. 13 00:00:32,759 --> 00:00:36,020 And at all times, you want to know the sorted order 14 00:00:36,020 --> 00:00:41,170 of your elements by storing them in a nice binary search tree. 15 00:00:41,170 --> 00:00:46,360 Remember, in general, a binary search tree is a tree. 16 00:00:46,360 --> 00:00:50,040 It's binary, and it has the search property. 17 00:00:50,040 --> 00:00:51,300 Those three things. 18 00:00:51,300 --> 00:00:52,660 This is a rooted binary tree. 19 00:00:52,660 --> 00:00:53,815 It has a root. 20 00:00:53,815 --> 00:00:56,870 It's binary, so there's a left child and a right child. 21 00:00:56,870 --> 00:00:59,270 Some nodes lack a right or left child. 22 00:00:59,270 --> 00:01:02,040 Some nodes lack both. 23 00:01:02,040 --> 00:01:03,780 Every node has a key. 24 00:01:03,780 --> 00:01:05,110 This is the search part. 25 00:01:05,110 --> 00:01:08,850 You store key in every node, and you have this BST property, 26 00:01:08,850 --> 00:01:10,780 or also called the search property, 27 00:01:10,780 --> 00:01:14,280 that every node-- if you have a node the stores key x, 28 00:01:14,280 --> 00:01:17,599 everybody in the left subtree stores a key that's less than 29 00:01:17,599 --> 00:01:19,890 or equal to x, and everyone that's in the right subtree 30 00:01:19,890 --> 00:01:22,130 stores a key that's greater than or equal to x. 31 00:01:22,130 --> 00:01:24,370 So not just the left and right children, 32 00:01:24,370 --> 00:01:27,870 but every descendant way down there is smaller than x. 33 00:01:27,870 --> 00:01:30,370 Every descendent way down there is greater than x. 34 00:01:30,370 --> 00:01:32,370 So when you have a binary search tree like this, 35 00:01:32,370 --> 00:01:34,170 if you want to know the sorted order, 36 00:01:34,170 --> 00:01:37,469 you do what's called an in-order traversal. 37 00:01:37,469 --> 00:01:38,260 You look at a node. 38 00:01:38,260 --> 00:01:40,750 You recursively visit the left child. 39 00:01:40,750 --> 00:01:42,740 Then you print out the root. 40 00:01:42,740 --> 00:01:44,780 Then you recursively visit the right child. 41 00:01:44,780 --> 00:01:47,742 So in this case, we'd go left, left, print 11. 42 00:01:47,742 --> 00:01:49,130 Print 20. 43 00:01:49,130 --> 00:01:49,630 Go right. 44 00:01:49,630 --> 00:01:50,130 Go left. 45 00:01:50,130 --> 00:01:51,000 Print 26. 46 00:01:51,000 --> 00:01:51,980 Print 29. 47 00:01:51,980 --> 00:01:53,620 Go up. 48 00:01:53,620 --> 00:01:54,780 Print 41. 49 00:01:54,780 --> 00:01:55,500 Go right. 50 00:01:55,500 --> 00:01:56,640 Print 50. 51 00:01:56,640 --> 00:01:57,630 Print 65. 52 00:01:57,630 --> 00:01:59,450 Then check that's in sorted order. 53 00:01:59,450 --> 00:02:02,180 If you're not familiar with in-order traversal, 54 00:02:02,180 --> 00:02:03,240 look at the textbook. 55 00:02:03,240 --> 00:02:05,350 It's a very simple operation. 56 00:02:05,350 --> 00:02:07,850 I'm not going to talk about it more here, 57 00:02:07,850 --> 00:02:11,750 except we're going to use it. 58 00:02:11,750 --> 00:02:14,030 All right, we'll get to the topic of today's lecture 59 00:02:14,030 --> 00:02:16,620 in a moment, which is balance. 60 00:02:16,620 --> 00:02:19,220 What we saw in last lecture and recitation 61 00:02:19,220 --> 00:02:21,220 is that these basic binary search 62 00:02:21,220 --> 00:02:23,509 trees, where when you insert a node you just walk down 63 00:02:23,509 --> 00:02:26,050 the tree to find where that item fits-- like if you're trying 64 00:02:26,050 --> 00:02:29,360 to insert 30, you go left here, go right here, go right here, 65 00:02:29,360 --> 00:02:30,460 and say, oh 30 fits here. 66 00:02:30,460 --> 00:02:31,440 Let's put 30 there. 67 00:02:31,440 --> 00:02:35,420 If you keep doing that, you can do insert. 68 00:02:35,420 --> 00:02:36,560 You can do delete. 69 00:02:36,560 --> 00:02:38,336 You can do these kinds of searches, 70 00:02:38,336 --> 00:02:40,750 which we saw, finding the next larger element 71 00:02:40,750 --> 00:02:43,352 or finding the next smaller element, also known 72 00:02:43,352 --> 00:02:44,560 as successor and predecessor. 73 00:02:44,560 --> 00:02:47,850 These are actually the typical names for those operations. 74 00:02:47,850 --> 00:02:50,520 You can solve them in order h time. 75 00:02:50,520 --> 00:02:53,290 Anyone remember what h was? 76 00:02:53,290 --> 00:02:54,150 The height. 77 00:02:54,150 --> 00:02:54,790 Yeah, good. 78 00:02:54,790 --> 00:02:57,460 The height of the tree. 79 00:02:57,460 --> 00:03:04,800 So h is the height of the BST. 80 00:03:04,800 --> 00:03:06,560 What is the height of the tree? 81 00:03:06,560 --> 00:03:08,200 AUDIENCE: [INAUDIBLE]. 82 00:03:08,200 --> 00:03:08,914 PROFESSOR: Sorry? 83 00:03:08,914 --> 00:03:09,830 AUDIENCE: [INAUDIBLE]. 84 00:03:09,830 --> 00:03:10,538 PROFESSOR: Log n? 85 00:03:10,538 --> 00:03:13,210 Log n would be great, but not always. 86 00:03:13,210 --> 00:03:15,530 So this is the issue of being balance. 87 00:03:22,050 --> 00:03:24,870 So in an ideal world, your tree's 88 00:03:24,870 --> 00:03:28,190 going to look something like this. 89 00:03:28,190 --> 00:03:32,230 I've drawn this picture probably the most in my academic career. 90 00:03:32,230 --> 00:03:35,180 This is a nice, perfectly balanced binary search tree. 91 00:03:35,180 --> 00:03:38,690 The height is log n. 92 00:03:38,690 --> 00:03:40,580 This would be the balance case. 93 00:03:40,580 --> 00:03:41,680 I mean, roughly log n. 94 00:03:41,680 --> 00:03:46,380 Let's just put theta to be approximate. 95 00:03:46,380 --> 00:03:48,560 But as we saw at the end of last class, 96 00:03:48,560 --> 00:03:54,020 you can have a very unbalanced tree, which is just a path. 97 00:03:54,020 --> 00:03:57,657 And there the height is n. 98 00:03:57,657 --> 00:03:58,990 What's the definition of height? 99 00:03:58,990 --> 00:04:00,615 That's actually what I was looking for. 100 00:04:03,280 --> 00:04:05,587 Should be 6.042 material. 101 00:04:05,587 --> 00:04:06,581 Yeah? 102 00:04:06,581 --> 00:04:08,569 AUDIENCE: Is it the length of the longest path 103 00:04:08,569 --> 00:04:09,827 always going down? 104 00:04:09,827 --> 00:04:12,410 PROFESSOR: Yeah, length of the longest path always going down. 105 00:04:12,410 --> 00:04:15,610 So length of the longest path from the root to some leaf. 106 00:04:15,610 --> 00:04:16,910 That's right. 107 00:04:16,910 --> 00:04:18,060 OK, so this is-- 108 00:04:35,870 --> 00:04:37,520 I highlight this because we're going 109 00:04:37,520 --> 00:04:40,840 to be working a lot with height today. 110 00:04:40,840 --> 00:04:44,840 All that's happening here, all of the paths are length log n. 111 00:04:44,840 --> 00:04:46,829 Here, there is a path of length n. 112 00:04:46,829 --> 00:04:49,120 Some of them are shorter, but in fact, the average path 113 00:04:49,120 --> 00:04:49,710 is n over 2. 114 00:04:49,710 --> 00:04:50,840 It's really bad. 115 00:04:50,840 --> 00:04:52,790 So this is very unbalanced. 116 00:04:56,980 --> 00:04:58,920 I'll put "very." 117 00:04:58,920 --> 00:05:01,590 It's not a very formal term, but that's 118 00:05:01,590 --> 00:05:04,170 like the worst case for BSTs. 119 00:05:04,170 --> 00:05:04,730 This is good. 120 00:05:04,730 --> 00:05:06,650 This does have a formal definition. 121 00:05:06,650 --> 00:05:15,410 We call a tree balanced if the height is order log n. 122 00:05:18,170 --> 00:05:19,250 So you're storing n keys. 123 00:05:19,250 --> 00:05:20,970 If your height is always order log n, 124 00:05:20,970 --> 00:05:23,130 we get a constant factor here. 125 00:05:23,130 --> 00:05:26,430 Here, it's basically exactly log n, 1 times log n. 126 00:05:26,430 --> 00:05:28,930 It's always going to be at least log n, 127 00:05:28,930 --> 00:05:31,426 because if you're storing n things in a binary tree, 128 00:05:31,426 --> 00:05:33,050 you need to have height at least log n. 129 00:05:33,050 --> 00:05:36,170 So in fact, it will be theta log n if your tree is balanced. 130 00:05:36,170 --> 00:05:38,540 And today's goal is to always maintain 131 00:05:38,540 --> 00:05:40,760 that your trees are balanced. 132 00:05:40,760 --> 00:05:42,690 And we're going to do that using the structure 133 00:05:42,690 --> 00:05:46,840 called AVL trees, which I'll define in a moment. 134 00:05:46,840 --> 00:05:49,630 They're the original way people found 135 00:05:49,630 --> 00:05:52,980 to keep trees balanced back in the '60s, 136 00:05:52,980 --> 00:05:54,720 but they're still kind of the simplest. 137 00:05:54,720 --> 00:05:56,678 There are lots of ways to keep a tree balanced, 138 00:05:56,678 --> 00:05:59,700 so I'll mention some other balance trees later on. 139 00:05:59,700 --> 00:06:02,910 In particular, your textbook covers two other ways to do it. 140 00:06:02,910 --> 00:06:05,720 It does not cover AVL trees, so pay attention. 141 00:06:12,974 --> 00:06:14,390 One more thing I wanted to define. 142 00:06:14,390 --> 00:06:20,230 We talked about the height of the tree, 143 00:06:20,230 --> 00:06:27,350 but I'd also like to talk about the height of a node in a tree. 144 00:06:29,780 --> 00:06:31,030 Can anyone define this for me? 145 00:06:33,940 --> 00:06:35,880 Yeah? 146 00:06:35,880 --> 00:06:38,522 AUDIENCE: It's the level that the node is at. 147 00:06:38,522 --> 00:06:40,230 PROFESSOR: The level that the node is at. 148 00:06:40,230 --> 00:06:41,254 That is roughly right. 149 00:06:41,254 --> 00:06:42,170 I mean, that is right. 150 00:06:42,170 --> 00:06:44,513 It's all about, what is the level of a node? 151 00:06:44,513 --> 00:06:48,424 AUDIENCE: Like how many levels of children it has. 152 00:06:48,424 --> 00:06:50,340 PROFESSOR: How many levels of children it has. 153 00:06:50,340 --> 00:06:51,830 That's basically right, yeah. 154 00:06:51,830 --> 00:06:54,110 AUDIENCE: The distance from it to the root. 155 00:06:54,110 --> 00:06:55,776 PROFESSOR: Distance from it to the root. 156 00:06:55,776 --> 00:06:56,930 That would be the depth. 157 00:06:56,930 --> 00:06:59,180 So depth is counting from above. 158 00:06:59,180 --> 00:06:59,960 Height is-- 159 00:06:59,960 --> 00:07:00,876 AUDIENCE: [INAUDIBLE]. 160 00:07:04,660 --> 00:07:08,928 PROFESSOR: Yes, longest path from that node to the leaf. 161 00:07:08,928 --> 00:07:11,856 Note that's why I wrote this definition actually, 162 00:07:11,856 --> 00:07:12,832 to give you a hint. 163 00:07:20,650 --> 00:07:27,000 Here I should probably say down to be precise. 164 00:07:27,000 --> 00:07:29,026 You're not allowed to go up in these paths. 165 00:07:35,650 --> 00:07:38,485 [INAUDIBLE] 166 00:07:38,485 --> 00:07:39,447 All right. 167 00:07:39,447 --> 00:07:40,890 Sorry. 168 00:07:40,890 --> 00:07:42,450 I've got to learn how to throw. 169 00:07:42,450 --> 00:07:42,950 All right. 170 00:07:42,950 --> 00:07:46,495 So for example, over here I'm going to write depths in red. 171 00:07:49,340 --> 00:07:50,820 If you're taking notes it's OK. 172 00:07:50,820 --> 00:07:52,490 Don't worry. 173 00:07:52,490 --> 00:07:55,480 So length off the longest path from it down to a leaf. 174 00:07:55,480 --> 00:07:59,740 Well, this is a leaf, so its height is 0. 175 00:07:59,740 --> 00:08:01,894 OK. 176 00:08:01,894 --> 00:08:04,960 Yeah, I'll just leave it at that. 177 00:08:04,960 --> 00:08:09,000 It takes 0 steps to get from a leaf to a leaf. 178 00:08:09,000 --> 00:08:10,260 This guy's not a leaf. 179 00:08:10,260 --> 00:08:16,900 It has a child, but it has a path of length one to a leaf. 180 00:08:16,900 --> 00:08:19,580 So it's one. 181 00:08:19,580 --> 00:08:20,630 This guy has a choice. 182 00:08:20,630 --> 00:08:24,830 You could go left and you get a path of length 1, 183 00:08:24,830 --> 00:08:27,720 or you could go right and get a path of length 2. 184 00:08:27,720 --> 00:08:31,490 We take the max, so this guy has height 2. 185 00:08:31,490 --> 00:08:34,250 This node has height 1. 186 00:08:34,250 --> 00:08:38,620 This node has height 3. 187 00:08:38,620 --> 00:08:40,872 How do you compute the height of a node? 188 00:08:40,872 --> 00:08:42,479 Anyone? 189 00:08:42,479 --> 00:08:43,419 Yeah. 190 00:08:43,419 --> 00:08:45,544 AUDIENCE: Max of the height of the children plus 1. 191 00:08:45,544 --> 00:08:46,252 PROFESSOR: Right. 192 00:08:46,252 --> 00:08:48,330 You take the max of the height of the children. 193 00:08:48,330 --> 00:08:49,700 Here, 2 and 1. 194 00:08:49,700 --> 00:08:50,730 Max is 2. 195 00:08:50,730 --> 00:08:51,540 Add 1. 196 00:08:51,540 --> 00:08:52,770 You get 3. 197 00:08:52,770 --> 00:08:56,110 So it's going to always be-- this is just a formula. 198 00:08:56,110 --> 00:09:06,640 The height of the left child maxed 199 00:09:06,640 --> 00:09:17,860 with the height of the right child plus 1. 200 00:09:17,860 --> 00:09:20,380 This is obviously useful for computing. 201 00:09:20,380 --> 00:09:24,090 And in particular, in lecture and recitation 202 00:09:24,090 --> 00:09:27,110 last time, we saw how to maintain 203 00:09:27,110 --> 00:09:32,760 the size of every tree using data structure augmentation. 204 00:09:32,760 --> 00:09:34,403 Data structure augmentation. 205 00:09:37,210 --> 00:09:40,900 And then we started with a regular vanilla binary search 206 00:09:40,900 --> 00:09:43,540 tree, and then we maintained-- every time 207 00:09:43,540 --> 00:09:46,160 we did an operation on the tree, we also 208 00:09:46,160 --> 00:09:49,060 updated the size of the subtree rooted 209 00:09:49,060 --> 00:09:52,260 at that node, the size field. 210 00:09:52,260 --> 00:09:54,720 Here, I want to store a height field, 211 00:09:54,720 --> 00:09:57,170 and because I have this nice local rule that tells me 212 00:09:57,170 --> 00:09:59,411 how to compute the height of a node using just 213 00:09:59,411 --> 00:10:01,910 local information-- the height of its left child, the height 214 00:10:01,910 --> 00:10:02,750 of its right child. 215 00:10:02,750 --> 00:10:06,287 Do a constant amount of work here. 216 00:10:06,287 --> 00:10:07,370 There's a general theorem. 217 00:10:07,370 --> 00:10:09,180 Whenever you have a nice local formula 218 00:10:09,180 --> 00:10:11,300 like this for updating your information in terms 219 00:10:11,300 --> 00:10:14,010 of your children, then you can maintain 220 00:10:14,010 --> 00:10:15,890 it using constant overhead. 221 00:10:15,890 --> 00:10:19,500 So we can store the height of every node for free. 222 00:10:19,500 --> 00:10:20,390 Why do I care? 223 00:10:20,390 --> 00:10:23,310 Because AVL trees are going to use the heights of the nodes. 224 00:10:23,310 --> 00:10:25,260 Our goal is to keep the heights small. 225 00:10:25,260 --> 00:10:26,750 We don't want this. 226 00:10:26,750 --> 00:10:28,200 We want this. 227 00:10:28,200 --> 00:10:30,260 So a natural thing to do is store the heights. 228 00:10:30,260 --> 00:10:34,100 When they get too big, fix it. 229 00:10:34,100 --> 00:10:36,550 So that's what we're going to do. 230 00:10:52,140 --> 00:10:56,363 Maybe one more thing to mention over here for convenience. 231 00:11:01,260 --> 00:11:04,360 Leaves, for example, have children that are-- I mean, 232 00:11:04,360 --> 00:11:07,550 they have null pointers to their left and right children. 233 00:11:07,550 --> 00:11:10,250 You could draw them explicitly like this. 234 00:11:10,250 --> 00:11:12,920 Also some nodes just lack a single child. 235 00:11:12,920 --> 00:11:16,360 I'm going to define the depths of these things 236 00:11:16,360 --> 00:11:19,590 to be negative 1. 237 00:11:19,590 --> 00:11:22,270 This will be convenient later on. 238 00:11:22,270 --> 00:11:23,070 Why negative 1? 239 00:11:23,070 --> 00:11:24,874 Because then this formula works. 240 00:11:24,874 --> 00:11:26,040 You can just think about it. 241 00:11:26,040 --> 00:11:28,185 Like leaves, for example, have two children, 242 00:11:28,185 --> 00:11:29,060 which are negative 1. 243 00:11:29,060 --> 00:11:29,750 You take the max. 244 00:11:29,750 --> 00:11:30,250 You add 1. 245 00:11:30,250 --> 00:11:31,460 You get 0. 246 00:11:31,460 --> 00:11:33,505 So that just makes things work out. 247 00:11:33,505 --> 00:11:35,380 We don't normally draw these in the pictures, 248 00:11:35,380 --> 00:11:39,190 but it's convenient that I don't have to do special cases when 249 00:11:39,190 --> 00:11:42,096 the left child doesn't exist and the right child doesn't exist. 250 00:11:42,096 --> 00:11:43,470 You could either do special cases 251 00:11:43,470 --> 00:11:44,910 or you could make this definition. 252 00:11:44,910 --> 00:11:47,510 Up to you. 253 00:11:47,510 --> 00:11:49,780 OK. 254 00:11:49,780 --> 00:11:50,535 AVL trees. 255 00:11:54,960 --> 00:12:00,930 So the idea with an AVL tree is the following. 256 00:12:36,660 --> 00:12:39,211 We'd like to keep the height order log n. 257 00:12:39,211 --> 00:12:41,710 It's a little harder to think about keeping the height order 258 00:12:41,710 --> 00:12:46,110 log n than it is to think about keeping the tree balance, 259 00:12:46,110 --> 00:12:49,095 meaning the left and right sides are more or less equal. 260 00:12:49,095 --> 00:12:50,970 In this case, we're going to think about them 261 00:12:50,970 --> 00:12:53,720 as being more or less equal in height. 262 00:12:53,720 --> 00:12:55,280 You could also think about them being 263 00:12:55,280 --> 00:12:57,600 more or less equal in subtree size. 264 00:12:57,600 --> 00:12:58,640 That would also work. 265 00:12:58,640 --> 00:13:01,020 It's a different balanced search tree. 266 00:13:01,020 --> 00:13:03,610 Height is kind of the easiest thing to work with. 267 00:13:03,610 --> 00:13:07,570 So if we have a node, it has a left subtree. 268 00:13:07,570 --> 00:13:09,670 It has a right subtree, which we traditionally 269 00:13:09,670 --> 00:13:11,310 draw as triangles. 270 00:13:11,310 --> 00:13:13,456 This subtree has a height. 271 00:13:13,456 --> 00:13:17,440 We'll call it HL for left. 272 00:13:17,440 --> 00:13:20,710 By the height of the subtree, I mean the height of its root. 273 00:13:20,710 --> 00:13:24,280 And the right subtree has some height, r. 274 00:13:24,280 --> 00:13:26,860 I've drawn them as the same, but in general they 275 00:13:26,860 --> 00:13:28,260 might be different. 276 00:13:28,260 --> 00:13:31,250 And what we would like is that h sub l and h 277 00:13:31,250 --> 00:13:32,890 sub r are more or less the same. 278 00:13:32,890 --> 00:13:35,670 They differ by at most an additive 1. 279 00:13:35,670 --> 00:13:41,240 So if I look at h sub l minus h sub r in absolute value, 280 00:13:41,240 --> 00:13:45,690 this is at most 1, for every node. 281 00:13:45,690 --> 00:13:47,780 So I have some node x. 282 00:13:47,780 --> 00:13:50,370 For every node x, I want the left and right subtrees 283 00:13:50,370 --> 00:13:52,410 to be almost balanced. 284 00:13:52,410 --> 00:13:54,840 Now, I could say differ by at most 0, 285 00:13:54,840 --> 00:13:58,030 that the left and right have exactly the same heights. 286 00:13:58,030 --> 00:14:00,290 That's difficult, because that really 287 00:14:00,290 --> 00:14:03,130 forces you to have exactly the perfect tree. 288 00:14:03,130 --> 00:14:07,900 And in fact, it's not even possible for odd n or even n 289 00:14:07,900 --> 00:14:09,280 or something. 290 00:14:09,280 --> 00:14:10,810 Because at the very end you're going 291 00:14:10,810 --> 00:14:14,750 to have one missing child, and then you're unbalanced there. 292 00:14:14,750 --> 00:14:18,070 So 0's just not possible to maintain, 293 00:14:18,070 --> 00:14:21,400 but 1 is almost as good, hopefully. 294 00:14:21,400 --> 00:14:23,950 We're going to prove that in a second. 295 00:14:23,950 --> 00:14:30,060 And it turns out to be easy to maintain in log n time. 296 00:14:30,060 --> 00:14:34,205 So let's prove some stuff. 297 00:14:38,870 --> 00:14:43,525 So first claim is that AVL trees are balanced. 298 00:14:52,880 --> 00:14:55,000 Balanced, remember, means that the height of them 299 00:14:55,000 --> 00:14:56,820 is always order log n. 300 00:14:56,820 --> 00:14:59,300 So we're just going to assume for now that we can somehow 301 00:14:59,300 --> 00:15:00,850 achieve this property. 302 00:15:00,850 --> 00:15:05,100 We want to prove that it implies that the height is 303 00:15:05,100 --> 00:15:06,730 at most some constant times log n. 304 00:15:06,730 --> 00:15:09,740 We know it's at least log n, but also 305 00:15:09,740 --> 00:15:12,460 like it to be not much bigger. 306 00:15:12,460 --> 00:15:16,040 So what do you think is the worst case? 307 00:15:16,040 --> 00:15:18,020 Say I have n nodes. 308 00:15:18,020 --> 00:15:21,370 How could I make the tree as high as possible? 309 00:15:21,370 --> 00:15:23,570 Or conversely, if I have a particular height, 310 00:15:23,570 --> 00:15:26,600 how could I make it have as few nodes as possible? 311 00:15:26,600 --> 00:15:29,790 That'd be like the sparsest, the least balanced 312 00:15:29,790 --> 00:15:30,870 situation for AVL trees. 313 00:15:34,488 --> 00:15:34,988 Yeah? 314 00:15:34,988 --> 00:15:37,154 AUDIENCE: You could have one node on the last level. 315 00:15:37,154 --> 00:15:39,740 PROFESSOR: One node on the last level, yeah, in particular. 316 00:15:39,740 --> 00:15:40,690 Little more. 317 00:15:40,690 --> 00:15:43,960 What do the other levels look like? 318 00:15:43,960 --> 00:15:46,680 That is correct, but I want to know the whole tree. 319 00:15:50,320 --> 00:15:53,490 It's hard to explain the tree, but you 320 00:15:53,490 --> 00:15:55,240 can explain the core property of the tree. 321 00:15:55,240 --> 00:15:55,703 Yeah? 322 00:15:55,703 --> 00:15:56,619 AUDIENCE: [INAUDIBLE]. 323 00:15:58,769 --> 00:16:00,310 PROFESSOR: For every node, let's make 324 00:16:00,310 --> 00:16:03,730 the right side have a height of one larger than the left side. 325 00:16:03,730 --> 00:16:06,270 I think that's worth a cushion. 326 00:16:06,270 --> 00:16:07,450 See if I can throw better. 327 00:16:10,402 --> 00:16:12,862 Good catch. 328 00:16:12,862 --> 00:16:14,830 Better than hitting your eye. 329 00:16:17,790 --> 00:16:22,539 So I'm going to not prove this formally, 330 00:16:22,539 --> 00:16:24,580 but I think if you stare at this long enough it's 331 00:16:24,580 --> 00:16:27,000 pretty obvious. 332 00:16:27,000 --> 00:16:32,809 Worst case is when-- there are multiple worst cases, 333 00:16:32,809 --> 00:16:34,350 because right and left are symmetric. 334 00:16:34,350 --> 00:16:35,680 We don't really care. 335 00:16:35,680 --> 00:16:40,040 But let's say that the right subtree 336 00:16:40,040 --> 00:16:52,350 has height one more than the left for every node. 337 00:16:56,860 --> 00:17:00,780 OK, this is a little tricky to draw. 338 00:17:00,780 --> 00:17:03,087 Not even sure I want to try to draw it. 339 00:17:03,087 --> 00:17:04,670 But you basically draw it recursively. 340 00:17:04,670 --> 00:17:07,550 So, OK, somehow I've figured out this 341 00:17:07,550 --> 00:17:10,089 where the height difference here is 1. 342 00:17:10,089 --> 00:17:12,260 Then I take two copies of it. 343 00:17:12,260 --> 00:17:13,220 It's like a fractal. 344 00:17:13,220 --> 00:17:15,569 You should know all about fractals by now. 345 00:17:15,569 --> 00:17:17,849 Problem set two. 346 00:17:17,849 --> 00:17:20,548 And then you just-- well, that's not quite right. 347 00:17:20,548 --> 00:17:22,589 In fact, I need to somehow make this one a little 348 00:17:22,589 --> 00:17:28,700 bit taller and then glue these together. 349 00:17:28,700 --> 00:17:30,330 Little tricky. 350 00:17:30,330 --> 00:17:32,010 Let's not even try to draw the tree. 351 00:17:32,010 --> 00:17:33,880 Let's just imagine this is possible. 352 00:17:33,880 --> 00:17:36,740 It is possible. 353 00:17:36,740 --> 00:17:40,190 And instead, I'm going to use mathematics 354 00:17:40,190 --> 00:17:43,160 to understand how high that tree is. 355 00:17:43,160 --> 00:17:46,070 Or actually, it's a little easier 356 00:17:46,070 --> 00:17:48,200 to think about-- let me get this right. 357 00:17:48,200 --> 00:17:50,350 It's so easy that I have to look at my notes 358 00:17:50,350 --> 00:17:53,550 to remember what to write. 359 00:17:53,550 --> 00:17:56,460 Really, no problem. 360 00:17:56,460 --> 00:18:01,760 All right, so I'm going to define n sub h 361 00:18:01,760 --> 00:18:08,580 is the minimum number of nodes that's 362 00:18:08,580 --> 00:18:15,690 possible in an AVL tree of height h. 363 00:18:20,380 --> 00:18:23,090 This is sort of the inverse of what we care about, 364 00:18:23,090 --> 00:18:25,440 but if we can solve the inverse, we can solve the thing. 365 00:18:25,440 --> 00:18:28,640 What we really care about is, for n nodes, how large 366 00:18:28,640 --> 00:18:29,390 can the height be? 367 00:18:29,390 --> 00:18:31,232 We want to prove that's order log n. 368 00:18:31,232 --> 00:18:33,690 But it'll be a lot easier to think about the reverse, which 369 00:18:33,690 --> 00:18:37,245 is, if I fix the height to be h, what's the fewest nodes 370 00:18:37,245 --> 00:18:38,590 that I can pack in? 371 00:18:38,590 --> 00:18:43,060 Because for the very unbalanced tree, I have a height of n, 372 00:18:43,060 --> 00:18:45,540 and I only need to put n nodes. 373 00:18:45,540 --> 00:18:48,300 That would be really bad. 374 00:18:48,300 --> 00:18:54,090 What I prefer is a situation like this, where with height h, 375 00:18:54,090 --> 00:18:56,360 I have to put in 2 to the h nodes. 376 00:18:56,360 --> 00:18:58,100 That would be perfect balance. 377 00:18:58,100 --> 00:18:59,840 Any constant to the h will do. 378 00:18:59,840 --> 00:19:02,234 So when you take the inverse, you get a log. 379 00:19:02,234 --> 00:19:03,650 OK, we'll get to that in a moment. 380 00:19:13,950 --> 00:19:15,960 How should we analyze n sub h? 381 00:19:20,163 --> 00:19:21,400 I hear something. 382 00:19:21,400 --> 00:19:21,900 Yeah? 383 00:19:21,900 --> 00:19:27,090 AUDIENCE: [INAUDIBLE] 2 to the h minus 1 [INAUDIBLE]. 384 00:19:35,890 --> 00:19:40,900 PROFESSOR: Maybe, but I don't think that will quite work out. 385 00:19:45,210 --> 00:19:47,239 Any-- yeah? 386 00:19:47,239 --> 00:19:50,592 AUDIENCE: So you have only 1 node in the last level, 387 00:19:50,592 --> 00:19:55,861 so it would be 1/2 to the h plus 1. 388 00:19:55,861 --> 00:19:58,500 PROFESSOR: That turns out to be approximately correct, 389 00:19:58,500 --> 00:20:03,050 but I don't know where you got 1/2 to the h plus 1. 390 00:20:03,050 --> 00:20:04,710 It's not exactly correct. 391 00:20:04,710 --> 00:20:08,490 I'll tell you that, so that your analysis isn't right. 392 00:20:08,490 --> 00:20:09,612 It's a lot easier. 393 00:20:09,612 --> 00:20:11,320 You guys are worried about the last level 394 00:20:11,320 --> 00:20:13,816 and actually what the tree looks like, but in fact, all you need 395 00:20:13,816 --> 00:20:14,316 is this. 396 00:20:18,100 --> 00:20:20,024 All you need is love, yeah. 397 00:20:20,024 --> 00:20:20,940 AUDIENCE: [INAUDIBLE]. 398 00:20:20,940 --> 00:20:22,517 PROFESSOR: No, it's not a half. 399 00:20:22,517 --> 00:20:23,600 It's a different constant. 400 00:20:23,600 --> 00:20:23,960 Yeah? 401 00:20:23,960 --> 00:20:26,570 AUDIENCE: Start with base cases and write a recursive formula. 402 00:20:26,570 --> 00:20:28,270 PROFESSOR: Ah, recursive formula. 403 00:20:28,270 --> 00:20:29,169 Good. 404 00:20:29,169 --> 00:20:30,460 You said start with base cases. 405 00:20:30,460 --> 00:20:32,860 I always forget that part, so it's good that you remember. 406 00:20:32,860 --> 00:20:34,120 You should start with the base case, 407 00:20:34,120 --> 00:20:36,036 but I'm not going to worry about the base case 408 00:20:36,036 --> 00:20:37,210 because it won't matter. 409 00:20:37,210 --> 00:20:40,330 Because I know the base case is always going to be n order 1 410 00:20:40,330 --> 00:20:41,710 is order 1. 411 00:20:41,710 --> 00:20:43,550 So for algorithms, that's usually all 412 00:20:43,550 --> 00:20:45,690 you need for base case, but it's good that you think about it. 413 00:20:45,690 --> 00:20:47,640 What I was looking for is recursive formula, 414 00:20:47,640 --> 00:20:49,740 aka, recurrence. 415 00:20:49,740 --> 00:20:52,840 So can someone tell me-- maybe even you-- could tell me 416 00:20:52,840 --> 00:20:56,690 a recurrence for n sub h, in terms of n sub smaller h? 417 00:20:59,660 --> 00:21:01,145 Yeah? 418 00:21:01,145 --> 00:21:02,630 AUDIENCE: 1 plus [INAUDIBLE]. 419 00:21:05,600 --> 00:21:09,480 PROFESSOR: 1 plus n sub h minus 1. 420 00:21:13,570 --> 00:21:14,320 Not quite. 421 00:21:14,320 --> 00:21:15,196 Yeah? 422 00:21:15,196 --> 00:21:19,170 AUDIENCE: N sub h minus 1 plus n sub h minus 2. 423 00:21:19,170 --> 00:21:22,365 PROFESSOR: N plus-- do you want the 1 plus? 424 00:21:22,365 --> 00:21:25,720 AUDIENCE: I don't think so. 425 00:21:25,720 --> 00:21:28,560 PROFESSOR: You do. 426 00:21:28,560 --> 00:21:30,040 It's a collaboration. 427 00:21:30,040 --> 00:21:31,470 To combine your two answers, this 428 00:21:31,470 --> 00:21:32,720 should be the correct formula. 429 00:21:32,720 --> 00:21:33,980 Let me double check. 430 00:21:33,980 --> 00:21:35,260 Yes, whew. 431 00:21:35,260 --> 00:21:35,760 Good. 432 00:21:35,760 --> 00:21:37,050 OK, why? 433 00:21:37,050 --> 00:21:41,900 Because the one thing we know is that our tree looks like this. 434 00:21:45,510 --> 00:21:47,130 The total height here is h. 435 00:21:47,130 --> 00:21:49,260 That's what we're trying to figure out. 436 00:21:49,260 --> 00:21:52,050 How many nodes are in this tree of height h? 437 00:21:52,050 --> 00:21:57,660 Well, the height is the max of the two directions. 438 00:21:57,660 --> 00:22:02,260 So that means that the larger has height h minus 1, 439 00:22:02,260 --> 00:22:04,860 because the longest path to a leaf 440 00:22:04,860 --> 00:22:06,750 is going to be down this way. 441 00:22:06,750 --> 00:22:08,110 What's the height of this? 442 00:22:08,110 --> 00:22:10,140 Well, it's one less than the height of this. 443 00:22:10,140 --> 00:22:13,270 So it's going to be h minus 2. 444 00:22:13,270 --> 00:22:17,550 This is where the n sub h minus 1 plus n sub h minus 2 come in. 445 00:22:17,550 --> 00:22:19,400 But there's also this node. 446 00:22:19,400 --> 00:22:22,580 It doesn't actually make a big difference in this recurrence. 447 00:22:22,580 --> 00:22:23,940 This is the exponential part. 448 00:22:23,940 --> 00:22:26,370 This is like itty bitty thing. 449 00:22:26,370 --> 00:22:29,080 But it matters for the base case is pretty much 450 00:22:29,080 --> 00:22:30,850 where it matters. 451 00:22:30,850 --> 00:22:31,920 Back to your base case. 452 00:22:31,920 --> 00:22:34,500 There's one guy here, plus all the nodes on the left, 453 00:22:34,500 --> 00:22:36,410 plus all the nodes on the right. 454 00:22:36,410 --> 00:22:38,110 And for whatever reason, I put the left 455 00:22:38,110 --> 00:22:41,804 over here and the right over here. 456 00:22:41,804 --> 00:22:43,720 And of course, you could reverse this picture. 457 00:22:43,720 --> 00:22:44,380 It doesn't really matter. 458 00:22:44,380 --> 00:22:45,620 You get the same formula. 459 00:22:45,620 --> 00:22:47,200 That's the point. 460 00:22:47,200 --> 00:22:49,250 So this is the recurrence. 461 00:22:49,250 --> 00:22:51,370 Now we need to solve it. 462 00:22:51,370 --> 00:22:54,070 What we would like is for it to be exponential, 463 00:22:54,070 --> 00:22:57,830 because that means there's a lot of nodes in a height h AVL 464 00:22:57,830 --> 00:22:59,780 tree. 465 00:22:59,780 --> 00:23:02,460 So any suggestions on how we could figure out 466 00:23:02,460 --> 00:23:04,290 this recurrence? 467 00:23:04,290 --> 00:23:06,275 Does it look like anything you've seen before? 468 00:23:06,275 --> 00:23:07,460 AUDIENCE: Fibonacci. 469 00:23:07,460 --> 00:23:08,335 PROFESSOR: Fibonacci. 470 00:23:08,335 --> 00:23:09,930 It's almost Fibonacci. 471 00:23:09,930 --> 00:23:13,290 If I hid this plus 1, which you wanted to do, 472 00:23:13,290 --> 00:23:16,210 then it would be exactly Fibonacci. 473 00:23:16,210 --> 00:23:21,190 Well, that's actually good, because in particular, n sub h 474 00:23:21,190 --> 00:23:23,620 is bigger than Fibonacci. 475 00:23:23,620 --> 00:23:25,220 If you add one at every single level, 476 00:23:25,220 --> 00:23:27,050 the certainly you get something bigger 477 00:23:27,050 --> 00:23:28,560 than the base Fibonacci sequence. 478 00:23:31,479 --> 00:23:33,520 Now, hopefully you know Fibonacci is exponential. 479 00:23:39,570 --> 00:23:40,570 I have an exact formula. 480 00:23:45,100 --> 00:23:48,990 If you take the golden ratio to the power h, divide by root 5, 481 00:23:48,990 --> 00:23:50,600 and round to the nearest integer, 482 00:23:50,600 --> 00:23:52,320 you get exactly the Fibonacci number. 483 00:23:52,320 --> 00:23:53,307 Crazy stuff. 484 00:23:53,307 --> 00:23:54,890 We don't need to know why that's true. 485 00:23:54,890 --> 00:23:56,800 Just take it as fact. 486 00:23:56,800 --> 00:23:59,292 And conveniently phi is bigger than 1. 487 00:23:59,292 --> 00:24:00,750 You don't need to remember what phi 488 00:24:00,750 --> 00:24:02,240 is, except it is bigger than 1. 489 00:24:02,240 --> 00:24:04,900 And so this is an exponential bound. 490 00:24:04,900 --> 00:24:07,560 This is good news. 491 00:24:07,560 --> 00:24:13,160 So I'll tell you it's about 1.618. 492 00:24:13,160 --> 00:24:16,420 And so we get is that-- if we invert this, 493 00:24:16,420 --> 00:24:21,410 this says n sub h is bigger than some phi to the h. 494 00:24:21,410 --> 00:24:23,334 This is our n, basically. 495 00:24:23,334 --> 00:24:24,750 What we really want to know is how 496 00:24:24,750 --> 00:24:29,420 h relates to n, which is just inverting this formula. 497 00:24:29,420 --> 00:24:32,090 So we have, on the other hand, the phi 498 00:24:32,090 --> 00:24:37,260 to the h divided by root 5 is less than n. 499 00:24:40,960 --> 00:24:45,720 So I got a log base phi on both sides. 500 00:24:45,720 --> 00:24:47,640 Seems like a good thing to do. 501 00:24:50,560 --> 00:24:52,200 This is actually quite annoying. 502 00:24:52,200 --> 00:24:55,210 I've got h minus a tiny little thing. 503 00:24:55,210 --> 00:25:01,920 It's less than log base phi of n. 504 00:25:01,920 --> 00:25:08,900 And I will tell you that is about 1.440 times log base 2 505 00:25:08,900 --> 00:25:11,700 of n, because after all, log base 2 is what computer 506 00:25:11,700 --> 00:25:13,800 scientists care about. 507 00:25:13,800 --> 00:25:16,690 So just to put it into perspective. 508 00:25:16,690 --> 00:25:19,170 We want it to be theta log base 2 of n. 509 00:25:19,170 --> 00:25:20,370 And here's the bound. 510 00:25:20,370 --> 00:25:23,532 The height is always less than 1.44 times log n. 511 00:25:23,532 --> 00:25:24,990 All we care about is some constant, 512 00:25:24,990 --> 00:25:27,800 but this is a pretty good constant. 513 00:25:27,800 --> 00:25:28,900 We'd like one. 514 00:25:28,900 --> 00:25:33,040 There are binary search tress that achieve 1, plus very, 515 00:25:33,040 --> 00:25:39,360 very tiny thing, arbitrarily tiny, but this is pretty good. 516 00:25:39,360 --> 00:25:41,830 Now, if you don't know Fibonacci numbers, 517 00:25:41,830 --> 00:25:45,810 I pull a rabbit out of a hat and I've got this phi to the h. 518 00:25:45,810 --> 00:25:48,530 It's kind of magical. 519 00:25:48,530 --> 00:25:51,055 There's a much easier way to analyze this recurrence. 520 00:25:53,690 --> 00:25:57,810 I'll just tell you because it's good to know but not 521 00:25:57,810 --> 00:25:58,550 super critical. 522 00:26:09,880 --> 00:26:11,830 So we have this recurrence, n sub h. 523 00:26:15,960 --> 00:26:19,110 This is the computer scientist way to solve the recurrence. 524 00:26:19,110 --> 00:26:21,190 We don't care about the constants. 525 00:26:21,190 --> 00:26:22,940 This is the theoretical computer scientist 526 00:26:22,940 --> 00:26:24,450 way to solve this recurrence. 527 00:26:24,450 --> 00:26:25,700 We don't care about constants. 528 00:26:25,700 --> 00:26:28,020 And so we say, aw, this is hard. 529 00:26:28,020 --> 00:26:31,610 I've got n sub h minus 1 and n sub h minus 2. 530 00:26:31,610 --> 00:26:33,360 So asymmetric. 531 00:26:33,360 --> 00:26:35,900 Let's symmetrify. 532 00:26:35,900 --> 00:26:39,030 Could I make them both n sub h minus 1. 533 00:26:39,030 --> 00:26:41,440 Or could I make them both n sub h minus 2? 534 00:26:44,310 --> 00:26:45,920 Suggestions? 535 00:26:45,920 --> 00:26:47,600 AUDIENCE: [INAUDIBLE]. 536 00:26:47,600 --> 00:26:49,220 PROFESSOR: Minus 2 is the right way 537 00:26:49,220 --> 00:26:52,130 to go because I want to know n sub h is greater than something 538 00:26:52,130 --> 00:26:54,400 in order to get a less than down here. 539 00:26:54,400 --> 00:26:57,210 By the way, I use that log is monatomic here, 540 00:26:57,210 --> 00:26:58,770 but it is, so we're good. 541 00:26:58,770 --> 00:27:01,080 So this is going to be greater than 1 542 00:27:01,080 --> 00:27:07,880 plus 2 times n sub h minus 2. 543 00:27:07,880 --> 00:27:10,432 Because if I have a larger height 544 00:27:10,432 --> 00:27:11,640 I'm going to have more nodes. 545 00:27:11,640 --> 00:27:15,140 That's an easy proof by induction. 546 00:27:15,140 --> 00:27:17,120 So I can combine these into one term. 547 00:27:17,120 --> 00:27:17,910 It's simpler. 548 00:27:17,910 --> 00:27:21,040 I can get rid of this 1 because that only makes things bigger. 549 00:27:21,040 --> 00:27:22,960 So I just have this. 550 00:27:22,960 --> 00:27:24,800 OK, now I need a base case, but this 551 00:27:24,800 --> 00:27:27,800 looks like 2 the something. 552 00:27:27,800 --> 00:27:29,190 What's the something? 553 00:27:29,190 --> 00:27:30,190 H over 2. 554 00:27:32,780 --> 00:27:35,330 So I'll just write theta to avoid the base case. 555 00:27:35,330 --> 00:27:37,180 2 to the h over 2. 556 00:27:37,180 --> 00:27:41,780 Every two steps of h, I get another factor of 2. 557 00:27:41,780 --> 00:27:43,970 So when you invert and do the log, 558 00:27:43,970 --> 00:27:50,130 this means that h is also less than log base 2 of n. 559 00:27:50,130 --> 00:27:51,690 Log base 2 because of that. 560 00:27:51,690 --> 00:27:55,060 Factor 2 out here because of that factor 2 561 00:27:55,060 --> 00:27:57,000 when you take the log. 562 00:27:57,000 --> 00:27:59,730 And so the real answer is 1.44. 563 00:27:59,730 --> 00:28:03,190 This is the correct-- this is the worst case. 564 00:28:03,190 --> 00:28:05,780 But it's really easy to prove that it's, at most, 2 log n. 565 00:28:05,780 --> 00:28:07,540 So keep this in mind in case we ask 566 00:28:07,540 --> 00:28:10,840 you to analyze variance of AVL trees, 567 00:28:10,840 --> 00:28:12,900 like in problem set three. 568 00:28:12,900 --> 00:28:14,960 This is the easy way to do it and just get 569 00:28:14,960 --> 00:28:18,070 some constant times log n. 570 00:28:18,070 --> 00:28:19,600 Clear? 571 00:28:19,600 --> 00:28:23,590 All right, so that's AVL trees, why they're balanced. 572 00:28:23,590 --> 00:28:27,070 And so if we can achieve this property, 573 00:28:27,070 --> 00:28:30,090 that the left and right subtrees have about the same height, 574 00:28:30,090 --> 00:28:31,890 we'll be done. 575 00:28:31,890 --> 00:28:35,370 So how the heck do we maintain that property? 576 00:28:42,090 --> 00:28:43,810 Let's go over here. 577 00:28:58,980 --> 00:29:00,580 Mobius trees are supposed to support 578 00:29:00,580 --> 00:29:03,270 a whole bunch of operations, but in particular, insert 579 00:29:03,270 --> 00:29:05,130 and delete. 580 00:29:05,130 --> 00:29:09,060 I'm just going to worry about insert today. 581 00:29:09,060 --> 00:29:11,410 Delete is almost identical. 582 00:29:11,410 --> 00:29:14,340 And it's in the code that corresponds to this lecture, 583 00:29:14,340 --> 00:29:16,420 so you can take a look at it. 584 00:29:16,420 --> 00:29:17,380 Very, very similar. 585 00:29:24,930 --> 00:29:25,930 Let's start with insert. 586 00:29:29,790 --> 00:29:33,190 Well, it's pretty straightforward. 587 00:29:33,190 --> 00:29:35,390 Our algorithm is as follows. 588 00:29:35,390 --> 00:29:42,780 We do the simple BST insertion, which we already saw, 589 00:29:42,780 --> 00:29:45,280 which is you walk down the tree to find where that key fits. 590 00:29:45,280 --> 00:29:46,950 You search for that key. 591 00:29:46,950 --> 00:29:49,830 And wherever it isn't, you insert a node there, 592 00:29:49,830 --> 00:29:52,010 insert a new leaf, and add it in. 593 00:29:52,010 --> 00:29:54,690 Now, this will not preserve the AVL property. 594 00:29:54,690 --> 00:29:57,000 So the second step is fix the AVL property. 595 00:30:03,460 --> 00:30:06,520 And there's a nice concise description of AVL insertion. 596 00:30:06,520 --> 00:30:10,700 Of course, how do you do step two is the interesting part. 597 00:30:10,700 --> 00:30:14,440 All right, maybe let's start with an example. 598 00:30:14,440 --> 00:30:15,380 That could be fun. 599 00:30:22,296 --> 00:30:24,695 Hey, look, here's an example. 600 00:30:24,695 --> 00:30:26,070 And to match the notes, I'm going 601 00:30:26,070 --> 00:30:33,230 to do insert 23 as a first example. 602 00:30:33,230 --> 00:30:36,720 OK, I'm also going to annotate this tree a little bit. 603 00:30:36,720 --> 00:30:39,160 So I said we store the heights, but what 604 00:30:39,160 --> 00:30:44,370 I care about is which height is larger, the left or the right. 605 00:30:44,370 --> 00:30:46,090 In fact, you could just store that, 606 00:30:46,090 --> 00:30:47,830 just store whether it's plus 1, minus 1, 607 00:30:47,830 --> 00:30:50,270 or 0, the difference between left and right sides. 608 00:30:50,270 --> 00:30:52,840 So I'm going to draw that with a little icon, which 609 00:30:52,840 --> 00:30:56,740 is a left arrow, a descending left arrow if this 610 00:30:56,740 --> 00:30:59,748 is the bigger side. 611 00:30:59,748 --> 00:31:01,910 And this is a right arrow. 612 00:31:01,910 --> 00:31:02,730 This is even. 613 00:31:02,730 --> 00:31:04,330 Left and right are the same. 614 00:31:04,330 --> 00:31:07,420 Here, the left is heavier, or higher, I guess. 615 00:31:07,420 --> 00:31:08,240 Here it's even. 616 00:31:08,240 --> 00:31:11,720 Here it's left. 617 00:31:11,720 --> 00:31:14,580 This is AVL, because it's only one 618 00:31:14,580 --> 00:31:17,010 heavier wherever I have an arrow. 619 00:31:17,010 --> 00:31:19,220 OK, now I insert 23. 620 00:31:19,220 --> 00:31:24,030 23 belongs-- it's less than 41, greater than 20, less than 29, 621 00:31:24,030 --> 00:31:25,140 less than 26. 622 00:31:25,140 --> 00:31:27,800 So it belongs here. 623 00:31:27,800 --> 00:31:31,319 Here's 23, a brand-new node. 624 00:31:31,319 --> 00:31:32,610 OK, now all the heights change. 625 00:31:32,610 --> 00:31:36,680 And it's annoying to draw what the heights are, 626 00:31:36,680 --> 00:31:39,150 but I'll do it. 627 00:31:39,150 --> 00:31:41,320 This one changes to 1. 628 00:31:41,320 --> 00:31:43,370 This is 0. 629 00:31:43,370 --> 00:31:44,850 This changes to 2. 630 00:31:44,850 --> 00:31:46,450 This changes to 3. 631 00:31:46,450 --> 00:31:49,160 This changes to 4. 632 00:31:49,160 --> 00:31:51,890 Anyway, never mind what the heights are. 633 00:31:51,890 --> 00:31:53,960 What's bad is, well, this guy's even. 634 00:31:53,960 --> 00:31:55,590 This guy's left heavy. 635 00:31:55,590 --> 00:31:58,090 This guy's now doubly left heavy. 636 00:31:58,090 --> 00:32:00,650 Bad news. 637 00:32:00,650 --> 00:32:02,410 OK, let's not worry about above that. 638 00:32:02,410 --> 00:32:03,510 Let's just start. 639 00:32:03,510 --> 00:32:05,270 The algorithm is going to walk up the tree 640 00:32:05,270 --> 00:32:08,220 and say, oh, when do I get something bad? 641 00:32:08,220 --> 00:32:11,650 So now I have 23, 26, 29 in a path. 642 00:32:11,650 --> 00:32:14,840 I'd like to fix it. 643 00:32:14,840 --> 00:32:18,023 Hmm, how to fix it? 644 00:32:18,023 --> 00:32:21,500 I don't think we know how to fix it, so I will tell you how. 645 00:32:26,824 --> 00:32:28,240 Actually, I wasn't here last week. 646 00:32:28,240 --> 00:32:29,480 So did we cover rotations? 647 00:32:29,480 --> 00:32:30,360 AUDIENCE: No. 648 00:32:30,360 --> 00:32:31,480 PROFESSOR: OK, good. 649 00:32:31,480 --> 00:32:32,820 Then you don't know. 650 00:32:32,820 --> 00:32:35,540 Let me tell you about rotations. 651 00:32:35,540 --> 00:32:36,310 Super cool. 652 00:32:47,280 --> 00:32:48,180 It's just a tool. 653 00:33:12,598 --> 00:33:14,398 That's x and y. 654 00:33:21,254 --> 00:33:22,420 I always get these mixed up. 655 00:33:22,420 --> 00:33:33,255 So this is called left rotate of x. 656 00:33:37,342 --> 00:33:39,800 OK, so here's the thing we can do with binary search trees. 657 00:33:39,800 --> 00:33:41,636 It's like the only thing you need to know. 658 00:33:41,636 --> 00:33:43,680 Because you've got search in binary search trees 659 00:33:43,680 --> 00:33:46,210 and you've got rotations. 660 00:33:46,210 --> 00:33:48,877 So when I have a tree like this, I've highlighted two nodes, 661 00:33:48,877 --> 00:33:50,960 and then there's the children hanging off of them. 662 00:33:50,960 --> 00:33:53,390 Some of these might be empty, but they're trees, 663 00:33:53,390 --> 00:33:56,510 so we draw them as triangles. 664 00:33:56,510 --> 00:33:59,810 If I just do this, which is like changing 665 00:33:59,810 --> 00:34:02,980 which is higher, x or y, and whatever the parent of x was 666 00:34:02,980 --> 00:34:05,160 becomes the parent of y. 667 00:34:05,160 --> 00:34:06,650 And vice versa, in fact. 668 00:34:06,650 --> 00:34:10,340 The parent of y was x, and now the parent of x is y. 669 00:34:10,340 --> 00:34:14,030 OK, the parent of a is still x. 670 00:34:14,030 --> 00:34:15,270 The parent of b changes. 671 00:34:15,270 --> 00:34:16,120 It used to be y. 672 00:34:16,120 --> 00:34:17,510 Now it's x. 673 00:34:17,510 --> 00:34:19,230 The parent of c was y. 674 00:34:19,230 --> 00:34:20,370 It's still y. 675 00:34:20,370 --> 00:34:23,000 So in a constant number of pointer changes, 676 00:34:23,000 --> 00:34:24,663 I can go from this to this. 677 00:34:24,663 --> 00:34:25,870 This is constant time. 678 00:34:29,600 --> 00:34:32,380 And more importantly, it satisfies the BST order 679 00:34:32,380 --> 00:34:33,070 property. 680 00:34:33,070 --> 00:34:35,469 If you do an in-order traversal of this, 681 00:34:35,469 --> 00:34:39,095 you will get a, x, b, y, c. 682 00:34:42,437 --> 00:34:46,778 If I do an in-order traversal over here, I get a, x, b, y, c. 683 00:34:50,600 --> 00:34:52,110 So they're the same. 684 00:34:52,110 --> 00:34:54,139 So it still has BST ordering. 685 00:34:54,139 --> 00:34:55,330 You can check more formally. 686 00:34:55,330 --> 00:34:57,570 b has all the nodes between x and y. 687 00:34:57,570 --> 00:35:01,400 Still all the nodes between x and y, and so on. 688 00:35:01,400 --> 00:35:03,840 You can check it at home, but this works. 689 00:35:03,840 --> 00:35:08,256 We call it a left rotate because the root moves to the left. 690 00:35:08,256 --> 00:35:10,130 You can go straight back where you came from. 691 00:35:10,130 --> 00:35:12,310 This would be a right rotate of y. 692 00:35:17,440 --> 00:35:19,580 OK, it's a reversible operation. 693 00:35:19,580 --> 00:35:21,190 It lets you manipulate the tree. 694 00:35:21,190 --> 00:35:22,970 So when we have this picture and we're 695 00:35:22,970 --> 00:35:26,450 really sad because this looks like a mess, what 696 00:35:26,450 --> 00:35:27,810 we'd like to do is fix it. 697 00:35:27,810 --> 00:35:31,050 This is a path of three nodes. 698 00:35:31,050 --> 00:35:34,445 We'd really prefer it to look like this. 699 00:35:34,445 --> 00:35:37,660 If we could make that transformation, we'd be happy. 700 00:35:37,660 --> 00:35:38,740 And we can. 701 00:35:38,740 --> 00:35:44,085 It is a right rotate of 29. 702 00:35:44,085 --> 00:35:45,460 So that's what we're going to do. 703 00:36:01,810 --> 00:36:03,700 So let me quickly copy. 704 00:36:14,690 --> 00:36:16,810 I want to rotate 29 to the right, which 705 00:36:16,810 --> 00:36:18,930 means 29 and 26-- this is x. 706 00:36:18,930 --> 00:36:20,030 This is y. 707 00:36:20,030 --> 00:36:24,250 I turn them, and so I get 26 here now, 708 00:36:24,250 --> 00:36:27,100 and 29 is the new right child. 709 00:36:27,100 --> 00:36:28,600 And then whatever was the left child 710 00:36:28,600 --> 00:36:31,570 of x becomes the left child of x in the picture. 711 00:36:31,570 --> 00:36:32,640 You can check it. 712 00:36:32,640 --> 00:36:34,790 So this used to be the triangle a. 713 00:36:34,790 --> 00:36:38,830 And in this case, it's just the node 23. 714 00:36:38,830 --> 00:36:40,385 And we are happy. 715 00:36:40,385 --> 00:36:42,470 Except I didn't draw the whole tree. 716 00:36:42,470 --> 00:36:46,250 Now we're happy because we have an AVL tree again. 717 00:36:46,250 --> 00:36:48,250 Good news. 718 00:36:48,250 --> 00:36:50,310 So just check. 719 00:36:50,310 --> 00:36:50,970 This is even. 720 00:36:50,970 --> 00:36:52,000 This is right heavy. 721 00:36:52,000 --> 00:36:52,970 This is even. 722 00:36:52,970 --> 00:36:56,360 This is left heavy still. 723 00:36:56,360 --> 00:37:01,530 This is left heavy, even, even, even. 724 00:37:01,530 --> 00:37:07,325 OK, so now we have an AVL tree and our beauty is restored. 725 00:37:07,325 --> 00:37:08,640 I'll do one more example. 726 00:37:15,830 --> 00:37:16,755 Insert 55. 727 00:37:23,090 --> 00:37:24,690 We want to insert 55 here. 728 00:37:27,260 --> 00:37:29,850 And what changes is now this is even. 729 00:37:29,850 --> 00:37:32,990 This is right heavy. 730 00:37:32,990 --> 00:37:35,180 This is doubly left heavy. 731 00:37:35,180 --> 00:37:36,500 We're super sad. 732 00:37:36,500 --> 00:37:39,320 And then we don't look above that until later. 733 00:37:43,890 --> 00:37:48,240 This is more annoying, because you 734 00:37:48,240 --> 00:37:52,570 look at this thing, this little path. 735 00:37:52,570 --> 00:37:55,360 It's a zigzag path, if you will. 736 00:37:55,360 --> 00:37:58,370 If I do a right rotation where this is x and this 737 00:37:58,370 --> 00:38:03,260 is y, what I'll get is x, y, and then this is b. 738 00:38:03,260 --> 00:38:06,090 This is what's in between x and y. 739 00:38:06,090 --> 00:38:08,510 And so it'll go here. 740 00:38:08,510 --> 00:38:11,510 And now it's a zag zig path, which is no better. 741 00:38:11,510 --> 00:38:13,390 The height's the same. 742 00:38:13,390 --> 00:38:14,040 And we're sad. 743 00:38:17,910 --> 00:38:19,660 I told you, though, that somehow rotations 744 00:38:19,660 --> 00:38:20,760 are all we need to do. 745 00:38:24,086 --> 00:38:25,050 What can I do? 746 00:38:28,130 --> 00:38:31,140 How could I fix this little zigzag? 747 00:38:31,140 --> 00:38:33,110 Just need to think about those three nodes, 748 00:38:33,110 --> 00:38:35,900 but all I give you are rotations. 749 00:38:35,900 --> 00:38:38,400 AUDIENCE: Perhaps rotate 50. 750 00:38:38,400 --> 00:38:39,620 PROFESSOR: Maybe rotate 50. 751 00:38:39,620 --> 00:38:41,130 That seems like a good idea. 752 00:38:41,130 --> 00:38:41,808 Let's try it. 753 00:38:44,440 --> 00:38:46,420 If you don't mind, I'm just going to write 41, 754 00:38:46,420 --> 00:38:48,790 and then there's all the stuff on the left. 755 00:38:48,790 --> 00:38:49,990 Now we rotate 50. 756 00:38:49,990 --> 00:38:53,190 So 65 remains where it is. 757 00:38:53,190 --> 00:38:55,810 And we rotate 50 to the left. 758 00:38:55,810 --> 00:38:57,050 So 50 and its child. 759 00:38:57,050 --> 00:38:57,550 This is x. 760 00:38:57,550 --> 00:38:59,320 This is y. 761 00:38:59,320 --> 00:39:04,735 And so I get 55 and I get 50. 762 00:39:07,800 --> 00:39:09,710 Now, this is bad from an AVL perspective. 763 00:39:09,710 --> 00:39:13,470 This is still doubly left heavy, this is left heavy, 764 00:39:13,470 --> 00:39:15,820 and this is even. 765 00:39:15,820 --> 00:39:18,690 But it looks like this case. 766 00:39:18,690 --> 00:39:22,460 And so now I can do a right rotation on 65, 767 00:39:22,460 --> 00:39:27,090 and I will get-- so let me order the diagrams here. 768 00:39:27,090 --> 00:39:31,480 I do a right rotate on 65, and I will get 41. 769 00:39:31,480 --> 00:39:34,260 And to the right I get 55. 770 00:39:34,260 --> 00:39:37,012 And to the right I get 65. 771 00:39:37,012 --> 00:39:38,805 To the left I get 50. 772 00:39:41,930 --> 00:39:44,600 And then I get the left subtree. 773 00:39:44,600 --> 00:39:48,400 And so now this is even, even, even. 774 00:39:48,400 --> 00:39:49,950 Wow. 775 00:39:49,950 --> 00:39:51,140 How high was left subtree? 776 00:39:51,140 --> 00:39:54,780 I think it's still left heavy. 777 00:39:54,780 --> 00:39:55,850 Cool. 778 00:39:55,850 --> 00:39:58,600 This is what some people call double rotation, 779 00:39:58,600 --> 00:40:01,370 but I like to call it two rotations. 780 00:40:01,370 --> 00:40:03,687 It's whatever you prefer. 781 00:40:03,687 --> 00:40:05,020 It's not really a new operation. 782 00:40:05,020 --> 00:40:06,790 It's just doing two rotations. 783 00:40:06,790 --> 00:40:08,010 So that's an example. 784 00:40:08,010 --> 00:40:09,460 Let's do the general case. 785 00:40:09,460 --> 00:40:11,164 It's no harder. 786 00:40:11,164 --> 00:40:13,330 You might say, oh, gosh, why do you do two examples? 787 00:40:13,330 --> 00:40:14,750 Well, because they were different. 788 00:40:14,750 --> 00:40:16,541 And they're are two cases on the algorithm. 789 00:40:16,541 --> 00:40:18,190 You need to know both of them. 790 00:40:18,190 --> 00:40:21,161 OK, so AVL insert. 791 00:40:21,161 --> 00:40:21,660 Here we go. 792 00:40:21,660 --> 00:40:22,700 Fix AVL property. 793 00:40:33,459 --> 00:40:42,500 I'm just going to call this from the changed node up. 794 00:40:42,500 --> 00:40:44,900 So the one thing that's missing from these examples 795 00:40:44,900 --> 00:40:48,160 is that you might have to do more than two rotations. 796 00:40:48,160 --> 00:40:51,090 What we did was look at the lowest violation of the AVL 797 00:40:51,090 --> 00:40:52,524 property and we fixed it. 798 00:40:52,524 --> 00:40:53,940 When we do that, there's still may 799 00:40:53,940 --> 00:40:58,840 be violations higher up, because when you add a node, 800 00:40:58,840 --> 00:41:00,620 you change the height of this subtree, 801 00:41:00,620 --> 00:41:02,911 the height of this subtree, the height of this subtree, 802 00:41:02,911 --> 00:41:04,840 and the height of this subtree, potentially. 803 00:41:04,840 --> 00:41:07,170 What happened in these cases when I was done, 804 00:41:07,170 --> 00:41:08,667 what I did fixed one violation. 805 00:41:08,667 --> 00:41:09,500 They were all fixed. 806 00:41:09,500 --> 00:41:13,920 But in general, there might be several violations up the tree. 807 00:41:13,920 --> 00:41:16,510 So that's what we do. 808 00:41:20,440 --> 00:41:22,000 Yeah, I'll leave it at that. 809 00:41:25,510 --> 00:41:33,785 So suppose x is the lowest node that is not AVL. 810 00:41:40,879 --> 00:41:42,920 The way we find that node is we start at the node 811 00:41:42,920 --> 00:41:44,040 that we changed. 812 00:41:44,040 --> 00:41:45,800 We check if that's OK. 813 00:41:45,800 --> 00:41:50,900 We update the heights as we go up using our simple rule. 814 00:41:50,900 --> 00:41:55,120 And that's actually not our simple rule, but it's erased. 815 00:41:55,120 --> 00:41:57,625 We update the height based on the heights of its children. 816 00:41:57,625 --> 00:41:59,000 And you keep walking up until you 817 00:41:59,000 --> 00:42:04,010 see, oh, the left is twice, two times-- or not two times, 818 00:42:04,010 --> 00:42:07,870 but plus 2 larger than the left, or vice versa. 819 00:42:07,870 --> 00:42:10,230 Then you say, oh, that's bad. 820 00:42:10,230 --> 00:42:12,980 And so we fix it. 821 00:42:12,980 --> 00:42:14,060 Yeah, question. 822 00:42:14,060 --> 00:42:16,895 AUDIENCE: So here we continue to [INAUDIBLE]. 823 00:42:16,895 --> 00:42:17,520 PROFESSOR: Yes. 824 00:42:17,520 --> 00:42:18,436 AUDIENCE: [INAUDIBLE]. 825 00:42:23,102 --> 00:42:27,476 add n to the level [INAUDIBLE] than 1. 826 00:42:27,476 --> 00:42:30,890 So add [INAUDIBLE]. 827 00:42:30,890 --> 00:42:34,250 PROFESSOR: AVL property's not about levels. 828 00:42:34,250 --> 00:42:37,000 It's about left subtrees and right subtrees. 829 00:42:37,000 --> 00:42:39,720 So the trouble is that 65-- you have a left subtree, which 830 00:42:39,720 --> 00:42:44,250 has height 2-- or sorry, height 1, I guess-- 831 00:42:44,250 --> 00:42:47,380 because the longest path from here to a leaf is 1. 832 00:42:47,380 --> 00:42:49,630 The right subtree has height negative 1 833 00:42:49,630 --> 00:42:50,750 because it doesn't exist. 834 00:42:50,750 --> 00:42:51,810 So it's one versus negative 1. 835 00:42:51,810 --> 00:42:53,351 So that's why there's a double arrow. 836 00:42:53,351 --> 00:42:54,140 Yeah, good to ask. 837 00:42:54,140 --> 00:42:56,030 It's weird with the negative 1s. 838 00:42:56,030 --> 00:42:58,560 That's also why I wanted to define those negative 1s to be 839 00:42:58,560 --> 00:43:02,420 there, so the AVL property is easier to state. 840 00:43:02,420 --> 00:43:05,180 Other questions? 841 00:43:05,180 --> 00:43:07,570 All right. 842 00:43:07,570 --> 00:43:08,070 Good. 843 00:43:08,070 --> 00:43:10,810 I think I want a symmetry assumption here. 844 00:43:21,750 --> 00:43:24,050 I don't know why I wrote right of x. 845 00:43:24,050 --> 00:43:28,810 I guess in modern days we write x dot right. 846 00:43:28,810 --> 00:43:31,020 Same thing. 847 00:43:31,020 --> 00:43:34,090 OK, I'm going to assume that the right child is the heavier 848 00:43:34,090 --> 00:43:37,350 one like we did before. 849 00:43:37,350 --> 00:43:38,120 Could be the left. 850 00:43:38,120 --> 00:43:39,431 It's symmetric. 851 00:43:39,431 --> 00:43:40,180 It doesn't matter. 852 00:43:56,950 --> 00:43:58,700 So now there are two cases, like I said. 853 00:44:14,570 --> 00:44:16,240 I'm going to use this term right heavy 854 00:44:16,240 --> 00:44:17,490 because it's super convenient. 855 00:44:22,735 --> 00:44:24,110 OK, right heavy is what I've been 856 00:44:24,110 --> 00:44:26,040 drawing by a descending right arrow. 857 00:44:26,040 --> 00:44:29,280 Balance is what I've been drawing by a horizontal line. 858 00:44:29,280 --> 00:44:32,440 OK, so we're just distinguishing between these two cases. 859 00:44:32,440 --> 00:44:36,000 This turns out to be the easy case. 860 00:44:36,000 --> 00:44:43,227 So we have x, y, a, b, c. 861 00:44:43,227 --> 00:44:44,810 Why are we looking at the right child? 862 00:44:44,810 --> 00:44:47,730 Because we assumed that the right one is higher, so that x 863 00:44:47,730 --> 00:44:49,380 was right heavy. 864 00:44:49,380 --> 00:44:52,190 So this subtree as I've drawn it is higher than the left one 865 00:44:52,190 --> 00:44:55,040 by 2, in fact. 866 00:44:55,040 --> 00:44:59,800 And what we do in this case is right rotate of x. 867 00:44:59,800 --> 00:45:07,162 And so we get x, y, a, b, c. 868 00:45:07,162 --> 00:45:09,370 I could have drawn this no matter what case we're in, 869 00:45:09,370 --> 00:45:12,150 so we need to check this actually works. 870 00:45:12,150 --> 00:45:13,350 That's the interesting part. 871 00:45:13,350 --> 00:45:15,170 And that's over here. 872 00:45:17,700 --> 00:45:21,450 OK, so I said x is right heavy, in fact doubly so. 873 00:45:21,450 --> 00:45:25,250 y is either right heavy or balanced. 874 00:45:25,250 --> 00:45:28,260 Let's start with right heavy. 875 00:45:28,260 --> 00:45:33,710 So when we do this rotation, what happens to the heights? 876 00:45:33,710 --> 00:45:39,431 Well, it's hard to tell. 877 00:45:39,431 --> 00:45:41,930 It's a lot easier to think about what the actual heights are 878 00:45:41,930 --> 00:45:43,440 than just these arrows. 879 00:45:43,440 --> 00:45:45,030 So let's suppose x has height k. 880 00:45:45,030 --> 00:45:46,070 That's pretty generic. 881 00:45:48,640 --> 00:45:50,230 And it's right heavy, so that means 882 00:45:50,230 --> 00:45:54,450 the y has height k minus 1. 883 00:45:54,450 --> 00:45:58,620 And then this is right heavy, so this has height k minus 2. 884 00:45:58,620 --> 00:46:01,466 And this is something smaller then k minus 2. 885 00:46:01,466 --> 00:46:03,867 In fact, because this is AVL, we assume 886 00:46:03,867 --> 00:46:05,450 that x was the lowest that is not AVL. 887 00:46:05,450 --> 00:46:07,500 So y is AVL. 888 00:46:07,500 --> 00:46:10,920 And so this is going to be k minus 3, 889 00:46:10,920 --> 00:46:15,155 and this is going to be k minus 3 because these differ by 2. 890 00:46:15,155 --> 00:46:17,030 You can prove by a simple induction you never 891 00:46:17,030 --> 00:46:21,310 get more than 2 out of whack because we're just adding 1, 892 00:46:21,310 --> 00:46:22,470 off by 1. 893 00:46:22,470 --> 00:46:23,900 So we got off by 2. 894 00:46:23,900 --> 00:46:25,280 So this is the bad situation. 895 00:46:25,280 --> 00:46:27,710 Now we can just update the heights over here. 896 00:46:27,710 --> 00:46:32,770 So k minus 3 for a, k minus 3 for b, k minus 2 for c. 897 00:46:32,770 --> 00:46:35,420 Those don't change because we didn't touch those trees, 898 00:46:35,420 --> 00:46:38,050 and height is about going down, not up. 899 00:46:38,050 --> 00:46:43,190 And so this becomes k minus 2, and this becomes k minus 1. 900 00:46:43,190 --> 00:46:45,570 And so we changed the height of the root, 901 00:46:45,570 --> 00:46:47,460 but now you can see that life is good. 902 00:46:47,460 --> 00:46:50,280 This is now balanced between k minus 3 and k minus 3. 903 00:46:50,280 --> 00:46:53,810 This is now balanced between k minus 2 and k minus 2. 904 00:46:53,810 --> 00:46:56,290 And now the parent of y may be messed up, 905 00:46:56,290 --> 00:47:00,050 and that's why after this we go to the parent of y, 906 00:47:00,050 --> 00:47:02,300 see if it's messed up, but keep working our way up. 907 00:47:04,830 --> 00:47:05,520 But it worked. 908 00:47:08,220 --> 00:47:10,100 And in the interest of time, I will not 909 00:47:10,100 --> 00:47:12,790 check the case where y is balanced, 910 00:47:12,790 --> 00:47:14,400 but it works out, too. 911 00:47:14,400 --> 00:47:16,165 And see the notes. 912 00:47:18,670 --> 00:47:53,160 So the other case is where we do two rotations. 913 00:47:53,160 --> 00:47:58,670 And in general, so here x was doubly right heavy. 914 00:47:58,670 --> 00:48:03,010 And the else case is when the right child 915 00:48:03,010 --> 00:48:06,850 of x, which I'm going to call z here, is left heavy. 916 00:48:06,850 --> 00:48:09,590 That's the one remaining situation. 917 00:48:09,590 --> 00:48:11,620 You do the same thing, and you check 918 00:48:11,620 --> 00:48:13,560 that right rotating and left rotating, which 919 00:48:13,560 --> 00:48:18,210 makes the nice picture, which is x, y, z, 920 00:48:18,210 --> 00:48:22,530 actually balances everything and you restore the AVL property. 921 00:48:22,530 --> 00:48:26,670 So again, check the notes on that. 922 00:48:26,670 --> 00:48:29,375 I have a couple minutes left, and instead I'd 923 00:48:29,375 --> 00:48:31,000 like to tell you a little bit about how 924 00:48:31,000 --> 00:48:32,680 this fits into big-picture land. 925 00:48:38,107 --> 00:48:39,440 Two things I want to talk about. 926 00:48:39,440 --> 00:48:43,180 One is you could use this, of course, 927 00:48:43,180 --> 00:48:48,670 to sort, which is, if you want to sort n numbers, 928 00:48:48,670 --> 00:48:54,900 you insert them and you do in-order traversal. 929 00:48:58,740 --> 00:49:01,060 How long does this take? 930 00:49:01,060 --> 00:49:05,370 In-order traversal takes linear time. 931 00:49:05,370 --> 00:49:08,370 That's the sense in which we're storing things in sorted order. 932 00:49:08,370 --> 00:49:12,900 Inserting n items-- well, each insert takes h time, 933 00:49:12,900 --> 00:49:15,140 but now we're guaranteed that h is order log n. 934 00:49:15,140 --> 00:49:20,460 So all the insertions take log n time each, n log n total. 935 00:49:20,460 --> 00:49:23,390 So this is yet another way to sort n items in n log n time, 936 00:49:23,390 --> 00:49:26,800 in some ways the most powerful way. 937 00:49:26,800 --> 00:49:29,400 We've seen heaps, and we've seen merge sort. 938 00:49:29,400 --> 00:49:31,080 They all sort. 939 00:49:31,080 --> 00:49:35,710 Heaps let you do two operations, insert and delete min, which 940 00:49:35,710 --> 00:49:39,750 a lot of times is all you care about, like in p set two. 941 00:49:39,750 --> 00:49:42,920 But these guys, AVL trees, let you 942 00:49:42,920 --> 00:49:47,130 do insert, delete, and delete min. 943 00:49:47,130 --> 00:49:49,360 So they're the same in those senses, 944 00:49:49,360 --> 00:49:51,420 but we have the new operation, which 945 00:49:51,420 --> 00:49:56,570 is that we can do find next larger and next smaller, aka 946 00:49:56,570 --> 00:49:59,080 successor and predecessor. 947 00:49:59,080 --> 00:50:06,120 So you can think about what we call an abstract data type. 948 00:50:06,120 --> 00:50:08,870 These are the operations that you support, 949 00:50:08,870 --> 00:50:11,260 or that you're supposed to support. 950 00:50:11,260 --> 00:50:14,344 If you're into Java, you call this an interface. 951 00:50:14,344 --> 00:50:16,010 But this is an algorithmic specification 952 00:50:16,010 --> 00:50:18,630 of what your data structure is supposed to do. 953 00:50:18,630 --> 00:50:24,080 So we have operations like insert and delete. 954 00:50:24,080 --> 00:50:26,632 We have operations like find the min 955 00:50:26,632 --> 00:50:30,840 and things like successor and predecessor, 956 00:50:30,840 --> 00:50:34,310 or next larger, next smaller. 957 00:50:34,310 --> 00:50:38,000 You can take any subset of these and it's an abstract data type. 958 00:50:38,000 --> 00:50:41,492 Insert, delete, and min is called a priority queue. 959 00:50:41,492 --> 00:50:43,440 So if you just take these first two, 960 00:50:43,440 --> 00:50:46,500 it's called a priority queue. 961 00:50:46,500 --> 00:50:49,140 And there are many priority queues. 962 00:50:49,140 --> 00:50:52,310 This is a generic thing that you might want to do. 963 00:50:52,310 --> 00:50:55,070 And then the data structure on the other side 964 00:50:55,070 --> 00:50:57,020 is how you actually do it. 965 00:50:57,020 --> 00:51:00,040 This is the analog of the algorithm. 966 00:51:00,040 --> 00:51:01,370 OK, this is the specification. 967 00:51:01,370 --> 00:51:02,580 You want a priority queue. 968 00:51:02,580 --> 00:51:04,700 One way to do it is a heap. 969 00:51:04,700 --> 00:51:08,220 Another way to do it is an AVL tree. 970 00:51:08,220 --> 00:51:09,720 You could do it with a sorted array. 971 00:51:09,720 --> 00:51:12,230 You could do lots of sub-optimal things, too, 972 00:51:12,230 --> 00:51:14,950 but in particular, heaps get these two operations. 973 00:51:14,950 --> 00:51:17,770 If you want all three, you basically 974 00:51:17,770 --> 00:51:20,010 need a balanced binary search tree. 975 00:51:23,530 --> 00:51:26,010 There are probably a dozen balanced binary search trees 976 00:51:26,010 --> 00:51:28,680 out there, at least a dozen balanced search trees, 977 00:51:28,680 --> 00:51:30,370 not all binary. 978 00:51:30,370 --> 00:51:31,760 They all achieve log n. 979 00:51:31,760 --> 00:51:32,990 So it doesn't really matter. 980 00:51:32,990 --> 00:51:35,630 There are various practical issues, constant factors, 981 00:51:35,630 --> 00:51:36,680 things like that. 982 00:51:36,680 --> 00:51:39,260 The main reason you prefer a heap is that it's in place. 983 00:51:39,260 --> 00:51:40,795 It doesn't use any extra space. 984 00:51:40,795 --> 00:51:42,670 Here, you've got pointers all over the place. 985 00:51:42,670 --> 00:51:46,080 You lose a constant factor in space. 986 00:51:46,080 --> 00:51:47,680 But from a theoretical standpoint, 987 00:51:47,680 --> 00:51:49,388 if you don't care about constant factors, 988 00:51:49,388 --> 00:51:53,760 AVL trees are really good because they get everything 989 00:51:53,760 --> 00:51:56,680 that we've seen so far and log n. 990 00:51:56,680 --> 00:51:59,490 And I'll stop there.