1 00:00:00,090 --> 00:00:01,770 The following content is provided 2 00:00:01,770 --> 00:00:04,000 under a Creative Commons license. 3 00:00:04,000 --> 00:00:06,860 Your support will help MIT OpenCourseWare continue 4 00:00:06,860 --> 00:00:10,720 to offer high quality educational resources for free. 5 00:00:10,720 --> 00:00:13,320 To make a donation or view additional materials 6 00:00:13,320 --> 00:00:17,207 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,207 --> 00:00:17,832 at ocw.mit.edu. 8 00:00:21,742 --> 00:00:23,450 PROFESSOR: Are people understanding AVLs? 9 00:00:27,600 --> 00:00:29,960 That's good, because if everyone raised their hands, 10 00:00:29,960 --> 00:00:31,690 we'd be done and out of here. 11 00:00:31,690 --> 00:00:34,270 So we're not covering new material in this recitation. 12 00:00:34,270 --> 00:00:37,320 We're talking about AVLs again, except this time 13 00:00:37,320 --> 00:00:39,210 you'll tell me how we'll do them, 14 00:00:39,210 --> 00:00:42,140 and we'll look at the code and see 15 00:00:42,140 --> 00:00:44,780 how the theory matches the code. 16 00:00:44,780 --> 00:00:47,150 And I need one of these. 17 00:00:47,150 --> 00:00:47,800 Thank you. 18 00:00:52,344 --> 00:00:53,260 So let's start simple. 19 00:00:53,260 --> 00:00:55,610 What's a BST? 20 00:00:55,610 --> 00:00:57,110 AUDIENCE: Binary search tree. 21 00:01:00,110 --> 00:01:02,930 PROFESSOR: OK, binary search tree. 22 00:01:02,930 --> 00:01:06,080 It's binary because every node has at most two children. 23 00:01:06,080 --> 00:01:07,686 Why is it a search tree? 24 00:01:07,686 --> 00:01:09,627 AUDIENCE: It's easy to search. 25 00:01:09,627 --> 00:01:10,210 PROFESSOR: OK. 26 00:01:10,210 --> 00:01:11,279 Why is it easy to search? 27 00:01:11,279 --> 00:01:12,945 AUDIENCE: Because you start [INAUDIBLE], 28 00:01:12,945 --> 00:01:15,195 is my number greater than this key or is it less than, 29 00:01:15,195 --> 00:01:16,410 and then you go from there. 30 00:01:16,410 --> 00:01:16,993 PROFESSOR: OK. 31 00:01:16,993 --> 00:01:20,600 So if I would have to state this as a formal rep invariant 32 00:01:20,600 --> 00:01:24,016 thing, what would I say it is so that I can do that operation? 33 00:01:24,016 --> 00:01:26,268 AUDIENCE: Node at right is greater than 34 00:01:26,268 --> 00:01:30,910 node at key, which is greater than node at left. 35 00:01:30,910 --> 00:01:32,140 PROFESSOR: OK, excellent. 36 00:01:32,140 --> 00:01:36,190 So it turns out I can use this argument recursively 37 00:01:36,190 --> 00:01:40,830 to say that for a given node, everything that 38 00:01:40,830 --> 00:01:44,740 is to the right of that node is greater, 39 00:01:44,740 --> 00:01:48,710 and everything that is to the left is smaller. 40 00:01:52,450 --> 00:01:54,480 And this allows us to do search quickly 41 00:01:54,480 --> 00:01:57,180 because if you're looking for a key, 42 00:01:57,180 --> 00:02:00,480 say you have numbers in your tree and you're looking for 5. 43 00:02:00,480 --> 00:02:03,770 If you arrive at a node whose value is 4, you go right. 44 00:02:03,770 --> 00:02:07,820 If you arrive at a node whose value is 7, you go left. 45 00:02:07,820 --> 00:02:11,420 OK So what do we know how to do on BSTs? 46 00:02:11,420 --> 00:02:14,325 What are the operations that we know? 47 00:02:14,325 --> 00:02:17,580 AUDIENCE: Minimum, maximum. 48 00:02:17,580 --> 00:02:18,965 PROFESSOR: Min, max. 49 00:02:22,100 --> 00:02:24,210 AUDIENCE: Left, right, parent. 50 00:02:24,210 --> 00:02:26,640 PROFESSOR: So those are for the node. 51 00:02:26,640 --> 00:02:30,300 I want the queries and the updates for the BST type. 52 00:02:30,300 --> 00:02:31,717 AUDIENCE: Insert and delete. 53 00:02:31,717 --> 00:02:32,300 PROFESSOR: OK. 54 00:02:35,030 --> 00:02:39,052 Insert, Delete. 55 00:02:39,052 --> 00:02:40,036 AUDIENCE: Next larger. 56 00:02:42,897 --> 00:02:43,480 PROFESSOR: OK. 57 00:02:52,340 --> 00:02:55,026 And then in here, the S in BST. 58 00:02:55,026 --> 00:02:57,626 Someone give me the S so we can move on. 59 00:02:57,626 --> 00:02:58,800 AUDIENCE: Find. 60 00:02:58,800 --> 00:03:03,254 PROFESSOR: Find, search, depending 61 00:03:03,254 --> 00:03:04,420 on what code you're reading. 62 00:03:07,070 --> 00:03:11,260 What's the running time for these guys? 63 00:03:11,260 --> 00:03:12,550 AUDIENCE: Order h. 64 00:03:12,550 --> 00:03:14,220 PROFESSOR: Excellent. 65 00:03:14,220 --> 00:03:16,350 So everything has the same running time. 66 00:03:16,350 --> 00:03:19,030 Nice and easy to remember. 67 00:03:19,030 --> 00:03:19,900 Order h. 68 00:03:19,900 --> 00:03:21,636 What's h? 69 00:03:21,636 --> 00:03:23,930 AUDIENCE: [INAUDIBLE]. 70 00:03:23,930 --> 00:03:25,190 PROFESSOR: OK. 71 00:03:25,190 --> 00:03:27,120 How do we define this height? 72 00:03:27,120 --> 00:03:31,269 What's the height for this tree? 73 00:03:31,269 --> 00:03:32,185 AUDIENCE: [INAUDIBLE]. 74 00:03:35,460 --> 00:03:36,720 PROFESSOR: Good. 75 00:03:36,720 --> 00:03:38,880 What's the height for this tree? 76 00:03:43,460 --> 00:03:48,883 What's the height for this tree? 77 00:03:48,883 --> 00:03:49,466 AUDIENCE: Two. 78 00:03:52,600 --> 00:03:55,020 PROFESSOR: So now for a more general case, 79 00:03:55,020 --> 00:04:00,320 where this is the height, and the height of my left subtree 80 00:04:00,320 --> 00:04:00,880 is hl. 81 00:04:00,880 --> 00:04:04,940 The height of my right subtree is hr. 82 00:04:04,940 --> 00:04:06,672 What's h? 83 00:04:06,672 --> 00:04:11,001 AUDIENCE: It's the maximum of hr times hl plus 1. 84 00:04:16,785 --> 00:04:17,660 PROFESSOR: Nice save. 85 00:04:17,660 --> 00:04:19,118 I heard a "plus 1" somewhere there. 86 00:04:22,290 --> 00:04:24,480 Very good. 87 00:04:24,480 --> 00:04:25,070 This is r. 88 00:04:28,870 --> 00:04:31,060 So if we look at the first part of the code, 89 00:04:31,060 --> 00:04:35,840 lines one through eight, lines seven and eight 90 00:04:35,840 --> 00:04:39,280 implement the definition that we talked about here. 91 00:04:39,280 --> 00:04:41,440 So in our Python implementation, each node 92 00:04:41,440 --> 00:04:44,550 knows the height of the tree that he's the root of. 93 00:04:47,640 --> 00:04:49,900 And since we're storing that, we need 94 00:04:49,900 --> 00:04:52,150 to update it every once in awhile when we make changes 95 00:04:52,150 --> 00:04:54,870 to the tree, like when we insert nodes. 96 00:04:54,870 --> 00:04:56,810 And the way we do that is update height, which 97 00:04:56,810 --> 00:04:59,970 uses the formula that we came up with here. 98 00:04:59,970 --> 00:05:04,120 Now, there is a hack on lines two, three, four, five. 99 00:05:04,120 --> 00:05:06,750 Can anyone tell me what the hack is? 100 00:05:20,989 --> 00:05:22,470 AUDIENCE: The negative 1. 101 00:05:22,470 --> 00:05:23,240 PROFESSOR: OK. 102 00:05:23,240 --> 00:05:24,140 How does that work? 103 00:05:24,140 --> 00:05:24,980 Why do I need it? 104 00:05:27,780 --> 00:05:30,170 AUDIENCE: That's so if you're at the root node, 105 00:05:30,170 --> 00:05:33,510 you can still calculate the height. 106 00:05:33,510 --> 00:05:37,820 PROFESSOR: Depends on how the root node looks like. 107 00:05:37,820 --> 00:05:40,039 AUDIENCE: If it has no children. 108 00:05:40,039 --> 00:05:41,080 PROFESSOR: That's a leaf. 109 00:05:41,080 --> 00:05:41,880 AUDIENCE: Yes. 110 00:05:41,880 --> 00:05:43,670 PROFESSOR: OK. 111 00:05:43,670 --> 00:05:47,390 So if I'm in this case or in this case, what's hl, 112 00:05:47,390 --> 00:05:49,140 what's hr? 113 00:05:49,140 --> 00:05:52,130 In this case, I have this node of height zero, 114 00:05:52,130 --> 00:05:55,400 so I can make a small mistake here and it'll save me, 115 00:05:55,400 --> 00:05:59,990 but here, I have no children, so hl and hr have 116 00:05:59,990 --> 00:06:04,560 to be set in such a way that this formula evaluates to 0. 117 00:06:04,560 --> 00:06:07,890 If I set them to minus 1, I'll have minus 1, minus 1. 118 00:06:07,890 --> 00:06:11,756 The maximum is minus 1 plus 1 equals 0. 119 00:06:11,756 --> 00:06:15,650 AUDIENCE: It's just to check the nodes for AVLs. 120 00:06:15,650 --> 00:06:18,090 PROFESSOR: We use that to update the height. 121 00:06:18,090 --> 00:06:19,950 For AVLs, we need to know the height 122 00:06:19,950 --> 00:06:21,650 of a node instantaneously. 123 00:06:21,650 --> 00:06:23,200 We can't afford to go down the tree 124 00:06:23,200 --> 00:06:25,590 and compute the height every time we need it, 125 00:06:25,590 --> 00:06:29,750 so every node gets to store its height. 126 00:06:29,750 --> 00:06:32,885 There's a small field in each node that has the height. 127 00:06:32,885 --> 00:06:35,010 So we need to update that every once in awhile when 128 00:06:35,010 --> 00:06:36,770 we do insertions and deletions. 129 00:06:36,770 --> 00:06:38,590 This is how we update it, and in order 130 00:06:38,590 --> 00:06:42,320 to update it for this case, where we're at a leaf, 131 00:06:42,320 --> 00:06:45,230 we have to say that the height of a non-existing tree 132 00:06:45,230 --> 00:06:47,516 is minus 1. 133 00:06:47,516 --> 00:06:49,810 Of course, in theory and in real life, 134 00:06:49,810 --> 00:06:52,540 there are no non-existing trees, so this is a clever hack 135 00:06:52,540 --> 00:06:56,470 to reduce code size. 136 00:06:56,470 --> 00:07:00,220 So we said binary search trees would look something like this. 137 00:07:16,408 --> 00:07:17,430 Is this guy an AVL? 138 00:07:24,050 --> 00:07:26,696 Does anyone think it's an a AVL? 139 00:07:26,696 --> 00:07:31,660 Does anyone think it's not an AVL? 140 00:07:31,660 --> 00:07:35,205 Can one of you tell me why? 141 00:07:35,205 --> 00:07:39,300 AUDIENCE: The tree with 5 as its root is not balanced. 142 00:07:43,862 --> 00:07:45,695 PROFESSOR: So this guy here is not balanced. 143 00:07:45,695 --> 00:07:48,670 Why is it not balanced? 144 00:07:48,670 --> 00:07:53,790 AUDIENCE: Because it has two on its right and zero on its left, 145 00:07:53,790 --> 00:07:55,290 so the difference is greater than 1. 146 00:08:00,214 --> 00:08:02,380 PROFESSOR: You're saying that there's something here 147 00:08:02,380 --> 00:08:08,100 that's two, and something here that's-- OK, so height. 148 00:08:08,100 --> 00:08:11,890 So then it's almost like that. 149 00:08:11,890 --> 00:08:17,770 It's 1 here and minus 1 here. 150 00:08:17,770 --> 00:08:19,470 So the reason I asked you to clarify 151 00:08:19,470 --> 00:08:21,300 is that first you said two and zero, 152 00:08:21,300 --> 00:08:23,940 and that's the node count, and AVL 153 00:08:23,940 --> 00:08:25,660 doesn't care about node count. 154 00:08:25,660 --> 00:08:27,650 AVL cares about height. 155 00:08:27,650 --> 00:08:35,949 So for example, if I have something like this, 156 00:08:35,949 --> 00:08:37,619 this is a happy AVL. 157 00:08:37,619 --> 00:08:39,809 Three nodes, here one node here. 158 00:08:39,809 --> 00:08:42,460 The difference in node count is greater than 1, 159 00:08:42,460 --> 00:08:44,200 but the difference in height is 1. 160 00:08:44,200 --> 00:08:47,360 Therefore, this is a good AVL. 161 00:08:47,360 --> 00:08:49,360 So what's a rep invariant for an AVL? 162 00:08:56,024 --> 00:08:58,973 AUDIENCE: The height of the left subtree for every node 163 00:08:58,973 --> 00:09:02,950 is within 1 of the height of the right subtree. 164 00:09:02,950 --> 00:09:03,825 PROFESSOR: Excellent. 165 00:09:06,420 --> 00:09:07,030 AVL. 166 00:09:07,030 --> 00:09:15,050 The core property is that for every node, 167 00:09:15,050 --> 00:09:20,850 the left height and the right height differ by at most 1. 168 00:09:20,850 --> 00:09:22,300 What else? 169 00:09:22,300 --> 00:09:24,560 If we want to be completely rigorous, 170 00:09:24,560 --> 00:09:26,140 what else do we have to say? 171 00:09:28,790 --> 00:09:29,530 It's a BST. 172 00:09:34,170 --> 00:09:36,145 So an AVL is a special kind of BST, 173 00:09:36,145 --> 00:09:38,470 and that's why when we write the AVL code, 174 00:09:38,470 --> 00:09:43,030 we inherit from the BST and we use some of its methods 175 00:09:43,030 --> 00:09:43,530 heavily. 176 00:09:47,660 --> 00:09:48,760 Why do we like AVLs? 177 00:09:48,760 --> 00:09:52,140 What's so good about them? 178 00:09:52,140 --> 00:09:56,140 AUDIENCE: It's faster because it ensures the minimal height 179 00:09:56,140 --> 00:10:00,640 of the entire tree because most of our complexities are o of h, 180 00:10:00,640 --> 00:10:02,670 so it would have to be smaller. 181 00:10:02,670 --> 00:10:05,110 PROFESSOR: So we care about this. 182 00:10:05,110 --> 00:10:07,660 In a regular binary tree, the worst case that you have 183 00:10:07,660 --> 00:10:12,560 is this. 184 00:10:12,560 --> 00:10:15,740 Ignoring this part, this is a worst case binary search tree 185 00:10:15,740 --> 00:10:19,880 where it's basically a list, so height is order n. 186 00:10:19,880 --> 00:10:24,510 In an AVL, you're saying it's better. 187 00:10:24,510 --> 00:10:29,480 So the reason why we care about AVLs is 188 00:10:29,480 --> 00:10:36,570 that height is order of log n. 189 00:10:36,570 --> 00:10:39,265 Now, did people understand from lecture why that's the case? 190 00:10:42,840 --> 00:10:47,660 Can anyone tell me why that's the case? 191 00:10:47,660 --> 00:10:49,588 AUDIENCE: Well, it's just like every level 192 00:10:49,588 --> 00:10:55,715 you go down, if you split off that many times, of course 193 00:10:55,715 --> 00:10:58,260 it's log n. 194 00:10:58,260 --> 00:11:03,690 If you have n nodes and they're filled up to the edge, 195 00:11:03,690 --> 00:11:06,140 there's going to be log n of them. 196 00:11:06,140 --> 00:11:09,230 AUDIENCE: It's close to a full binary tree, right? 197 00:11:09,230 --> 00:11:12,260 PROFESSOR: For some definition of "close." 198 00:11:12,260 --> 00:11:14,010 So here's what I use to remember, 199 00:11:14,010 --> 00:11:18,810 and I think I can persuade you that the height is log n using 200 00:11:18,810 --> 00:11:21,949 the argument that I'll show you here. 201 00:11:21,949 --> 00:11:23,490 Let's start building a tree this way. 202 00:11:23,490 --> 00:11:25,750 Let's say I have a fixed height, and I 203 00:11:25,750 --> 00:11:28,620 want to have as few nodes as possible. 204 00:11:28,620 --> 00:11:33,280 If I have a tree with a big height and very few nodes, 205 00:11:33,280 --> 00:11:38,740 h is going to be bad when you write it as a function of n. 206 00:11:38,740 --> 00:11:40,670 So those trees are unbalanced. 207 00:11:40,670 --> 00:11:43,000 Big height, small number of nodes. 208 00:11:43,000 --> 00:11:46,000 So say we're trying to build an AVL with the smallest 209 00:11:46,000 --> 00:11:48,720 number of nodes and a fixed height. 210 00:11:48,720 --> 00:11:49,860 What if the height is 0? 211 00:11:49,860 --> 00:11:52,660 What does that tree look like? 212 00:11:52,660 --> 00:11:56,930 It's not too complicated, right? 213 00:11:56,930 --> 00:11:58,610 This is an AVL of height 0. 214 00:11:58,610 --> 00:12:01,740 It's the only possible AVL of height 0. 215 00:12:01,740 --> 00:12:04,260 Now, what if we're trying to build an AVL of height 1 216 00:12:04,260 --> 00:12:05,855 that has as few nodes as possible? 217 00:12:10,550 --> 00:12:12,890 This is what it looks like, right? 218 00:12:12,890 --> 00:12:16,510 Height 0, height 1. 219 00:12:16,510 --> 00:12:18,080 I could add another node here, but I 220 00:12:18,080 --> 00:12:21,370 don't want to because I want as few nodes as possible. 221 00:12:21,370 --> 00:12:23,720 Now, what if I tried to build an AVL of height 2 222 00:12:23,720 --> 00:12:25,295 that has as few nodes as possible? 223 00:12:28,340 --> 00:12:29,010 Can I do this? 224 00:12:31,735 --> 00:12:34,353 AUDIENCE: So at the worst case, you have h minus 1 225 00:12:34,353 --> 00:12:36,160 and then h minus 2 there. 226 00:12:36,160 --> 00:12:38,890 PROFESSOR: OK. 227 00:12:38,890 --> 00:12:39,840 You're moving ahead. 228 00:12:39,840 --> 00:12:42,280 You're forcing me to move faster, but you're right. 229 00:12:42,280 --> 00:12:45,320 And the reason is this is unbalanced. 230 00:12:45,320 --> 00:12:48,590 The height at the left has to be the height of the right 231 00:12:48,590 --> 00:12:49,810 plus or minus 1. 232 00:12:49,810 --> 00:12:51,680 Can't be anything else. 233 00:12:51,680 --> 00:12:54,980 So at the very least, I have to build a tree of height 0 here, 234 00:12:54,980 --> 00:12:57,240 and I know that the best tree of height 235 00:12:57,240 --> 00:12:59,000 0 that I have is this guy. 236 00:13:03,770 --> 00:13:04,540 Cool. 237 00:13:04,540 --> 00:13:07,940 So for height 3, I would have to do this. 238 00:13:07,940 --> 00:13:11,168 And then you're saying, what would I use at the left? 239 00:13:11,168 --> 00:13:16,148 AUDIENCE: You'd use whatever height's on the right minus 2. 240 00:13:16,148 --> 00:13:18,444 The height on the right minus 1 per side. 241 00:13:18,444 --> 00:13:20,360 PROFESSOR: So on the right, I have a height 2. 242 00:13:20,360 --> 00:13:23,000 On the left, I have a height 1. 243 00:13:26,360 --> 00:13:29,940 If I want to go up to height 4, I do the same thing. 244 00:13:29,940 --> 00:13:33,470 So if I want to build an AVL tree 245 00:13:33,470 --> 00:13:36,890 with as few nodes as possible and height h, 246 00:13:36,890 --> 00:13:38,960 I start with the root, then at the right, 247 00:13:38,960 --> 00:13:43,350 I build an AVL tree of height h minus 1, and at the left, 248 00:13:43,350 --> 00:13:45,970 an AVL tree of height h minus 2. 249 00:13:45,970 --> 00:13:48,420 And if these have the minimum number of nodes, 250 00:13:48,420 --> 00:13:51,040 then it turns out that the whole thing 251 00:13:51,040 --> 00:13:52,487 has the minimum number of nodes. 252 00:13:52,487 --> 00:13:54,570 I don't want to build a tree where the heights are 253 00:13:54,570 --> 00:13:57,500 equal because that would mean more nodes here, 254 00:13:57,500 --> 00:13:59,040 so this is the best I can do. 255 00:13:59,040 --> 00:14:02,095 This is the best way I can build a tall tree 256 00:14:02,095 --> 00:14:03,345 with as few nodes as possible. 257 00:14:10,450 --> 00:14:13,150 Suppose I want to write the number of nodes 258 00:14:13,150 --> 00:14:14,270 as a function of height. 259 00:14:14,270 --> 00:14:16,970 You're telling me what it is. 260 00:14:16,970 --> 00:14:21,420 When I was here, you were giving me the answer for this. 261 00:14:21,420 --> 00:14:22,940 So suppose I have a height, and I 262 00:14:22,940 --> 00:14:25,090 want to know how many nodes I have in my tree that 263 00:14:25,090 --> 00:14:26,370 has a minimum number of nodes. 264 00:14:26,370 --> 00:14:28,657 What is it? 265 00:14:28,657 --> 00:14:39,635 AUDIENCE: It's N h minus 1, and then plus N h minus 2. 266 00:14:44,126 --> 00:14:45,649 There might be a constant. 267 00:14:45,649 --> 00:14:47,190 PROFESSOR: There might be a constant. 268 00:14:50,950 --> 00:14:54,840 N h minus 1 is this tree, N h minus 2 is this tree, 269 00:14:54,840 --> 00:14:56,399 so what's the constant? 270 00:14:56,399 --> 00:14:57,315 AUDIENCE: [INAUDIBLE]. 271 00:14:57,315 --> 00:14:57,940 PROFESSOR: Yep. 272 00:14:57,940 --> 00:15:01,195 This guy. 273 00:15:01,195 --> 00:15:04,355 AUDIENCE: Doesn't he have height 0, though? 274 00:15:04,355 --> 00:15:06,119 AUDIENCE: We're talking number of nodes. 275 00:15:06,119 --> 00:15:06,660 AUDIENCE: Oh. 276 00:15:06,660 --> 00:15:07,888 Number of nodes, yeah. 277 00:15:10,919 --> 00:15:12,710 PROFESSOR: I'm not going to solve this now. 278 00:15:12,710 --> 00:15:14,876 We learned how to solve recurrences a long time ago, 279 00:15:14,876 --> 00:15:16,879 so I'll pretend I still remember how to do that, 280 00:15:16,879 --> 00:15:18,670 and I will tell you that the solution looks 281 00:15:18,670 --> 00:15:20,480 something like this. 282 00:15:20,480 --> 00:15:25,080 N of h is roughly 5, which is the magic number that we talked 283 00:15:25,080 --> 00:15:27,760 about in lecture-- roughly, which 284 00:15:27,760 --> 00:15:34,880 means there might be some things there that I forgot-- to the h. 285 00:15:34,880 --> 00:15:38,280 What really matters is that it's an exponential in h. 286 00:15:38,280 --> 00:15:40,830 If you look at these guys, this is close to the Fibonacci 287 00:15:40,830 --> 00:15:44,720 number formula except there's a plus 1 here. 288 00:15:44,720 --> 00:15:48,350 So these guys are bigger than the Fibonacci numbers. 289 00:15:48,350 --> 00:15:50,670 This is definitely bigger than whatever 290 00:15:50,670 --> 00:15:52,045 the formula for Fibonacci numbers 291 00:15:52,045 --> 00:15:58,160 is, which is 5 to the h minus something over something. 292 00:15:58,160 --> 00:16:01,700 What really matters is the minimum number of nodes 293 00:16:01,700 --> 00:16:06,240 in an AVL of height h is an exponential as a function of h. 294 00:16:06,240 --> 00:16:10,110 If you invert that, you get that the maximum height 295 00:16:10,110 --> 00:16:20,160 for a tree with N nodes is log N. This is a way 296 00:16:20,160 --> 00:16:24,270 to construct an AVL that shows you that the height has 297 00:16:24,270 --> 00:16:27,400 to be logarithmic as long as we can keep this rep 298 00:16:27,400 --> 00:16:29,657 invariant true. 299 00:16:29,657 --> 00:16:31,094 AUDIENCE: I have a question. 300 00:16:31,094 --> 00:16:35,884 I didn't quite follow how you got h minus 2 and h minus 1 301 00:16:35,884 --> 00:16:37,330 there. 302 00:16:37,330 --> 00:16:39,020 PROFESSOR: Here, here, here? 303 00:16:39,020 --> 00:16:40,836 AUDIENCE: There. 304 00:16:40,836 --> 00:16:43,110 The tree down. 305 00:16:43,110 --> 00:16:43,930 This one? 306 00:16:43,930 --> 00:16:44,630 This one? 307 00:16:44,630 --> 00:16:45,780 AUDIENCE: This one, yeah. 308 00:16:45,780 --> 00:16:46,363 PROFESSOR: OK. 309 00:16:46,363 --> 00:16:49,030 So suppose I'm at height 4 here. 310 00:16:53,780 --> 00:16:56,790 What's the best way to construct an AVL that 311 00:16:56,790 --> 00:16:59,720 has as few nodes as possible but height 4? 312 00:16:59,720 --> 00:17:01,990 If this guy has to be at height 4, 313 00:17:01,990 --> 00:17:06,180 then it has to have a child at height 3, at least one child. 314 00:17:06,180 --> 00:17:08,880 Otherwise, it's not going to be height 4. 315 00:17:08,880 --> 00:17:12,609 Now, I need to build something here that has height 3. 316 00:17:12,609 --> 00:17:14,000 What should I build? 317 00:17:14,000 --> 00:17:17,430 The best AVL tree that I know that has height 3, right? 318 00:17:17,430 --> 00:17:20,940 I have to get to height 3 using as few nodes as possible, 319 00:17:20,940 --> 00:17:23,490 so I'm going to use this guy because it has as few nodes as 320 00:17:23,490 --> 00:17:26,230 possible and it has height 3. 321 00:17:26,230 --> 00:17:29,990 So this covers my right side. 322 00:17:29,990 --> 00:17:35,430 Now, for my left side, what am I going to use here? 323 00:17:35,430 --> 00:17:38,990 Another AVL tree that has as few nodes as possible, right? 324 00:17:38,990 --> 00:17:41,810 So I'm definitely keeping that. 325 00:17:41,810 --> 00:17:45,628 But what's going to be the height of that? 326 00:17:45,628 --> 00:17:47,753 AUDIENCE: So we want the difference to be less than 327 00:17:47,753 --> 00:17:48,384 or equal to 1? 328 00:17:48,384 --> 00:17:49,050 PROFESSOR: Yeah. 329 00:17:49,050 --> 00:17:50,820 Otherwise, it's not an AVL. 330 00:17:50,820 --> 00:17:55,330 AUDIENCE: So it would be of height 4. 331 00:17:55,330 --> 00:17:58,690 PROFESSOR: So I know for sure that this guy has height 4. 332 00:17:58,690 --> 00:18:00,446 The difference has to be plus, minus 1. 333 00:18:00,446 --> 00:18:01,570 AUDIENCE: So it would be 2. 334 00:18:01,570 --> 00:18:03,590 PROFESSOR: 2, 3, 4. 335 00:18:03,590 --> 00:18:06,362 I know for sure I don't want it to be 4, 336 00:18:06,362 --> 00:18:08,070 and now I have to choose between 2 and 3. 337 00:18:08,070 --> 00:18:10,354 A tree of height 2 will have fewer nodes 338 00:18:10,354 --> 00:18:11,770 than a tree of height 3, so that's 339 00:18:11,770 --> 00:18:14,140 why we're doing it this way. 340 00:18:14,140 --> 00:18:17,870 So to build a tree of height 4, build a tree of height 3, 341 00:18:17,870 --> 00:18:20,280 build a tree of height 2, connect them together. 342 00:18:20,280 --> 00:18:24,000 That's how we get to this, and then this. 343 00:18:24,000 --> 00:18:24,500 Cool. 344 00:18:27,220 --> 00:18:29,660 Do people remember how to do insertions and deletions 345 00:18:29,660 --> 00:18:30,920 in a regular binary tree? 346 00:18:35,080 --> 00:18:35,580 Yes? 347 00:18:40,420 --> 00:18:43,540 How do I insert 6.5 here in this one? 348 00:18:47,360 --> 00:18:49,910 AUDIENCE: You take 6.5, you're like, oh, it's greater than 4, 349 00:18:49,910 --> 00:18:51,410 then you move to 5. 350 00:18:51,410 --> 00:18:53,410 Then you're like, oh, it's greater than 5. 351 00:18:53,410 --> 00:18:53,920 Go to 6. 352 00:18:53,920 --> 00:18:58,230 Oh, it's less than 6, and then it goes to the left of 6. 353 00:18:58,230 --> 00:18:59,586 I mean-- sorry. 354 00:18:59,586 --> 00:19:02,460 I meant left of 7. 355 00:19:02,460 --> 00:19:03,110 PROFESSOR: OK. 356 00:19:03,110 --> 00:19:04,788 So it's bigger than-- 357 00:19:04,788 --> 00:19:06,500 AUDIENCE: Testing you all, guys. 358 00:19:06,500 --> 00:19:08,920 Oh yeah. 359 00:19:08,920 --> 00:19:12,168 6.5 is bigger than 6, so then it goes to the left of 7 360 00:19:12,168 --> 00:19:13,581 because it's less than 7. 361 00:19:18,000 --> 00:19:21,899 PROFESSOR: I thought I had my example wrong for a second. 362 00:19:21,899 --> 00:19:22,940 AUDIENCE: I'm just tired. 363 00:19:22,940 --> 00:19:27,060 PROFESSOR: Me too, so don't scare me. 364 00:19:27,060 --> 00:19:29,390 All right. 365 00:19:29,390 --> 00:19:32,740 So suppose we have heights stored in the nodes here, 366 00:19:32,740 --> 00:19:35,480 because we want to do that for AVLs. 367 00:19:35,480 --> 00:19:39,960 We'll figure out why in a bit. 368 00:19:39,960 --> 00:19:45,520 The height of this guy used to be 0, 1, 2, 3, right? 369 00:19:45,520 --> 00:19:50,649 0, 1, 2, 3. 370 00:19:50,649 --> 00:19:52,190 What happened when I added this node? 371 00:19:54,820 --> 00:19:57,924 AUDIENCE: You added 1 to everything [INAUDIBLE]. 372 00:19:57,924 --> 00:19:58,590 PROFESSOR: Yeah. 373 00:19:58,590 --> 00:20:01,340 So I went down on my insertion path 374 00:20:01,340 --> 00:20:04,960 to find out where to insert a node, and then I added it. 375 00:20:07,480 --> 00:20:09,080 I just chose my case conveniently, 376 00:20:09,080 --> 00:20:11,810 but in some cases, all the heights 377 00:20:11,810 --> 00:20:14,990 of the nodes on that path have changed. 378 00:20:14,990 --> 00:20:18,090 So in an AVL, after we insert, we 379 00:20:18,090 --> 00:20:19,720 have to make sure that the height 380 00:20:19,720 --> 00:20:23,000 of every node on the path is updated. 381 00:20:23,000 --> 00:20:24,600 Does this make sense? 382 00:20:24,600 --> 00:20:30,500 So the heights will be 1, 2, 3, 4. 383 00:20:33,540 --> 00:20:35,440 So the way we implemented the AVLs is 384 00:20:35,440 --> 00:20:38,690 that we do regular insertions and deletions, 385 00:20:38,690 --> 00:20:42,360 and then at the end, we say, well it used to be an AVL. 386 00:20:42,360 --> 00:20:43,860 Now we added or removed the node, 387 00:20:43,860 --> 00:20:46,360 so it might be a slightly unbalanced AVL, 388 00:20:46,360 --> 00:20:49,340 which means it's not an AVL. 389 00:20:49,340 --> 00:20:51,320 And we have the rebalance procedure. 390 00:20:51,320 --> 00:20:56,042 So if you look at on the second page of the code, 391 00:20:56,042 --> 00:20:59,620 insert and delete are really tiny, lines 20 to 22 392 00:20:59,620 --> 00:21:01,380 and 24 to 28. 393 00:21:01,380 --> 00:21:03,980 And they're really tiny because they call the old code 394 00:21:03,980 --> 00:21:06,500 and then they call Rebalance. 395 00:21:06,500 --> 00:21:11,200 So all the magic in an AVL is in Rebalance. 396 00:21:11,200 --> 00:21:13,650 The first thing that Rebalance does, if you see, 397 00:21:13,650 --> 00:21:16,410 it has a while loop there, and the first thing that it does 398 00:21:16,410 --> 00:21:20,150 in the while loop is it updates the height, and this is why. 399 00:21:20,150 --> 00:21:22,540 The height might have changed after the insert, 400 00:21:22,540 --> 00:21:25,210 so any decision based on the old height is bad. 401 00:21:28,391 --> 00:21:29,640 That's why we have that there. 402 00:21:32,900 --> 00:21:33,780 Make sense so far? 403 00:21:38,610 --> 00:21:42,440 So if you look at rebalance, don't try to understand it 404 00:21:42,440 --> 00:21:44,530 quite yet, but what it does is it 405 00:21:44,530 --> 00:21:47,307 calls Rotate Left and Rotate Right. 406 00:21:47,307 --> 00:21:48,640 Well, Left Rotate, Right Rotate. 407 00:21:51,190 --> 00:21:53,980 All the magic is in Rebalance, and the tools that it uses 408 00:21:53,980 --> 00:21:56,930 are Left Rotate and Right Rotate. 409 00:21:56,930 --> 00:21:59,800 Now, I'm going to show you what a rotation is supposed 410 00:21:59,800 --> 00:22:16,950 to do in-- and the children of these nodes are a, b, c, d. 411 00:22:16,950 --> 00:22:23,860 Also, this node is hanging off of something here, e. 412 00:22:23,860 --> 00:22:26,630 If I want to do a Right Rotate here, 413 00:22:26,630 --> 00:22:33,360 so if I want to rotate the tree like this, then after rotating, 414 00:22:33,360 --> 00:22:34,806 it's going to look like this. 415 00:22:51,860 --> 00:22:57,240 So notice that c got moved from B to A, but it got 416 00:22:57,240 --> 00:23:00,320 moved in such a way that the whole thing is still a BST. 417 00:23:00,320 --> 00:23:03,050 These guys show up in the same order as children, 418 00:23:03,050 --> 00:23:05,700 and these guys show up in the same order, 419 00:23:05,700 --> 00:23:08,404 so search will still work. 420 00:23:08,404 --> 00:23:10,570 This is how it's supposed to look like as a picture. 421 00:23:10,570 --> 00:23:13,830 Let's try to write the pseudocode for achieving this. 422 00:23:17,450 --> 00:23:21,160 AUDIENCE: First, you identify the parent node. 423 00:23:21,160 --> 00:23:23,935 Step one. 424 00:23:23,935 --> 00:23:26,320 PROFESSOR: Well, let's say we have it. 425 00:23:26,320 --> 00:23:29,790 AUDIENCE: I mean from B. You say, A is my parent. 426 00:23:29,790 --> 00:23:32,477 That's now going to become my right child. 427 00:23:32,477 --> 00:23:33,060 PROFESSOR: OK. 428 00:23:38,060 --> 00:23:40,740 So what do you want to change? 429 00:23:40,740 --> 00:23:44,400 By the way, this whole thing has to happen in constant time, 430 00:23:44,400 --> 00:23:47,150 so we're not allowed to go anywhere inside here, here, 431 00:23:47,150 --> 00:23:48,370 here, or here. 432 00:23:48,370 --> 00:23:49,870 We're allowed to change these links, 433 00:23:49,870 --> 00:23:51,510 but if you go inside and try to do 434 00:23:51,510 --> 00:23:53,500 more complicated restructuring, that's 435 00:23:53,500 --> 00:23:55,297 going to block the running time. 436 00:23:55,297 --> 00:23:56,880 We're only allowed to change the links 437 00:23:56,880 --> 00:23:58,570 that you see on the board. 438 00:23:58,570 --> 00:24:04,150 AUDIENCE: I know in the BST, not BST node but BST, the delete, 439 00:24:04,150 --> 00:24:08,050 if you're deleting the root, they make up the pseudoroot. 440 00:24:08,050 --> 00:24:10,440 PROFESSOR: Let's not worry about it. 441 00:24:10,440 --> 00:24:12,970 So Delete already did that magic for you, right? 442 00:24:12,970 --> 00:24:16,221 So the pseudoroot would be here, so this node has a root. 443 00:24:16,221 --> 00:24:16,720 We're happy. 444 00:24:16,720 --> 00:24:18,553 AUDIENCE: That's not what I'm talking about. 445 00:24:18,553 --> 00:24:21,720 I'm saying that you could do the same with B because you're 446 00:24:21,720 --> 00:24:25,100 going to have to break a link with lowercase c onto B 447 00:24:25,100 --> 00:24:28,520 in order to flip it. 448 00:24:28,520 --> 00:24:32,510 So I'm saying you could have a placer, some sort of place 449 00:24:32,510 --> 00:24:35,607 to put it so you don't lose it. 450 00:24:35,607 --> 00:24:36,190 PROFESSOR: OK. 451 00:24:36,190 --> 00:24:40,330 Let's see if we do that. 452 00:24:40,330 --> 00:24:44,650 AUDIENCE: The new right child of B will be A, 453 00:24:44,650 --> 00:24:50,840 and the new left child of A will be C, and the new parent of c 454 00:24:50,840 --> 00:24:51,750 is A. 455 00:24:51,750 --> 00:24:54,145 PROFESSOR: OK, let's move slower. 456 00:24:54,145 --> 00:24:55,660 I have to write them down. 457 00:24:55,660 --> 00:24:59,131 So the new right child of B is? 458 00:24:59,131 --> 00:25:03,690 AUDIENCE: Is A. Also, you probably 459 00:25:03,690 --> 00:25:05,601 should do temp variables to store them. 460 00:25:05,601 --> 00:25:07,100 AUDIENCE: I don't think you have to. 461 00:25:07,100 --> 00:25:12,432 You can just swap the one connection and swap the other. 462 00:25:15,064 --> 00:25:17,230 PROFESSOR: Let's do it without worrying about temps, 463 00:25:17,230 --> 00:25:19,150 and then we can figure out temps later. 464 00:25:19,150 --> 00:25:26,000 So B's right becomes A. So it's going to be like this. 465 00:25:30,674 --> 00:25:32,090 Let me erase this confusing error. 466 00:25:36,560 --> 00:25:38,785 AUDIENCE: And A's left child is c. 467 00:25:45,420 --> 00:25:52,250 PROFESSOR: A's left child is c. 468 00:25:52,250 --> 00:25:54,150 OK. 469 00:25:54,150 --> 00:25:56,790 AUDIENCE: C's parent is A. 470 00:25:56,790 --> 00:26:02,980 PROFESSOR: C's parent is A. OK, very good. 471 00:26:02,980 --> 00:26:05,104 So I changed the child, then I changed the parent 472 00:26:05,104 --> 00:26:06,270 so that they would match up. 473 00:26:10,160 --> 00:26:13,285 So if B's right is A, then what should I do with-- 474 00:26:13,285 --> 00:26:19,264 AUDIENCE: A's parent is B. 475 00:26:19,264 --> 00:26:20,680 PROFESSOR: Always do them in pairs 476 00:26:20,680 --> 00:26:23,790 so you don't lose track of them. 477 00:26:23,790 --> 00:26:24,516 And? 478 00:26:24,516 --> 00:26:29,592 AUDIENCE: B's parent is e and e's subchild is B. 479 00:26:29,592 --> 00:26:30,300 PROFESSOR: Sorry. 480 00:26:30,300 --> 00:26:32,606 B's parent is e, right? 481 00:26:32,606 --> 00:26:33,230 AUDIENCE: Yeah. 482 00:26:36,492 --> 00:26:40,170 PROFESSOR: So B's parent is e, and-- 483 00:26:40,170 --> 00:26:42,090 AUDIENCE: e's left. 484 00:26:42,090 --> 00:26:43,470 PROFESSOR: Child is-- 485 00:26:43,470 --> 00:26:44,390 AUDIENCE: B. 486 00:26:44,390 --> 00:26:46,796 PROFESSOR: Because I drew it like this, right? 487 00:26:46,796 --> 00:26:49,940 But if I drew it like this? 488 00:26:49,940 --> 00:26:52,150 AUDIENCE: e's-- 489 00:26:52,150 --> 00:26:54,330 PROFESSOR: It would have to be the right child, 490 00:26:54,330 --> 00:26:57,931 so we have to look at both cases. 491 00:26:57,931 --> 00:26:59,879 AUDIENCE: If B is greater than e, 492 00:26:59,879 --> 00:27:04,749 then it would be right child, and if B is less than e, 493 00:27:04,749 --> 00:27:05,977 it would be left child. 494 00:27:05,977 --> 00:27:06,560 PROFESSOR: OK. 495 00:27:06,560 --> 00:27:18,540 So if B is greater than e, then e's right is B. 496 00:27:18,540 --> 00:27:23,680 Otherwise, e's left is B, right? 497 00:27:23,680 --> 00:27:25,540 Now, suppose comparisons are expensive 498 00:27:25,540 --> 00:27:27,874 and they don't want to do a comparison to find this out. 499 00:27:27,874 --> 00:27:29,415 I want to play with the tree instead. 500 00:27:29,415 --> 00:27:29,990 What do I do? 501 00:27:37,372 --> 00:27:41,880 AUDIENCE: Can you see which one A was before that? 502 00:27:41,880 --> 00:27:43,670 PROFESSOR: Yep. 503 00:27:43,670 --> 00:27:45,885 AUDIENCE: If A used to be the right child, 504 00:27:45,885 --> 00:27:47,100 now B is the right child. 505 00:27:47,100 --> 00:27:47,766 PROFESSOR: Yeah. 506 00:27:47,766 --> 00:27:51,150 I haven't changed the child yet, so I can do that. 507 00:27:51,150 --> 00:27:54,355 AUDIENCE: That's still a comparison though, right? 508 00:27:54,355 --> 00:27:56,730 PROFESSOR: But now I'm doing a pointer comparison and not 509 00:27:56,730 --> 00:27:59,510 a key comparison. 510 00:27:59,510 --> 00:28:02,270 AUDIENCE: If e.r is A, then-- 511 00:28:02,270 --> 00:28:08,270 PROFESSOR: If e.right is A, then it becomes B. Otherwise. 512 00:28:11,060 --> 00:28:13,707 OK, this looks good. 513 00:28:13,707 --> 00:28:15,290 So there's the issue of temp variables 514 00:28:15,290 --> 00:28:16,969 and assigning these in the right order 515 00:28:16,969 --> 00:28:19,010 so you don't have too many temp variables and too 516 00:28:19,010 --> 00:28:22,310 many lines of code, and the Python code in the handout 517 00:28:22,310 --> 00:28:23,650 takes care of that. 518 00:28:23,650 --> 00:28:25,380 But this is the logic that you want. 519 00:28:25,380 --> 00:28:28,890 So if you have to write it from scratch, 520 00:28:28,890 --> 00:28:30,270 you don't have to memorize that. 521 00:28:30,270 --> 00:28:32,760 Remember that you want to get from here to here, 522 00:28:32,760 --> 00:28:37,210 and do exactly the thought process that we have here. 523 00:28:37,210 --> 00:28:42,060 What if I want to do a left rotation instead 524 00:28:42,060 --> 00:28:45,450 of a right rotation? 525 00:28:45,450 --> 00:28:49,136 AUDIENCE: You just have to change the r's to l's. 526 00:28:49,136 --> 00:28:50,010 PROFESSOR: Very good. 527 00:28:50,010 --> 00:28:51,090 Copy, paste. 528 00:28:51,090 --> 00:28:52,770 Swap l's and r's and we're done. 529 00:28:57,060 --> 00:28:58,120 Why do we need rotations? 530 00:29:02,902 --> 00:29:04,110 AUDIENCE: To rebalance stuff. 531 00:29:04,110 --> 00:29:05,200 PROFESSOR: To rebalance stuff. 532 00:29:05,200 --> 00:29:05,700 OK. 533 00:29:05,700 --> 00:29:08,069 Why do we rebalance stuff? 534 00:29:08,069 --> 00:29:10,001 AUDIENCE: Because you don't want your code 535 00:29:10,001 --> 00:29:12,635 to crash when you add nodes that are sequentially larger, 536 00:29:12,635 --> 00:29:16,042 and then you try to find something, and then it crashes. 537 00:29:16,042 --> 00:29:16,625 PROFESSOR: OK. 538 00:29:16,625 --> 00:29:17,742 Why would it crash? 539 00:29:17,742 --> 00:29:19,283 AUDIENCE: Because the recursion depth 540 00:29:19,283 --> 00:29:21,602 is exceeded because you're going down 541 00:29:21,602 --> 00:29:23,040 the line trying to find something, 542 00:29:23,040 --> 00:29:24,517 and you go down too far. 543 00:29:24,517 --> 00:29:25,100 PROFESSOR: OK. 544 00:29:25,100 --> 00:29:27,650 So pretending we don't have a recursion depth issue, 545 00:29:27,650 --> 00:29:30,670 then it's going to be slow. 546 00:29:30,670 --> 00:29:33,990 So you start from a nice AVL, and if you don't rebalance, 547 00:29:33,990 --> 00:29:37,430 you get to a BST that's slow, slow, slow, 548 00:29:37,430 --> 00:29:40,240 and then you'll fail our tests and still get a 0. 549 00:29:40,240 --> 00:29:40,740 Yes? 550 00:29:40,740 --> 00:29:47,560 AUDIENCE: But if you just had a carrot-like tree, 551 00:29:47,560 --> 00:29:50,470 or if you added, for instance, 4, and then you added in 5, 552 00:29:50,470 --> 00:29:52,960 and then you added in 3, and then you added in 6, 553 00:29:52,960 --> 00:29:57,300 and then you added in 2, you'd just get a carrot. 554 00:29:57,300 --> 00:30:02,930 So then I feel like AVL wouldn't cover that case. 555 00:30:02,930 --> 00:30:04,970 PROFESSOR: Let's do them in sequence. 556 00:30:09,500 --> 00:30:12,047 So what are we inserting? 557 00:30:12,047 --> 00:30:12,630 So you said 4. 558 00:30:12,630 --> 00:30:15,450 AUDIENCE: So you start with 4, and then you insert 5, 559 00:30:15,450 --> 00:30:18,760 then you insert 3. 560 00:30:18,760 --> 00:30:19,700 PROFESSOR: Let's see. 561 00:30:19,700 --> 00:30:21,402 4, 3. 562 00:30:21,402 --> 00:30:28,530 AUDIENCE: Then you insert 6, then you insert 2, 563 00:30:28,530 --> 00:30:34,150 then you insert 7, and then 1. 564 00:30:34,150 --> 00:30:35,650 AUDIENCE: You've got to rotate that. 565 00:30:35,650 --> 00:30:36,970 AUDIENCE: Well, the thing is-- 566 00:30:36,970 --> 00:30:38,515 PROFESSOR: Well, is this an AVL? 567 00:30:38,515 --> 00:30:40,032 PROFESSOR: Well, it's balanced. 568 00:30:40,032 --> 00:30:40,740 PROFESSOR: Is it? 569 00:30:40,740 --> 00:30:42,454 AUDIENCE: No, it's not. 570 00:30:42,454 --> 00:30:43,390 5 is unhappy. 571 00:30:46,200 --> 00:30:49,140 AUDIENCE: That height of the tree is only one 572 00:30:49,140 --> 00:30:51,270 greater than the other height. 573 00:30:51,270 --> 00:30:53,015 PROFESSOR: So the height here is 1. 574 00:30:53,015 --> 00:30:56,442 What's the height here? 575 00:30:56,442 --> 00:30:58,400 There's nothing here, so the height is minus 1. 576 00:31:02,200 --> 00:31:06,060 AUDIENCE: I mean, but the other side of the tree. 577 00:31:06,060 --> 00:31:08,820 PROFESSOR: So if you're looking at this guy, 578 00:31:08,820 --> 00:31:11,510 things look balanced, but in an AVL, 579 00:31:11,510 --> 00:31:14,572 this has to hold for every node. 580 00:31:14,572 --> 00:31:16,780 If there's one node where the heights are unbalanced, 581 00:31:16,780 --> 00:31:18,434 the whole thing is unbalanced. 582 00:31:18,434 --> 00:31:20,350 Otherwise, the construction that we did before 583 00:31:20,350 --> 00:31:22,070 wouldn't make sense. 584 00:31:22,070 --> 00:31:24,320 AUDIENCE: It's all [INAUDIBLE] within the [INAUDIBLE]. 585 00:31:24,320 --> 00:31:25,220 PROFESSOR: Yep. 586 00:31:25,220 --> 00:31:27,820 So now that we're going to look at rebalancing, which 587 00:31:27,820 --> 00:31:29,350 is the magic behind AVLs, we have 588 00:31:29,350 --> 00:31:32,920 to make sure that after we start with something that's 589 00:31:32,920 --> 00:31:35,329 slightly imbalanced, when we rotate things around, 590 00:31:35,329 --> 00:31:37,412 we have to make sure that everything gets balanced 591 00:31:37,412 --> 00:31:42,320 at the end and happy, so that's a good thing to keep in mind. 592 00:31:42,320 --> 00:31:42,820 Good. 593 00:31:42,820 --> 00:31:43,730 Any other questions? 594 00:31:47,880 --> 00:31:50,254 AUDIENCE: Is it obvious that you can 595 00:31:50,254 --> 00:31:53,040 put any number of nodes in an AVL tree? 596 00:31:53,040 --> 00:31:54,700 PROFESSOR: Point any number of nodes? 597 00:31:54,700 --> 00:31:56,360 AUDIENCE: Yeah. 598 00:31:56,360 --> 00:31:59,513 It's obvious that you can do one node, or two nodes or three 599 00:31:59,513 --> 00:32:05,020 nodes, but is it true that for any number of nodes, 600 00:32:05,020 --> 00:32:07,030 you can arrange into an AVL tree? 601 00:32:07,030 --> 00:32:08,644 Does that make sense? 602 00:32:08,644 --> 00:32:09,310 PROFESSOR: Yeah. 603 00:32:09,310 --> 00:32:13,030 So if you want to have any number of nodes, 604 00:32:13,030 --> 00:32:16,142 you call Insert, AVL Insert, and then you'll get an AVL. 605 00:32:16,142 --> 00:32:16,808 AUDIENCE: Right. 606 00:32:16,808 --> 00:32:19,100 But what I'm trying to say is, is it 607 00:32:19,100 --> 00:32:23,230 possible to construct an AVL tree out of 13 nodes? 608 00:32:23,230 --> 00:32:24,230 PROFESSOR: Sure. 609 00:32:24,230 --> 00:32:26,158 AUDIENCE: Or 17 nodes? 610 00:32:26,158 --> 00:32:28,616 AUDIENCE: Is there a limit for where that property will not 611 00:32:28,616 --> 00:32:29,115 fit? 612 00:32:29,115 --> 00:32:29,830 AUDIENCE: Right. 613 00:32:29,830 --> 00:32:31,587 Is there some set of nodes-- 614 00:32:31,587 --> 00:32:32,920 PROFESSOR: I like this question. 615 00:32:32,920 --> 00:32:34,160 I like this question. 616 00:32:34,160 --> 00:32:39,720 So what would be the perfect binary search tree? 617 00:32:42,684 --> 00:32:44,999 AUDIENCE: An element of log N height. 618 00:32:44,999 --> 00:32:46,540 PROFESSOR: An element of log N height 619 00:32:46,540 --> 00:32:54,200 and the complete tree, so something that looks like this. 620 00:33:00,640 --> 00:33:04,665 Where did we see this thing before? 621 00:33:04,665 --> 00:33:05,595 AUDIENCE: In the heap. 622 00:33:05,595 --> 00:33:06,470 PROFESSOR: All right. 623 00:33:06,470 --> 00:33:08,650 So a heap looks exactly like this, 624 00:33:08,650 --> 00:33:13,620 except the values inside don't fulfill the BST requirement. 625 00:33:13,620 --> 00:33:15,675 Otherwise, we'd know how to build perfect BSTs. 626 00:33:15,675 --> 00:33:17,780 It so happens that we don't. 627 00:33:17,780 --> 00:33:19,350 But is this an AVL? 628 00:33:25,895 --> 00:33:26,478 AUDIENCE: Yes. 629 00:33:31,835 --> 00:33:34,082 AUDIENCE: That example or in general? 630 00:33:34,082 --> 00:33:35,790 PROFESSOR: Let's start with that example. 631 00:33:35,790 --> 00:33:36,415 Is this an AVL? 632 00:33:41,290 --> 00:33:42,360 AUDIENCE: No. 633 00:33:42,360 --> 00:33:45,600 The node just to the left of the root-- 634 00:33:45,600 --> 00:33:47,250 PROFESSOR: This guy? 635 00:33:47,250 --> 00:33:49,130 AUDIENCE: No. 636 00:33:49,130 --> 00:33:52,032 That one's of height two, so it has height one. 637 00:33:52,032 --> 00:33:53,240 PROFESSOR: This is beautiful. 638 00:33:53,240 --> 00:33:54,890 This is as good as it could get. 639 00:33:54,890 --> 00:33:56,740 This is an optimal binary search tree. 640 00:33:56,740 --> 00:33:59,190 This is perfect. 641 00:33:59,190 --> 00:34:01,100 It better match the definition of AVL, 642 00:34:01,100 --> 00:34:04,270 because otherwise, it would mean AVL doesn't like perfect trees. 643 00:34:04,270 --> 00:34:06,830 So the good news is that any complete tree 644 00:34:06,830 --> 00:34:10,159 is going to be an AVL because everything 645 00:34:10,159 --> 00:34:14,989 is perfectly balanced or almost perfectly balanced. 646 00:34:14,989 --> 00:34:18,690 The only thing that's not complete is the last level. 647 00:34:18,690 --> 00:34:21,790 So all the paths from the roots to the last level 648 00:34:21,790 --> 00:34:25,810 are either height log N or height log N minus 1. 649 00:34:25,810 --> 00:34:28,810 So wherever you do the height comparison, 650 00:34:28,810 --> 00:34:31,219 you're going to get a difference of 1, 651 00:34:31,219 --> 00:34:35,739 and we can keep adding nodes to this. 652 00:34:35,739 --> 00:34:39,179 This is how I'd build a BST of as many nodes as we want. 653 00:34:39,179 --> 00:34:41,489 AUDIENCE: I have a question. 654 00:34:41,489 --> 00:34:46,600 For the fixed heights, why we try to minimize the nodes? 655 00:34:46,600 --> 00:34:51,079 Wouldn't a better way to build a more efficient tree 656 00:34:51,079 --> 00:34:55,120 is to try to minimize the height? 657 00:34:55,120 --> 00:35:00,205 So instead of adding nodes up, just fill in the tree? 658 00:35:00,205 --> 00:35:01,080 PROFESSOR: Very good. 659 00:35:01,080 --> 00:35:04,080 So you're thinking of how to build an efficient tree, how 660 00:35:04,080 --> 00:35:05,870 to build a good tree. 661 00:35:05,870 --> 00:35:08,760 Here, I'm trying to prove the property 662 00:35:08,760 --> 00:35:11,480 about the maximum height of a tree. 663 00:35:11,480 --> 00:35:13,520 So here I'm playing devil's advocate. 664 00:35:13,520 --> 00:35:15,980 I'm thinking, what is the worst case for AVLs? 665 00:35:15,980 --> 00:35:18,720 How do I make AVLs look as bad as possible? 666 00:35:18,720 --> 00:35:20,840 That's why I started this way. 667 00:35:20,840 --> 00:35:22,641 So usually, we're the good guy coding, 668 00:35:22,641 --> 00:35:24,140 but after you're done coding and you 669 00:35:24,140 --> 00:35:27,050 know how your algorithm runs, if you want to have peace of mind 670 00:35:27,050 --> 00:35:30,200 and go to sleep well and get full points afterwards, 671 00:35:30,200 --> 00:35:32,300 it sometimes helps to think as devil's advocate. 672 00:35:32,300 --> 00:35:34,200 How would I break this algorithm? 673 00:35:34,200 --> 00:35:36,410 What's the worst thing that I could do to it? 674 00:35:36,410 --> 00:35:38,680 And if you can find something that breaks it, 675 00:35:38,680 --> 00:35:41,560 well, you probably know how to fix it. 676 00:35:41,560 --> 00:35:43,890 If not, you can sleep well at night 677 00:35:43,890 --> 00:35:45,776 and know that your algorithm is sound. 678 00:35:49,350 --> 00:35:55,216 AUDIENCE: In that example on the board, where we said 679 00:35:55,216 --> 00:36:00,250 the height of the tree on the left side was h minus 1, 680 00:36:00,250 --> 00:36:03,220 and then the other side had the h minus 2. 681 00:36:05,990 --> 00:36:11,920 For that h minus 2 tree, every node in that 682 00:36:11,920 --> 00:36:16,107 has to be h minus 2 height, every leaf. 683 00:36:16,107 --> 00:36:17,190 Is that what we're saying? 684 00:36:17,190 --> 00:36:19,810 Otherwise, it falls apart because there would be one 685 00:36:19,810 --> 00:36:20,310 that's-- 686 00:36:20,310 --> 00:36:22,518 PROFESSOR: So we're saying that the root of this tree 687 00:36:22,518 --> 00:36:24,900 has to be h minus 2. 688 00:36:24,900 --> 00:36:27,970 And the way we do this is we copy this guy over, 689 00:36:27,970 --> 00:36:30,740 so this guy's lopsided, too. 690 00:36:30,740 --> 00:36:34,105 So it's not every path to the bottom, just one path. 691 00:36:34,105 --> 00:36:34,646 AUDIENCE: OK. 692 00:36:34,646 --> 00:36:37,001 So it could be an incomplete. 693 00:36:37,001 --> 00:36:37,960 PROFESSOR: Yep. 694 00:36:37,960 --> 00:36:43,020 In that case, is that balanced? 695 00:36:43,020 --> 00:36:48,310 PROFESSOR: So this was an AVL of height 2, right? 696 00:36:48,310 --> 00:36:51,530 If I stick it here, it's still going to be an AVL of height 2. 697 00:36:51,530 --> 00:36:53,432 AUDIENCE: Right. 698 00:36:53,432 --> 00:36:55,140 PROFESSOR: This thing will have height 2. 699 00:36:55,140 --> 00:36:58,518 This thing will have height 3. 700 00:36:58,518 --> 00:37:00,446 AUDIENCE: OK. 701 00:37:00,446 --> 00:37:02,743 So it's not everything on the same level, 702 00:37:02,743 --> 00:37:04,784 necessarily, that needs to be of the same height. 703 00:37:04,784 --> 00:37:06,230 It's only the children. 704 00:37:06,230 --> 00:37:07,210 PROFESSOR: Yep. 705 00:37:07,210 --> 00:37:10,340 So the two children, so two nodes on the same level, 706 00:37:10,340 --> 00:37:14,137 might have their heights differ by one, but not more than one. 707 00:37:14,137 --> 00:37:15,720 If they could differ by more than one, 708 00:37:15,720 --> 00:37:17,480 you could have a link list. 709 00:37:17,480 --> 00:37:20,740 Because they differ by one, then it's sort of sane. 710 00:37:20,740 --> 00:37:24,190 It's almost balanced. 711 00:37:24,190 --> 00:37:25,110 Good. 712 00:37:25,110 --> 00:37:26,130 I like the questions. 713 00:37:26,130 --> 00:37:27,820 It means you guys are thinking. 714 00:37:27,820 --> 00:37:28,903 I really like it. 715 00:37:28,903 --> 00:37:30,045 Yeah? 716 00:37:30,045 --> 00:37:31,550 AUDIENCE: If they differ by one. 717 00:37:31,550 --> 00:37:34,280 So you have an ideal tree up there, 718 00:37:34,280 --> 00:37:36,200 and then you have the worst case possible. 719 00:37:36,200 --> 00:37:39,080 That wouldn't affect performance at all, right, 720 00:37:39,080 --> 00:37:40,520 if you have the worst case? 721 00:37:40,520 --> 00:37:42,600 PROFESSOR: So what we know what performance is we 722 00:37:42,600 --> 00:37:45,090 have this guarantee. 723 00:37:45,090 --> 00:37:48,640 It's at worst a constant times log n. 724 00:37:48,640 --> 00:37:50,240 This is constant times log n. 725 00:37:50,240 --> 00:37:52,530 The constant happens to be 1. 726 00:37:52,530 --> 00:37:54,890 This is a constant times log n. 727 00:37:54,890 --> 00:37:58,515 The constant happens to be something bigger than 1. 728 00:37:58,515 --> 00:38:00,692 I think it's somewhere between 2 and 3. 729 00:38:00,692 --> 00:38:02,952 AUDIENCE: That just varies by the constant. 730 00:38:02,952 --> 00:38:03,856 PROFESSOR: Yeah. 731 00:38:03,856 --> 00:38:06,525 And since we only care about asymptotics in this class, 732 00:38:06,525 --> 00:38:08,020 we're happy. 733 00:38:08,020 --> 00:38:10,760 But we don't know how to build this. 734 00:38:10,760 --> 00:38:13,210 This is just something pretty that we draw on a board, 735 00:38:13,210 --> 00:38:15,940 but we don't have an algorithm that efficiency builds 736 00:38:15,940 --> 00:38:21,380 this out of a random series of insertions and deletions. 737 00:38:21,380 --> 00:38:23,740 I hope we don't. 738 00:38:23,740 --> 00:38:26,090 Otherwise, I look bad. 739 00:38:26,090 --> 00:38:27,360 What do I want to delete? 740 00:38:32,220 --> 00:38:34,164 AUDIENCE: Instead of a binary search 741 00:38:34,164 --> 00:38:35,870 tree as the base of the AVL, if you 742 00:38:35,870 --> 00:38:39,785 wanted to have four children per node, 743 00:38:39,785 --> 00:38:42,470 would this change that much except you'd 744 00:38:42,470 --> 00:38:46,606 just have twice as many variables? 745 00:38:46,606 --> 00:38:48,230 PROFESSOR: That's a good exam question. 746 00:38:53,510 --> 00:38:56,270 So it turns out that there's this tree called 747 00:38:56,270 --> 00:39:01,490 a B tree, which has 1,000 or more nodes. 748 00:39:01,490 --> 00:39:02,486 AUDIENCE: B tree? 749 00:39:02,486 --> 00:39:03,897 Doesn't that stand for binary? 750 00:39:03,897 --> 00:39:04,480 PROFESSOR: No. 751 00:39:04,480 --> 00:39:10,972 It's B. Just B. That's used for databases on disk. 752 00:39:10,972 --> 00:39:13,430 So there, you want to make the height as small as possible, 753 00:39:13,430 --> 00:39:15,763 because every time you get a node, you do a disk access, 754 00:39:15,763 --> 00:39:17,140 and that's expensive. 755 00:39:17,140 --> 00:39:18,970 But once you do a disk access, the nodes 756 00:39:18,970 --> 00:39:22,430 can be as big as you want. 757 00:39:22,430 --> 00:39:25,730 You can read a few bytes off the disk at roughly the same cost 758 00:39:25,730 --> 00:39:27,870 as you can read a megabyte, so you're better off 759 00:39:27,870 --> 00:39:28,859 reading a megabyte. 760 00:39:28,859 --> 00:39:31,150 I might be exaggerating a bit, but for a few kilobytes, 761 00:39:31,150 --> 00:39:32,090 that's true. 762 00:39:32,090 --> 00:39:34,560 So B trees have thousands of children, 763 00:39:34,560 --> 00:39:35,880 and they keep them sorted. 764 00:39:35,880 --> 00:39:39,630 It's sort of like that, but the fan out is huge. 765 00:39:39,630 --> 00:39:41,570 And it turns out everything is still log n 766 00:39:41,570 --> 00:39:44,256 as long as the number of children you have is constant. 767 00:39:44,256 --> 00:39:46,434 AUDIENCE: But for this rotation thing, though. 768 00:39:46,434 --> 00:39:47,350 PROFESSOR: Oh god, no. 769 00:39:47,350 --> 00:39:48,820 We don't have to rotate. 770 00:39:48,820 --> 00:39:51,640 It gets a lot more complicated. 771 00:39:51,640 --> 00:39:54,475 We haven't gotten to rebalancing yet, right? 772 00:39:54,475 --> 00:39:56,350 Wait until you see how rebalancing looks like 773 00:39:56,350 --> 00:39:57,920 with two children, and then imagine 774 00:39:57,920 --> 00:40:01,920 trying to figure out the cases for 1,000 children. 775 00:40:01,920 --> 00:40:03,560 That's not sane. 776 00:40:03,560 --> 00:40:05,450 In B trees, they use something else 777 00:40:05,450 --> 00:40:07,800 to make them balance the right way. 778 00:40:07,800 --> 00:40:11,710 And of course, they're harder than this. 779 00:40:11,710 --> 00:40:15,910 So let me try to get through rebalancing somewhat quickly. 780 00:40:25,610 --> 00:40:29,080 This is where I forget what I need to do. 781 00:40:29,080 --> 00:40:29,580 Rebalancing. 782 00:41:12,980 --> 00:41:27,450 So suppose we call Rebalance on this guy, 783 00:41:27,450 --> 00:41:30,930 and we know that the nodes here have heights k minus 1, 784 00:41:30,930 --> 00:41:33,600 this is k minus 1 or k, and this is k, 785 00:41:33,600 --> 00:41:36,730 and I want to call Rebalance here. 786 00:41:36,730 --> 00:41:39,040 Let's first figure out if it's an AVL tree 787 00:41:39,040 --> 00:41:41,030 or if there's something wrong with it. 788 00:41:41,030 --> 00:41:44,675 What's the height here? 789 00:41:44,675 --> 00:41:47,221 AUDIENCE: k plus 1. 790 00:41:47,221 --> 00:41:49,220 PROFESSOR: So no matter what the height is here, 791 00:41:49,220 --> 00:41:51,400 the height here has to be k plus 1. 792 00:41:51,400 --> 00:41:53,736 What's the height here? 793 00:41:53,736 --> 00:41:56,570 AUDIENCE: k plus 2. 794 00:41:56,570 --> 00:41:57,610 PROFESSOR: OK. 795 00:41:57,610 --> 00:41:59,780 What else can you tell me about this node? 796 00:42:05,160 --> 00:42:06,980 If I call Check RI on it, it's going 797 00:42:06,980 --> 00:42:10,440 to crash because the rep invariant for AVLs 798 00:42:10,440 --> 00:42:11,970 does not hold here. 799 00:42:11,970 --> 00:42:13,900 This child at the top of this tree 800 00:42:13,900 --> 00:42:15,430 will have height k minus 1. 801 00:42:15,430 --> 00:42:17,250 This will have height k plus 1. 802 00:42:17,250 --> 00:42:18,980 The difference is more than 1. 803 00:42:18,980 --> 00:42:21,650 Not good. 804 00:42:21,650 --> 00:42:22,544 So this is unhappy. 805 00:42:22,544 --> 00:42:23,710 How do we fix the situation? 806 00:42:26,380 --> 00:42:29,200 AUDIENCE: Rotate left around k plus 1. 807 00:42:29,200 --> 00:42:30,255 PROFESSOR: All right. 808 00:42:30,255 --> 00:42:33,520 So it better be a rotation, because we spent so much 809 00:42:33,520 --> 00:42:36,016 figuring out how to rotate. 810 00:42:36,016 --> 00:42:37,890 Let's see what happens if we rotate this way, 811 00:42:37,890 --> 00:42:39,370 and the way to keep track of this 812 00:42:39,370 --> 00:42:41,556 is I'm going to label my nodes. 813 00:42:45,010 --> 00:42:53,090 B, A, left child of A has height k minus 1. 814 00:42:53,090 --> 00:42:55,316 This guy is k minus 1. 815 00:43:00,760 --> 00:43:04,480 So what's the height of A now? 816 00:43:04,480 --> 00:43:06,432 AUDIENCE: k. 817 00:43:06,432 --> 00:43:07,896 Oh wait, that's not a node. 818 00:43:07,896 --> 00:43:10,840 Just k minus 1 or k. 819 00:43:10,840 --> 00:43:13,420 PROFESSOR: So this is a tree of height k minus 1. 820 00:43:13,420 --> 00:43:15,310 This is a height of tree k minus 1 or k. 821 00:43:15,310 --> 00:43:18,640 The height of this guy is? 822 00:43:18,640 --> 00:43:19,140 AUDIENCE: k. 823 00:43:22,486 --> 00:43:23,920 AUDIENCE: k plus 1. 824 00:43:23,920 --> 00:43:26,520 PROFESSOR: Or k plus 1. 825 00:43:26,520 --> 00:43:28,300 The height of this guy is definitely k. 826 00:43:28,300 --> 00:43:30,665 The height of this guys is k or k plus 1. 827 00:43:30,665 --> 00:43:31,290 Is that an AVL? 828 00:43:31,290 --> 00:43:32,510 Is it happy now? 829 00:43:37,100 --> 00:43:40,430 So this is the easy kind of rebalancing, one rotation 830 00:43:40,430 --> 00:43:41,418 and you're done. 831 00:43:41,418 --> 00:43:43,334 AUDIENCE: [INAUDIBLE] the difference between k 832 00:43:43,334 --> 00:43:44,749 minus 1 and k minus 1 over k? 833 00:43:44,749 --> 00:43:46,290 PROFESSOR: It's not k minus 1 over k. 834 00:43:46,290 --> 00:43:47,820 It's either k minus 1 or k. 835 00:43:50,610 --> 00:43:51,186 Bad notation. 836 00:43:54,050 --> 00:43:55,310 Thank you for the question. 837 00:43:55,310 --> 00:43:57,220 You saved everyone from confusion. 838 00:44:00,028 --> 00:44:02,370 No, there's no fractional heights. 839 00:44:02,370 --> 00:44:03,390 Anything else? 840 00:44:03,390 --> 00:44:03,890 Thank you. 841 00:44:03,890 --> 00:44:04,490 That was good. 842 00:44:09,461 --> 00:44:09,960 All right. 843 00:44:09,960 --> 00:44:11,030 So this is easy, right? 844 00:44:11,030 --> 00:44:12,270 This is the easy rotation. 845 00:44:12,270 --> 00:44:13,560 Let's do the hard one now. 846 00:44:37,670 --> 00:44:40,664 What's the height here? 847 00:44:40,664 --> 00:44:42,152 AUDIENCE: k plus 1. 848 00:44:47,112 --> 00:44:50,584 PROFESSOR: What's the height here? 849 00:44:50,584 --> 00:44:53,580 AUDIENCE: Plus 2. 850 00:44:53,580 --> 00:44:56,550 PROFESSOR: AVL, not AVL? 851 00:44:56,550 --> 00:44:58,310 AUDIENCE: No. 852 00:44:58,310 --> 00:45:00,810 PROFESSOR: k minus 1 on the left child, 853 00:45:00,810 --> 00:45:02,650 k plus 1 on the right child. 854 00:45:02,650 --> 00:45:04,350 The difference is more than one. 855 00:45:04,350 --> 00:45:06,120 Not an AVL. 856 00:45:06,120 --> 00:45:07,160 How do we fix this? 857 00:45:14,132 --> 00:45:17,620 AUDIENCE: Make a right rotation [INAUDIBLE]. 858 00:45:17,620 --> 00:45:18,560 PROFESSOR: Sorry? 859 00:45:18,560 --> 00:45:21,560 AUDIENCE: Make a right rotation on k first. 860 00:45:21,560 --> 00:45:23,378 PROFESSOR: So make a right rotation where? 861 00:45:23,378 --> 00:45:25,242 AUDIENCE: On k [INAUDIBLE] plus 1. 862 00:45:30,290 --> 00:45:32,665 PROFESSOR: Let me break this up so we 863 00:45:32,665 --> 00:45:34,870 could see how that would work out. 864 00:45:39,280 --> 00:45:42,720 So these two are k minus 1 and k minus 1, 865 00:45:42,720 --> 00:45:48,140 and these nodes are A, B, and C. So you're 866 00:45:48,140 --> 00:45:52,450 saying do a right rotation here, right? 867 00:45:52,450 --> 00:45:54,940 Let's see what that gets us to. 868 00:45:54,940 --> 00:45:58,360 So A is the same, and then here I'm 869 00:45:58,360 --> 00:46:04,140 going to have C instead of B, and B. The right child of B 870 00:46:04,140 --> 00:46:06,700 is the same as it was before. 871 00:46:06,700 --> 00:46:14,920 The left child of B is k minus 1. 872 00:46:14,920 --> 00:46:19,170 Then the left child of C is another k minus 1, 873 00:46:19,170 --> 00:46:21,510 and this guy is k minus 1. 874 00:46:21,510 --> 00:46:23,992 OK, what's the height at B now? 875 00:46:23,992 --> 00:46:25,920 AUDIENCE: k. 876 00:46:25,920 --> 00:46:27,320 PROFESSOR: OK. 877 00:46:27,320 --> 00:46:29,759 What's the height here? 878 00:46:29,759 --> 00:46:33,140 AUDIENCE: k plus 1. 879 00:46:33,140 --> 00:46:35,022 PROFESSOR: And the height here? 880 00:46:35,022 --> 00:46:37,760 AUDIENCE: k plus 2. 881 00:46:37,760 --> 00:46:40,110 PROFESSOR: Is this an AVL? 882 00:46:40,110 --> 00:46:43,936 So we're not done, but what's the good news about this? 883 00:46:43,936 --> 00:46:47,664 AUDIENCE: Now you can rotate it left around C. 884 00:46:47,664 --> 00:46:48,596 PROFESSOR: Yep. 885 00:46:48,596 --> 00:46:49,530 Exactly. 886 00:46:49,530 --> 00:46:55,980 That is this, and we know how to fix this in one step. 887 00:46:55,980 --> 00:46:57,630 You said exactly the right thing. 888 00:46:57,630 --> 00:46:59,910 This is exactly what we do. 889 00:46:59,910 --> 00:47:03,640 First, rotate here so that we get to that, 890 00:47:03,640 --> 00:47:06,390 and then we're in the happy case. 891 00:47:06,390 --> 00:47:08,990 So intuitively, the difference between the happy case 892 00:47:08,990 --> 00:47:13,330 and the harder case that's going to be happy eventually 893 00:47:13,330 --> 00:47:16,860 is where the heavy path is, so where 894 00:47:16,860 --> 00:47:19,370 you have the bigger height. 895 00:47:19,370 --> 00:47:24,500 In this case, k plus 2, k plus 1, k. 896 00:47:24,500 --> 00:47:29,230 So the heavy path is this thing here. 897 00:47:29,230 --> 00:47:30,160 It's a straight line. 898 00:47:30,160 --> 00:47:33,400 Because of that, we can rotate here and then 899 00:47:33,400 --> 00:47:36,290 redistribute the weights uniformly. 900 00:47:36,290 --> 00:47:39,410 In this case, the heavy path is this. 901 00:47:39,410 --> 00:47:44,920 It's a zigzag, so one rotation won't fix it. 902 00:47:44,920 --> 00:47:46,570 Instead, we do one rotation here so 903 00:47:46,570 --> 00:47:50,570 that we make the weights be distributed 904 00:47:50,570 --> 00:47:52,610 in a straight line, and then one rotation here, 905 00:47:52,610 --> 00:47:55,230 which makes everything look good. 906 00:47:55,230 --> 00:47:55,979 Yes? 907 00:47:55,979 --> 00:47:57,770 AUDIENCE: So that's just a general pattern. 908 00:47:57,770 --> 00:47:59,694 And you redistribute [INAUDIBLE]? 909 00:48:02,412 --> 00:48:04,870 PROFESSOR: So the first thing we do is say you're at a node 910 00:48:04,870 --> 00:48:05,964 and you need to rebalance. 911 00:48:05,964 --> 00:48:07,380 First thing, you check the heights 912 00:48:07,380 --> 00:48:09,980 and see if it's already balanced. 913 00:48:09,980 --> 00:48:13,120 If it's not balanced, suppose your heavier node 914 00:48:13,120 --> 00:48:14,700 is on the right. 915 00:48:14,700 --> 00:48:19,260 Then you go and see if the right child is heavier 916 00:48:19,260 --> 00:48:20,810 or the left child is heavier. 917 00:48:20,810 --> 00:48:24,660 So you see if the height distribution is a straight line 918 00:48:24,660 --> 00:48:26,680 or is exact, and then you decide whether you're 919 00:48:26,680 --> 00:48:28,429 going to do one rotation or two rotations. 920 00:48:31,970 --> 00:48:35,050 So this is all nice and good except for one thing. 921 00:48:35,050 --> 00:48:39,105 The heights keep changing here, and we're 922 00:48:39,105 --> 00:48:42,250 storing the heights in each node. 923 00:48:42,250 --> 00:48:44,150 So if we're doing all this rotating 924 00:48:44,150 --> 00:48:47,830 without updating the heights, then we're 925 00:48:47,830 --> 00:48:49,924 going to have something that looks like an AVL 926 00:48:49,924 --> 00:48:51,840 but doesn't keep track of the heights anymore, 927 00:48:51,840 --> 00:48:54,980 so eventually, we're going to make bad decisions. 928 00:48:54,980 --> 00:48:57,440 So we need to update the heights. 929 00:48:57,440 --> 00:49:00,254 Where do we update the heights? 930 00:49:00,254 --> 00:49:01,670 You can look at the code and cheat 931 00:49:01,670 --> 00:49:05,457 or you can try to think and tell me the answer, either way. 932 00:49:13,570 --> 00:49:17,630 So this only does rotates, right? 933 00:49:17,630 --> 00:49:20,649 If rotates would be able to update the heights, 934 00:49:20,649 --> 00:49:21,690 everything would be good. 935 00:49:21,690 --> 00:49:23,520 I wouldn't have to worry about this. 936 00:49:26,350 --> 00:49:29,520 Pretend I hadn't deleted the original case here, 937 00:49:29,520 --> 00:49:38,050 which I think was A, B, C, and then some children. 938 00:49:38,050 --> 00:49:39,720 What heights changed here? 939 00:49:45,950 --> 00:49:50,640 When I did this rotation, what heights changed? 940 00:49:50,640 --> 00:49:53,040 AUDIENCE: B's height changed. 941 00:49:53,040 --> 00:49:58,320 AUDIENCE: B, A, and C is greater than B. 942 00:49:58,320 --> 00:50:02,240 PROFESSOR: B and potentially A. If I'm not sure, 943 00:50:02,240 --> 00:50:04,300 then I'm going to try to update the height anyway 944 00:50:04,300 --> 00:50:06,630 to make sure that I have the right height. 945 00:50:06,630 --> 00:50:10,790 After I do this, I have to update the height on B 946 00:50:10,790 --> 00:50:15,225 and update the heights on A. Is this correct? 947 00:50:21,472 --> 00:50:22,960 AUDIENCE: And B. 948 00:50:22,960 --> 00:50:24,210 PROFESSOR: And all the way up. 949 00:50:24,210 --> 00:50:26,240 Let's assume that happens. 950 00:50:26,240 --> 00:50:28,242 Does this look right? 951 00:50:28,242 --> 00:50:31,220 AUDIENCE: Don't you also have to update for C? 952 00:50:31,220 --> 00:50:31,940 PROFESSOR: For c. 953 00:50:34,886 --> 00:50:37,840 Did I change anything under c? 954 00:50:37,840 --> 00:50:40,290 AUDIENCE: I think he's talking about capital C. 955 00:50:40,290 --> 00:50:42,980 PROFESSOR: Oh, capital C. Yeah, OK. 956 00:50:48,800 --> 00:50:50,185 So the children are the same. 957 00:50:50,185 --> 00:50:53,900 They were a and b before, they're a and b afterwards, 958 00:50:53,900 --> 00:50:55,586 so I don't have to worry about this guy. 959 00:50:55,586 --> 00:50:56,960 The heights here haven't changed, 960 00:50:56,960 --> 00:50:59,705 so the height here hasn't changed. 961 00:50:59,705 --> 00:51:01,080 But you're thinking of a problem, 962 00:51:01,080 --> 00:51:03,121 and that is good because there is a problem here. 963 00:51:06,251 --> 00:51:09,632 AUDIENCE: What if there's a subtree instead of small c? 964 00:51:09,632 --> 00:51:13,220 PROFESSOR: A subtree instead of small c? 965 00:51:13,220 --> 00:51:14,610 OK, so yeah, this is a subtree. 966 00:51:17,140 --> 00:51:19,960 After rotating, this subtree moves over here. 967 00:51:22,612 --> 00:51:23,236 AUDIENCE: Wait. 968 00:51:23,236 --> 00:51:25,580 You have to update A first, and then update B. 969 00:51:25,580 --> 00:51:27,300 PROFESSOR: OK. 970 00:51:27,300 --> 00:51:29,510 So the height starts from the bottom. 971 00:51:29,510 --> 00:51:31,260 The bottom height zero, the next one 972 00:51:31,260 --> 00:51:33,400 up, the next one up, the next one up. 973 00:51:33,400 --> 00:51:34,920 Keep that in mind. 974 00:51:34,920 --> 00:51:37,640 Nothing changed below this guy, nothing changed below this guy, 975 00:51:37,640 --> 00:51:42,090 nothing changed here, so I don't have to update these. 976 00:51:42,090 --> 00:51:44,540 But when I compute the height, update height 977 00:51:44,540 --> 00:51:47,490 at the beginning of the listing assumes 978 00:51:47,490 --> 00:51:49,450 that the height of the children is correct. 979 00:51:49,450 --> 00:51:51,116 So when I compute the height of my mode, 980 00:51:51,116 --> 00:51:53,000 the height of the children has to be updated. 981 00:51:53,000 --> 00:51:55,117 So if I call Update Height of B first, 982 00:51:55,117 --> 00:51:56,950 I already know that this might have changed, 983 00:51:56,950 --> 00:51:59,420 so whatever answer I get is bogus. 984 00:51:59,420 --> 00:52:03,575 So this doesn't work and I have to do this. 985 00:52:13,290 --> 00:52:14,480 This is rotation. 986 00:52:14,480 --> 00:52:16,380 That's rebalancing. 987 00:52:16,380 --> 00:52:18,240 One more trick to rebalancing. 988 00:52:18,240 --> 00:52:23,200 So we talked about rebalancing the subtree at one node. 989 00:52:23,200 --> 00:52:24,660 What's missing from this picture? 990 00:52:29,560 --> 00:52:32,010 AUDIENCE: So you check it on all levels? 991 00:52:32,010 --> 00:52:32,920 PROFESSOR: Yeah. 992 00:52:32,920 --> 00:52:34,560 So I made some changes here. 993 00:52:34,560 --> 00:52:38,830 My tree might look happy starting from here on, 994 00:52:38,830 --> 00:52:41,090 but if I did an insertion or a deletion, 995 00:52:41,090 --> 00:52:43,570 heights changed all the way up, so I 996 00:52:43,570 --> 00:52:46,635 have to go all the way up and do more rebalancings potentially. 997 00:52:50,460 --> 00:52:51,600 And that's it. 998 00:52:51,600 --> 00:52:52,787 These are AVLs. 999 00:52:52,787 --> 00:52:53,620 Any other questions? 1000 00:52:58,574 --> 00:52:59,490 Did you guys get them? 1001 00:52:59,490 --> 00:53:00,993 Do they make more sense now? 1002 00:53:00,993 --> 00:53:01,820 AUDIENCE: Yes. 1003 00:53:01,820 --> 00:53:03,030 PROFESSOR: OK. 1004 00:53:03,030 --> 00:53:04,300 Good. 1005 00:53:04,300 --> 00:53:08,134 You'll have to play with them on the next Pset. 1006 00:53:08,134 --> 00:53:10,004 AUDIENCE: By "play," what do you mean? 1007 00:53:10,004 --> 00:53:12,170 PROFESSOR: Well, you're not writing it from scratch. 1008 00:53:12,170 --> 00:53:13,914 You have to modify existing code. 1009 00:53:13,914 --> 00:53:17,614 AUDIENCE: What are we going to make it do? 1010 00:53:17,614 --> 00:53:20,030 PROFESSOR: So you'll have to keep track of something other 1011 00:53:20,030 --> 00:53:20,850 than heights. 1012 00:53:20,850 --> 00:53:22,670 You'll have to keep track of a new value. 1013 00:53:22,670 --> 00:53:24,503 AUDIENCE: Oh, like the minimum or something? 1014 00:53:24,503 --> 00:53:26,020 PROFESSOR: Or something, yeah.