1 00:00:00,070 --> 00:00:02,430 The following content is provided under a Creative 2 00:00:02,430 --> 00:00:03,810 Commons license. 3 00:00:03,810 --> 00:00:06,060 Your support will help MIT OpenCourseWare 4 00:00:06,060 --> 00:00:10,140 continue to offer high quality educational resources for free. 5 00:00:10,140 --> 00:00:12,700 To make a donation or to view additional materials 6 00:00:12,700 --> 00:00:16,600 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:16,600 --> 00:00:17,260 at ocw.mit.edu. 8 00:00:26,680 --> 00:00:28,150 PROFESSOR: All right, well I'd like 9 00:00:28,150 --> 00:00:31,470 to thank you for inviting me again 10 00:00:31,470 --> 00:00:33,750 to talk to the poker class. 11 00:00:33,750 --> 00:00:38,250 It's always great to come here, and we're 12 00:00:38,250 --> 00:00:40,970 going to be having a tournament in a couple weeks, 13 00:00:40,970 --> 00:00:43,890 so good luck for the people participating in that. 14 00:00:43,890 --> 00:00:47,210 Actually, I'm coming back in another two weeks 15 00:00:47,210 --> 00:00:50,610 because I think [INAUDIBLE] a Harvard MIT math tournament 16 00:00:50,610 --> 00:00:53,760 for high school kids. 17 00:00:53,760 --> 00:00:56,120 I really love visiting MIT. 18 00:00:56,120 --> 00:00:59,270 I just wish it were at some other time besides the winter. 19 00:01:03,084 --> 00:01:04,125 Then it would be perfect. 20 00:01:04,125 --> 00:01:05,850 All right, today I'm going to talk 21 00:01:05,850 --> 00:01:10,290 about the University of Alberta's Cepheus computer 22 00:01:10,290 --> 00:01:11,100 program. 23 00:01:11,100 --> 00:01:12,470 It supposedly solved poker. 24 00:01:12,470 --> 00:01:15,316 We're going to talk about what they actually did. 25 00:01:15,316 --> 00:01:16,270 [LAUGHTER] 26 00:01:16,270 --> 00:01:18,360 There seems to be a lot of buzz about this, 27 00:01:18,360 --> 00:01:24,070 so I thought this was a good to do. 28 00:01:24,070 --> 00:01:29,130 So I have to tell you that Jared and I did not work directly 29 00:01:29,130 --> 00:01:31,940 with the University of Alberta people, 30 00:01:31,940 --> 00:01:34,100 but we are very familiar with their methods 31 00:01:34,100 --> 00:01:38,920 and have actually tried some of their coding techniques. 32 00:01:38,920 --> 00:01:42,360 So we're pretty familiar with the same research that's 33 00:01:42,360 --> 00:01:43,430 going on. 34 00:01:43,430 --> 00:01:47,540 To It's sort of an, I think, objective commentary. 35 00:01:47,540 --> 00:01:51,480 So by the way, as the lecture goes on, 36 00:01:51,480 --> 00:01:53,450 you can interrupt with questions. 37 00:01:53,450 --> 00:01:56,140 Just raise your hands if something is unclear 38 00:01:56,140 --> 00:02:00,215 because I've been told I have about 80 minutes. 39 00:02:00,215 --> 00:02:04,440 Probably spend 55 and then save the rest for questions. 40 00:02:04,440 --> 00:02:07,930 All right, so that line of talk-- first 41 00:02:07,930 --> 00:02:11,410 I'm going to talk about what the Cepheus accomplished, 42 00:02:11,410 --> 00:02:14,260 what the University of Alberta people accomplished, 43 00:02:14,260 --> 00:02:21,390 and I'm going to bring that up by discussing game theory 44 00:02:21,390 --> 00:02:23,340 optimal energies in poker. 45 00:02:23,340 --> 00:02:25,640 How many of you know what game [INAUDIBLE] is. 46 00:02:25,640 --> 00:02:28,750 I just want to know [INAUDIBLE] or what a [INAUDIBLE] is. 47 00:02:28,750 --> 00:02:31,100 Raise your hands. 48 00:02:31,100 --> 00:02:32,200 OK. 49 00:02:32,200 --> 00:02:34,270 So about 1/2, 2/3. 50 00:02:34,270 --> 00:02:34,980 Good. 51 00:02:34,980 --> 00:02:39,194 I'm going to do a quick introduction to what game 52 00:02:39,194 --> 00:02:40,110 theory [INAUDIBLE] is. 53 00:02:40,110 --> 00:02:43,700 We're going to talk about a simple poker game and solutions 54 00:02:43,700 --> 00:02:44,970 to it. 55 00:02:44,970 --> 00:02:47,740 And then I'm going to go into their algorithm, which 56 00:02:47,740 --> 00:02:51,810 is written [INAUDIBLE]. 57 00:02:51,810 --> 00:02:55,750 They used the method of counterfactual [INAUDIBLE]. 58 00:02:55,750 --> 00:02:58,370 Actually, the method they used to push 59 00:02:58,370 --> 00:03:00,080 through to the solution of the problem 60 00:03:00,080 --> 00:03:04,820 is counter CF plus, which is basically 61 00:03:04,820 --> 00:03:09,460 the original algorithm with some shortcuts, which we'll discuss. 62 00:03:09,460 --> 00:03:13,640 After this, though, we're going to think about extensions 63 00:03:13,640 --> 00:03:17,160 of computer solutions to other games, including [INAUDIBLE] 64 00:03:17,160 --> 00:03:19,640 games and multiplayer games. 65 00:03:19,640 --> 00:03:23,490 A couple people have questions about [INAUDIBLE] no limit 66 00:03:23,490 --> 00:03:24,820 program. 67 00:03:24,820 --> 00:03:31,970 We'll talk about what they're work entailed if questions 68 00:03:31,970 --> 00:03:35,630 lead in that direction. 69 00:03:35,630 --> 00:03:39,120 All right, let's talk about what Cepheus accomplished. 70 00:03:39,120 --> 00:03:42,280 It's a game theory [INAUDIBLE] solution to heads up limit 71 00:03:42,280 --> 00:03:43,280 hold 'em. 72 00:03:43,280 --> 00:03:45,140 And so what does that mean? 73 00:03:45,140 --> 00:03:48,350 You guys all know what limit hold 'em is, right? 74 00:03:48,350 --> 00:03:50,150 Good. 75 00:03:50,150 --> 00:03:56,740 Basically, after [INAUDIBLE] few years, 76 00:03:56,740 --> 00:03:59,960 they've achieved and exploited less than 1/1000 77 00:03:59,960 --> 00:04:01,310 of a big blind. 78 00:04:01,310 --> 00:04:09,050 So the first thing is not a boo perfect optimal solution. 79 00:04:09,050 --> 00:04:11,960 You can still exploit it for about 1/1000 80 00:04:11,960 --> 00:04:16,579 of a blind for a hand. 81 00:04:16,579 --> 00:04:21,890 However, there are probably better games. 82 00:04:21,890 --> 00:04:26,590 This is like 1/20-- this is 1/2000 of a big bet. 83 00:04:26,590 --> 00:04:30,790 You can actually play heads up for 50 years at normal speed 84 00:04:30,790 --> 00:04:36,260 and still have some probability of losing. 85 00:04:36,260 --> 00:04:42,400 The reason for that is the standard deviation of heads 86 00:04:42,400 --> 00:04:47,100 up limit hold 'em is about five big blinds. 87 00:04:47,100 --> 00:04:49,380 So you can just imagine how many hands 88 00:04:49,380 --> 00:04:52,440 you have to play [INAUDIBLE] the significance. 89 00:04:52,440 --> 00:04:56,850 About, oh, 25 million. 90 00:04:56,850 --> 00:04:59,460 So it's definitely a milestone. 91 00:04:59,460 --> 00:05:02,890 This is the first time a real poker game has been solved. 92 00:05:02,890 --> 00:05:06,030 In math of poker, we solved ace, king, queen, [INAUDIBLE] 93 00:05:06,030 --> 00:05:11,580 on paper, but [INAUDIBLE] a real poker games that's solved. 94 00:05:11,580 --> 00:05:14,390 However, given their previous work, 95 00:05:14,390 --> 00:05:16,120 it was just a matter of [INAUDIBLE]. 96 00:05:16,120 --> 00:05:17,670 I remember two or three years ago 97 00:05:17,670 --> 00:05:21,970 they passed the 1/100 of a big bet, which 98 00:05:21,970 --> 00:05:24,230 is sort of our measurement of significance. 99 00:05:24,230 --> 00:05:29,280 If you're playing and you're winning more than 1/100 100 00:05:29,280 --> 00:05:33,490 of a big bet for a hand, you can [INAUDIBLE] it's 101 00:05:33,490 --> 00:05:34,440 a probable game. 102 00:05:34,440 --> 00:05:37,230 Below that comes theoretical. 103 00:05:37,230 --> 00:05:40,080 So it's definitely a milestone. 104 00:05:40,080 --> 00:05:47,150 And basically I knew that, if they just maybe spent 105 00:05:47,150 --> 00:05:50,100 more CPU power, they would get the solution. 106 00:05:50,100 --> 00:05:56,000 For 900 CPU years, we finally got the solution. 107 00:05:56,000 --> 00:05:59,460 So I don't know. 108 00:05:59,460 --> 00:06:05,410 If I had that much CPU power, I'd solve a few problems, too. 109 00:06:05,410 --> 00:06:08,550 But it's still the miles [INAUDIBLE]. 110 00:06:08,550 --> 00:06:10,670 It's great. 111 00:06:10,670 --> 00:06:13,350 So what effect does this have on other games? 112 00:06:13,350 --> 00:06:16,480 Does this mean poker is going to go the way of chess 113 00:06:16,480 --> 00:06:19,620 for computers who are just much better than we are? 114 00:06:19,620 --> 00:06:21,560 I don't think we're there yet, and we'll 115 00:06:21,560 --> 00:06:25,000 talk about that later. 116 00:06:25,000 --> 00:06:28,230 So let's talk about Nash equilibrium. 117 00:06:28,230 --> 00:06:32,210 So John F. Nash won the Nobel Prize in 1994 118 00:06:32,210 --> 00:06:34,460 "for pioneering analysis of equilibrium 119 00:06:34,460 --> 00:06:36,870 in the theory of non-cooperative games." 120 00:06:36,870 --> 00:06:39,860 And he extended the work of John Von Neumann and Oskar 121 00:06:39,860 --> 00:06:42,470 Morgenstern, [INAUDIBLE] actually first considered 122 00:06:42,470 --> 00:06:44,680 these two player zero sum games. 123 00:06:44,680 --> 00:06:47,530 So Nash equilibrium is just a set of strategies 124 00:06:47,530 --> 00:06:52,510 such that no player can actually improve their strategy 125 00:06:52,510 --> 00:06:55,980 and make more [INAUDIBLE]. 126 00:06:55,980 --> 00:06:57,420 [INAUDIBLE] whatever. 127 00:06:57,420 --> 00:07:00,020 In a the two player zero sum games, 128 00:07:00,020 --> 00:07:03,955 we refer to Nash equilibria as also very optimal. 129 00:07:03,955 --> 00:07:06,800 The reason is because Nash equilibria are also 130 00:07:06,800 --> 00:07:08,840 the min/max solution. 131 00:07:08,840 --> 00:07:11,520 It's the best you can do given that he can 132 00:07:11,520 --> 00:07:14,850 see what you do and respond. 133 00:07:14,850 --> 00:07:16,754 Simplest case of Nash equilibria is, 134 00:07:16,754 --> 00:07:18,420 if you're playing rock, paper, scissors, 135 00:07:18,420 --> 00:07:21,580 what's the Nash equilibrium? 136 00:07:21,580 --> 00:07:22,420 1/3 each. 137 00:07:22,420 --> 00:07:27,580 So that's not that exciting in this case, because both players 138 00:07:27,580 --> 00:07:29,340 kind of just t0. 139 00:07:29,340 --> 00:07:31,930 You can't make more than 0, you can't make less than 0. 140 00:07:31,930 --> 00:07:34,960 So it doesn't seem to be that exciting a solution, 141 00:07:34,960 --> 00:07:37,520 but in poker it's kind of exciting 142 00:07:37,520 --> 00:07:41,570 because they're kind of dominated mistakes people play, 143 00:07:41,570 --> 00:07:47,340 or mistakes that actually lose money to the optimal solution. 144 00:07:47,340 --> 00:07:50,950 So the reason 1/3, 1/3 is the Nash equilibrium 145 00:07:50,950 --> 00:07:55,530 because nobody can do anything to improve their lot. 146 00:07:55,530 --> 00:07:57,360 It may not be the best thing to play. 147 00:07:57,360 --> 00:08:01,670 If a guy is playing 1/2 scissors and 1/2 rock, 148 00:08:01,670 --> 00:08:05,020 what should you play? 149 00:08:05,020 --> 00:08:06,870 100% rock. 150 00:08:06,870 --> 00:08:09,650 Yeah, sort of like the Aerosmith strategy. 151 00:08:09,650 --> 00:08:12,270 [LAUGHTER] 152 00:08:12,270 --> 00:08:12,850 Right. 153 00:08:12,850 --> 00:08:16,610 So there are much better ways to play if your opponents deviate 154 00:08:16,610 --> 00:08:18,910 from Nash equilibrium. 155 00:08:18,910 --> 00:08:22,640 So actually game theory optimal is not necessarily the best way 156 00:08:22,640 --> 00:08:25,800 to play, even heads up. 157 00:08:25,800 --> 00:08:29,490 It's a way to play to kind of guaranteed you never lose. 158 00:08:29,490 --> 00:08:32,919 So that's sort of the accomplishment. 159 00:08:32,919 --> 00:08:36,919 That's why we like to find these things. 160 00:08:36,919 --> 00:08:39,539 I know I could just play this, and I'm not 161 00:08:39,539 --> 00:08:42,770 taking total advantage of my opponent's mistakes, 162 00:08:42,770 --> 00:08:44,940 but at least I'm playing in away where he can't 163 00:08:44,940 --> 00:08:47,410 take advantage of me at all. 164 00:08:47,410 --> 00:08:50,120 Let's do a simple example. 165 00:08:50,120 --> 00:08:55,100 So this is an example that I shared 166 00:08:55,100 --> 00:08:57,060 with the class a couple years ago. 167 00:08:57,060 --> 00:08:59,210 So there are two players, Rose and Colin, 168 00:08:59,210 --> 00:09:00,850 and the reason the players are called 169 00:09:00,850 --> 00:09:04,185 Rose and Colin are because this refers to [INAUDIBLE] games. 170 00:09:06,880 --> 00:09:10,090 One player chooses a row, the other player chooses a column. 171 00:09:10,090 --> 00:09:13,050 That's their payoff. 172 00:09:13,050 --> 00:09:16,740 And for a three player game, we introduce Larry, 173 00:09:16,740 --> 00:09:19,530 because there are layers. 174 00:09:19,530 --> 00:09:22,510 So the two players are Rose and Colin. 175 00:09:22,510 --> 00:09:25,840 So each player antes $50 for $100 in a pot. 176 00:09:25,840 --> 00:09:27,700 Rose looks at a card [INAUDIBLE] full deck, 177 00:09:27,700 --> 00:09:30,660 who will win in the pot a showdown if the card is. 178 00:09:30,660 --> 00:09:32,380 Otherwise she will lose. 179 00:09:32,380 --> 00:09:35,950 So Rose can decide to bet $100 or check 180 00:09:35,950 --> 00:09:37,660 after she looks at her card. 181 00:09:37,660 --> 00:09:38,974 So there's $100 in the pot. 182 00:09:38,974 --> 00:09:39,890 She looks at her card. 183 00:09:39,890 --> 00:09:43,340 She [INAUDIBLE] whether to be $100 or to check. 184 00:09:43,340 --> 00:09:47,180 If Rose bets, Colin may decide to call $100 or fold. 185 00:09:47,180 --> 00:09:49,420 If Colin folds, Rose wins. 186 00:09:49,420 --> 00:09:51,220 Well, you guys know how poker works. 187 00:09:51,220 --> 00:09:52,790 If Colin calls, there's a showdown, 188 00:09:52,790 --> 00:09:54,420 and her card is actually a spade. 189 00:09:54,420 --> 00:09:57,710 She wins the whole pot. 190 00:09:57,710 --> 00:09:59,200 Colin wins the pot. 191 00:09:59,200 --> 00:10:01,992 So what's the optimal strategies for Rose and Colin? 192 00:10:01,992 --> 00:10:03,200 Does anybody know the answer? 193 00:10:06,690 --> 00:10:11,590 Well, let's do one [INAUDIBLE] part of it. 194 00:10:11,590 --> 00:10:14,840 How often do you think [INAUDIBLE] should call? 195 00:10:14,840 --> 00:10:17,310 Colin wants a call [INAUDIBLE] enough to make 196 00:10:17,310 --> 00:10:21,000 Rose's bluffs probable. 197 00:10:21,000 --> 00:10:25,220 If Rose gets a spade, what is she going to do? 198 00:10:25,220 --> 00:10:25,820 Bet. 199 00:10:25,820 --> 00:10:29,140 She has nothing to lose by betting, 200 00:10:29,140 --> 00:10:32,060 unless she's being very, very tricky, 201 00:10:32,060 --> 00:10:36,450 but it is correct to bet. 202 00:10:36,450 --> 00:10:39,460 So let's see. 203 00:10:39,460 --> 00:10:41,690 If Rose doesn't pick up a spade and bluffs, 204 00:10:41,690 --> 00:10:44,540 how often does that have to succeed for it 205 00:10:44,540 --> 00:10:46,914 to be profitable? 206 00:10:46,914 --> 00:10:49,030 There's $100 in the pot. 207 00:10:49,030 --> 00:10:50,100 She looks. 208 00:10:50,100 --> 00:10:53,030 If it's not a spade, she has to bet $100, 209 00:10:53,030 --> 00:10:57,890 and how much is she risking? 210 00:10:57,890 --> 00:10:59,140 How much is she going to win? 211 00:11:03,630 --> 00:11:06,290 It's actually $100 and another $100, right? 212 00:11:06,290 --> 00:11:09,010 Because there's $100 in a pot. 213 00:11:09,010 --> 00:11:12,330 Sure, she anted something and made the pot, 214 00:11:12,330 --> 00:11:14,680 but she's spending $100. 215 00:11:14,680 --> 00:11:18,080 And if Colin calls, she's going to lose the $100. 216 00:11:18,080 --> 00:11:21,280 If Colin folds, she's going to win the $100 in the pot, 217 00:11:21,280 --> 00:11:22,780 or she could have just given up. 218 00:11:22,780 --> 00:11:24,600 So it's 1 to 1. 219 00:11:24,600 --> 00:11:28,710 So Rose should call half the time-- 220 00:11:28,710 --> 00:11:31,130 I mean Colin should call half the time. 221 00:11:31,130 --> 00:11:35,230 Rose should bet to bluff in a 2 to 1 222 00:11:35,230 --> 00:11:39,680 ratio, because that's the odds Colin gets to call. 223 00:11:39,680 --> 00:11:42,110 So Rose should always bet a spade. 224 00:11:42,110 --> 00:11:45,520 If Colin calls 100% of the time, Rose will just never bluff. 225 00:11:45,520 --> 00:11:48,150 If Colin never calls, Rose would just be every time. 226 00:11:48,150 --> 00:11:51,640 So there is kind of no equilibrium there. 227 00:11:51,640 --> 00:11:53,740 If Colin calls half the time, Rose 228 00:11:53,740 --> 00:11:55,100 will be indifferent to bluffing. 229 00:11:55,100 --> 00:11:58,770 She'll be negative $50 either way without a spade, 230 00:11:58,770 --> 00:12:00,570 and then $100 with a spade. 231 00:12:00,570 --> 00:12:02,680 Now, this is strategy for [INAUDIBLE] 232 00:12:02,680 --> 00:12:04,720 and the correct strategy for Rose 233 00:12:04,720 --> 00:12:11,070 is this ratio of bluff to spade, which is 1 to 2. 234 00:12:11,070 --> 00:12:15,110 So Rose should basically bet half of her hearts. 235 00:12:15,110 --> 00:12:17,530 She can bet the high hearts, and I 236 00:12:17,530 --> 00:12:21,500 guess with the eight of hearts she can decide whether-- is 237 00:12:21,500 --> 00:12:23,720 it the eight or the seven? 238 00:12:23,720 --> 00:12:26,870 No, it's the-- yeah, it's the eight. 239 00:12:26,870 --> 00:12:28,960 [INAUDIBLE] with the eight of hearts 240 00:12:28,960 --> 00:12:34,120 she can decide whether to bet or not like half the time. 241 00:12:34,120 --> 00:12:36,870 So these are Nash equilibrium and game theory optimal 242 00:12:36,870 --> 00:12:43,420 strategies, and basically the value of the game is negative-- 243 00:12:43,420 --> 00:12:47,210 is worth $12.50 to Colin. 244 00:12:47,210 --> 00:12:50,990 Any questions about this? 245 00:12:50,990 --> 00:12:54,950 All right, so these are the strategies 246 00:12:54,950 --> 00:13:02,970 that the algorithm tries to find. 247 00:13:02,970 --> 00:13:07,400 Let's go on to the algorithm now. 248 00:13:07,400 --> 00:13:10,420 Well, let's talk about what [INAUDIBLE] optimal is first. 249 00:13:10,420 --> 00:13:15,280 By the way, there will be about five or so transparencies 250 00:13:15,280 --> 00:13:17,150 [INAUDIBLE] of math equations. 251 00:13:17,150 --> 00:13:20,990 So just suffer through these. 252 00:13:20,990 --> 00:13:26,710 Those of you who understand are going to enjoy the later part, 253 00:13:26,710 --> 00:13:29,710 but let's just talk formally about what 254 00:13:29,710 --> 00:13:31,460 game theory optimal means. 255 00:13:31,460 --> 00:13:34,980 So there's this game function, u. 256 00:13:34,980 --> 00:13:38,860 It takes two strategies, an x strategy and a y strategy, 257 00:13:38,860 --> 00:13:41,500 and it gives [INAUDIBLE]. 258 00:13:41,500 --> 00:13:43,130 If this was rock, paper, scissors, 259 00:13:43,130 --> 00:13:46,890 you would have u of rock versus scissors 260 00:13:46,890 --> 00:13:50,160 to be 1, so on and so forth. 261 00:13:50,160 --> 00:13:53,000 It's positive for x and negative-- x is trying 262 00:13:53,000 --> 00:14:01,130 to-- x gets u, and y loses u. 263 00:14:01,130 --> 00:14:03,190 That's the idea. 264 00:14:03,190 --> 00:14:07,280 So one of things is we can take convex linear combinations 265 00:14:07,280 --> 00:14:08,610 of strategies. 266 00:14:08,610 --> 00:14:13,460 That is, if x sigma xk are strategies 267 00:14:13,460 --> 00:14:18,900 and we have some coefficients that are all non-negative 268 00:14:18,900 --> 00:14:21,960 and that sum to 1, we can make a new strategy 269 00:14:21,960 --> 00:14:24,630 as a linear combination of these strategies. 270 00:14:24,630 --> 00:14:32,310 And also u is bi-linear means that the value of the game 271 00:14:32,310 --> 00:14:36,180 here is just the linear combination 272 00:14:36,180 --> 00:14:38,690 that [INAUDIBLE] sigma x. 273 00:14:38,690 --> 00:14:43,230 And it would be the same also for sigma y. 274 00:14:43,230 --> 00:14:45,860 This just means, suppose you have two strategies 275 00:14:45,860 --> 00:14:52,530 and you play 1/3 sigma x1 and 2/3 sigma x2, 276 00:14:52,530 --> 00:14:56,225 your payoff is going to be 1/3 of the payoff of sigma x1 277 00:14:56,225 --> 00:14:57,725 and 1/3 payoff of sigma x2. 278 00:15:00,310 --> 00:15:04,080 Hopefully that's pretty clear. 279 00:15:04,080 --> 00:15:05,940 Now we define a pair of strategies 280 00:15:05,940 --> 00:15:10,600 to be an epsilonic rim if the best x can do against y 281 00:15:10,600 --> 00:15:12,010 is this strategy. 282 00:15:12,010 --> 00:15:17,000 The best y can do against x is this strategy-- is epsilon. 283 00:15:17,000 --> 00:15:22,550 And if epsilon equals 0, these are in Nash equilibrium. 284 00:15:22,550 --> 00:15:26,880 So after 900 PU hours, what they found 285 00:15:26,880 --> 00:15:29,320 were two strategies-- sigma x star, 286 00:15:29,320 --> 00:15:38,860 sigma y star-- that were within 1/1000 287 00:15:38,860 --> 00:15:41,790 of a big blind of equilibrium. 288 00:15:41,790 --> 00:15:47,950 And that's basically [INAUDIBLE] accomplished. 289 00:15:47,950 --> 00:15:50,860 So I'm going to actually go through the nitty gritty of how 290 00:15:50,860 --> 00:15:56,280 they did this in case you would like to write on poker solver 291 00:15:56,280 --> 00:15:58,630 Sunday. 292 00:15:58,630 --> 00:16:03,670 So the big idea that they borrowed 293 00:16:03,670 --> 00:16:05,810 was this idea of regret minimization, 294 00:16:05,810 --> 00:16:07,590 which is actually pretty cool. 295 00:16:07,590 --> 00:16:11,750 Suppose that each time step t the player has 296 00:16:11,750 --> 00:16:13,920 a few pure strategies. 297 00:16:13,920 --> 00:16:17,510 We're assuming the player has a handful of strategies. 298 00:16:17,510 --> 00:16:21,780 In poker, obviously, there's trillions of strategies, 299 00:16:21,780 --> 00:16:24,240 but-- two to the trillions of strategies. 300 00:16:24,240 --> 00:16:26,920 But say he has two strategies. 301 00:16:26,920 --> 00:16:28,524 He can play one or two. 302 00:16:28,524 --> 00:16:30,690 Suppose it's odds, or evens, or something like that. 303 00:16:30,690 --> 00:16:32,523 Or he has three strategies like [INAUDIBLE]. 304 00:16:35,380 --> 00:16:39,640 So basically he chooses some sort 305 00:16:39,640 --> 00:16:45,180 of mixture of strategies at the beginning, 306 00:16:45,180 --> 00:16:49,730 and we're only dealing with one player at this time. 307 00:16:49,730 --> 00:16:52,630 We're assuming the other guy-- we're 308 00:16:52,630 --> 00:16:55,300 assuming he's playing against some adversary that's 309 00:16:55,300 --> 00:16:55,830 all knowing. 310 00:16:58,530 --> 00:17:01,300 That's the original set up, regret memorization. 311 00:17:01,300 --> 00:17:03,040 We'll talk about how this applies 312 00:17:03,040 --> 00:17:06,630 to game theory in general. 313 00:17:06,630 --> 00:17:11,510 Now with each time t we're given values ut of sigma k. 314 00:17:11,510 --> 00:17:15,030 So basically after he determines this, 315 00:17:15,030 --> 00:17:19,960 the adversary decides what the value of use of t is, 316 00:17:19,960 --> 00:17:22,339 and basically his payoff is just [INAUDIBLE] 317 00:17:22,339 --> 00:17:25,274 a linear combination of the things he picked. 318 00:17:25,274 --> 00:17:32,090 But the idea is that the adversary can be adversarial. 319 00:17:32,090 --> 00:17:38,360 he can decide to make the [INAUDIBLE] strategy score well 320 00:17:38,360 --> 00:17:40,870 some of the time, and the [INAUDIBLE] strategy 321 00:17:40,870 --> 00:17:42,780 score badly some of the time. 322 00:17:42,780 --> 00:17:48,190 So basically now the idea is to calculate a regret. 323 00:17:48,190 --> 00:17:50,070 By the way, this is not the notation 324 00:17:50,070 --> 00:17:55,130 that's used in the three or four papers they wrote on this, 325 00:17:55,130 --> 00:18:01,780 because I think they did great work-- it's really 326 00:18:01,780 --> 00:18:05,450 written as a math paper. 327 00:18:05,450 --> 00:18:11,080 It looks like a particle physics paper, which is-- actually 328 00:18:11,080 --> 00:18:15,450 for particle physics you need all the complex notation 329 00:18:15,450 --> 00:18:17,920 because they're trying to describe something [INAUDIBLE] 330 00:18:17,920 --> 00:18:20,910 difficult. I think for computer science papers usually 331 00:18:20,910 --> 00:18:23,480 don't need this. 332 00:18:23,480 --> 00:18:26,340 So I'll explain this, and then you guys through 333 00:18:26,340 --> 00:18:29,380 reread their paper. 334 00:18:29,380 --> 00:18:33,630 I think that [INAUDIBLE] give you a quicker way 335 00:18:33,630 --> 00:18:34,880 to understand their paper. 336 00:18:34,880 --> 00:18:38,230 So there's this thing called regret of the k option 337 00:18:38,230 --> 00:18:42,980 at time t, which is just the sum of the difference of playing 338 00:18:42,980 --> 00:18:46,110 k versus playing whatever you played. 339 00:18:46,110 --> 00:18:49,230 So basically you can have positive regret 340 00:18:49,230 --> 00:18:51,740 or negative regrets. 341 00:18:51,740 --> 00:18:56,490 Negative regrets means that what you played-- what you decided 342 00:18:56,490 --> 00:19:00,260 to play up to time t was better than just playing k 343 00:19:00,260 --> 00:19:02,950 at each time step. 344 00:19:02,950 --> 00:19:06,180 So we're only concerned-- we're mostly 345 00:19:06,180 --> 00:19:08,030 concerned with the positive regret, which 346 00:19:08,030 --> 00:19:09,620 means, instead of playing, you should 347 00:19:09,620 --> 00:19:14,870 have made-- you could have made more money by playing option k. 348 00:19:14,870 --> 00:19:19,470 So what's the significance of this? 349 00:19:19,470 --> 00:19:22,520 So the idea is we want the average regret, which 350 00:19:22,520 --> 00:19:26,860 is this element divided by t. 351 00:19:26,860 --> 00:19:32,220 So basically you want the average regret, average amount 352 00:19:32,220 --> 00:19:35,950 that you're kind missing out on to be less than epsilon sub 353 00:19:35,950 --> 00:19:39,080 t, where in epsilon sub t is the [INAUDIBLE] converging to 0. 354 00:19:39,080 --> 00:19:46,690 If you have this, you have some regret [INAUDIBLE]. 355 00:19:46,690 --> 00:19:51,650 So the cool thing about this is you can do regret matching. 356 00:19:51,650 --> 00:19:55,770 You can let these weights-- first of all, 357 00:19:55,770 --> 00:19:59,400 you just look at the positive, the things 358 00:19:59,400 --> 00:20:03,960 with positive regret, and weight the options. 359 00:20:03,960 --> 00:20:05,470 At each [INAUDIBLE], we basically 360 00:20:05,470 --> 00:20:09,530 weight the options that have positive regrets accordingly. 361 00:20:09,530 --> 00:20:12,330 And if you're so lucky that nothing is positive regret, 362 00:20:12,330 --> 00:20:15,600 you just randomly pick a strategy. 363 00:20:15,600 --> 00:20:18,630 Let's do an example, because I think this 364 00:20:18,630 --> 00:20:21,510 is kind of unclear what it is. 365 00:20:21,510 --> 00:20:24,580 So let's just say we have two strategies. 366 00:20:24,580 --> 00:20:27,280 The player can pick one, or the player 367 00:20:27,280 --> 00:20:29,560 can pick two at each time, or the player can 368 00:20:29,560 --> 00:20:32,880 pick some mixture one and two. 369 00:20:32,880 --> 00:20:37,010 After a player does that, the adversary comes out and says, 370 00:20:37,010 --> 00:20:39,310 well, one of them is worth [INAUDIBLE] and one of them 371 00:20:39,310 --> 00:20:41,030 is worth 1. 372 00:20:41,030 --> 00:20:44,380 So let's just see how this works. 373 00:20:44,380 --> 00:20:47,020 So suppose at the first time step 374 00:20:47,020 --> 00:20:50,624 we picked sigma 2 because we don't have any regrets yet. 375 00:20:50,624 --> 00:20:52,790 We're just randomly picking a strategy-- [INAUDIBLE] 376 00:20:52,790 --> 00:20:54,220 sorry, sigma 1. 377 00:20:54,220 --> 00:20:56,790 We'll just randomly pick sigma 1. 378 00:20:56,790 --> 00:21:03,790 So the adversary now gives us the value of sigma 1 0 379 00:21:03,790 --> 00:21:05,240 and sigma 2 is 1. 380 00:21:05,240 --> 00:21:09,290 And you go, oh, well that means that the regret 381 00:21:09,290 --> 00:21:13,010 of the first option is 0 and the regret of the second option 382 00:21:13,010 --> 00:21:13,510 is 1. 383 00:21:13,510 --> 00:21:17,930 We're aware this first option is 0 is because we already played 384 00:21:17,930 --> 00:21:20,980 sigma 1, so you can't have any regrets, 385 00:21:20,980 --> 00:21:23,580 either positive or negative, for playing sigma 1, 386 00:21:23,580 --> 00:21:27,810 because your option was playing sigma 1, 387 00:21:27,810 --> 00:21:31,440 but you have some regret of not playing sigma 2. 388 00:21:31,440 --> 00:21:34,340 Sigma 2 was kind of the winner here. 389 00:21:34,340 --> 00:21:37,750 If the two [INAUDIBLE] reversed, we 390 00:21:37,750 --> 00:21:41,890 would have r1 equals 0 and r2 equals the negative 1. 391 00:21:41,890 --> 00:21:44,550 And then we'd become happy because all our regrets 392 00:21:44,550 --> 00:21:46,790 would be non-negative. 393 00:21:46,790 --> 00:21:50,840 So at t equals 2, because we have 394 00:21:50,840 --> 00:21:53,890 zero regret here and regret 1 here, 395 00:21:53,890 --> 00:21:58,890 we actually pick the strategy to be all sigma 2. 396 00:21:58,890 --> 00:22:03,350 Now the adversary says, OK, well the value of sigma 1 is 1, 397 00:22:03,350 --> 00:22:07,014 and the value of sigma 2 is 0 for the second time step. 398 00:22:07,014 --> 00:22:07,680 So what happens? 399 00:22:07,680 --> 00:22:11,240 Well, the same thing happens as before. 400 00:22:11,240 --> 00:22:15,164 Now we have regret of 1 on the [INAUDIBLE], 401 00:22:15,164 --> 00:22:17,330 and then regret of [INAUDIBLE] on the second option. 402 00:22:17,330 --> 00:22:19,862 So what do we do next? 403 00:22:19,862 --> 00:22:20,820 The regret [INAUDIBLE]. 404 00:22:25,090 --> 00:22:29,620 Well, flip a coin or just pick a linear even combination 405 00:22:29,620 --> 00:22:33,880 of the two strategies, half of one and half the other. 406 00:22:33,880 --> 00:22:34,925 That's what we can do. 407 00:22:34,925 --> 00:22:36,000 [INAUDIBLE] the same. 408 00:22:36,000 --> 00:22:39,620 So now the adversary says sigma 1 is 0 409 00:22:39,620 --> 00:22:45,990 and sigma 2 is 1, which means that the regret of 1 410 00:22:45,990 --> 00:22:48,740 actually goes to 0.5, and the regret of 2 411 00:22:48,740 --> 00:22:50,450 actually goes to 1.5. 412 00:22:50,450 --> 00:22:51,820 [INAUDIBLE] 413 00:22:51,820 --> 00:22:53,510 1 goes down a 1/2. 414 00:22:53,510 --> 00:22:56,750 So now with these regrets our waiting 415 00:22:56,750 --> 00:22:59,740 is kind of the ratio of the two. 416 00:22:59,740 --> 00:23:04,470 It's 1/4 sigma 1 and 3/4 sigma 2. 417 00:23:04,470 --> 00:23:07,560 So now the adversary goes, OK, well sigma is 0. 418 00:23:07,560 --> 00:23:10,070 Sigma 2 is 1. 419 00:23:10,070 --> 00:23:13,460 So this regret actually goes [INAUDIBLE] down by 3/4, 420 00:23:13,460 --> 00:23:15,360 and this goes up by a 1/4. 421 00:23:15,360 --> 00:23:17,280 And since this is negative, now we 422 00:23:17,280 --> 00:23:19,905 pick the strategy to be sigma 2. 423 00:23:19,905 --> 00:23:21,570 [INAUDIBLE] and so forth. 424 00:23:21,570 --> 00:23:23,600 Now the adversary [INAUDIBLE] for us and say, 425 00:23:23,600 --> 00:23:25,360 oh, it's really sigma 1. 426 00:23:25,360 --> 00:23:29,080 Then a regret of sigma 1 would go up to 0.75, 427 00:23:29,080 --> 00:23:30,500 and so on and so forth. 428 00:23:30,500 --> 00:23:36,090 So it seems that the adversary can make the job tough on us. 429 00:23:36,090 --> 00:23:40,390 Well actually, there is a theorem 430 00:23:40,390 --> 00:23:43,800 that says, for our example, [INAUDIBLE]. 431 00:23:43,800 --> 00:23:47,010 The square of the first regret if it's positive 432 00:23:47,010 --> 00:23:49,830 plus the square of the second regret if it's positive 433 00:23:49,830 --> 00:23:54,130 is always going to be less than or equal to t. 434 00:23:54,130 --> 00:23:57,980 And that's because, if [INAUDIBLE] these 435 00:23:57,980 --> 00:24:02,370 are both positive, it goes, for example, 436 00:24:02,370 --> 00:24:08,110 you are really going r1 plus or minus whatever amount of r2 437 00:24:08,110 --> 00:24:09,340 you're doing. 438 00:24:09,340 --> 00:24:12,820 And r2 of t now minus plus whatever amount of r1 439 00:24:12,820 --> 00:24:14,030 you're doing. 440 00:24:14,030 --> 00:24:18,330 The things that [INAUDIBLE] this you can see the cross terms 441 00:24:18,330 --> 00:24:19,270 cancel each other out. 442 00:24:19,270 --> 00:24:24,250 This becomes 2 r1 r2 divided by r1 plus rt. 443 00:24:24,250 --> 00:24:27,560 So you're left with this squared plus this squared 444 00:24:27,560 --> 00:24:29,800 plus this squared plus this squared. 445 00:24:29,800 --> 00:24:31,880 And this squared plus this squared 446 00:24:31,880 --> 00:24:33,820 is going to be less than 1, so we 447 00:24:33,820 --> 00:24:37,990 have this here, which means that the quadratic sum only 448 00:24:37,990 --> 00:24:39,860 [INAUDIBLE] by 1. 449 00:24:39,860 --> 00:24:41,120 We have this bound. 450 00:24:41,120 --> 00:24:43,150 Why is this bound so great? 451 00:24:43,150 --> 00:24:46,380 Well if the square of the regrets are less than t, 452 00:24:46,380 --> 00:24:50,550 that means the average regret is going to be [INAUDIBLE] 1 453 00:24:50,550 --> 00:24:51,820 over root t. 454 00:24:51,820 --> 00:24:54,880 In fact, it's kind of left as a homework problem. 455 00:24:54,880 --> 00:24:58,440 In a general case, our kt over t is less than n 456 00:24:58,440 --> 00:25:01,340 minus 1 delta over root t, where delta 457 00:25:01,340 --> 00:25:04,870 is the maximum deviation of the options 458 00:25:04,870 --> 00:25:08,100 and is just the number of options. 459 00:25:08,100 --> 00:25:09,251 Yeah? 460 00:25:09,251 --> 00:25:13,075 AUDIENCE: I'm curious, is [INAUDIBLE] in terms 461 00:25:13,075 --> 00:25:17,488 of what is the strategy sigma. 462 00:25:17,488 --> 00:25:18,784 Number of like a payoff? 463 00:25:18,784 --> 00:25:19,700 PROFESSOR: No, no, no. 464 00:25:19,700 --> 00:25:23,670 A strategy sigma, in terms of poker strategy, 465 00:25:23,670 --> 00:25:26,910 is sort of a description of what you would do. 466 00:25:29,630 --> 00:25:34,620 Suppose you get ace, six off suit pre-flop. 467 00:25:34,620 --> 00:25:36,860 A strategy would be a descriptor of what you would 468 00:25:36,860 --> 00:25:39,160 do at each point of the hand. 469 00:25:39,160 --> 00:25:43,820 So there's some significance in effect 470 00:25:43,820 --> 00:25:50,550 that this regret, average regret, goes to 0. 471 00:25:50,550 --> 00:25:57,000 Well, the significance in terms of game theory optimal is 472 00:25:57,000 --> 00:25:59,760 suppose a peer's strategies are-- 473 00:25:59,760 --> 00:26:03,274 suppose you have a bunch of peer strategies for x and bunch 474 00:26:03,274 --> 00:26:04,820 of peer strategies for y. 475 00:26:04,820 --> 00:26:08,480 If we regret match, but instead of doing an adversary, 476 00:26:08,480 --> 00:26:13,780 we just say t utility for x is just 477 00:26:13,780 --> 00:26:20,340 the utility for x playing against the sigma ty, 478 00:26:20,340 --> 00:26:24,500 and the utility for y is just negative utility-- 479 00:26:24,500 --> 00:26:28,670 the game utility for y playing against sigma xt. 480 00:26:28,670 --> 00:26:31,060 This is kind of a mutual regret matching. 481 00:26:31,060 --> 00:26:33,220 You do regret matching for x and y 482 00:26:33,220 --> 00:26:35,610 in each step, which means you just modify 483 00:26:35,610 --> 00:26:39,890 x-- you compute the regrets at each step. 484 00:26:39,890 --> 00:26:42,570 Then you modify x [INAUDIBLE] y strategy 485 00:26:42,570 --> 00:26:45,550 by this type of regret matching. 486 00:26:45,550 --> 00:26:51,090 And basically the strategies that you 487 00:26:51,090 --> 00:26:53,050 choose, the average strategy, which 488 00:26:53,050 --> 00:26:58,330 is the sum of the strategies you have had all along 489 00:26:58,330 --> 00:26:59,660 divided by t. 490 00:26:59,660 --> 00:27:05,070 1/t-- all the strategies you've done in these t steps. 491 00:27:05,070 --> 00:27:08,760 And basically what happens is now, 492 00:27:08,760 --> 00:27:11,700 if you try [INAUDIBLE] to exploit a [INAUDIBLE] strategy, 493 00:27:11,700 --> 00:27:16,270 again, this is the best x can do against y minus the best 494 00:27:16,270 --> 00:27:18,720 y does against x. 495 00:27:18,720 --> 00:27:22,160 You compute this, and you add the sum 496 00:27:22,160 --> 00:27:27,080 of what actually happened with x of t and y sub t, 497 00:27:27,080 --> 00:27:29,680 and so on and so forth. 498 00:27:29,680 --> 00:27:39,400 You notice that this is the regret of k-- of x picking 499 00:27:39,400 --> 00:27:41,185 strategy k all the time. 500 00:27:41,185 --> 00:27:45,140 It's just y picking strategy j all the time. 501 00:27:45,140 --> 00:27:47,770 So that's less than 2 epsilon over t 502 00:27:47,770 --> 00:27:52,220 because regrets over t converge, so it's within [INAUDIBLE] 503 00:27:52,220 --> 00:27:53,280 game theory optimal. 504 00:27:53,280 --> 00:27:58,720 Basically what this all means is basically suppose 505 00:27:58,720 --> 00:28:01,980 you choose your strategy, some mixture of stuff. 506 00:28:01,980 --> 00:28:06,100 Your opponent tries to figure out how best he 507 00:28:06,100 --> 00:28:07,870 can exploit this strategy. 508 00:28:07,870 --> 00:28:09,990 By the way, this is often called nemesis. 509 00:28:09,990 --> 00:28:12,570 I really like that name. 510 00:28:12,570 --> 00:28:17,160 Opponent figures out his nemesis strategy against you. 511 00:28:17,160 --> 00:28:22,721 Then, well, you get to see-- so his nemesis strategies-- 512 00:28:22,721 --> 00:28:24,220 unless you're playing the exact game 513 00:28:24,220 --> 00:28:26,553 theory optimal strategies-- is always going to be better 514 00:28:26,553 --> 00:28:28,100 than the game value. 515 00:28:28,100 --> 00:28:31,620 He looks at what you've done and finds the best response. 516 00:28:31,620 --> 00:28:35,850 And you do the same to him, and the difference of those two 517 00:28:35,850 --> 00:28:38,330 games kind of exploitable. 518 00:28:38,330 --> 00:28:44,270 Obviously, this means basically, if your opponent 519 00:28:44,270 --> 00:28:50,110 sees what you're doing, this is the best he can do against you. 520 00:28:50,110 --> 00:28:55,300 This number is the one that's less than 1/1000 521 00:28:55,300 --> 00:28:56,490 of a big blind. 522 00:28:56,490 --> 00:29:00,000 So counterfactual regret is kind of cool 523 00:29:00,000 --> 00:29:03,500 because-- it's a good thing I've drawn this tree. 524 00:29:03,500 --> 00:29:09,120 At each of your decision points, now you can regret match. 525 00:29:09,120 --> 00:29:12,907 So first of all, you don't need to be fed back 526 00:29:12,907 --> 00:29:14,240 the correct utility [INAUDIBLE]. 527 00:29:17,482 --> 00:29:21,560 Here in the example we gave, we had a u0 and u1. 528 00:29:21,560 --> 00:29:26,920 You'll just be fed back some unbiased stochastic number that 529 00:29:26,920 --> 00:29:29,470 averages the value of the game. 530 00:29:29,470 --> 00:29:33,080 For example, if you're doing a regret chain on poker, 531 00:29:33,080 --> 00:29:37,727 it's hard to tell if I'm up with this strategy that 532 00:29:37,727 --> 00:29:39,750 has a bunch of terabytes, and you come up 533 00:29:39,750 --> 00:29:42,780 with a strategy that's also a bunch of terrabytes-- 534 00:29:42,780 --> 00:29:45,700 what's the value of playing against y [INAUDIBLE]? 535 00:29:45,700 --> 00:29:47,707 But we can just get a sample. 536 00:29:47,707 --> 00:29:48,540 We can get a sample. 537 00:29:51,980 --> 00:29:54,170 Well you can just run it once. 538 00:29:54,170 --> 00:29:55,490 Right, that's the idea. 539 00:29:55,490 --> 00:30:00,960 You get a sample by just saying, OK, just play one hand, 540 00:30:00,960 --> 00:30:02,328 and see the result of that hand. 541 00:30:04,850 --> 00:30:08,360 And you could use either random chance or whatever 542 00:30:08,360 --> 00:30:12,490 every time you decide to do whatever branches of your tree 543 00:30:12,490 --> 00:30:13,690 if you do a mix tree. 544 00:30:13,690 --> 00:30:18,860 So the cool thing already is, without counterfactual regret, 545 00:30:18,860 --> 00:30:20,790 you can quickly converge the solution, 546 00:30:20,790 --> 00:30:27,945 because a lot of strategies, like fictitious play-- 547 00:30:27,945 --> 00:30:29,310 it's the best response. 548 00:30:29,310 --> 00:30:32,090 The best response is hard to calculate sometimes, 549 00:30:32,090 --> 00:30:38,750 but each simulation can just be one iteration through it. 550 00:30:38,750 --> 00:30:41,140 And this is counterfactual regret 551 00:30:41,140 --> 00:30:42,800 because [INAUDIBLE] is given assuming 552 00:30:42,800 --> 00:30:46,390 that the player does everything to play to that node. 553 00:30:46,390 --> 00:30:52,280 So the waiting here is nature just has its probabilities. 554 00:30:52,280 --> 00:30:56,730 If your opponent plays according to his strategy, 555 00:30:56,730 --> 00:30:58,570 but when you play you always kind 556 00:30:58,570 --> 00:31:01,370 of play towards that node, so your weight actually 557 00:31:01,370 --> 00:31:03,910 1 for each of these options you pick. 558 00:31:06,860 --> 00:31:10,820 The cool thing is that once you have the structure set up 559 00:31:10,820 --> 00:31:15,460 where you're just doing one or a few iterations 560 00:31:15,460 --> 00:31:20,590 throughout the hand, it's actually pretty easy to set up 561 00:31:20,590 --> 00:31:23,620 different weighting schemes. 562 00:31:23,620 --> 00:31:28,490 For example, if you have two options and the ace of hearts 563 00:31:28,490 --> 00:31:31,790 comes on the turn, or the deuce of clubs comes on the turn, 564 00:31:31,790 --> 00:31:34,410 and you don't really have to worry about the ace of hearts 565 00:31:34,410 --> 00:31:36,420 coming on a turn. 566 00:31:36,420 --> 00:31:37,950 That tree is fine. 567 00:31:37,950 --> 00:31:41,960 That part of the tree has very little positive regrets. 568 00:31:41,960 --> 00:31:46,040 You can say, OK, we'll just-- different game where 569 00:31:46,040 --> 00:31:49,520 the ace of hearts comes about [INAUDIBLE] 570 00:31:49,520 --> 00:31:51,110 at a time the deuce of clubs comes, 571 00:31:51,110 --> 00:31:53,770 but we're going to weight the results by 10. 572 00:31:53,770 --> 00:31:56,010 You still get the same answer. 573 00:31:56,010 --> 00:31:59,150 It's just that you get a much coarser kind of [INAUDIBLE] 574 00:31:59,150 --> 00:32:01,860 every time the ace of hearts comes, but already kind of know 575 00:32:01,860 --> 00:32:02,750 what to do with that. 576 00:32:02,750 --> 00:32:04,870 You can work on the deuce of clubs. 577 00:32:04,870 --> 00:32:06,900 So there a lot of different weightings schemes. 578 00:32:10,206 --> 00:32:12,830 This means that the hands can be kind of sampled intrinsically. 579 00:32:15,350 --> 00:32:20,760 So the final algorithm they had was factual regret plus. 580 00:32:20,760 --> 00:32:25,100 So instead of having accumulated negative regrets, 581 00:32:25,100 --> 00:32:29,900 basically a lot of these option regrets can be really negative. 582 00:32:29,900 --> 00:32:33,040 Folding aces pre-flop quickly turns 583 00:32:33,040 --> 00:32:34,855 to really negative regret. 584 00:32:37,420 --> 00:32:40,730 You lose your small blind, and hopefully 585 00:32:40,730 --> 00:32:45,120 if you play it limit hold 'em, you could win more 586 00:32:45,120 --> 00:32:46,590 than the small blind. 587 00:32:46,590 --> 00:32:50,550 So you accumulate a lot of [INAUDIBLE] so set options 588 00:32:50,550 --> 00:32:52,850 falls off the map. 589 00:32:52,850 --> 00:32:55,440 Their innovation in counter factual plus 590 00:32:55,440 --> 00:32:59,020 is to, instead of putting a big negative number 591 00:32:59,020 --> 00:33:02,040 to a lot of these things, they just floor them at 0. 592 00:33:02,040 --> 00:33:04,110 And the reason they floor at 0 is 593 00:33:04,110 --> 00:33:09,050 because you know this a simultaneous evolution 594 00:33:09,050 --> 00:33:15,630 of strategies where even strategies at the beginning 595 00:33:15,630 --> 00:33:17,550 just might not be great strategies, 596 00:33:17,550 --> 00:33:21,840 and you want to-- if regret of something is 0, 597 00:33:21,840 --> 00:33:26,620 you can route get regret faster if it's the right thing 598 00:33:26,620 --> 00:33:29,690 to do to respond to your opponent's strategy. 599 00:33:29,690 --> 00:33:32,700 All of these things-- suppose you 600 00:33:32,700 --> 00:33:35,260 start with a random initial guess for your opponent's 601 00:33:35,260 --> 00:33:36,420 strategy. 602 00:33:36,420 --> 00:33:39,540 Then you actually have a pretty reasonable strategy, 603 00:33:39,540 --> 00:33:43,210 which is bet and raise every time with every hand. 604 00:33:43,210 --> 00:33:47,270 If your opponent has a random strategy, he might just fold. 605 00:33:47,270 --> 00:33:49,660 So later in the streets, it's probably [INAUDIBLE] 606 00:33:49,660 --> 00:33:52,360 just bet and raise every time with every hand. 607 00:33:52,360 --> 00:33:53,907 He raises you back. 608 00:33:53,907 --> 00:33:55,240 It's not like he knows anything. 609 00:33:55,240 --> 00:33:56,427 It's a random strategy. 610 00:33:56,427 --> 00:33:58,010 Just raise him back and hope he folds. 611 00:33:58,010 --> 00:34:00,790 If he doesn't fold and call, you bet again an x3, 612 00:34:00,790 --> 00:34:02,890 because now the pot is bigger. 613 00:34:02,890 --> 00:34:06,060 So he has a 1/3 chance of folding. 614 00:34:06,060 --> 00:34:07,670 You should bet. 615 00:34:07,670 --> 00:34:11,719 So that evolves quick. 616 00:34:11,719 --> 00:34:16,230 If you start off with a random tree with no information, 617 00:34:16,230 --> 00:34:21,580 that starts off as the dominant strategy. 618 00:34:21,580 --> 00:34:25,429 And then you have to walk that back as your opponent's 619 00:34:25,429 --> 00:34:27,080 strategy evolves also. 620 00:34:27,080 --> 00:34:30,000 By the way, they're actually keeping 621 00:34:30,000 --> 00:34:33,630 two trees-- one for the small blind strategy, 622 00:34:33,630 --> 00:34:35,159 and one for the big blind strategy. 623 00:34:35,159 --> 00:34:38,350 And this is everything with respect to the small blind. 624 00:34:38,350 --> 00:34:43,090 The small blind isn't-- so let's just go into the next slide 625 00:34:43,090 --> 00:34:44,112 probably. 626 00:34:44,112 --> 00:34:45,070 [INAUDIBLE] have to be. 627 00:34:49,040 --> 00:34:54,139 So let's try to figure out how big the strategy space in limit 628 00:34:54,139 --> 00:34:56,770 hold 'em has to be. 629 00:34:56,770 --> 00:34:59,180 So let's concentrate on river nodes 630 00:34:59,180 --> 00:35:03,470 because that's most of the nodes. 631 00:35:03,470 --> 00:35:08,380 It's a tree so we just have to calculate the leaves. 632 00:35:08,380 --> 00:35:13,050 So first of all, assuming a four bet cap-- 633 00:35:13,050 --> 00:35:15,860 the reason we assume a four bet cap-- well, I don't know why, 634 00:35:15,860 --> 00:35:21,870 but it seems that that's-- so this is one approximation, 635 00:35:21,870 --> 00:35:26,280 the four bet cap, but this is kind of normal in types 636 00:35:26,280 --> 00:35:27,325 of research papers. 637 00:35:29,950 --> 00:35:33,370 if we have a four bet, there are nine possible actions that 638 00:35:33,370 --> 00:35:34,710 get you to the next street. 639 00:35:34,710 --> 00:35:38,910 There are some actions that [INAUDIBLE] like player one 640 00:35:38,910 --> 00:35:43,230 bets and player two folds, but if you don't get to the street, 641 00:35:43,230 --> 00:35:47,580 you don't get to the river, and that's 642 00:35:47,580 --> 00:35:50,996 a pretty small percentage of the nodes. 643 00:35:50,996 --> 00:35:52,620 So why are there nine possible actions? 644 00:35:55,680 --> 00:35:56,720 Let's count them. 645 00:35:56,720 --> 00:35:59,950 One of the actions that gets to the next street is check check. 646 00:35:59,950 --> 00:36:01,630 So that's one. 647 00:36:01,630 --> 00:36:02,529 What are the eight? 648 00:36:02,529 --> 00:36:03,570 What are the other eight? 649 00:36:06,824 --> 00:36:07,740 AUDIENCE: [INAUDIBLE]. 650 00:36:10,660 --> 00:36:12,350 PROFESSOR: Right, check raise. 651 00:36:12,350 --> 00:36:15,010 Let's try systemic [INAUDIBLE] count them. 652 00:36:15,010 --> 00:36:20,590 So I claim that there are two ways-- one bet in the pot. 653 00:36:20,590 --> 00:36:22,643 Player one can bet, and player two can call, 654 00:36:22,643 --> 00:36:24,476 or player one can check, player two can bet, 655 00:36:24,476 --> 00:36:26,420 and player one can call. 656 00:36:26,420 --> 00:36:29,970 In fact, there are two ways to put k bets in the pot 657 00:36:29,970 --> 00:36:33,630 and k is greater than 0. 658 00:36:33,630 --> 00:36:37,030 If you want to put three bets in a pot, what are the two ways? 659 00:36:39,736 --> 00:36:41,090 AUDIENCE: [INAUDIBLE]. 660 00:36:41,090 --> 00:36:42,790 PROFESSOR: Right. 661 00:36:42,790 --> 00:36:43,690 Yeah, right. 662 00:36:43,690 --> 00:36:46,860 Bet, raise, re-raise, call, and check, bet, raise, re-raise, 663 00:36:46,860 --> 00:36:47,640 call. 664 00:36:47,640 --> 00:36:54,140 So if the cap is k bets, there's always 2k plus 1 ways 665 00:36:54,140 --> 00:36:55,120 to [INAUDIBLE] three. 666 00:36:55,120 --> 00:36:57,880 So there are nine possible actions in each betting 667 00:36:57,880 --> 00:36:59,430 round before the river. 668 00:36:59,430 --> 00:37:01,790 So there are three betting rounds-- pre-flop, flop, 669 00:37:01,790 --> 00:37:03,070 and turn. 670 00:37:03,070 --> 00:37:07,990 So let's use some symmetries because I 671 00:37:07,990 --> 00:37:10,490 don't think the optimal strategy has you playing something 672 00:37:10,490 --> 00:37:14,500 differently with ace, six of diamonds, ace, six of heart. 673 00:37:14,500 --> 00:37:15,990 [INAUDIBLE] very easy to prove. 674 00:37:15,990 --> 00:37:19,530 The optimal strategy doesn't have that. 675 00:37:19,530 --> 00:37:24,830 So using symmetries on a flop-- so how many distinct flops are 676 00:37:24,830 --> 00:37:26,760 there? 677 00:37:26,760 --> 00:37:31,130 Well, I like to think about it as where 678 00:37:31,130 --> 00:37:32,890 the suits have symmetries. 679 00:37:32,890 --> 00:37:35,800 I like to think about it as, well, there 680 00:37:35,800 --> 00:37:38,490 could be three suits in a flop, two suits in a flop, 681 00:37:38,490 --> 00:37:40,070 or one suit. 682 00:37:40,070 --> 00:37:44,980 So if there's one suit on a flop, there's 13 [INAUDIBLE] 683 00:37:44,980 --> 00:37:46,320 combinations. 684 00:37:46,320 --> 00:37:48,470 That's pretty straightforward. 685 00:37:48,470 --> 00:37:55,180 If there are two suits on a flop, what's the combinations? 686 00:37:55,180 --> 00:37:57,820 There are 13 possibilities for one of the suits, 687 00:37:57,820 --> 00:38:01,920 and there are 13 [INAUDIBLE] for the other suit. 688 00:38:01,920 --> 00:38:03,970 It's based on heart or something like that. 689 00:38:03,970 --> 00:38:05,390 The suits are symmetric. 690 00:38:05,390 --> 00:38:09,020 So there are 1014 things [INAUDIBLE]. 691 00:38:09,020 --> 00:38:10,890 This is [INAUDIBLE] the things. 692 00:38:10,890 --> 00:38:15,090 And if it's three suited, you just choose three ranks, 693 00:38:15,090 --> 00:38:16,400 but it's not 13 choose 2. 694 00:38:16,400 --> 00:38:18,890 It's 15 choose 2 because why? 695 00:38:22,800 --> 00:38:24,310 I guess the ranks can be equal. 696 00:38:28,030 --> 00:38:31,210 So it would 13 choose 2 if the ranks would be unique, 697 00:38:31,210 --> 00:38:32,975 but you'd have three aces on him. 698 00:38:32,975 --> 00:38:35,540 So this is actually 15 choose 2. 699 00:38:35,540 --> 00:38:41,410 So there's 455 three suited flops, [INAUDIBLE] flops. 700 00:38:41,410 --> 00:38:43,230 That's kind of the big explosion in limit 701 00:38:43,230 --> 00:38:46,120 hold 'em, pre-flop to flop. 702 00:38:46,120 --> 00:38:48,190 So there is not [INAUDIBLE] possible actions 703 00:38:48,190 --> 00:38:49,300 in each betting round. 704 00:38:49,300 --> 00:38:51,670 So let's count the number of turns and rivers. 705 00:38:51,670 --> 00:38:54,520 There's [INAUDIBLE] turns and 48 rivers. 706 00:38:54,520 --> 00:38:58,820 So counting that, you have a billion possible action 707 00:38:58,820 --> 00:39:01,360 sequences to the river. 708 00:39:01,360 --> 00:39:05,125 The [INAUDIBLE] things in each street, all the flops, 709 00:39:05,125 --> 00:39:08,040 then the turns and rivers. 710 00:39:08,040 --> 00:39:14,110 But each river, there could be up to 126 [INAUDIBLE] types. 711 00:39:14,110 --> 00:39:17,020 47 times 46. 712 00:39:17,020 --> 00:39:21,450 Making about 6.5 trillion hand river types. 713 00:39:21,450 --> 00:39:24,440 Each node should be visited about 1,000 times. 714 00:39:24,440 --> 00:39:26,220 It's a big computational problem, 715 00:39:26,220 --> 00:39:28,100 but it still tractable, especially 716 00:39:28,100 --> 00:39:30,070 if you have 900 years of CPU. 717 00:39:34,810 --> 00:39:37,030 And they also used many shortcuts. 718 00:39:37,030 --> 00:39:39,180 They use all the symmetries I talk about, 719 00:39:39,180 --> 00:39:42,530 and they also have a few shortcuts. 720 00:39:42,530 --> 00:39:46,340 And you can see these trees are big. 721 00:39:46,340 --> 00:39:50,200 Terabytes of memory to actually store your strategy. 722 00:39:50,200 --> 00:39:54,780 So you can't really get that on a node yet. 723 00:39:54,780 --> 00:39:55,580 I don't know. 724 00:39:55,580 --> 00:39:56,910 Can you fit that on a node now? 725 00:39:56,910 --> 00:39:58,460 Does anybody know? 726 00:39:58,460 --> 00:40:03,340 I don't know of a CPU that has [INAUDIBLE] bytes of RAM yet. 727 00:40:08,220 --> 00:40:11,690 What they did was they broke the problem up into about 100 728 00:40:11,690 --> 00:40:14,170 [INAUDIBLE] different sub-games, and they just 729 00:40:14,170 --> 00:40:15,960 worked on those sub-games. 730 00:40:15,960 --> 00:40:18,380 In fact, I guess if you're clever about it, 731 00:40:18,380 --> 00:40:23,590 you can use cache memory when you get down to the river. 732 00:40:23,590 --> 00:40:27,970 Things are pretty close, and you know that using cache memory 733 00:40:27,970 --> 00:40:30,700 is faster than using [INAUDIBLE] memory. 734 00:40:30,700 --> 00:40:32,980 You can take advantage of these things. 735 00:40:32,980 --> 00:40:36,460 A lot of these updates through these regrets 736 00:40:36,460 --> 00:40:40,180 are just simple addition, and you can just 737 00:40:40,180 --> 00:40:44,750 optimize the heck out of this, and I'm sure they did it. 738 00:40:44,750 --> 00:40:49,530 Let's just try to solve some other games. 739 00:40:49,530 --> 00:40:54,140 I have two games that seem accessible. 740 00:40:54,140 --> 00:40:57,500 Suppose we do Omaha eight. 741 00:40:57,500 --> 00:41:02,460 Well, this is exactly the same structure as limit hold 'em. 742 00:41:02,460 --> 00:41:06,750 You just change the hole cards. 743 00:41:06,750 --> 00:41:13,370 So instead of having 47 choose 2 different river hands, 744 00:41:13,370 --> 00:41:15,090 you have 47 choose 4. 745 00:41:15,090 --> 00:41:22,010 That's like a multiple 82.5 x to the original tree, 746 00:41:22,010 --> 00:41:24,410 so that's not that bad. 747 00:41:24,410 --> 00:41:29,420 900 CPU hours-- this is just 75,000 CPU hours. 748 00:41:29,420 --> 00:41:34,050 If it were a matter of national security 749 00:41:34,050 --> 00:41:35,680 to get the exact solution to Omaha, 750 00:41:35,680 --> 00:41:39,150 the military could just do it in a few months. 751 00:41:39,150 --> 00:41:42,110 There's also [INAUDIBLE] you can do, by the way. 752 00:41:42,110 --> 00:41:46,270 Basically, what they did is-- before they did this, 753 00:41:46,270 --> 00:41:49,030 was that they solved the sub-game. 754 00:41:49,030 --> 00:41:52,830 In that, basically if you both get hands together 755 00:41:52,830 --> 00:41:55,970 and you say you have to play these hands the same way, 756 00:41:55,970 --> 00:42:04,340 that's basically a sub-strategy. 757 00:42:04,340 --> 00:42:07,160 You can consider subspace of your strategy 758 00:42:07,160 --> 00:42:10,090 x prime of x and y prime of y, and you just 759 00:42:10,090 --> 00:42:14,530 solved the x prime y prime game, meaning you both get hands 760 00:42:14,530 --> 00:42:18,150 together, probably on the river because that's when bucketing 761 00:42:18,150 --> 00:42:20,820 kind of becomes more necessary. 762 00:42:20,820 --> 00:42:24,310 And you solve that game, and you go, well, 763 00:42:24,310 --> 00:42:28,250 how optimal is x prime in the hold game? 764 00:42:28,250 --> 00:42:34,180 And if you're good at bucketing, it may be pretty close. 765 00:42:34,180 --> 00:42:36,400 If you're bad at bucketing, like you 766 00:42:36,400 --> 00:42:41,870 put the aces in the same bucket as seven, five suited, 767 00:42:41,870 --> 00:42:44,240 you probably won't get a great answer. 768 00:42:44,240 --> 00:42:51,010 So you need to intelligently design your buckets. 769 00:42:51,010 --> 00:42:55,530 You can't-- well, I guess there are also evolutionary things 770 00:42:55,530 --> 00:42:59,100 you can do to try to design buckets and see what things are 771 00:42:59,100 --> 00:43:01,780 close to each other. 772 00:43:01,780 --> 00:43:03,882 People who have familiarity with this 773 00:43:03,882 --> 00:43:07,290 know that this is kind of hit or miss. 774 00:43:07,290 --> 00:43:13,440 Another game that you can maybe solve is razz. 775 00:43:13,440 --> 00:43:15,840 It's definitely as simple as [INAUDIBLE] a stud. 776 00:43:15,840 --> 00:43:18,140 Why is razz simpler than all other games of stud? 777 00:43:18,140 --> 00:43:20,600 There are only 13 different cards. 778 00:43:20,600 --> 00:43:23,100 The deuce of spades is the same card as the deuce of hearts. 779 00:43:23,100 --> 00:43:25,360 You can't-- well, you could make flushes, 780 00:43:25,360 --> 00:43:29,040 but they're irrelevant. 781 00:43:29,040 --> 00:43:33,100 So unfortunately there are 13 to the 8th power 782 00:43:33,100 --> 00:43:36,030 possible ways of cards can come, because there 783 00:43:36,030 --> 00:43:38,570 are four up cards. 784 00:43:38,570 --> 00:43:41,080 That's sort of the problem. 785 00:43:41,080 --> 00:43:42,830 Kind of the community information 786 00:43:42,830 --> 00:43:47,200 you have is a bigger set, and your trees just 787 00:43:47,200 --> 00:43:52,810 get bigger because now you have one extra street. 788 00:43:52,810 --> 00:43:58,540 And you still have 415 choose 3 combinations of any three ranks 789 00:43:58,540 --> 00:43:59,800 as river hand types. 790 00:43:59,800 --> 00:44:03,740 So There are 2.4 quadrillion river hands. 791 00:44:03,740 --> 00:44:07,280 So that's a factor of 374 [INAUDIBLE], 792 00:44:07,280 --> 00:44:12,730 but we think some of these roads are pretty null. 793 00:44:12,730 --> 00:44:16,620 How many of you actually play razz? 794 00:44:16,620 --> 00:44:17,310 A couple of you. 795 00:44:17,310 --> 00:44:18,285 OK, great. 796 00:44:18,285 --> 00:44:22,740 Good poker class that people study razz. 797 00:44:22,740 --> 00:44:25,850 If you have a queen up and a deuce completes it, 798 00:44:25,850 --> 00:44:28,080 you're not really going to get into a raising war 799 00:44:28,080 --> 00:44:34,310 and make it [INAUDIBLE] cap on third street. 800 00:44:34,310 --> 00:44:36,460 Some of the [INAUDIBLE] may be null. 801 00:44:36,460 --> 00:44:40,280 You can do some bucketing, perhaps. 802 00:44:40,280 --> 00:44:44,090 Razz is kind of more natural to bucketing 803 00:44:44,090 --> 00:44:50,690 because you can think about what hands to bucket together. 804 00:44:50,690 --> 00:44:54,250 Maybe the king, eight, six, deuce 805 00:44:54,250 --> 00:44:57,380 is very close to the king, eight, six, ace. 806 00:44:57,380 --> 00:45:02,680 And the two strategies-- and you can start in hands 807 00:45:02,680 --> 00:45:06,380 by rank order of cards or something like that. 808 00:45:06,380 --> 00:45:09,400 So this is 374. 809 00:45:09,400 --> 00:45:11,290 This is 82.5. 810 00:45:11,290 --> 00:45:17,230 Or you could apply for a grant and say 811 00:45:17,230 --> 00:45:22,190 we need x hours of CPU time. 812 00:45:22,190 --> 00:45:24,250 I don't know what the right strategy is, 813 00:45:24,250 --> 00:45:28,560 but these two problems are tractable. 814 00:45:28,560 --> 00:45:30,960 Let's talk about big bet games because there's 815 00:45:30,960 --> 00:45:42,530 been some sort of discussion, even last night, about Snowie. 816 00:45:42,530 --> 00:45:48,480 A few people have tried big bet games, and they're problems. 817 00:45:48,480 --> 00:45:53,150 First of all, there's a continuum of bet sizes 818 00:45:53,150 --> 00:45:53,820 you can make. 819 00:45:53,820 --> 00:45:57,210 The Snowie solution just assumes three bet sizes. 820 00:45:57,210 --> 00:46:00,470 I can bet half the pot, I can bet the pot, or I can jam, 821 00:46:00,470 --> 00:46:01,530 I think. 822 00:46:01,530 --> 00:46:03,980 Maybe there's-- I can bet two times the pot, 823 00:46:03,980 --> 00:46:07,680 but the problem with that is that I think that's a little 824 00:46:07,680 --> 00:46:08,550 bit too coarse. 825 00:46:08,550 --> 00:46:10,860 The question is, if you solved that game, 826 00:46:10,860 --> 00:46:15,370 how close is that solution to the real game? 827 00:46:15,370 --> 00:46:18,612 And that's kind of an interesting question, 828 00:46:18,612 --> 00:46:20,445 but you don't even have a complete strategy. 829 00:46:23,360 --> 00:46:26,200 What if some guy bets a quarter of the pot, 830 00:46:26,200 --> 00:46:30,930 or 1.5 times the pot, something that' not on your list? 831 00:46:30,930 --> 00:46:35,970 You have to exploit-- and then it gets kind of weird, 832 00:46:35,970 --> 00:46:39,500 because my response to a pot size bet 833 00:46:39,500 --> 00:46:41,145 is to raise the pot again. 834 00:46:41,145 --> 00:46:44,890 All right, what if he makes a 1.1 times the pot? 835 00:46:44,890 --> 00:46:51,560 Is it right to raise the pot-- just raise the pot 1.1 times 836 00:46:51,560 --> 00:46:55,630 or raise the pot 0.9 times so you get back to the same stack 837 00:46:55,630 --> 00:46:59,675 sizes so you can do the same thing in the future. 838 00:46:59,675 --> 00:47:00,925 These are difficult questions. 839 00:47:05,690 --> 00:47:08,360 Even if some bets [INAUDIBLE] are non-optimal, 840 00:47:08,360 --> 00:47:11,900 our full strategy needs responses to to the bets. 841 00:47:11,900 --> 00:47:14,510 So simple approximations may work. 842 00:47:14,510 --> 00:47:19,770 I kind of feel this is kind of a tough problem, though. 843 00:47:19,770 --> 00:47:24,730 And you could just-- just playing a game 844 00:47:24,730 --> 00:47:28,820 where you can just make rigid pot bet sizes, 845 00:47:28,820 --> 00:47:31,990 then you might get something actually interesting. 846 00:47:31,990 --> 00:47:35,120 But one of the things with regret matching, 847 00:47:35,120 --> 00:47:38,560 if you actually have a lot of bet sizes, suppose you say, 848 00:47:38,560 --> 00:47:40,580 OK, I'm just going to kill this problem, 849 00:47:40,580 --> 00:47:47,020 and I'm going to do 0.01 times the pot, 0.02 times the pot, 850 00:47:47,020 --> 00:47:49,230 0.03 times the pot, and so on and so forth. 851 00:47:49,230 --> 00:47:51,990 The problems is now you have a lot of options which are really 852 00:47:51,990 --> 00:47:55,500 close in equity together, so this regret minimization 853 00:47:55,500 --> 00:47:57,330 is going to take a while. 854 00:47:57,330 --> 00:48:02,100 It's going to have to sort out really close events. 855 00:48:02,100 --> 00:48:04,500 And then it's going to have to balance your value 856 00:48:04,500 --> 00:48:06,610 bets with your bluff and things like that. 857 00:48:06,610 --> 00:48:10,590 So even just trying to kill it by putting a lot of bet types 858 00:48:10,590 --> 00:48:14,870 may not solve the problem for you. 859 00:48:19,630 --> 00:48:22,490 So two player, three player games 860 00:48:22,490 --> 00:48:26,010 are actually kind of interesting. 861 00:48:26,010 --> 00:48:29,130 The dress by the group and using counterfactual 862 00:48:29,130 --> 00:48:33,190 regret to create competitive multi-player agents. 863 00:48:33,190 --> 00:48:42,320 And this is a paper done in 2011 or so. 864 00:48:42,320 --> 00:48:44,360 And the program for actually first and second 865 00:48:44,360 --> 00:48:47,760 in annual three player limit event-- 866 00:48:47,760 --> 00:48:50,930 the first problem is that there's no guarantee of epsilon 867 00:48:50,930 --> 00:48:53,030 convergence. 868 00:48:53,030 --> 00:48:56,140 You're not necessarily within epsilon of Nash equilibrium. 869 00:48:56,140 --> 00:49:01,450 Second problem is that, do you just 870 00:49:01,450 --> 00:49:03,160 want to play in Nash equilibrium? 871 00:49:03,160 --> 00:49:06,340 There could be multiple Nash equilibria in multi-way games, 872 00:49:06,340 --> 00:49:12,840 especially in these proportional payout tournaments, satellites 873 00:49:12,840 --> 00:49:16,110 where, say, two people get a seat. 874 00:49:16,110 --> 00:49:18,450 There are really nonlinear effects going, 875 00:49:18,450 --> 00:49:20,940 and it could [INAUDIBLE] which collusive equilibria are 876 00:49:20,940 --> 00:49:21,490 you playing? 877 00:49:25,331 --> 00:49:28,510 In our book, Jared and I point out 878 00:49:28,510 --> 00:49:31,910 a game called the rock maniac game 879 00:49:31,910 --> 00:49:33,820 where it's a real poker game where 880 00:49:33,820 --> 00:49:39,170 players can use a simple strategy and ensure you losing. 881 00:49:39,170 --> 00:49:41,920 A simple version non-poker version, 882 00:49:41,920 --> 00:49:45,750 like a game where you play even or odds with three players, 883 00:49:45,750 --> 00:49:49,200 but the odd man out wins. 884 00:49:49,200 --> 00:49:53,220 So suppose you and I are colluding 885 00:49:53,220 --> 00:49:54,810 against the third chump. 886 00:49:54,810 --> 00:49:56,940 What would we do? 887 00:49:56,940 --> 00:49:57,900 AUDIENCE: [INAUDIBLE]. 888 00:49:57,900 --> 00:50:00,990 PROFESSOR: Right, I would play one, and you'd play two. 889 00:50:00,990 --> 00:50:04,620 And the third guy could never win. 890 00:50:04,620 --> 00:50:10,360 There are situations which can come up in poker like that, 891 00:50:10,360 --> 00:50:18,990 but I think if there's no collusion 892 00:50:18,990 --> 00:50:22,660 and it's not a tournament, playing Nash equilibria usually 893 00:50:22,660 --> 00:50:24,190 turns out OK. 894 00:50:24,190 --> 00:50:28,850 I think that's sort of the argument they were making 895 00:50:28,850 --> 00:50:30,680 in creating these strategies. 896 00:50:30,680 --> 00:50:32,570 All right, here are the references. 897 00:50:35,870 --> 00:50:41,065 This took about [INAUDIBLE] the time I estimated, so questions? 898 00:50:43,620 --> 00:50:47,660 OK, let's just-- you hand your hand up first. 899 00:50:47,660 --> 00:50:49,975 AUDIENCE: Well, the original strategy 900 00:50:49,975 --> 00:50:52,360 finds that the Nash equilibria, if you're 901 00:50:52,360 --> 00:50:56,116 playing against someone who's trying to beat [INAUDIBLE] 902 00:50:56,116 --> 00:51:01,845 strategy-- does it work if one of the strategies 903 00:51:01,845 --> 00:51:03,825 is probabilistic. 904 00:51:03,825 --> 00:51:05,674 Let's say two strategy trees-- 905 00:51:05,674 --> 00:51:06,840 PROFESSOR: Yeah, yeah, yeah. 906 00:51:06,840 --> 00:51:07,803 It does work with-- 907 00:51:07,803 --> 00:51:09,344 AUDIENCE: Choose [INAUDIBLE], but you 908 00:51:09,344 --> 00:51:11,506 don't know always which one I'll choose. 909 00:51:11,506 --> 00:51:14,610 PROFESSOR: Yeah, it works because you're 910 00:51:14,610 --> 00:51:18,720 going to play-- all of these strategies 911 00:51:18,720 --> 00:51:23,050 assume that they could be mixed strategies. 912 00:51:23,050 --> 00:51:26,600 If you're not allowed to play 1/3 rock, 1/3 paper, and 1/3 913 00:51:26,600 --> 00:51:28,280 scissors, then you're going to have 914 00:51:28,280 --> 00:51:32,170 to play really bad strategy, and there's definitely 915 00:51:32,170 --> 00:51:35,780 times in which mixing is going to be necessary. 916 00:51:35,780 --> 00:51:36,540 So, yeah. 917 00:51:36,540 --> 00:51:38,815 All of these strategies have mixing. 918 00:51:41,660 --> 00:51:42,160 Yeah? 919 00:51:42,160 --> 00:51:43,618 AUDIENCE: What effects do you think 920 00:51:43,618 --> 00:51:48,502 [INAUDIBLE] going to have on limit hold 'em games? 921 00:51:48,502 --> 00:51:49,460 PROFESSOR: I don't now. 922 00:51:49,460 --> 00:51:55,660 I think pretty much before the solution came out 923 00:51:55,660 --> 00:51:59,440 the big online players kind of knew that a lot of people 924 00:51:59,440 --> 00:52:04,040 were playing near optimal, and I think the game is kind of dead. 925 00:52:04,040 --> 00:52:06,254 What do you think, Mike? 926 00:52:06,254 --> 00:52:08,210 AUDIENCE: [INAUDIBLE]. 927 00:52:08,210 --> 00:52:10,910 PROFESSOR: Right. 928 00:52:10,910 --> 00:52:13,010 Too bad Matt doesn't come here. 929 00:52:13,010 --> 00:52:13,980 AUDIENCE: [INAUDIBLE] are already basically doing this 930 00:52:13,980 --> 00:52:14,600 anyway. 931 00:52:14,600 --> 00:52:15,660 PROFESSOR: Well, no. 932 00:52:15,660 --> 00:52:18,940 I mean, even if you have the strategy, you have to learn it. 933 00:52:25,880 --> 00:52:28,880 The problem is that, if you go to a casino and you play 934 00:52:28,880 --> 00:52:33,110 somebody who's a good limit hold 'em player, 935 00:52:33,110 --> 00:52:36,730 he's-- because these types of strategies have been out 936 00:52:36,730 --> 00:52:42,650 for a while, they already played much closer to optimal than 937 00:52:42,650 --> 00:52:44,280 they did before. 938 00:52:44,280 --> 00:52:49,500 So I think this would have absolutely no effect on heads 939 00:52:49,500 --> 00:52:51,670 up limit hold 'em. 940 00:52:51,670 --> 00:52:54,379 It's already kind of no one-- yes? 941 00:52:54,379 --> 00:52:56,670 AUDIENCE: So can you talk more about different ways you 942 00:52:56,670 --> 00:52:58,086 can do approximations. [INAUDIBLE] 943 00:52:58,086 --> 00:53:01,830 mentioning earlier bucketing all of the different hands 944 00:53:01,830 --> 00:53:05,090 [INAUDIBLE] the ranks or what are 945 00:53:05,090 --> 00:53:06,870 some other things we can do? 946 00:53:06,870 --> 00:53:09,710 PROFESSOR: It's an endless [INAUDIBLE] 947 00:53:09,710 --> 00:53:12,770 be clever in bucketing. 948 00:53:12,770 --> 00:53:14,570 So bucket hand types together. 949 00:53:17,360 --> 00:53:20,370 One kind of clever thing you can do 950 00:53:20,370 --> 00:53:23,430 is try to cut out the river entirely 951 00:53:23,430 --> 00:53:28,020 by just estimating your equity on the river. 952 00:53:28,020 --> 00:53:30,270 Of course, that's not going to be your showdown equity 953 00:53:30,270 --> 00:53:35,750 because you may be forced to face of a bet. 954 00:53:35,750 --> 00:53:42,890 So you try some sort of implied value of your hand. 955 00:53:46,200 --> 00:53:46,700 Let's see. 956 00:53:46,700 --> 00:53:48,710 What other bucketing things. 957 00:53:52,480 --> 00:53:57,340 I mean, in some games there's a sort of a natural way 958 00:53:57,340 --> 00:54:00,000 of bucketing hand types. 959 00:54:00,000 --> 00:54:06,570 Like In the river on Omaha, you could just 960 00:54:06,570 --> 00:54:10,520 try to bucket the cards that actually play and ignore 961 00:54:10,520 --> 00:54:12,030 the other cards. 962 00:54:12,030 --> 00:54:14,880 The thing is that, when you do things like that, [INAUDIBLE] 963 00:54:14,880 --> 00:54:18,526 losing assisting, we call it card removal. 964 00:54:18,526 --> 00:54:21,610 Card removal and blocking players 965 00:54:21,610 --> 00:54:23,810 from having the nuts and things like that 966 00:54:23,810 --> 00:54:26,810 are pretty important-- do turn out 967 00:54:26,810 --> 00:54:30,450 to be a pretty important part of the game theory 968 00:54:30,450 --> 00:54:33,000 optimal solution when you're getting down 969 00:54:33,000 --> 00:54:36,830 to the milli big blind kind of level. 970 00:54:36,830 --> 00:54:42,300 And if you don't think about card removal at all, 971 00:54:42,300 --> 00:54:46,600 then you actually have a strategy that 972 00:54:46,600 --> 00:54:48,300 can be exploited pretty easily. 973 00:54:48,300 --> 00:54:51,770 Actually, I talked about this yesterday. 974 00:54:51,770 --> 00:54:59,630 The thing is typically when the pot is p 975 00:54:59,630 --> 00:55:02,280 and you're facing a bet, you want 976 00:55:02,280 --> 00:55:04,590 to make them indifferent to bluffing. 977 00:55:04,590 --> 00:55:13,670 He's betting 1 to win p, so you want to call about pr over p 978 00:55:13,670 --> 00:55:15,070 plus 1 at the time. 979 00:55:15,070 --> 00:55:17,340 If you don't call this much, he's 980 00:55:17,340 --> 00:55:20,110 going to bluff and take it. 981 00:55:20,110 --> 00:55:22,890 So that's sort of the thing. 982 00:55:22,890 --> 00:55:27,520 We're saying the bet is 1 and the pot is p. 983 00:55:27,520 --> 00:55:30,980 So if the pot is 10, and he bets 1, 984 00:55:30,980 --> 00:55:33,900 and he takes it more than 1/11 at the time, 985 00:55:33,900 --> 00:55:38,990 he's going to just-- [INAUDIBLE] bluff everything. 986 00:55:38,990 --> 00:55:42,750 The real problem becomes that, if you 987 00:55:42,750 --> 00:55:45,740 don't think about card removal at all, 988 00:55:45,740 --> 00:55:49,540 he can start bluffing hands in which he knows 989 00:55:49,540 --> 00:55:55,960 it's more likely you have a mediocre hand or something that 990 00:55:55,960 --> 00:55:57,970 includes a strong hand. 991 00:55:57,970 --> 00:56:01,920 One real example is in PLO when there 992 00:56:01,920 --> 00:56:07,107 is a flush on the board, what's a good bluff? 993 00:56:07,107 --> 00:56:08,690 AUDIENCE: You have ace of [INAUDIBLE]. 994 00:56:08,690 --> 00:56:10,642 PROFESSOR: Right, you have the ace in a suit. 995 00:56:10,642 --> 00:56:11,850 You don't have anything else. 996 00:56:11,850 --> 00:56:13,980 That's a great bluff, because you're blocking him 997 00:56:13,980 --> 00:56:17,960 from having a great hand, and you're 998 00:56:17,960 --> 00:56:22,260 blocking all of his not hands and a lot 999 00:56:22,260 --> 00:56:23,900 of his really good hands. 1000 00:56:23,900 --> 00:56:28,350 And he's much more likely to fold, 1001 00:56:28,350 --> 00:56:33,490 because if you bet the pot, a lot of his hands 1002 00:56:33,490 --> 00:56:38,050 he's [INAUDIBLE] himself with [INAUDIBLE] with the nut flush. 1003 00:56:38,050 --> 00:56:39,570 Oh, I have a natural call. 1004 00:56:39,570 --> 00:56:40,270 Are you all in? 1005 00:56:40,270 --> 00:56:40,936 I have the nuts? 1006 00:56:40,936 --> 00:56:42,850 OK, I call. 1007 00:56:42,850 --> 00:56:47,277 So that's why card removal is important. 1008 00:56:50,690 --> 00:56:51,956 Yeah? 1009 00:56:51,956 --> 00:56:53,928 AUDIENCE: So is my understanding correct 1010 00:56:53,928 --> 00:56:56,886 that optimal [INAUDIBLE]? 1011 00:56:59,796 --> 00:57:00,420 PROFESSOR: Yes. 1012 00:57:00,420 --> 00:57:04,758 AUDIENCE: And has there been any study of optimal [INAUDIBLE]. 1013 00:57:08,720 --> 00:57:10,345 PROFESSOR: Sort of like utility theory. 1014 00:57:13,700 --> 00:57:20,760 In poker in general, it's kind of weird. 1015 00:57:20,760 --> 00:57:23,750 People think a lot about that [INAUDIBLE] 1016 00:57:23,750 --> 00:57:26,070 what tournament they should enter, 1017 00:57:26,070 --> 00:57:28,050 what games they should play. 1018 00:57:28,050 --> 00:57:31,810 But there hasn't been a study really 1019 00:57:31,810 --> 00:57:35,160 optimizing your own personal utility within the games. 1020 00:57:35,160 --> 00:57:38,250 The assumption is kind of like, well, I'm 1021 00:57:38,250 --> 00:57:41,075 going to use all this cool utilities theory [INAUDIBLE] 1022 00:57:41,075 --> 00:57:42,575 to figure out what game I'm playing. 1023 00:57:42,575 --> 00:57:44,074 As long as I'm playing the game, I'm 1024 00:57:44,074 --> 00:57:47,440 just going to try to win the most money. 1025 00:57:47,440 --> 00:57:49,870 That's sort of been the attitude, 1026 00:57:49,870 --> 00:57:54,440 and I think that's actually correct for most [INAUDIBLE]. 1027 00:57:54,440 --> 00:57:57,060 In limit hold 'em, [INAUDIBLE] you 1028 00:57:57,060 --> 00:58:00,460 need bank rolls of hundreds of bets. 1029 00:58:00,460 --> 00:58:02,740 You're not going to try to optimize and try 1030 00:58:02,740 --> 00:58:09,269 to win some fraction of a bet with your utility function 1031 00:58:09,269 --> 00:58:10,310 by lowering the variance. 1032 00:58:15,520 --> 00:58:20,910 That is an interesting question, because maybe-- I 1033 00:58:20,910 --> 00:58:25,460 feel that, if there is some utility consideration-- 1034 00:58:25,460 --> 00:58:28,020 like maybe in a tournament you feel your chips are 1035 00:58:28,020 --> 00:58:32,150 non-linear-- maybe you are going to quit 1036 00:58:32,150 --> 00:58:35,850 playing your marginal hands because of utility 1037 00:58:35,850 --> 00:58:38,240 considerations. 1038 00:58:38,240 --> 00:58:41,200 AUDIENCE: [INAUDIBLE] like the fountain table of major events. 1039 00:58:41,200 --> 00:58:44,215 They'll go beyond ICM to say maybe I won't coin 1040 00:58:44,215 --> 00:58:48,215 flip for a $10 edge [INAUDIBLE] step up. 1041 00:58:51,590 --> 00:58:54,190 PROFESSOR: I mean, if you use ICM, 1042 00:58:54,190 --> 00:58:59,300 those utilities are already kind of calculated, but yeah. 1043 00:58:59,300 --> 00:59:02,060 For example, final table of the main event, 1044 00:59:02,060 --> 00:59:06,510 I'm not only using ICM, but I'm thinking, well, 1045 00:59:06,510 --> 00:59:13,200 $3 million-- $4 million compared to $2 million 1046 00:59:13,200 --> 00:59:15,420 is a much smaller step to me than $2 million 1047 00:59:15,420 --> 00:59:20,080 is compared to 0 in my own personal utility. 1048 00:59:23,720 --> 00:59:27,660 Like $0.5 million compared to $2 million versus $2 million 1049 00:59:27,660 --> 00:59:29,400 compared to $3.5 million. 1050 00:59:29,400 --> 00:59:34,760 So I need to optimize utility. 1051 00:59:34,760 --> 00:59:36,620 I mean, yeah. 1052 00:59:36,620 --> 00:59:40,376 I think that's kind of worthy of study. 1053 00:59:40,376 --> 00:59:41,268 Yeah? 1054 00:59:41,268 --> 00:59:45,230 AUDIENCE: What is it about the analytics of poker 1055 00:59:45,230 --> 00:59:47,700 that makes it so popular with trading firms? 1056 00:59:47,700 --> 00:59:49,541 And how does it-- 1057 00:59:49,541 --> 00:59:50,290 PROFESSOR: Oh, OK. 1058 00:59:50,290 --> 00:59:51,996 That's a great question. 1059 00:59:51,996 --> 00:59:54,336 AUDIENCE: How do you use it professionally, 1060 00:59:54,336 --> 00:59:55,280 all of this stuff? 1061 00:59:55,280 --> 00:59:58,140 PROFESSOR: Well, I mean, I think poker is just 1062 00:59:58,140 --> 01:00:03,840 kind of-- if you think what one game-- if you could teach 1063 01:00:03,840 --> 01:00:08,500 traders one game, what one game would represent 1064 01:00:08,500 --> 01:00:09,870 what traders have to know? 1065 01:00:09,870 --> 01:00:13,070 Well poker-- there are a lot of actors. 1066 01:00:13,070 --> 01:00:15,580 There's incomplete information. 1067 01:00:15,580 --> 01:00:17,900 That's one big thing. 1068 01:00:17,900 --> 01:00:21,715 And you do have to do a lot of thinking of what your counter 1069 01:00:21,715 --> 01:00:23,610 party is doing. 1070 01:00:23,610 --> 01:00:25,480 If he wants to trade against you, 1071 01:00:25,480 --> 01:00:30,100 he puts a bid or offer-- some of that 1072 01:00:30,100 --> 01:00:36,050 is why there's this [INAUDIBLE]. 1073 01:00:36,050 --> 01:00:37,935 Are you trying to get out of risk? 1074 01:00:37,935 --> 01:00:40,780 [INAUDIBLE] big position he's trying to get out of, 1075 01:00:40,780 --> 01:00:44,750 or do you have to be worried about these orders 1076 01:00:44,750 --> 01:00:45,920 and things like that? 1077 01:00:45,920 --> 01:00:50,280 And also poker gives you sort of the skills 1078 01:00:50,280 --> 01:00:55,539 to trade that-- suppose you know something is worth $10. 1079 01:00:55,539 --> 01:00:57,830 [INAUDIBLE] you're going to make around it [INAUDIBLE]. 1080 01:00:57,830 --> 01:01:01,740 Knowing nothing, you might make-- bid [INAUDIBLE] 1081 01:01:01,740 --> 01:01:03,810 offer at 10/10, which means you're 1082 01:01:03,810 --> 01:01:07,150 willing to buy the [INAUDIBLE] or sell it at 10/10, 1083 01:01:07,150 --> 01:01:10,170 but you know something about the counter party. 1084 01:01:10,170 --> 01:01:12,810 You may know the counter party can 1085 01:01:12,810 --> 01:01:20,320 be a better buyer than seller or that buying is the risky part 1086 01:01:20,320 --> 01:01:22,830 [INAUDIBLE] is the risky part. 1087 01:01:22,830 --> 01:01:25,590 That kind of has a quant. 1088 01:01:25,590 --> 01:01:28,510 Also, as a quant, doing poker analytics 1089 01:01:28,510 --> 01:01:33,910 is very similar to the analysis we do in trading. 1090 01:01:33,910 --> 01:01:38,350 A lot of this analysis-- how these strategies work, do 1091 01:01:38,350 --> 01:01:43,540 these strategies really return what we think they return 1092 01:01:43,540 --> 01:01:47,320 are similar to discussions we have in our trading strategy. 1093 01:01:47,320 --> 01:01:50,400 I'm glad I'm able to talk to you about this, because if you're 1094 01:01:50,400 --> 01:01:53,390 interested in doing poker strategies, 1095 01:01:53,390 --> 01:01:57,090 you'll probably be interested in doing trading strategies, too. 1096 01:01:57,090 --> 01:01:58,132 Any more questions? 1097 01:02:01,430 --> 01:02:02,310 Yes? 1098 01:02:02,310 --> 01:02:07,705 AUDIENCE: What about doing the deviation from [INAUDIBLE] 1099 01:02:07,705 --> 01:02:12,930 the [INAUDIBLE] detecting deviation or let's say somebody 1100 01:02:12,930 --> 01:02:15,420 goes from playing optimally [INAUDIBLE] 1101 01:02:15,420 --> 01:02:17,320 not playing optimal [INAUDIBLE]. 1102 01:02:21,470 --> 01:02:24,140 PROFESSOR: Yeah, I mean that's a very interesting thing, 1103 01:02:24,140 --> 01:02:30,270 and that's actually hard to determine because that 1104 01:02:30,270 --> 01:02:32,070 feels a little bit harder than this 1105 01:02:32,070 --> 01:02:33,960 because this is [INAUDIBLE]. 1106 01:02:33,960 --> 01:02:36,340 It's like I'm trying to figure out the optimal strategy, 1107 01:02:36,340 --> 01:02:40,040 and I just play this, and whatever money 1108 01:02:40,040 --> 01:02:42,550 comes to me comes to me. 1109 01:02:42,550 --> 01:02:43,600 You open your arms. 1110 01:02:43,600 --> 01:02:46,500 The money comes to you. 1111 01:02:46,500 --> 01:02:49,970 The other thing is, oh, well he's playing badly, 1112 01:02:49,970 --> 01:02:52,380 so I'm going to go there and take his money. 1113 01:02:52,380 --> 01:02:55,740 But then if I deviate from optimal, 1114 01:02:55,740 --> 01:03:00,220 I'm also opening up myself to being exploited. 1115 01:03:00,220 --> 01:03:02,480 So that's kind of hard. 1116 01:03:02,480 --> 01:03:05,880 That's much more of a dynamic problem. 1117 01:03:05,880 --> 01:03:07,260 When does he go on tilt? 1118 01:03:07,260 --> 01:03:09,330 How long was he on tilt? 1119 01:03:09,330 --> 01:03:15,780 What evidence do we have that he's on tilt. 1120 01:03:15,780 --> 01:03:21,450 I know that [INAUDIBLE], the guys in CMU, 1121 01:03:21,450 --> 01:03:26,490 were looking into some sort of zero loss way 1122 01:03:26,490 --> 01:03:29,630 to exploit your opponents, because you just figure out 1123 01:03:29,630 --> 01:03:32,160 when your opponents are playing badly, 1124 01:03:32,160 --> 01:03:35,800 how much they've given up in playing sub-optimally, 1125 01:03:35,800 --> 01:03:37,670 and then you go to a [INAUDIBLE]. 1126 01:03:37,670 --> 01:03:41,830 But you only open up yourself to, say, 1127 01:03:41,830 --> 01:03:43,750 half the money he's given up, or something 1128 01:03:43,750 --> 01:03:46,120 like that, playing badly. 1129 01:03:46,120 --> 01:03:53,080 And the metric is-- so there's some sort of gaming algorithm 1130 01:03:53,080 --> 01:03:57,002 you can do to do that, but yeah that's definitely 1131 01:03:57,002 --> 01:03:57,960 another field of study. 1132 01:03:57,960 --> 01:04:02,450 There are a lot of interesting fields that can come out poker 1133 01:04:02,450 --> 01:04:02,950 [INAUDIBLE]. 1134 01:04:06,171 --> 01:04:06,670 All right. 1135 01:04:06,670 --> 01:04:07,570 I guess that's it. 1136 01:04:07,570 --> 01:04:09,120 [APPLAUSE]