1 00:00:09,530 --> 00:00:13,420 After Watson has completed the initial two steps of question 2 00:00:13,420 --> 00:00:16,350 analysis and hypothesis generation, 3 00:00:16,350 --> 00:00:18,840 it's time to move on to step three, 4 00:00:18,840 --> 00:00:22,040 where each of the hypotheses are scored. 5 00:00:22,040 --> 00:00:24,920 In this step, Watson computes confidence levels 6 00:00:24,920 --> 00:00:28,320 for each possible answer or hypothesis. 7 00:00:28,320 --> 00:00:30,810 This is necessary to accurately estimate 8 00:00:30,810 --> 00:00:34,530 the probability of a proposed answer being correct. 9 00:00:34,530 --> 00:00:37,610 Watson will only buzz and to answer a question 10 00:00:37,610 --> 00:00:40,240 if a confidence level for one of the hypotheses 11 00:00:40,240 --> 00:00:42,250 is above a threshold. 12 00:00:42,250 --> 00:00:44,610 To compute these confidence levels, 13 00:00:44,610 --> 00:00:48,980 Watson combines a large number of different methods. 14 00:00:48,980 --> 00:00:52,510 First, Watson starts with a lightweight scoring algorithms 15 00:00:52,510 --> 00:00:55,660 to prune down the large set of hypotheses. 16 00:00:55,660 --> 00:00:59,630 Recall that in step two, about 200 different hypotheses 17 00:00:59,630 --> 00:01:01,420 were generated. 18 00:01:01,420 --> 00:01:04,200 An example of a lightweight scoring algorithm 19 00:01:04,200 --> 00:01:07,640 is computing the likelihood that a candidate answer is actually 20 00:01:07,640 --> 00:01:10,030 an instance of the LAT. 21 00:01:10,030 --> 00:01:14,570 For the Mozart symphony question where the LAT is "this planet," 22 00:01:14,570 --> 00:01:18,140 a candidate answer like "Earth" would have a very high score, 23 00:01:18,140 --> 00:01:20,600 but a candidate answer like, "the moon" 24 00:01:20,600 --> 00:01:22,960 would have a lower score. 25 00:01:22,960 --> 00:01:25,460 If the likelihood is not very high, 26 00:01:25,460 --> 00:01:28,470 Watson throws away the hypothesis. 27 00:01:28,470 --> 00:01:31,490 They candidate answers that pass this step proceed 28 00:01:31,490 --> 00:01:34,550 to the next stage of the scoring algorithms. 29 00:01:34,550 --> 00:01:37,750 Watson lets about 100 candidate answers 30 00:01:37,750 --> 00:01:39,259 pass on to the next stage. 31 00:01:41,970 --> 00:01:46,150 Then Watson goes into more advanced scoring analytics. 32 00:01:46,150 --> 00:01:48,810 Watson needs to gather supporting evidence 33 00:01:48,810 --> 00:01:51,090 for each candidate answer. 34 00:01:51,090 --> 00:01:53,320 One way of doing this is through a method 35 00:01:53,320 --> 00:01:55,770 called passage search, where passages 36 00:01:55,770 --> 00:01:59,130 are retrieved that contain the hypothesis text. 37 00:01:59,130 --> 00:02:01,610 To simulate this, let's see what happens 38 00:02:01,610 --> 00:02:05,160 when we search for two of our hypotheses on Google. 39 00:02:05,160 --> 00:02:07,850 Our first hypothesis is "Mozart's last 40 00:02:07,850 --> 00:02:10,199 and perhaps most powerful symphony shares 41 00:02:10,199 --> 00:02:12,010 its name with Mercury." 42 00:02:12,010 --> 00:02:14,860 And our second hypothesis is "Mozart's last 43 00:02:14,860 --> 00:02:17,310 and perhaps most powerful symphony shares 44 00:02:17,310 --> 00:02:20,550 its name with Jupiter." 45 00:02:20,550 --> 00:02:24,870 On Google, if we search for Mozart, symphony, and Mercury, 46 00:02:24,870 --> 00:02:27,270 we get about 900,000 results. 47 00:02:27,270 --> 00:02:29,350 And we get some good results. 48 00:02:29,350 --> 00:02:32,329 They definitely mention the three words we searched for, 49 00:02:32,329 --> 00:02:35,440 but Mercury is only next to symphony once. 50 00:02:35,440 --> 00:02:40,700 And there's no mention about this being his last symphony. 51 00:02:40,700 --> 00:02:44,570 Now, if we search for Mozart, symphony, and Jupiter, 52 00:02:44,570 --> 00:02:47,290 we get about 1.5 million results. 53 00:02:47,290 --> 00:02:49,630 And they look much more promising. 54 00:02:49,630 --> 00:02:53,390 We see the phrase "last symphony" a couple times 55 00:02:53,390 --> 00:02:56,510 and "Jupiter symphony" more than once. 56 00:02:56,510 --> 00:02:59,250 Therefore, the hypothesis with Jupiter 57 00:02:59,250 --> 00:03:02,120 seems to be more supported than the hypothesis with Mercury. 58 00:03:04,790 --> 00:03:07,690 Now, the scoring analytics determine the degree 59 00:03:07,690 --> 00:03:11,740 of certainty that the evidence supports the candidate answers. 60 00:03:11,740 --> 00:03:15,170 More than 50 different scoring components are used. 61 00:03:15,170 --> 00:03:19,410 One example is analyzing temporal relationships. 62 00:03:19,410 --> 00:03:23,340 Consider the Jeopardy question-- "In 1594, he 63 00:03:23,340 --> 00:03:27,100 took a job as a tax collector in Andalusia." 64 00:03:27,100 --> 00:03:31,380 Two candidate answers are Thoreau and Cervantes. 65 00:03:31,380 --> 00:03:33,870 However, this algorithm would determine 66 00:03:33,870 --> 00:03:37,360 that Thoreau was not born until 1817. 67 00:03:37,360 --> 00:03:41,790 So it would give a higher score to Cervantes. 68 00:03:41,790 --> 00:03:44,800 Once all of the scoring algorithms are run, 69 00:03:44,800 --> 00:03:47,710 Watson needs to select the single best supported 70 00:03:47,710 --> 00:03:49,360 hypothesis. 71 00:03:49,360 --> 00:03:51,990 Before this can be done, similar answers 72 00:03:51,990 --> 00:03:55,410 need to be merged, since multiple candidate answers may 73 00:03:55,410 --> 00:03:57,170 be equivalent. 74 00:03:57,170 --> 00:03:59,690 As an example, the candidate answers 75 00:03:59,690 --> 00:04:04,800 "Abraham Lincoln" and "Honest Abe" refer to the same person. 76 00:04:04,800 --> 00:04:07,240 So the scores for these two candidate answers 77 00:04:07,240 --> 00:04:08,980 need to be combined. 78 00:04:08,980 --> 00:04:11,410 Watson should not be viewing similar answers 79 00:04:11,410 --> 00:04:13,820 as competing choices. 80 00:04:13,820 --> 00:04:16,829 Now, Watson is ready to rank the hypotheses 81 00:04:16,829 --> 00:04:20,130 and estimate an overall confidence for each. 82 00:04:20,130 --> 00:04:24,740 To do this, predictive analytics are used. 83 00:04:24,740 --> 00:04:27,770 To compute an overall confidence level for each candidate 84 00:04:27,770 --> 00:04:31,470 answer, Watson uses logistic regression. 85 00:04:31,470 --> 00:04:35,409 The training data is a set of historical jeopardy questions 86 00:04:35,409 --> 00:04:37,960 and all of the candidate answers. 87 00:04:37,960 --> 00:04:39,700 Each of the scoring algorithms is 88 00:04:39,700 --> 00:04:42,770 used as an independent variable. 89 00:04:42,770 --> 00:04:46,110 Then, logistic regression is used to predict whether or not 90 00:04:46,110 --> 00:04:50,390 a candidate answer is correct using the scores. 91 00:04:50,390 --> 00:04:52,800 This gives each score a weight and computes 92 00:04:52,800 --> 00:04:55,620 an overall profitability or confidence 93 00:04:55,620 --> 00:04:58,380 that a candidate answer is correct. 94 00:04:58,380 --> 00:05:01,330 If the highest confidence level for one of the candidate 95 00:05:01,330 --> 00:05:04,140 answers for a question is high enough, 96 00:05:04,140 --> 00:05:06,240 Watson buzzes in to answer the question. 97 00:05:08,780 --> 00:05:11,520 In total, the Watson system is composed 98 00:05:11,520 --> 00:05:14,090 of eight refrigerator-sized cabinets 99 00:05:14,090 --> 00:05:18,020 and has high-speed local storage for all information. 100 00:05:18,020 --> 00:05:22,280 It originally took over two hours to answer one question. 101 00:05:22,280 --> 00:05:26,280 And the team had to reduce this to two to six seconds. 102 00:05:26,280 --> 00:05:29,890 In the next video, we'll see how Watson progressed in the six 103 00:05:29,890 --> 00:05:32,890 years between starting and playing on Jeopardy, 104 00:05:32,890 --> 00:05:35,240 what happened during the game, and what 105 00:05:35,240 --> 00:05:38,120 the Watson team is working on now.