1 00:00:04,500 --> 00:00:07,140 When Watson receives a question, the first step 2 00:00:07,140 --> 00:00:09,670 is question analysis. 3 00:00:09,670 --> 00:00:13,080 One of the things Watson tries to figure out in this step 4 00:00:13,080 --> 00:00:16,210 is what the question is looking for. 5 00:00:16,210 --> 00:00:19,620 This is defined as trying to find the Lexical Answer 6 00:00:19,620 --> 00:00:23,300 Type, or LAT, of the question. 7 00:00:23,300 --> 00:00:26,280 The LAT is the word or noun in the question 8 00:00:26,280 --> 00:00:29,470 that specifies the type of answer. 9 00:00:29,470 --> 00:00:32,720 You should be able to replace the LAT with the answer 10 00:00:32,720 --> 00:00:34,850 to complete the sentence. 11 00:00:34,850 --> 00:00:38,580 For example, for the question, "Mozart's last 12 00:00:38,580 --> 00:00:40,770 and perhaps most powerful symphony 13 00:00:40,770 --> 00:00:44,680 shares its name with this planet," the LAT in this case 14 00:00:44,680 --> 00:00:47,370 is "this planet." 15 00:00:47,370 --> 00:00:50,380 If we replace this with the answer "Jupiter," 16 00:00:50,380 --> 00:00:51,920 it makes sense. 17 00:00:51,920 --> 00:00:55,570 Mozart's last and perhaps most powerful symphony shares 18 00:00:55,570 --> 00:00:58,120 its name with Jupiter. 19 00:00:58,120 --> 00:01:01,270 For the question, "Smaller than only Greenland, 20 00:01:01,270 --> 00:01:05,910 it's the world's second largest island," the LAT is "it's." 21 00:01:05,910 --> 00:01:09,260 If we replace the LAT with the answer "New Guinea," 22 00:01:09,260 --> 00:01:10,650 it makes sense. 23 00:01:10,650 --> 00:01:13,120 "Smaller than only Greenland, New Guinea 24 00:01:13,120 --> 00:01:15,850 is the world's second largest island." 25 00:01:15,850 --> 00:01:18,220 Unfortunately, the LAT is not "island," 26 00:01:18,220 --> 00:01:21,520 which would be more descriptive, since the sentence with "New 27 00:01:21,520 --> 00:01:25,160 Guinea" in place of "island" does not make sense. 28 00:01:25,160 --> 00:01:28,980 We can see in these two examples that sometimes the LAT is very 29 00:01:28,980 --> 00:01:33,630 specific, like "this planet," and sometimes it's very vague, 30 00:01:33,630 --> 00:01:36,880 like "it's." 31 00:01:36,880 --> 00:01:40,460 If we know the LAT, we know what to look for. 32 00:01:40,460 --> 00:01:44,420 However, in an analysis of 20,000 questions, 33 00:01:44,420 --> 00:01:49,360 2,500 distinct LATs were found, and 12% of the questions 34 00:01:49,360 --> 00:01:51,680 did not even have an explicit LAT. 35 00:01:51,680 --> 00:01:54,360 They had LATs like "it's." 36 00:01:54,360 --> 00:01:58,789 Furthermore, even the most frequent 200 explicit LATs 37 00:01:58,789 --> 00:02:02,550 cover less than 50% of the questions. 38 00:02:02,550 --> 00:02:05,420 So to enhance the question analysis step, 39 00:02:05,420 --> 00:02:08,360 Watson also performs relation detection 40 00:02:08,360 --> 00:02:12,020 to find relationships among words and decomposition 41 00:02:12,020 --> 00:02:13,980 to split the question into different clues. 42 00:02:17,220 --> 00:02:21,280 The second step in Watson is hypothesis generation. 43 00:02:21,280 --> 00:02:25,100 The goal of this step is to use the question analysis of step 44 00:02:25,100 --> 00:02:30,190 one to produce candidate answers by searching the databases. 45 00:02:30,190 --> 00:02:33,510 In this step several hundred candidate answers 46 00:02:33,510 --> 00:02:35,280 are generated. 47 00:02:35,280 --> 00:02:38,700 For the question, "Mozart's last and perhaps most powerful 48 00:02:38,700 --> 00:02:41,520 symphony shares its name with this planet," 49 00:02:41,520 --> 00:02:46,050 candidate answers could be Mercury, Earth, and Jupiter. 50 00:02:46,050 --> 00:02:48,760 These are generated using various search techniques. 51 00:02:51,430 --> 00:02:54,380 Then each candidate answer plugged back 52 00:02:54,380 --> 00:02:56,750 into the question in place of the LAT 53 00:02:56,750 --> 00:02:59,700 is considered a hypothesis. 54 00:02:59,700 --> 00:03:02,100 For the question about Mozart's symphony, 55 00:03:02,100 --> 00:03:04,200 hypothesis one would be the question 56 00:03:04,200 --> 00:03:07,400 with "Mercury" in place of "this planet." 57 00:03:07,400 --> 00:03:09,490 Hypothesis two would have "Jupiter" 58 00:03:09,490 --> 00:03:11,360 in place of "this planet." 59 00:03:11,360 --> 00:03:13,510 And hypothesis three would have "Earth" 60 00:03:13,510 --> 00:03:18,050 in place of "this planet." 61 00:03:18,050 --> 00:03:21,350 If the correct answer is not generated at this stage, 62 00:03:21,350 --> 00:03:25,260 Watson has no hope of getting the question right. 63 00:03:25,260 --> 00:03:27,760 Therefore, this step errors on the side 64 00:03:27,760 --> 00:03:30,220 of generating a lot of hypotheses 65 00:03:30,220 --> 00:03:32,040 and leaves it up to the next step 66 00:03:32,040 --> 00:03:34,260 to find the correct answer. 67 00:03:34,260 --> 00:03:38,140 In the next video, we'll discuss how steps three and four 68 00:03:38,140 --> 00:03:40,930 score and rank the hypotheses.