1
00:00:09,530 --> 00:00:13,420
After Watson has completed the
initial two steps of question

2
00:00:13,420 --> 00:00:16,350
analysis and
hypothesis generation,

3
00:00:16,350 --> 00:00:18,840
it's time to move
on to step three,

4
00:00:18,840 --> 00:00:22,040
where each of the
hypotheses are scored.

5
00:00:22,040 --> 00:00:24,920
In this step, Watson
computes confidence levels

6
00:00:24,920 --> 00:00:28,320
for each possible
answer or hypothesis.

7
00:00:28,320 --> 00:00:30,810
This is necessary to
accurately estimate

8
00:00:30,810 --> 00:00:34,530
the probability of a proposed
answer being correct.

9
00:00:34,530 --> 00:00:37,610
Watson will only buzz
and to answer a question

10
00:00:37,610 --> 00:00:40,240
if a confidence level
for one of the hypotheses

11
00:00:40,240 --> 00:00:42,250
is above a threshold.

12
00:00:42,250 --> 00:00:44,610
To compute these
confidence levels,

13
00:00:44,610 --> 00:00:48,980
Watson combines a large
number of different methods.

14
00:00:48,980 --> 00:00:52,510
First, Watson starts with a
lightweight scoring algorithms

15
00:00:52,510 --> 00:00:55,660
to prune down the large
set of hypotheses.

16
00:00:55,660 --> 00:00:59,630
Recall that in step two,
about 200 different hypotheses

17
00:00:59,630 --> 00:01:01,420
were generated.

18
00:01:01,420 --> 00:01:04,200
An example of a lightweight
scoring algorithm

19
00:01:04,200 --> 00:01:07,640
is computing the likelihood that
a candidate answer is actually

20
00:01:07,640 --> 00:01:10,030
an instance of the LAT.

21
00:01:10,030 --> 00:01:14,570
For the Mozart symphony question
where the LAT is "this planet,"

22
00:01:14,570 --> 00:01:18,140
a candidate answer like "Earth"
would have a very high score,

23
00:01:18,140 --> 00:01:20,600
but a candidate answer
like, "the moon"

24
00:01:20,600 --> 00:01:22,960
would have a lower score.

25
00:01:22,960 --> 00:01:25,460
If the likelihood
is not very high,

26
00:01:25,460 --> 00:01:28,470
Watson throws away
the hypothesis.

27
00:01:28,470 --> 00:01:31,490
They candidate answers
that pass this step proceed

28
00:01:31,490 --> 00:01:34,550
to the next stage of
the scoring algorithms.

29
00:01:34,550 --> 00:01:37,750
Watson lets about
100 candidate answers

30
00:01:37,750 --> 00:01:39,259
pass on to the next stage.

31
00:01:41,970 --> 00:01:46,150
Then Watson goes into more
advanced scoring analytics.

32
00:01:46,150 --> 00:01:48,810
Watson needs to gather
supporting evidence

33
00:01:48,810 --> 00:01:51,090
for each candidate answer.

34
00:01:51,090 --> 00:01:53,320
One way of doing this
is through a method

35
00:01:53,320 --> 00:01:55,770
called passage
search, where passages

36
00:01:55,770 --> 00:01:59,130
are retrieved that contain
the hypothesis text.

37
00:01:59,130 --> 00:02:01,610
To simulate this,
let's see what happens

38
00:02:01,610 --> 00:02:05,160
when we search for two of
our hypotheses on Google.

39
00:02:05,160 --> 00:02:07,850
Our first hypothesis
is "Mozart's last

40
00:02:07,850 --> 00:02:10,199
and perhaps most
powerful symphony shares

41
00:02:10,199 --> 00:02:12,010
its name with Mercury."

42
00:02:12,010 --> 00:02:14,860
And our second hypothesis
is "Mozart's last

43
00:02:14,860 --> 00:02:17,310
and perhaps most
powerful symphony shares

44
00:02:17,310 --> 00:02:20,550
its name with Jupiter."

45
00:02:20,550 --> 00:02:24,870
On Google, if we search for
Mozart, symphony, and Mercury,

46
00:02:24,870 --> 00:02:27,270
we get about 900,000 results.

47
00:02:27,270 --> 00:02:29,350
And we get some good results.

48
00:02:29,350 --> 00:02:32,329
They definitely mention the
three words we searched for,

49
00:02:32,329 --> 00:02:35,440
but Mercury is only
next to symphony once.

50
00:02:35,440 --> 00:02:40,700
And there's no mention about
this being his last symphony.

51
00:02:40,700 --> 00:02:44,570
Now, if we search for Mozart,
symphony, and Jupiter,

52
00:02:44,570 --> 00:02:47,290
we get about 1.5
million results.

53
00:02:47,290 --> 00:02:49,630
And they look much
more promising.

54
00:02:49,630 --> 00:02:53,390
We see the phrase "last
symphony" a couple times

55
00:02:53,390 --> 00:02:56,510
and "Jupiter symphony"
more than once.

56
00:02:56,510 --> 00:02:59,250
Therefore, the
hypothesis with Jupiter

57
00:02:59,250 --> 00:03:02,120
seems to be more supported than
the hypothesis with Mercury.

58
00:03:04,790 --> 00:03:07,690
Now, the scoring analytics
determine the degree

59
00:03:07,690 --> 00:03:11,740
of certainty that the evidence
supports the candidate answers.

60
00:03:11,740 --> 00:03:15,170
More than 50 different
scoring components are used.

61
00:03:15,170 --> 00:03:19,410
One example is analyzing
temporal relationships.

62
00:03:19,410 --> 00:03:23,340
Consider the Jeopardy
question-- "In 1594, he

63
00:03:23,340 --> 00:03:27,100
took a job as a tax
collector in Andalusia."

64
00:03:27,100 --> 00:03:31,380
Two candidate answers are
Thoreau and Cervantes.

65
00:03:31,380 --> 00:03:33,870
However, this algorithm
would determine

66
00:03:33,870 --> 00:03:37,360
that Thoreau was
not born until 1817.

67
00:03:37,360 --> 00:03:41,790
So it would give a higher
score to Cervantes.

68
00:03:41,790 --> 00:03:44,800
Once all of the scoring
algorithms are run,

69
00:03:44,800 --> 00:03:47,710
Watson needs to select
the single best supported

70
00:03:47,710 --> 00:03:49,360
hypothesis.

71
00:03:49,360 --> 00:03:51,990
Before this can be
done, similar answers

72
00:03:51,990 --> 00:03:55,410
need to be merged, since
multiple candidate answers may

73
00:03:55,410 --> 00:03:57,170
be equivalent.

74
00:03:57,170 --> 00:03:59,690
As an example, the
candidate answers

75
00:03:59,690 --> 00:04:04,800
"Abraham Lincoln" and "Honest
Abe" refer to the same person.

76
00:04:04,800 --> 00:04:07,240
So the scores for these
two candidate answers

77
00:04:07,240 --> 00:04:08,980
need to be combined.

78
00:04:08,980 --> 00:04:11,410
Watson should not be
viewing similar answers

79
00:04:11,410 --> 00:04:13,820
as competing choices.

80
00:04:13,820 --> 00:04:16,829
Now, Watson is ready
to rank the hypotheses

81
00:04:16,829 --> 00:04:20,130
and estimate an overall
confidence for each.

82
00:04:20,130 --> 00:04:24,740
To do this, predictive
analytics are used.

83
00:04:24,740 --> 00:04:27,770
To compute an overall confidence
level for each candidate

84
00:04:27,770 --> 00:04:31,470
answer, Watson uses
logistic regression.

85
00:04:31,470 --> 00:04:35,409
The training data is a set of
historical jeopardy questions

86
00:04:35,409 --> 00:04:37,960
and all of the
candidate answers.

87
00:04:37,960 --> 00:04:39,700
Each of the scoring
algorithms is

88
00:04:39,700 --> 00:04:42,770
used as an independent variable.

89
00:04:42,770 --> 00:04:46,110
Then, logistic regression is
used to predict whether or not

90
00:04:46,110 --> 00:04:50,390
a candidate answer is
correct using the scores.

91
00:04:50,390 --> 00:04:52,800
This gives each score
a weight and computes

92
00:04:52,800 --> 00:04:55,620
an overall profitability
or confidence

93
00:04:55,620 --> 00:04:58,380
that a candidate
answer is correct.

94
00:04:58,380 --> 00:05:01,330
If the highest confidence
level for one of the candidate

95
00:05:01,330 --> 00:05:04,140
answers for a question
is high enough,

96
00:05:04,140 --> 00:05:06,240
Watson buzzes in to
answer the question.

97
00:05:08,780 --> 00:05:11,520
In total, the Watson
system is composed

98
00:05:11,520 --> 00:05:14,090
of eight
refrigerator-sized cabinets

99
00:05:14,090 --> 00:05:18,020
and has high-speed local
storage for all information.

100
00:05:18,020 --> 00:05:22,280
It originally took over two
hours to answer one question.

101
00:05:22,280 --> 00:05:26,280
And the team had to reduce
this to two to six seconds.

102
00:05:26,280 --> 00:05:29,890
In the next video, we'll see
how Watson progressed in the six

103
00:05:29,890 --> 00:05:32,890
years between starting
and playing on Jeopardy,

104
00:05:32,890 --> 00:05:35,240
what happened during
the game, and what

105
00:05:35,240 --> 00:05:38,120
the Watson team
is working on now.