1
00:00:04,490 --> 00:00:08,680
We will discuss the results of
the classification tree model.

2
00:00:11,730 --> 00:00:15,080
So we first observe that
the overall accuracy

3
00:00:15,080 --> 00:00:20,300
of the method regarding the
percentage that it accurately

4
00:00:20,300 --> 00:00:27,340
predicts is 80%, compared
to 75% of the baseline.

5
00:00:27,340 --> 00:00:30,380
But notice that this is
done in an interesting way.

6
00:00:30,380 --> 00:00:35,260
For bucket one patients, the
two models are equivalent.

7
00:00:35,260 --> 00:00:38,390
But of course this
suggests the idea

8
00:00:38,390 --> 00:00:40,430
that healthy people
stay healthy,

9
00:00:40,430 --> 00:00:42,950
which is the idea of
the baseline model.

10
00:00:42,950 --> 00:00:47,080
The cost repeats is
valid in the data.

11
00:00:47,080 --> 00:00:50,000
But then for
buckets two to five,

12
00:00:50,000 --> 00:00:53,200
notice that the accuracy
increases substantially from

13
00:00:53,200 --> 00:00:58,370
31% to 60%-- it doubles--
from 21% to 53%--

14
00:00:58,370 --> 00:01:02,360
more than doubles-- and
from 19% to 39%-- doubles.

15
00:01:02,360 --> 00:01:06,680
There's an improvement from 23%
to 30%, not as big as before,

16
00:01:06,680 --> 00:01:09,510
but there is indeed an
improvement for bucket five.

17
00:01:09,510 --> 00:01:15,240
But notice the improvement on
the penalty from 0.56 to 0.52

18
00:01:15,240 --> 00:01:17,130
overall.

19
00:01:17,130 --> 00:01:20,640
A small improvement
in bucket one,

20
00:01:20,640 --> 00:01:26,280
but a significant improvement
as we increase on the buckets.

21
00:01:26,280 --> 00:01:32,320
For example, here
for bucket five,

22
00:01:32,320 --> 00:01:37,120
the penalty error decreases
from 1.88 to 1.01,

23
00:01:37,120 --> 00:01:38,280
a substantial improvement.

24
00:01:41,770 --> 00:01:44,039
So we observed that there's
a substantial improvement

25
00:01:44,039 --> 00:01:48,210
over the baseline, especially
as we go down on buckets.

26
00:01:48,210 --> 00:01:51,539
It doubles the accuracy over
the baseline in some cases.

27
00:01:54,789 --> 00:01:58,270
And so we have seen
there's a smaller accuracy

28
00:01:58,270 --> 00:02:03,860
improvement in bucket five, but
there's a much lower penalty

29
00:02:03,860 --> 00:02:06,980
in the prediction
for bucket five.

30
00:02:06,980 --> 00:02:09,470
So what is the edge
of the analytics

31
00:02:09,470 --> 00:02:12,050
provided to D2Hawkeye?

32
00:02:12,050 --> 00:02:15,190
First and foremost, there
was a substantial improvement

33
00:02:15,190 --> 00:02:16,950
in the company's
ability to identify

34
00:02:16,950 --> 00:02:19,050
patients who need
more attention.

35
00:02:21,900 --> 00:02:24,720
Another advantage was
related to the fact

36
00:02:24,720 --> 00:02:28,210
that the model was in fact
interpretable by physicians.

37
00:02:28,210 --> 00:02:30,930
So the physicians were
able to improve the model

38
00:02:30,930 --> 00:02:36,070
by identifying new variables
and refining existing variables.

39
00:02:36,070 --> 00:02:39,350
That really led to
further improvements.

40
00:02:39,350 --> 00:02:42,720
Finally, and quite
importantly, the analytics

41
00:02:42,720 --> 00:02:48,630
gave the company an edge
over the competition using--

42
00:02:48,630 --> 00:02:51,970
that the competition used
last century methods.

43
00:02:51,970 --> 00:02:53,960
And the use of machine
learning methods--

44
00:02:53,960 --> 00:02:55,890
in this case,
classification trees--

45
00:02:55,890 --> 00:02:59,850
provided an edge that
also helped Hawkeye

46
00:02:59,850 --> 00:03:04,100
when it was sold to
Verisk Analytics in 2009.