1 00:00:04,490 --> 00:00:08,680 We will discuss the results of the classification tree model. 2 00:00:11,730 --> 00:00:15,080 So we first observe that the overall accuracy 3 00:00:15,080 --> 00:00:20,300 of the method regarding the percentage that it accurately 4 00:00:20,300 --> 00:00:27,340 predicts is 80%, compared to 75% of the baseline. 5 00:00:27,340 --> 00:00:30,380 But notice that this is done in an interesting way. 6 00:00:30,380 --> 00:00:35,260 For bucket one patients, the two models are equivalent. 7 00:00:35,260 --> 00:00:38,390 But of course this suggests the idea 8 00:00:38,390 --> 00:00:40,430 that healthy people stay healthy, 9 00:00:40,430 --> 00:00:42,950 which is the idea of the baseline model. 10 00:00:42,950 --> 00:00:47,080 The cost repeats is valid in the data. 11 00:00:47,080 --> 00:00:50,000 But then for buckets two to five, 12 00:00:50,000 --> 00:00:53,200 notice that the accuracy increases substantially from 13 00:00:53,200 --> 00:00:58,370 31% to 60%-- it doubles-- from 21% to 53%-- 14 00:00:58,370 --> 00:01:02,360 more than doubles-- and from 19% to 39%-- doubles. 15 00:01:02,360 --> 00:01:06,680 There's an improvement from 23% to 30%, not as big as before, 16 00:01:06,680 --> 00:01:09,510 but there is indeed an improvement for bucket five. 17 00:01:09,510 --> 00:01:15,240 But notice the improvement on the penalty from 0.56 to 0.52 18 00:01:15,240 --> 00:01:17,130 overall. 19 00:01:17,130 --> 00:01:20,640 A small improvement in bucket one, 20 00:01:20,640 --> 00:01:26,280 but a significant improvement as we increase on the buckets. 21 00:01:26,280 --> 00:01:32,320 For example, here for bucket five, 22 00:01:32,320 --> 00:01:37,120 the penalty error decreases from 1.88 to 1.01, 23 00:01:37,120 --> 00:01:38,280 a substantial improvement. 24 00:01:41,770 --> 00:01:44,039 So we observed that there's a substantial improvement 25 00:01:44,039 --> 00:01:48,210 over the baseline, especially as we go down on buckets. 26 00:01:48,210 --> 00:01:51,539 It doubles the accuracy over the baseline in some cases. 27 00:01:54,789 --> 00:01:58,270 And so we have seen there's a smaller accuracy 28 00:01:58,270 --> 00:02:03,860 improvement in bucket five, but there's a much lower penalty 29 00:02:03,860 --> 00:02:06,980 in the prediction for bucket five. 30 00:02:06,980 --> 00:02:09,470 So what is the edge of the analytics 31 00:02:09,470 --> 00:02:12,050 provided to D2Hawkeye? 32 00:02:12,050 --> 00:02:15,190 First and foremost, there was a substantial improvement 33 00:02:15,190 --> 00:02:16,950 in the company's ability to identify 34 00:02:16,950 --> 00:02:19,050 patients who need more attention. 35 00:02:21,900 --> 00:02:24,720 Another advantage was related to the fact 36 00:02:24,720 --> 00:02:28,210 that the model was in fact interpretable by physicians. 37 00:02:28,210 --> 00:02:30,930 So the physicians were able to improve the model 38 00:02:30,930 --> 00:02:36,070 by identifying new variables and refining existing variables. 39 00:02:36,070 --> 00:02:39,350 That really led to further improvements. 40 00:02:39,350 --> 00:02:42,720 Finally, and quite importantly, the analytics 41 00:02:42,720 --> 00:02:48,630 gave the company an edge over the competition using-- 42 00:02:48,630 --> 00:02:51,970 that the competition used last century methods. 43 00:02:51,970 --> 00:02:53,960 And the use of machine learning methods-- 44 00:02:53,960 --> 00:02:55,890 in this case, classification trees-- 45 00:02:55,890 --> 00:02:59,850 provided an edge that also helped Hawkeye 46 00:02:59,850 --> 00:03:04,100 when it was sold to Verisk Analytics in 2009.