1 00:00:04,500 --> 00:00:08,600 In this video, we'll discuss how we can create visualizations 2 00:00:08,600 --> 00:00:12,490 that are used in predictive policing models. 3 00:00:12,490 --> 00:00:15,430 In almost every application, before we even 4 00:00:15,430 --> 00:00:17,980 consider a predictive model, we should 5 00:00:17,980 --> 00:00:21,560 try to understand the historical data. 6 00:00:21,560 --> 00:00:25,340 Many cities in the United States and around the world, 7 00:00:25,340 --> 00:00:30,260 provide logs of reported crimes, usually the time, location, 8 00:00:30,260 --> 00:00:32,640 and nature of the event. 9 00:00:32,640 --> 00:00:36,350 In this lecture, we'll use data from the city of Chicago, 10 00:00:36,350 --> 00:00:38,810 in the United States, about motor vehicle thefts. 11 00:00:41,400 --> 00:00:43,820 Given this data on crimes, suppose 12 00:00:43,820 --> 00:00:45,950 we wanted to communicate crime patterns 13 00:00:45,950 --> 00:00:48,560 over the course of an average week. 14 00:00:48,560 --> 00:00:52,690 We could display daily crime averages using a line graph, 15 00:00:52,690 --> 00:00:57,030 like the one shown here, but this doesn't seem too useful. 16 00:00:57,030 --> 00:01:00,960 We can see that crime tends to be higher on Saturday, but when 17 00:01:00,960 --> 00:01:04,400 on Saturday, and where? 18 00:01:04,400 --> 00:01:08,830 We could replace our x-axis with the hour of the day 19 00:01:08,830 --> 00:01:11,780 and have a different line for every day of the week 20 00:01:11,780 --> 00:01:15,760 to understand when crime occurs in more detail. 21 00:01:15,760 --> 00:01:19,150 But this would be a jumbled mess with seven lines, and probably 22 00:01:19,150 --> 00:01:21,150 very hard to read. 23 00:01:21,150 --> 00:01:24,660 We could instead use no visualization at all, 24 00:01:24,660 --> 00:01:27,810 and instead present information in a table, 25 00:01:27,810 --> 00:01:30,039 like the one shown here. 26 00:01:30,039 --> 00:01:33,650 For each hour and day, we have the total number 27 00:01:33,650 --> 00:01:36,240 of crimes that occurred. 28 00:01:36,240 --> 00:01:39,030 This is a valid representation of the data, 29 00:01:39,030 --> 00:01:43,460 but large tables of numbers can be hard to read and understand. 30 00:01:43,460 --> 00:01:46,590 So how can we make the table more interesting and usable? 31 00:01:49,210 --> 00:01:51,700 A great way to visualize information 32 00:01:51,700 --> 00:01:55,550 in a two-dimensional table is with a heat map. 33 00:01:55,550 --> 00:02:00,050 Heat maps visualize data using three attributes. 34 00:02:00,050 --> 00:02:04,030 Two of the attributes are on the x and y-axes, typically 35 00:02:04,030 --> 00:02:07,990 displayed horizontally and vertically. 36 00:02:07,990 --> 00:02:12,400 The third attribute is represented by shades of color. 37 00:02:12,400 --> 00:02:16,930 In this example, lower values in the third attribute 38 00:02:16,930 --> 00:02:19,780 correspond to colors closer to blue, 39 00:02:19,780 --> 00:02:22,280 and higher values in the third attribute 40 00:02:22,280 --> 00:02:25,780 correspond to colors closer to red. 41 00:02:25,780 --> 00:02:29,920 For example, the x-axis could be hours of the day, 42 00:02:29,920 --> 00:02:32,910 the y-axis could be days of the week, 43 00:02:32,910 --> 00:02:36,020 and the colors could correspond to the amount of crime. 44 00:02:38,630 --> 00:02:42,000 In a heat map, we can pick different color schemes 45 00:02:42,000 --> 00:02:46,150 based on the type of data to convey different messages. 46 00:02:46,150 --> 00:02:49,560 In crime, a yellow to red color scheme 47 00:02:49,560 --> 00:02:52,410 might be appropriate because it can highlight 48 00:02:52,410 --> 00:02:55,230 some of the more dangerous areas in red. 49 00:02:55,230 --> 00:02:59,760 Your eye is naturally drawn to the red areas of the plot. 50 00:02:59,760 --> 00:03:03,380 In other applications, both high and low values 51 00:03:03,380 --> 00:03:07,050 are meaningful, so having a more varied color scheme 52 00:03:07,050 --> 00:03:09,180 might be useful. 53 00:03:09,180 --> 00:03:11,340 And in other applications, you might only 54 00:03:11,340 --> 00:03:13,970 want to see cells with high values, 55 00:03:13,970 --> 00:03:17,550 so you could use a gray scale to make the cells with low values 56 00:03:17,550 --> 00:03:19,380 white. 57 00:03:19,380 --> 00:03:24,360 The x and y-axes in a heat map don't need to be continuous. 58 00:03:24,360 --> 00:03:27,680 In our example, we have a categorical or factor variable 59 00:03:27,680 --> 00:03:29,960 -- the day of the week. 60 00:03:29,960 --> 00:03:32,840 And we can even combine a heat map 61 00:03:32,840 --> 00:03:35,079 with a geographical map, which we'll 62 00:03:35,079 --> 00:03:37,660 discuss later in this lecture. 63 00:03:37,660 --> 00:03:40,030 This type of heat map is frequently used 64 00:03:40,030 --> 00:03:46,100 in predictive policing to show crime hot spots in a city. 65 00:03:46,100 --> 00:03:49,400 In this lecture, we'll use Chicago motor vehicle theft 66 00:03:49,400 --> 00:03:53,120 data to explore patterns of crime, both over days 67 00:03:53,120 --> 00:03:57,130 of the week, and over hours of the day. 68 00:03:57,130 --> 00:04:00,880 We're interested in analyzing the total number of car thefts 69 00:04:00,880 --> 00:04:03,470 that occur in any particular hour 70 00:04:03,470 --> 00:04:07,420 of a day of the week over our whole data set. 71 00:04:07,420 --> 00:04:10,280 In the next video, we'll switch to R, 72 00:04:10,280 --> 00:04:12,390 and start by creating a basic line 73 00:04:12,390 --> 00:04:16,890 plot counting the total crime by the day of the week.