1 00:00:04,580 --> 00:00:07,280 In this video, we'll add the hour of the day 2 00:00:07,280 --> 00:00:09,640 to our line plot, and then create 3 00:00:09,640 --> 00:00:13,310 an alternative visualization using a heat map. 4 00:00:13,310 --> 00:00:17,870 We can do this by creating a line for each day of the week 5 00:00:17,870 --> 00:00:21,570 and making the x-axis the hour of the day. 6 00:00:21,570 --> 00:00:23,860 We first need to create a counts table 7 00:00:23,860 --> 00:00:26,070 for the weekday, and hour. 8 00:00:26,070 --> 00:00:30,500 So we'll use the table function and give as the first variable, 9 00:00:30,500 --> 00:00:33,170 the Weekday variable in our data frame. 10 00:00:33,170 --> 00:00:37,650 and as the second variable, the Hour variable. 11 00:00:37,650 --> 00:00:41,640 This table gives, for each day of the week and each hour, 12 00:00:41,640 --> 00:00:45,080 the total number of motor vehicle thefts that occurred. 13 00:00:45,080 --> 00:00:48,650 For example, on Friday at 4 AM, there 14 00:00:48,650 --> 00:00:53,300 were 473 motor vehicle thefts, whereas on Saturday 15 00:00:53,300 --> 00:00:58,860 at midnight, there were 2,050 motor vehicle thefts. 16 00:00:58,860 --> 00:01:01,260 Let's save this table to a data frame 17 00:01:01,260 --> 00:01:03,960 so that we can use it in our visualizations. 18 00:01:03,960 --> 00:01:11,490 We'll call it DayHourCounts and use the as.data.frame function, 19 00:01:11,490 --> 00:01:16,140 run on our table, where the first variable is the Weekday 20 00:01:16,140 --> 00:01:18,430 and the second variable is the Hour. 21 00:01:21,289 --> 00:01:23,300 Let's take a look at the structure of the data 22 00:01:23,300 --> 00:01:24,220 frame we just created. 23 00:01:28,220 --> 00:01:32,360 We can see that we have 168 observations-- one 24 00:01:32,360 --> 00:01:35,259 for each day of the week and hour pair, 25 00:01:35,259 --> 00:01:37,490 and three different variables. 26 00:01:37,490 --> 00:01:40,930 The first variable, Var1, gives the day of the week. 27 00:01:40,930 --> 00:01:45,229 The second variable, Var2, gives the hour of the day. 28 00:01:45,229 --> 00:01:48,620 And the third variable, Freq for frequency, 29 00:01:48,620 --> 00:01:51,100 gives the total crime count. 30 00:01:51,100 --> 00:01:54,440 Let's convert the second variable, Var2, 31 00:01:54,440 --> 00:01:57,009 to actual numbers and call it Hour, 32 00:01:57,009 --> 00:01:58,759 since this is the hour of the day, 33 00:01:58,759 --> 00:02:01,430 and it makes sense that it's numerical. 34 00:02:01,430 --> 00:02:07,520 So we'll add a new variable to our data frame called Hour = 35 00:02:07,520 --> 00:02:09,400 as.numeric(as.character(DayHourCounts$Var2)). 36 00:02:21,910 --> 00:02:27,920 This is how we convert a factor variable to a numeric variable. 37 00:02:27,920 --> 00:02:30,410 Now we're ready to create our plot. 38 00:02:30,410 --> 00:02:33,840 We just need to change the group to Var1, 39 00:02:33,840 --> 00:02:35,620 which is the day of the week. 40 00:02:35,620 --> 00:02:38,510 So we'll use the ggplot function where 41 00:02:38,510 --> 00:02:43,860 our data frame is DayHourCounts, and then in our aesthetic, 42 00:02:43,860 --> 00:02:47,420 we want the x-axis to be Hour this time, 43 00:02:47,420 --> 00:02:53,620 the y-axis to be Freq, and then in the geom_line option, 44 00:02:53,620 --> 00:02:56,100 like we used in the previous video, 45 00:02:56,100 --> 00:03:01,740 we want the aesthetic to have the group equal to Var1, 46 00:03:01,740 --> 00:03:03,790 which is the day of the week. 47 00:03:03,790 --> 00:03:04,970 Go ahead and hit Enter. 48 00:03:04,970 --> 00:03:09,570 You should see a new plot show up in the graphics window. 49 00:03:09,570 --> 00:03:13,860 It has seven lines, one for each day of the week. 50 00:03:13,860 --> 00:03:15,670 While this is interesting, we can't 51 00:03:15,670 --> 00:03:18,280 tell which line is which day, so let's change 52 00:03:18,280 --> 00:03:20,570 the colors of the lines to correspond 53 00:03:20,570 --> 00:03:22,410 to the days of the week. 54 00:03:22,410 --> 00:03:26,320 To do that, just scroll up in your R console, 55 00:03:26,320 --> 00:03:32,630 and after group = Var1, add color = Var1. 56 00:03:32,630 --> 00:03:34,370 This will make the colors of the lines 57 00:03:34,370 --> 00:03:37,050 correspond to the day of the week. 58 00:03:37,050 --> 00:03:40,040 After that parenthesis, go ahead and type comma, 59 00:03:40,040 --> 00:03:41,820 and then size = 2. 60 00:03:41,820 --> 00:03:43,410 We'll make our lines a little thicker. 61 00:03:46,000 --> 00:03:49,579 Now in our plot, each line is colored corresponding 62 00:03:49,579 --> 00:03:51,600 to the day of the week. 63 00:03:51,600 --> 00:03:54,290 This helps us see that on Saturday and Sunday, 64 00:03:54,290 --> 00:03:57,790 for example, the green and the teal lines, 65 00:03:57,790 --> 00:04:01,570 there's less motor vehicle thefts in the morning. 66 00:04:01,570 --> 00:04:04,630 While we can get some information from this plot, 67 00:04:04,630 --> 00:04:06,870 it's still quite hard to interpret. 68 00:04:06,870 --> 00:04:09,020 Seven lines is a lot. 69 00:04:09,020 --> 00:04:14,170 Let's instead visualize the same information with a heat map. 70 00:04:14,170 --> 00:04:16,519 To make a heat map, we'll use our data 71 00:04:16,519 --> 00:04:19,230 in our data frame DayHourCounts. 72 00:04:19,230 --> 00:04:23,050 First, though, we need to fix the order of the days 73 00:04:23,050 --> 00:04:26,140 so that they'll show up in chronological order 74 00:04:26,140 --> 00:04:28,240 instead of in alphabetical order. 75 00:04:28,240 --> 00:04:31,620 We'll do the same thing we did in the previous video. 76 00:04:31,620 --> 00:04:37,110 So for DayHourCounts$Var1, which is the day of the week, 77 00:04:37,110 --> 00:04:41,090 we're going to use the factor function where the first 78 00:04:41,090 --> 00:04:46,850 argument is our variable, DayHourCounts$Var1, 79 00:04:46,850 --> 00:04:51,210 the second argument is ordered = TRUE, 80 00:04:51,210 --> 00:04:54,000 and the third argument is the order we want the days 81 00:04:54,000 --> 00:04:55,760 of the week to show up in. 82 00:04:55,760 --> 00:05:00,660 So we'll set levels, equals, and then c, 83 00:05:00,660 --> 00:05:02,800 and then list your days of the week. 84 00:05:02,800 --> 00:05:06,440 Let's put the weekdays first and the weekends at the end. 85 00:05:06,440 --> 00:05:11,290 So we'll start with Monday, and then Tuesday, then 86 00:05:11,290 --> 00:05:22,600 Wednesday, then Thursday, Friday, Saturday and Sunday. 87 00:05:26,450 --> 00:05:28,490 Now let's make our heat map. 88 00:05:28,490 --> 00:05:32,280 We'll use the ggplot function like we always do, 89 00:05:32,280 --> 00:05:36,570 and give our data frame name, DayHourCounts. 90 00:05:36,570 --> 00:05:39,980 Then in our aesthetic, we want the x-axis 91 00:05:39,980 --> 00:05:43,409 to be the hour of the day, and the y-axis 92 00:05:43,409 --> 00:05:46,860 to be the day of the week, which is Var1. 93 00:05:46,860 --> 00:05:48,680 Then we're going to add geom_tile. 94 00:05:51,230 --> 00:05:54,210 This is the function we use to make a heat map. 95 00:05:54,210 --> 00:05:57,040 And then in the aesthetic for our tiles, 96 00:05:57,040 --> 00:06:00,930 we want the fill to be equal to Freq. 97 00:06:00,930 --> 00:06:05,000 This will define the colors of the rectangles in our heat map 98 00:06:05,000 --> 00:06:08,850 to correspond to the total crime. 99 00:06:08,850 --> 00:06:12,530 You should see a heat map pop up in your graphics window. 100 00:06:12,530 --> 00:06:14,500 So how do we read this? 101 00:06:14,500 --> 00:06:17,440 For each hour and each day of the week, 102 00:06:17,440 --> 00:06:20,150 we have a rectangle in our heat map. 103 00:06:20,150 --> 00:06:23,450 The color of that rectangle indicates the frequency, 104 00:06:23,450 --> 00:06:26,420 or the number of crimes that occur in that hour 105 00:06:26,420 --> 00:06:28,070 and on that day. 106 00:06:28,070 --> 00:06:31,120 Our legend tells us that lighter colors 107 00:06:31,120 --> 00:06:33,680 correspond to more crime. 108 00:06:33,680 --> 00:06:36,250 So we can see that a lot of crime 109 00:06:36,250 --> 00:06:41,720 happens around midnight, particularly on the weekends. 110 00:06:41,720 --> 00:06:45,090 We can change the label on the legend, 111 00:06:45,090 --> 00:06:49,510 and get rid of the y label to make our plot a little nicer. 112 00:06:49,510 --> 00:06:52,930 We can do this by just scrolling up to our previous command 113 00:06:52,930 --> 00:06:56,659 in our R console and then adding scale_fill_gradient. 114 00:07:02,180 --> 00:07:04,960 This defines properties of the legend, 115 00:07:04,960 --> 00:07:11,400 and we want name = "Total MV Thefts", 116 00:07:11,400 --> 00:07:12,930 for total motor vehicle thefts. 117 00:07:15,930 --> 00:07:17,820 Then let's add, in the theme(axis.title.y = 118 00:07:17,820 --> 00:07:18,530 element_blank()). 119 00:07:28,200 --> 00:07:29,660 This is what you can do if you want 120 00:07:29,660 --> 00:07:33,100 to get rid of one of the axis labels. 121 00:07:33,100 --> 00:07:35,090 Go ahead and hit Enter. 122 00:07:35,090 --> 00:07:37,130 And now on our heat map, the legend 123 00:07:37,130 --> 00:07:43,010 is titled "Total MV Thefts" and the y-axis label is gone. 124 00:07:43,010 --> 00:07:45,880 We can also change the color scheme. 125 00:07:45,880 --> 00:07:49,360 We can do this by scrolling up in our R console, 126 00:07:49,360 --> 00:07:52,490 and going to that scale_fill_gradient function, 127 00:07:52,490 --> 00:07:55,610 the one that defines properties of our legend, 128 00:07:55,610 --> 00:08:00,890 and after name = "Total MV Thefts", 129 00:08:00,890 --> 00:08:08,230 low = "white", high = "red". 130 00:08:08,230 --> 00:08:10,510 We'll make lower values correspond 131 00:08:10,510 --> 00:08:13,020 to white colors and higher values 132 00:08:13,020 --> 00:08:15,060 correspond to red colors. 133 00:08:15,060 --> 00:08:18,350 If you hit enter, a new plot should show up 134 00:08:18,350 --> 00:08:20,610 with different colors. 135 00:08:20,610 --> 00:08:23,340 This is a common color scheme in policing. 136 00:08:23,340 --> 00:08:27,950 It shows the hot spots, or the places with more crime, in red. 137 00:08:27,950 --> 00:08:31,480 So now the most crime is shown by the red spots 138 00:08:31,480 --> 00:08:35,280 and the least crime is shown by the lighter areas. 139 00:08:35,280 --> 00:08:38,570 It looks like Friday night is a pretty common time 140 00:08:38,570 --> 00:08:40,220 for motor vehicle thefts. 141 00:08:40,220 --> 00:08:43,200 We saw something that we didn't really see in the heat map 142 00:08:43,200 --> 00:08:44,750 before. 143 00:08:44,750 --> 00:08:48,010 It's often useful to change the color scheme depending 144 00:08:48,010 --> 00:08:50,610 on whether you want high values or low values 145 00:08:50,610 --> 00:08:55,340 to pop out, and the feeling you want the plot to portray. 146 00:08:55,340 --> 00:08:59,710 In this video, we've seen how to create some new types of plots. 147 00:08:59,710 --> 00:09:02,940 In the next video, we'll see how to add data 148 00:09:02,940 --> 00:09:05,470 to geographical maps.