1 00:00:04,490 --> 00:00:09,030 In this video, we'll plot crime on a map of Chicago. 2 00:00:09,030 --> 00:00:13,790 First, we need to install and load two new packages, the maps 3 00:00:13,790 --> 00:00:16,700 package and the ggmap package. 4 00:00:16,700 --> 00:00:20,300 So start by installing the package maps. 5 00:00:20,300 --> 00:00:21,680 So type install.packages("maps"). 6 00:00:26,810 --> 00:00:29,150 When the CRAN mirror window pops up, 7 00:00:29,150 --> 00:00:31,160 go ahead and pick a location near you. 8 00:00:39,920 --> 00:00:42,260 When the package is done installing and you're back 9 00:00:42,260 --> 00:00:45,130 at the blinking cursor, also type install.packages("ggmap"). 10 00:00:53,530 --> 00:00:55,890 When that package is also done installing, 11 00:00:55,890 --> 00:00:59,150 load both packages using the library command. 12 00:00:59,150 --> 00:01:02,500 So type library(maps), and then library(ggmap). 13 00:01:05,680 --> 00:01:09,150 Now, let's load a map of Chicago into R. 14 00:01:09,150 --> 00:01:11,960 We can easily do this by using the get_map function. 15 00:01:11,960 --> 00:01:14,300 So we'll call it chicago = get_map(location = "chicago", 16 00:01:14,300 --> 00:01:14,800 zoom = 11). 17 00:01:30,230 --> 00:01:33,900 Let's take a look at the map by using the ggmap function. 18 00:01:37,570 --> 00:01:39,720 Now, in your R graphics window, you 19 00:01:39,720 --> 00:01:44,350 should see a geographical map of the city of Chicago. 20 00:01:44,350 --> 00:01:47,250 Now let's plot the first 100 motor vehicle 21 00:01:47,250 --> 00:01:50,130 thefts in our data set on this map. 22 00:01:50,130 --> 00:01:53,610 To do this, we start by typing ggmap(chicago). 23 00:01:56,570 --> 00:01:59,030 This is instead of using ggplot like we've 24 00:01:59,030 --> 00:02:01,650 been using in the previous videos. 25 00:02:01,650 --> 00:02:06,780 Then we want to add geom_point, and here, we'll 26 00:02:06,780 --> 00:02:10,880 define our data set to be equal to motor vehicle thefts, where 27 00:02:10,880 --> 00:02:17,130 we'll take the first through 100th observations, 28 00:02:17,130 --> 00:02:20,700 and in our aesthetic, we'll define our x-axis 29 00:02:20,700 --> 00:02:24,270 to be the longitude of the points and our y-axis 30 00:02:24,270 --> 00:02:25,700 to be the latitude of the points. 31 00:02:29,740 --> 00:02:31,560 Now, in your R graphics window, you 32 00:02:31,560 --> 00:02:35,530 should see the map of Chicago with black points marking where 33 00:02:35,530 --> 00:02:39,150 the first 100 motor vehicle thefts were. 34 00:02:39,150 --> 00:02:43,140 If we plotted all 190,000 motor vehicle thefts, 35 00:02:43,140 --> 00:02:45,130 we would just see a big black box, 36 00:02:45,130 --> 00:02:47,070 which wouldn't be helpful at all. 37 00:02:47,070 --> 00:02:49,170 We're more interested in whether or not 38 00:02:49,170 --> 00:02:51,860 an area has a high amount of crime, 39 00:02:51,860 --> 00:02:54,210 so let's round our latitude and longitude 40 00:02:54,210 --> 00:02:58,579 to two digits of accuracy and create a crime counts data 41 00:02:58,579 --> 00:03:00,460 frame for each area. 42 00:03:00,460 --> 00:03:09,510 We'll call it LatLonCounts, and use the as.data.frame function 43 00:03:09,510 --> 00:03:14,010 run on the table that compares the latitude and longitude 44 00:03:14,010 --> 00:03:16,850 rounded to two digits of accuracy. 45 00:03:16,850 --> 00:03:19,270 So our first argument to table is round(mvt$Longitude, 2). 46 00:03:27,880 --> 00:03:30,110 And our second argument is round(mvt$Latitude, 2). 47 00:03:41,470 --> 00:03:45,970 This gives us the total crimes at every point on a grid. 48 00:03:45,970 --> 00:03:49,010 Let's take a look at our data frame using the str function. 49 00:03:53,500 --> 00:03:58,829 We have 1,638 observations and three variables. 50 00:03:58,829 --> 00:04:02,280 The first two variables, Var1 and Var2, 51 00:04:02,280 --> 00:04:04,910 are the latitude and longitude coordinates, 52 00:04:04,910 --> 00:04:08,150 and the third variable is the number of motor vehicle thefts 53 00:04:08,150 --> 00:04:11,010 that occur in that area. 54 00:04:11,010 --> 00:04:15,300 Let's convert our longitude and latitude variables to numbers 55 00:04:15,300 --> 00:04:17,550 and call them Lat and Long. 56 00:04:17,550 --> 00:04:22,930 So first, we'll define the variable in our LatLonCounts 57 00:04:22,930 --> 00:04:30,090 data frame, called Long, and set that equal to as.numeric, run 58 00:04:30,090 --> 00:04:33,050 on as.character. 59 00:04:33,050 --> 00:04:36,040 Remember, this is how we convert a factor variable 60 00:04:36,040 --> 00:04:38,250 to a numerical variable. 61 00:04:38,250 --> 00:04:40,490 And we'll give the variable, LatLonCounts$Var1. 62 00:04:45,390 --> 00:04:47,800 Now let's just repeat this for latitude. 63 00:04:47,800 --> 00:04:50,590 So LatLonCounts$Lat = as.numeric(as.character(LatLonCounts$Var2)). 64 00:05:09,310 --> 00:05:11,860 Now, let's plot these points on our map, 65 00:05:11,860 --> 00:05:14,690 making the size and color of the points 66 00:05:14,690 --> 00:05:17,980 depend on the total number of motor vehicle thefts. 67 00:05:17,980 --> 00:05:26,530 So first, again we type ggmap(chicago) + 68 00:05:26,530 --> 00:05:44,510 geom_point(LatLonCounts, aes(x = Long, y = Lat, color = Freq, 69 00:05:44,510 --> 00:05:46,040 size = Freq)). 70 00:05:49,360 --> 00:05:52,340 Now, in our R graphics window, our plot 71 00:05:52,340 --> 00:05:56,810 should have a point for every area defined by our latitude 72 00:05:56,810 --> 00:05:59,770 and longitude areas, and the points 73 00:05:59,770 --> 00:06:02,400 have a size and color corresponding 74 00:06:02,400 --> 00:06:05,060 to the number of crimes in that area. 75 00:06:05,060 --> 00:06:08,620 So we can see that the lighter and larger points correspond 76 00:06:08,620 --> 00:06:10,920 to more motor vehicle thefts. 77 00:06:10,920 --> 00:06:15,590 This helps us see where in Chicago more crimes occur. 78 00:06:15,590 --> 00:06:17,780 If we want to change the color scheme, 79 00:06:17,780 --> 00:06:21,400 we can do that too by just hitting the up arrow in our R 80 00:06:21,400 --> 00:06:23,990 console and then adding scale_color_gradient(low="yellow", 81 00:06:23,990 --> 00:06:24,490 high="red"). 82 00:06:39,040 --> 00:06:42,560 If you hit Enter, you should see the same plot as before, 83 00:06:42,560 --> 00:06:47,600 but this time, the areas with more crime are closer to red 84 00:06:47,600 --> 00:06:52,230 and the areas with less crime are closer to yellow. 85 00:06:52,230 --> 00:06:56,280 We can also use geom_tile to make something that looks more 86 00:06:56,280 --> 00:06:58,580 like a traditional heat map. 87 00:06:58,580 --> 00:07:03,400 To do this, we type ggmap(chicago), just 88 00:07:03,400 --> 00:07:09,100 like before, but now we're going to use geom_tile, where 89 00:07:09,100 --> 00:07:11,120 our data frame again is LatLonCounts. 90 00:07:13,790 --> 00:07:18,040 And in our aesthetic, we have that the x-axis is Long, 91 00:07:18,040 --> 00:07:22,180 the y-axis is Lat, and then we have alpha=Freq. 92 00:07:24,880 --> 00:07:29,350 This will define how to scale the colors on the heat map 93 00:07:29,350 --> 00:07:31,710 according to the crime counts. 94 00:07:31,710 --> 00:07:34,620 Then close the parentheses and type a comma, 95 00:07:34,620 --> 00:07:40,210 and then type fill="red", defining our color scheme. 96 00:07:40,210 --> 00:07:43,790 Close the parentheses and hit Enter. 97 00:07:43,790 --> 00:07:45,650 This map takes a minute to load. 98 00:07:45,650 --> 00:07:47,810 While we're waiting, let's discuss 99 00:07:47,810 --> 00:07:49,890 what we've done in this video. 100 00:07:49,890 --> 00:07:54,610 We've created a geographical heat map, which in our case 101 00:07:54,610 --> 00:07:57,400 shows a visualization of the data, 102 00:07:57,400 --> 00:08:01,620 but it could also show the predictions of a model. 103 00:08:01,620 --> 00:08:04,540 Now that our heat map is loaded, let's take a look. 104 00:08:04,540 --> 00:08:08,420 In each area of Chicago, now that area 105 00:08:08,420 --> 00:08:12,140 is colored in red by the amount of crime there. 106 00:08:12,140 --> 00:08:14,490 This looks more like a map that people 107 00:08:14,490 --> 00:08:17,310 use for predictive policing. 108 00:08:17,310 --> 00:08:21,050 In the next video, we'll use data from the FBI 109 00:08:21,050 --> 00:08:25,230 to make a heat map on a map of the United States.