1 00:00:04,500 --> 00:00:07,750 So far, all of our output values and variables 2 00:00:07,750 --> 00:00:09,990 have been single numbers. 3 00:00:09,990 --> 00:00:13,790 You can also create more advanced data structures in R 4 00:00:13,790 --> 00:00:16,860 like vectors and data frames. 5 00:00:16,860 --> 00:00:20,790 A vector is a series of numbers or characters 6 00:00:20,790 --> 00:00:23,390 stored as the same object. 7 00:00:23,390 --> 00:00:27,540 You can create a vector in R using the c function, which 8 00:00:27,540 --> 00:00:29,990 stands for combine. 9 00:00:29,990 --> 00:00:46,090 In your R console, type c(2,3,5,8,13) and hit Enter. 10 00:00:46,090 --> 00:00:49,960 This creates a vector of five numbers all stored 11 00:00:49,960 --> 00:00:52,610 as the same object. 12 00:00:52,610 --> 00:00:55,040 To learn more about vectors in R, 13 00:00:55,040 --> 00:00:59,000 let's enter some data into R about the life expectancies 14 00:00:59,000 --> 00:01:01,010 in different countries. 15 00:01:01,010 --> 00:01:04,420 We'll first create a vector of country names called 16 00:01:04,420 --> 00:01:07,740 Country using the c function. 17 00:01:07,740 --> 00:01:10,720 We'll put each country name in quotes 18 00:01:10,720 --> 00:01:14,680 since we are typing characters not numbers. 19 00:01:14,680 --> 00:01:19,970 So in your R console, type Country 20 00:01:19,970 --> 00:01:26,240 equals and then c for the combine function, parentheses 21 00:01:26,240 --> 00:01:44,880 and then "Brazil", "China", "India", "Switzerland", 22 00:01:44,880 --> 00:01:47,560 and then "USA". 23 00:01:47,560 --> 00:01:51,009 Close the parentheses and hit Enter. 24 00:01:51,009 --> 00:01:53,890 Now, let's create a second vector. 25 00:01:53,890 --> 00:01:55,890 This time with the life expectancies 26 00:01:55,890 --> 00:01:59,390 of these five countries in the same order 27 00:01:59,390 --> 00:02:01,800 that we entered the country names. 28 00:02:01,800 --> 00:02:08,610 We'll call this one LifeExpectancy = 29 00:02:08,610 --> 00:02:26,040 c(74,76,65,83,79) and hit Enter. 30 00:02:26,040 --> 00:02:33,160 Now if you take a look at both Country and LifeExpectancy, 31 00:02:33,160 --> 00:02:36,960 you can see that Country has five character values 32 00:02:36,960 --> 00:02:41,970 and LifeExpectancy has five numerical values. 33 00:02:41,970 --> 00:02:44,020 A word of warning-- you shouldn't 34 00:02:44,020 --> 00:02:48,420 try to combine characters and numbers in the same vector. 35 00:02:48,420 --> 00:02:52,060 R only allows one data type in each vector 36 00:02:52,060 --> 00:02:55,670 so all of the numbers will be converted to characters. 37 00:02:55,670 --> 00:02:57,520 This is bad because it won't allow 38 00:02:57,520 --> 00:03:01,120 us to do any numerical calculations with the numbers, 39 00:03:01,120 --> 00:03:03,780 like compute the mean. 40 00:03:03,780 --> 00:03:06,750 If you want to display an element of a vector, 41 00:03:06,750 --> 00:03:08,380 use square brackets. 42 00:03:08,380 --> 00:03:15,160 For example, we could type CountryCountry[1] to get the first 43 00:03:15,160 --> 00:03:18,480 element of the Country vector, Brazil. 44 00:03:18,480 --> 00:03:25,160 Or we could type LifeExpectancyLifeExpectancy[3] to get 45 00:03:25,160 --> 00:03:29,180 the third element of LifeExpectancy, 65, 46 00:03:29,180 --> 00:03:32,800 corresponding to India. 47 00:03:32,800 --> 00:03:36,100 Another nice function to create vectors in R 48 00:03:36,100 --> 00:03:41,110 is the seq function which creates a sequence of numbers. 49 00:03:41,110 --> 00:03:50,290 Try typing seq(0,100,2). 50 00:03:50,290 --> 00:03:53,520 Close the parentheses and hit Enter. 51 00:03:53,520 --> 00:03:57,980 This gives a sequence of numbers from 0 to 100, 52 00:03:57,980 --> 00:04:02,510 starting at zero, the first argument to the seq function, 53 00:04:02,510 --> 00:04:07,340 ending at 100, the second argument to the seq function, 54 00:04:07,340 --> 00:04:10,080 and the numbers are in increments of two, 55 00:04:10,080 --> 00:04:13,340 the third argument to the seq function. 56 00:04:13,340 --> 00:04:15,750 This can be useful if you want to create 57 00:04:15,750 --> 00:04:20,200 a unique identifier for observations. 58 00:04:20,200 --> 00:04:22,950 Now, let's combine our vectors into what 59 00:04:22,950 --> 00:04:27,550 we call a data frame in R. This will be an important data 60 00:04:27,550 --> 00:04:29,940 structure for us because all of the data 61 00:04:29,940 --> 00:04:32,280 files we'll work with in this course 62 00:04:32,280 --> 00:04:34,810 will be loaded as data frames. 63 00:04:34,810 --> 00:04:39,500 Additionally, many algorithms in R require all of the data 64 00:04:39,500 --> 00:04:43,540 to be in a single object like a data frame. 65 00:04:43,540 --> 00:04:48,280 We'll call our data frame CountryData 66 00:04:48,280 --> 00:04:52,580 and then use the data.frame function 67 00:04:52,580 --> 00:04:56,909 to combine Country and LifeExpectancy. 68 00:04:56,909 --> 00:05:01,710 So after typing CountryData = data.frame, 69 00:05:01,710 --> 00:05:06,520 in parentheses type Country comma LifeExpectancy. 70 00:05:09,800 --> 00:05:12,760 Then close the parentheses and hit Enter. 71 00:05:12,760 --> 00:05:16,740 Let's take a look at CountryData by typing CountryData 72 00:05:16,740 --> 00:05:18,470 and hitting Enter. 73 00:05:18,470 --> 00:05:21,370 It has two columns, our variables, 74 00:05:21,370 --> 00:05:24,630 and five rows, our observations. 75 00:05:24,630 --> 00:05:27,070 It's similar to how you might organize data 76 00:05:27,070 --> 00:05:31,140 in a spreadsheet software like Excel. 77 00:05:31,140 --> 00:05:34,380 Let's say we now want to add another variable to our data 78 00:05:34,380 --> 00:05:39,110 frame-- the population in thousands of each country. 79 00:05:39,110 --> 00:05:41,659 We can do this by using a dollar sign 80 00:05:41,659 --> 00:05:44,870 to link the new data into the data frame. 81 00:05:44,870 --> 00:05:49,409 So we'll type the name of our data frame, CountryData, 82 00:05:49,409 --> 00:05:52,260 and then a dollar sign followed by the name 83 00:05:52,260 --> 00:05:55,590 of the new variable we want to add, Population. 84 00:05:58,390 --> 00:06:02,700 Then we'll set this equal to c, and in parentheses 85 00:06:02,700 --> 00:06:06,210 the population in thousands of each country. 86 00:06:06,210 --> 00:06:29,650 So 199000, 1390000, 1240000, 7997, and 318000. 87 00:06:29,650 --> 00:06:32,640 Close the parentheses and hit Enter. 88 00:06:32,640 --> 00:06:36,210 Now if you take a look at CountryData 89 00:06:36,210 --> 00:06:40,690 you should see that we have a third column, Population. 90 00:06:40,690 --> 00:06:43,590 We'll use this dollar sign notation a lot 91 00:06:43,590 --> 00:06:46,860 and we'll talk about it more later in this lecture. 92 00:06:46,860 --> 00:06:49,320 Note that we had to give the population values 93 00:06:49,320 --> 00:06:51,090 in the correct order. 94 00:06:51,090 --> 00:06:56,030 R will just combine the vectors in the order they're typed. 95 00:06:56,030 --> 00:07:00,760 Now, suppose we want to add two new observations for Australia 96 00:07:00,760 --> 00:07:02,290 and Greece. 97 00:07:02,290 --> 00:07:06,240 We first need to create new Country, LifeExpectancy 98 00:07:06,240 --> 00:07:08,470 and Population vectors. 99 00:07:08,470 --> 00:07:11,850 So we will now set Country equal to the names 100 00:07:11,850 --> 00:07:21,050 of the new countries, Australia and Greece. 101 00:07:25,790 --> 00:07:31,770 We'll set LifeExpectancy equal to the new life expectancies, 102 00:07:31,770 --> 00:07:39,640 82 and 81, and we'll call Population 103 00:07:39,640 --> 00:07:44,950 a vector of the populations of these countries 23050 104 00:07:44,950 --> 00:07:48,840 and 11125. 105 00:07:48,840 --> 00:07:51,770 Then we can create a new data frame. 106 00:07:51,770 --> 00:07:56,110 We'll call this one NewCountryData and we'll set it 107 00:07:56,110 --> 00:07:58,490 equal to data.frame(Country, LifeExpectancy, Population). 108 00:08:15,170 --> 00:08:17,530 Note that we combined three vectors here 109 00:08:17,530 --> 00:08:20,100 with the data.frame function. 110 00:08:20,100 --> 00:08:25,770 If we take a look at NewCountryData, 111 00:08:25,770 --> 00:08:30,320 we can see that it's very similar to CountryData. 112 00:08:30,320 --> 00:08:34,570 Lastly, let's combine CountryData and NewCountryData 113 00:08:34,570 --> 00:08:37,750 with the rbind function which combines 114 00:08:37,750 --> 00:08:40,700 data frames by stacking the rows. 115 00:08:40,700 --> 00:08:46,580 We'll call this new data frame AllCountryData 116 00:08:46,580 --> 00:08:50,720 and we'll use the rbind function to combine 117 00:08:50,720 --> 00:08:54,050 CountryData, and NewCountryData. 118 00:09:00,320 --> 00:09:05,360 If we take a look at AllCountryData 119 00:09:05,360 --> 00:09:09,570 we can see that it has the three variables from before 120 00:09:09,570 --> 00:09:12,950 and all seven observations. 121 00:09:12,950 --> 00:09:15,220 The functions we've used in this video 122 00:09:15,220 --> 00:09:17,640 have allowed us to create data structures 123 00:09:17,640 --> 00:09:21,950 and modify data structures in R. But most of the time 124 00:09:21,950 --> 00:09:24,970 you'll be reading in data from an external file, which 125 00:09:24,970 --> 00:09:27,550 we'll do in the next video.