1 00:00:09,490 --> 00:00:13,140 In this video, we're going to look at scales. 2 00:00:13,140 --> 00:00:15,260 This first plot shows the average height 3 00:00:15,260 --> 00:00:18,980 of a 21-year-old male in centimeters. 4 00:00:18,980 --> 00:00:25,300 The x-axis is time, starting in 1871, and ending in 1975. 5 00:00:25,300 --> 00:00:27,270 Each person represents the height, 6 00:00:27,270 --> 00:00:30,320 at a different point in time, and the points are evenly 7 00:00:30,320 --> 00:00:34,530 spaced in time, so the x-axis is OK. 8 00:00:34,530 --> 00:00:39,820 The y-axis ranges from just under 160 to 180 centimeters, 9 00:00:39,820 --> 00:00:44,030 which isn't inherently bad, but does overstate the change. 10 00:00:44,030 --> 00:00:46,210 The real problem is the bars. 11 00:00:46,210 --> 00:00:47,590 If it was accurate, we would only 12 00:00:47,590 --> 00:00:49,940 really see the heads of the men, but instead we 13 00:00:49,940 --> 00:00:52,770 see their whole bodies, making it seem as if people have not 14 00:00:52,770 --> 00:00:55,320 only doubled in height, but they've also double in width. 15 00:00:59,440 --> 00:01:02,590 This next plot also has issues with scale. 16 00:01:02,590 --> 00:01:05,850 The total range of the plot is 8% to 10%, 17 00:01:05,850 --> 00:01:11,360 although all the numbers fall in the range of 8.6% to 9.2%. 18 00:01:11,360 --> 00:01:14,490 If we plotted the y-axis on a 0% to 10% scale, 19 00:01:14,490 --> 00:01:16,520 the conclusion would be that nothing is really 20 00:01:16,520 --> 00:01:18,520 changing at all. 21 00:01:18,520 --> 00:01:21,190 The last point in the chart is at the wrong height, 22 00:01:21,190 --> 00:01:23,810 and the size of the markers makes the relative locations 23 00:01:23,810 --> 00:01:25,860 hard to distinguish. 24 00:01:25,860 --> 00:01:31,770 Also notice that the gap between 9.0% and the 8.9% markers 25 00:01:31,770 --> 00:01:36,820 on the far left side, and the 8.9% and 8.8% markers, 26 00:01:36,820 --> 00:01:38,789 have a different gap. 27 00:01:38,789 --> 00:01:41,550 This plot shows the relative breakdown of teachers 28 00:01:41,550 --> 00:01:44,430 by race in a certain teaching program. 29 00:01:44,430 --> 00:01:47,560 The Caucasian bar is truncated, which is a risky choice, 30 00:01:47,560 --> 00:01:50,759 but could be appropriate in some situations. 31 00:01:50,759 --> 00:01:53,620 A much bigger problem is that the scale of each blue bar 32 00:01:53,620 --> 00:01:55,500 is entirely different. 33 00:01:55,500 --> 00:01:58,450 For example, the Native American bar is about a third 34 00:01:58,450 --> 00:02:00,770 of the length of the African American bar, 35 00:02:00,770 --> 00:02:03,680 but there are more than 10 times as many African Americans 36 00:02:03,680 --> 00:02:06,170 in this program as Native Americans. 37 00:02:06,170 --> 00:02:09,860 In fact, visually, this plot is completely meaningless. 38 00:02:09,860 --> 00:02:11,960 The only useful thing about it is the numbers. 39 00:02:11,960 --> 00:02:14,750 But even there, there is a bit of confusion, 40 00:02:14,750 --> 00:02:18,020 as Native Americans are given to one decimal place, 41 00:02:18,020 --> 00:02:19,470 but the others are rounded. 42 00:02:19,470 --> 00:02:21,820 Which when combined with the confusing scales, 43 00:02:21,820 --> 00:02:23,800 casts doubt on the correctness of the numbers. 44 00:02:27,950 --> 00:02:31,300 Here is a before and after of the same data. 45 00:02:31,300 --> 00:02:33,930 On the left, we see the US military expense 46 00:02:33,930 --> 00:02:37,530 in the right axis, and troop count on the left axis. 47 00:02:37,530 --> 00:02:40,890 Both the line and bar plots are individually OK, 48 00:02:40,890 --> 00:02:43,280 but the combination is misleading. 49 00:02:43,280 --> 00:02:46,390 Because you have mixed two units, dollars and people, 50 00:02:46,390 --> 00:02:48,060 there is a false impression of some sort 51 00:02:48,060 --> 00:02:52,850 of crossover point in 1995 that does not exist. 52 00:02:52,850 --> 00:02:56,650 On the right is the same data presented in a different way. 53 00:02:56,650 --> 00:02:58,430 We now have troops on the x-axis, 54 00:02:58,430 --> 00:03:00,710 and dollars on the y-axis. 55 00:03:00,710 --> 00:03:02,740 The line moves through time now, allowing 56 00:03:02,740 --> 00:03:05,990 us to see when moments of change occurred, such as decreases 57 00:03:05,990 --> 00:03:09,510 in troop count, through the 90s, at the end of the Cold War, 58 00:03:09,510 --> 00:03:11,690 the increase in spending of the 2000s, 59 00:03:11,690 --> 00:03:14,850 and the recent decreases in military spending. 60 00:03:14,850 --> 00:03:17,500 The final visualization I want to show you today 61 00:03:17,500 --> 00:03:20,280 is all about the different types of household. 62 00:03:20,280 --> 00:03:22,280 The US Census Bureau periodically 63 00:03:22,280 --> 00:03:25,900 determines how many households are comprised, for example, 64 00:03:25,900 --> 00:03:28,560 of married couples with and without children, 65 00:03:28,560 --> 00:03:32,320 people living alone, and so on. 66 00:03:32,320 --> 00:03:35,730 First of all, I'm not saying this is a bad visualization. 67 00:03:35,730 --> 00:03:38,280 In fact, if we are interested in the relative share 68 00:03:38,280 --> 00:03:41,370 of each type of household in a particular year, 69 00:03:41,370 --> 00:03:43,620 it's actually pretty good. 70 00:03:43,620 --> 00:03:45,630 However, if what we're interested in 71 00:03:45,630 --> 00:03:47,940 is the rates of change across the years, 72 00:03:47,940 --> 00:03:50,010 this is next to useless. 73 00:03:50,010 --> 00:03:53,790 The key problem is that the x-axis is completely off. 74 00:03:53,790 --> 00:03:56,690 The gap between the first two columns is 10 years, 75 00:03:56,690 --> 00:03:58,590 but the gap between the last two columns 76 00:03:58,590 --> 00:04:01,090 is only 2 years, meaning that the rates are 77 00:04:01,090 --> 00:04:03,150 hard to read from this. 78 00:04:03,150 --> 00:04:05,510 If we're not interested in the rates of changes, 79 00:04:05,510 --> 00:04:09,920 but just want to compare two years at a time, it's not bad, 80 00:04:09,920 --> 00:04:12,070 but it's not easy either. 81 00:04:12,070 --> 00:04:15,170 Try comparing 1970 married without children 82 00:04:15,170 --> 00:04:18,100 to 2010 married without children, 83 00:04:18,100 --> 00:04:20,250 without looking at the numbers. 84 00:04:20,250 --> 00:04:23,100 Can you tell if it has grown or shrunk? 85 00:04:23,100 --> 00:04:25,490 Finally, and more generally, this chart 86 00:04:25,490 --> 00:04:27,500 shows relative numbers. 87 00:04:27,500 --> 00:04:29,380 If you look at absolute numbers, we 88 00:04:29,380 --> 00:04:32,530 might find the total number of couples married with children 89 00:04:32,530 --> 00:04:35,560 is actually constant, but the number of other households 90 00:04:35,560 --> 00:04:38,350 has increased. 91 00:04:38,350 --> 00:04:40,140 We are now going to change into R 92 00:04:40,140 --> 00:04:43,350 to try plotting this data as a line chart.