1 00:00:09,930 --> 00:00:14,480 In this lecture, we'll discuss the idea of using visualization 2 00:00:14,480 --> 00:00:17,970 to better understand data and to provide insights 3 00:00:17,970 --> 00:00:21,240 on the problem we're addressing. 4 00:00:21,240 --> 00:00:22,870 Why visualization? 5 00:00:22,870 --> 00:00:26,450 People often say that a picture is like a thousand words. 6 00:00:26,450 --> 00:00:30,120 In the same spirit, John Tukey, a major statistician 7 00:00:30,120 --> 00:00:34,290 at Princeton, wrote that "the picture-examining eye 8 00:00:34,290 --> 00:00:40,610 is the best finder we have of the wholly unanticipated." 9 00:00:40,610 --> 00:00:44,170 Visualizing data allows us to discern relationships, 10 00:00:44,170 --> 00:00:48,940 structures, distributions, outliers, patterns, behaviors, 11 00:00:48,940 --> 00:00:51,260 dependencies, and outcomes. 12 00:00:51,260 --> 00:00:55,980 Visualization is further useful for initial data exploration, 13 00:00:55,980 --> 00:00:59,200 for interpreting models, and for communicating results 14 00:00:59,200 --> 00:01:01,640 effectively. 15 00:01:01,640 --> 00:01:04,440 Let us give some examples of different modes 16 00:01:04,440 --> 00:01:07,380 of visualization that illustrate these points. 17 00:01:07,380 --> 00:01:09,960 The figure shows the miles per gallon 18 00:01:09,960 --> 00:01:13,090 of a car as a function of the car's weight. 19 00:01:13,090 --> 00:01:16,320 The figure clearly illustrates that as the weight of the car 20 00:01:16,320 --> 00:01:21,220 increases, the miles per gallon decrease. 21 00:01:21,220 --> 00:01:26,250 The same graph, but now colors of the points 22 00:01:26,250 --> 00:01:30,340 signify the number of cylinders in the car: four for red, 23 00:01:30,340 --> 00:01:34,759 six for green, and eight in blue. 24 00:01:39,160 --> 00:01:42,450 On the same data, we now plot a regression line 25 00:01:42,450 --> 00:01:45,670 that captures the intuition that as the weight of the car 26 00:01:45,670 --> 00:01:50,770 increases, the miles per gallon decrease. 27 00:01:50,770 --> 00:01:53,190 In this plot, we'll visualize burglaries 28 00:01:53,190 --> 00:01:56,770 in the city of Houston by combining data and geographical 29 00:01:56,770 --> 00:01:59,950 location in a map. 30 00:01:59,950 --> 00:02:03,210 This plot illustrates, using a heat map, 31 00:02:03,210 --> 00:02:07,500 the usage of rented bicycles from the Hubway company. 32 00:02:07,500 --> 00:02:10,460 The horizontal axis is the hour of the day, 33 00:02:10,460 --> 00:02:15,500 and the vertical axis the day of the week, starting on Sunday. 34 00:02:15,500 --> 00:02:18,520 The heat map shows that the usage increases 35 00:02:18,520 --> 00:02:22,910 during the morning and night rush hours on weekdays. 36 00:02:25,570 --> 00:02:28,550 The next plot helps us visualize histograms 37 00:02:28,550 --> 00:02:34,250 of different categories using the Hubway data. 38 00:02:34,250 --> 00:02:38,520 This plot shows US unemployment by state. 39 00:02:38,520 --> 00:02:42,990 The lighter colors corresponding to smaller unemployment, 40 00:02:42,990 --> 00:02:45,270 and the darker colors corresponding 41 00:02:45,270 --> 00:02:46,810 to larger unemployment rates. 42 00:02:50,140 --> 00:02:54,630 The plan this week is to create all of these visualizations. 43 00:02:54,630 --> 00:02:58,340 We'll see how visualizations can be used to better understand 44 00:02:58,340 --> 00:03:01,870 data, communicate information more effectively, 45 00:03:01,870 --> 00:03:04,670 show the results of analytical models. 46 00:03:04,670 --> 00:03:08,810 In the next video, we'll discuss the World Health Organization, 47 00:03:08,810 --> 00:03:12,850 and how they use visualizations effectively.