Data Visualization
Transcript: Visualization tools go beyond the standard charts and graphs used in Excel spreadsheets, displaying data in more sophisticated ways such as dials and gauges, geographic maps, time-series charts, spark lines, heat maps, tree maps and detailed bar, pie and fever charts. Patterns, trends and correlations that might go undetected in text-based data can be exposed and recognized easier with data visualization software. Visualized data is frequently displayed in business intelligence (BI) dashboards and performance scorecards that provide users with high-level views of corporate information,metrics and key performance indicators (KPIs). The images may include interactive capabilities, enabling users to manipulate them or drill into the data for querying and analysis. Indicators designed to alert users when data has been updated or predefined conditions occur, can also be included. This would work great, but there’s a secondary problem: since UNC is a good 40 minute drive away, our grad assistants tend to have very rigid schedules, which are fixed well in advance — so we can’t just alter our grad assistants’ schedules on short notice to have them cover a class. Meanwhile, instruction scheduling is very haphazard, due to wide variation in how course slots are configured in the weekly calendar, so it can be hard to predict when instruction requests are likely to be scheduled. What we need is a technique to maximize the likelihood that a grad student’s standing schedule will overlap with the timing of instruction requests that we do get — before the requests come in. Luckily, we had some accrued data on our instructional activity from previous semesters. This seemed like the obvious starting point: look at when we taught previously and see what days and times of day were most popular. The data consisted of about 80 instruction sessions given over the course of the prior two semesters; data included date, day of week, session start time, and a few other tidbits. The data was basically scraped by hand from the instruction records we maintain for annual reports; my colleague Anne Burke did the dirty work of collecting and cleaning the data, as well as the initial analysis. The chart above gets the fundamentals right — since we’re designing weekly schedules for our grad assistants, it’s clear that the relevant dimensions are days of week and times of day. However, there are basically two problems with the stacked bar chart approach: (1) The resolution of the stacked bars — morning, afternoon and evening — is too coarse. We need to get more granular if we’re really going to see the times that are popular for instruction; (2) The stacked bar chart slices just don’t fit our mental model of a week. If we’re going to solve a calendaring problem, doesn’t it make a lot of sense to create a visualization that looks like a calendar? When I thought about analyzing the data in these terms, the concept of a heatmap immediately came to mind. A heatmap is a tool commonly used to look for areas of density in spatial data. It’s often used for mapping click or eye-tracking data on websites, to develop an understanding of the areas of interest on the website. A heatmap’s density modeling works like this: each data point is mapped in two dimensions and displayed graphically as a circular “blob” with a small halo effect; in closely-packed data, the blobs overlap. Areas of overlap are drawn with more intense color, and the intensity effect is cumulative, so the regions with the most intense color correspond to the areas of highest density of points. In this graphic, days of the week are represented by the horizontal rows of blobs, with Monday as the first row and Friday as the last. The leftmost extent of each row corresponds to approximately 8am, while the rightmost extent is about 7:30pm. The key in the upper left indicates (more or less) the number of overlapping data points in a given location. A bit of labeling helps to clarify things: Heatmap of instruction data, labeled with days of week and approximate indications of time of day. Right away, we get a good sense of the shape of the instruction week. This presentation reinforces the findings of the earlier chart: that Monday, Tuesday, and Thursday are busiest, and that Friday afternoon is basically dead. But we do see a few other interesting tidbits, which are visible to us specifically through the use of the heatmap: Monday, Tuesday and Thursday aren’t just busy, they’re consistently well-trafficked throughout the day. Friday is really quite slow throughout. There are a few interesting hotspots scattered here and there, notably first thing in the morning on Tuesday. Wednesday is quite sparse overall, except for two or three prominent afternoon/evening times. There is a block of late afternoon-early evening time-slots that are consistently busy in the first half of the week. Using this information, we can take a much more informed approach to scheduling our graduate students, and