I have been terrible about posting work up on the blog this semester. I’ve been digging into thesis work, and spending less time on everything else. But there is lots of work to share! So, here’s a quick exercise that I did for my CS class in Data Visualization.
The assignment was to find a graphic and do a redesign to change some aspect that relates to principles we’ve covered in class. I chose a design that I actually like quite a lot, but that could maybe be done differently, depending on your purpose.
The original visualization can be found here. It’s from the US census bureau, and shows the distribution in population density for different metro areas. Metro regions are ranked from largest to smallest population along the y axis, and the population density is shown in units of increasing distance from City Hall (used as a proxy for the urban center) from left to right. The graphic is intended to show differences in population distribution within cities, and to allow comparison between cities based on the color trends shown in the heatmap.
First, the good:
The graphic is an extremely compact way of showing a lot of data, and the authors chose a nice color map to represent the quantitative values. It decreases in saturation pretty evenly across the range, and the colors shown are aesthetically pleasing as well. Using colors that span the yellow to blue range and that contain a significant gray value makes this an extremely colorblind friendly choice (it’s almost hard to notice when you turn on a colorblind simulator with this graphic; the color selection is that good).
Again, the primary advantage of this particular visualization is that it is compact. There’s a lot of data shown in a way that is clear and simple to read, once you are oriented to the information shown. The “miles from city center” unit is a little odd, and it wasn’t immediately clear to me what the data showed, and whether it was for one city or many. A graphic to orient the user to the visual organization might be helpful, though I found the labels to be very clear once I actually took the time to read them.
Thoughts for further exploration:
I wanted to play with the idea of a concentric mapping, since the data itself represents concentric rings. I also thought that it would be faster and easier to read a series of “bull’s eye” patterns than to interpret the single lines of information as shown. If these patterns were placed on a map, it might also help to show regional variations and other trends.
I also wanted to include the relative population of the cities in some way, to allow the user to actually see the differences used to sort the y axis. Hovering over a single square of the original visualization highlights the entire row for that city, and plots its name using an interactive label in the animated version. It also lists the population density for the particular square you’ve hovered over, but never gives information about the total population of the city shown. It would be nice to have some sense of the population variation within the dataset.
This was just my own personal bias, but I kept expecting the original graphic to be a 2D matrix array rather than a heatmap that reads from left to right. If it were to be kept in its original form, adding a little more space between cities (rows) might take advantage of proximity to help make the data for individual cities stand out better as a group.
This last point brings me to the use of a white grid over (mostly) dark data, which I think is my primary complaint from the original graphic. I like the fact that the grid is there; it’s very helpful for distinguishing rows and columns in the data, and for separating colors from one block to the next. Unfortunately, the huge difference in contrast between the grid and the squares on the left of the visualization creates a bit of a center-surround problem for me, where my eye keeps switching back and forth between focusing on the data and focusing on the grid. Making the grid lines finer and reducing the contrast between them and the data might help with that. To be fair, thin lines are hard to use on a screen, since vertical/horizontal lines that are a single pixel wide often get cut off or shifted to suit the resolution of the screen. For that reason, making the grid lines lighter might have other undesired effects. The width of grid lines also might be calculated based on screen width to make the visualization responsive; in that case, the gridlines may not be optimized for every view. I’ve had that problem on more than one occasion with d3 rangeRoundBands, and it can be very hard to fix for every case. It looks like the authors have written their own javascript code to generate the grid out of div elements (?!), in this case, so I’m not sure how hard it would have been to narrow the spacing a little bit.
I might also consider reducing the number of categories in the color map, as it is hard to visually distinguish so many similar colors, especially when going back and forth between the figure and the legend. On the other hand, I find this color range to be reasonably legible, and feel that it does a good job of conveying an overall/qualitative sense of the data. Especially with less prominent grid lines, it would be possible to see subtle variations in the colors shown, and those might lend interest and contrast to areas that would otherwise be undifferentiated. The interactive labels also do a very good job of helping the user to collect specific values from the data (they also include a link to the actual data table, for even better legibility), so I’m not sure that distinguishability of colors is really presenting a specific problem. Still, it’s a factor that I would consider in a redesign, depending on the importance of maintaining these distinct categories, and the overall purpose of the design.
The redesign:
I chose to go with the concentric mapping scheme for my redesign, even though it is not as space efficient as the original. I decided to keep both their original color bins and the total number of cities shown in their visualization, because I wanted to keep the integrity of the original piece intact as much as possible.
In order to fit all 90 of the mile-wide rings in the data from the original version, I had to make my ring width very, very small (it’s set to 2 pixels wide, and my canvas is still 6500 pixels across). I got rid of the gray border between color bands, both for space considerations and because I thought that it would be easier to see the actual data without them.
I arranged the cities in order by population, as in the original graphic. I also included the city names and population values as labels under each city circle. I had hoped to encode the population graphically as well, but I couldn’t scale the city bubbles, because the data itself has a fixed-length relationship that needs to be maintained. Also, the equal-area plotting helps to reinforce the underlying grid, and avoids the problem of too much whitespace. I considered trying to find other ways of showing the population graphically, but decided that it would clutter things up too much.
I used d3 to draw the actual graphic and apply the color scales, then moved into Illustrator to do touch-up, as I did for my soil poster last year. That method works really well; it keeps the automation and easy editing of d3 for the early phase exploration, but then I can move into Illustrator for the final tweaks.
Critique of the redesign:
I really love the aesthetics of the redesign, but it is not as data-dense as the original. It is more interesting to look at, and I think it’s easier to see differences in the population density patterns in the circular layout, but the graphic takes up a lot more space. To get all of the icons and the text in one place, you’d really have to move up to the poster level, which may or may not be an option. I think I would still recommend reducing the number of color options, since this really functions more as a qualitative view for pattern finding rather than a quantitative tool.
Depending on the intended purpose of the graphic, I would also consider reducing the number of cities shown. It’s interesting to have them all side by side to compare, but I’m not sure that you gain much more information by including the smaller cities.
If I were to make this interactive, I think I’d include a popup bar chart or other quantitative visualization to show a cross section of the graphs when the user hovers over them with the mouse. I’d also consider enlarging the highlighted pattern, to make it easier to see. It would be necessary to reduce the number of cities shown, to fit better on the screen. I’d want to allow users to filter the results based on specific metrics, and possibly also search for a specific city in the list. It would be interesting to use an animated transition from the general overview to a gallery showing cities of similar types (similarity perhaps being defined in different ways, based on user choice). I also think that integrating a map into the view might be helpful; perhaps in a linked sidebar, with a popup dot that corresponds to the city’s location when a user hovers over a circle, and reciprocal highlighting of related circles when the user hovers over a particular city/state on the map.