This post is part of a larger series focused on exploring the fundamental principles of data visualization. Eventually, the collection may grow into something larger and more coherent. For now, each post simply picks up and plays with one idea related to how we represent data visually. Other posts in this series can be found using the Form to Data tag.
We create an encoding by applying a visual variable to an information channel. A successful encoding makes a good match between the information channel and the particular strengths of the visual variable used to represent it. It is important to check that the visual variable is flexible enough to accommodate all of the data values that the information channel contains: the marks created with an encoding must be distinct enough that a person can read their data values.
Suitability
The first step in encoding an information channel is deciding which kind of visual variable best represents the kind of data that we want to show. An encoding that works with the natural strengths of a visual variable will be more effective than one that contradicts our expectations and forces us to work against our brain’s perceptual system.
Range and steps
To specify an encoding, we need to decide what range of values it can use, and the number of steps that it can represent.
Each visual variable describes a property of an object that can be varied to give different results. For instance, a shape variable can take the following values: circle, square, triangle, and star. The image below shows a range of values for three different visual variables: shape, (color) value, and hue.
For some variables, we can further restrict the properties that we are willing to change. For shape, this might mean that we use only 5-sided objects. For value, we might choose only shades of gray or blue. The image below shows what happens to the variables above if we add restrictions to their ranges.We can also change the number of steps allowed in our encoding, requiring 10 shapes instead of 5.
Distinguishability
When we restrict the range of an encoding, the difference between steps gets smaller and smaller, which makes the resulting marks less distinct. The same thing happens if we try to add too many steps to a particular range (e.g. 50 hues in a rainbow between red and purple). The user will need to be able to tell those values apart in order to read the chart, and the encoding should make that as easy as possible. Finding the right balance between the range of values and the number of steps in an encoding is critical for making a visualization clear.
Different channel types have different needs. Identity channels rely on an encoding to help the user tell things apart, and so maintaining distinguishability between items is critical. Shape and hue work well for identity channels, provided that the distance between steps is large enough. Quantity channels work better with a continuous range of values, and so pair well with visual variables that transition smoothly from one step to the next. (Color) value, size, and orientation are good examples of variables that we naturally interpret as values.
Adding too many steps makes an identity encoding weaker by reducing the distinguishability between items. We can even force it to transition to a value encoding instead. If I talk about the square or the red circle in the first image, you’d know exactly which item I mean. That becomes much harder in the second image, where you’re likely to perceive all of the objects in the first row as being of the same kind (stars), but with a different value for a secondary property (fatness). The item’s shape is still the thing that’s changing, but the different shapes are so similar to one another that we see their similarities (five points) more than we see their differences, and so the thinness or fatness of the star becomes a property that describes its value, instead of its identifying characteristic. The same thing happens with the last row: we no longer have names that easily distinguish between the different colors, and so we are more likely to see them as “more blue” or “more green” – effectively switching out identity for value.