This post is part of a larger series focused on exploring the fundamental principles of data visualization. Eventually, the collection may grow into something larger and more coherent. For now, each post simply picks up and plays with one idea related to how we represent data visually. Other posts in this series can be found using the Form to Data tag.
Sometimes, a single mark can be used to encode multiple pieces of information. The main purpose of a chart is to show relationships between data points in some visual way. To support more sophisticated analysis, we often need to show that data points belong to multiple categories at one time, so that we can understand relationships between different groups.
Each mark can represent one or many channels of information. Each channel contains information about a different aspect of the data. To show information visually, each channel must also be encoded using one of the visual variables (position, size, shape, hue, value, texture, orientation), so that each channel is assigned to a different property of the mark.
If a single mark encodes multiple channels, then we need encodings that are compatible enough to be combined within a single mark. Position is a very flexible property, and is compatible with many other visual variables, such as color, shape, or size.
Existence, Identity, and Value
Drawing a mark on a page tells us that it exists. An identity channel is information that tells us what the object is. Value channels give us a sense of relative value: they tell us about how many, how much, or how important the object is.
If there is more than one mark on the page, the identity channel helps us distinguish between them. In the image below, all of the dots have the same circular shape. We know that they are not all the same dot, because there are multiple marks on the page (existence), but we automatically assign shape to the identity channel and assume that all of the marks are the same kind because their shapes are the same.
The next image has different identities for each mark, so that the identity channel lines up with the dot’s existence. Now, we understand that identity is tied to shape, and we assume that each mark represents a different kind of thing. A good way to recognize an identity encoding is that there is usually a specific word to describe it – a star, a square, etc.
These examples also confirm that existence is different than identity. There are 5 dots in the first image (existence), even though they all look the same (identity). The second image has 5 marks (existence) that all have different shapes (identity). In the image below, there are 5 marks that are grouped by their shape (shared identities) into two kinds.
Value channels give us a different kind of information about an object. Instead of answering the question “what kind,” value channels tell us “how much.” Value is often encoded using an object’s size or color value. The image below shows a set of objects whose identity is encoded by shape (circles), and whose value is encoded with size.
Existence: an object is
Identity: what kind it is
Value: how much/many/important it is
Forcing Channels to Change Type
In some cases, we can force one kind of channel to act like another. We usually do this as a way of distinguishing between points that otherwise look the same, or to infer information that isn’t readily available.
In the image below, the dot’s position doesn’t mean anything – it doesn’t encode information that is meaningful for this visualization.
Let’s say we wanted to talk about a specific dot in this image. There is no way to tell the individual dots apart, because the identity channel has only a single value (circle), and that doesn’t help us to distinguish between them. In that case, we can force position to act like an identity channel and talk about the dot on the bottom right of the image. The dot is the mark itself (existence), the identity channel is useless for the distinction we want to make (the marks are all circles), and so the position becomes an identity channel that helps us talk about it.
In an image where each mark is a different shape, we don’t need to use position, because the identity channel distinguishes for us based on shape. Now, instead of using position to describe which object we mean, we can talk directly about the star, the triangle, or the circle.
This only works for as long as there is one mark with each identity; once there are duplicates, we are back to the initial problem, and we need some way to distinguish between objects in the same group. In the image below, we could talk about the top circle, or the right hand star.
In the image above, we can talk about the biggest circle or the smallest circle, but it is difficult to talk about the second-from-the-middle circle. The image below has only two sizes of circle, so it is very clear that “the big one” is an identity as well as a value.
Primary channels and modifier channels
Each mark has a primary channel (usually identity) that tells us the most important thing about it, and it may have one or more modifier channels to give us additional information. The way we talk about these dots helps to distinguish between primary channels and modifier channels.
Looking back at two examples from the last post, we can start to understand why some combinations of visual variables work better than others. In this image, shape and color are both used as identity channels. The two encodings agree, so we see a group of purple circles and a group of green stars.
By paying close attention to how we’re talking about the marks in this image, we can also begin to understand that there is an automatic hierarchy between channels, based on the kind of visual variables that we used to represent them. The circular purples and the star-shaped greens doesn’t make nearly as much sense as the purple circles and the green stars. This implies that we rely less on color than we do on shape to determine identity: color is a weaker encoding for the identity channel. It is more natural to use shape as the primary encoding, and color to reinforce the primary identity channel.
We can see this in action by applying color as the primary identity channel and using shape as a modifier channel that encodes a different identity relationship.
Here, we can talk about the stars and the circles and we can talk about the green marks and the purple marks, but the two sets of identities are always in conflict with each other. In this image, we have two groups of two identities (circle/star, purple/green) that never completely line up. It’s not clear which identity is primary and which is the modifier, and so the image becomes more difficult to read.
It is common for a data visualization to have several channels encoded in a single mark. Ensuring that the channel hierarchy matches the way that we naturally interpret the different visual variables helps to keep a chart clean, clear, and easy to read. The primary channel should reflect the most important identity for the mark, followed by less important value and identity channels as needed. The next post will talk about the properties of different visual variables, and how we can use their different strengths to reinforce chart hierarchy and reduce competition between different channels.