DesignTheory: Thoughts on entropy

As a scientist, I am used to thinking of entropy as disorder. It is the thing that takes a clear signal and turns it into noise. When entropy is introduced into a system, information decays.

This week’s reading for my Design Theory class turned that idea on its head. We read Shannon + Weaver’s Mathematical Theory of Communication. Instead of viewing entropy as the destroyer of order and the origin of noise, Shannon argues that entropy is information. In fact, he argues that entropy is the only quantity that can be used to fully (and mathematically) represent information.

Coming from my perspective, this felt like a complete backflip. But, after multiple readings and thinking about it for a week, it actually makes a lot of sense.

For Shannon, entropy is a measure of the number of options available to someone who wants to send information to someone else. Each option is associated with a probability that that one option will be selected out of the many options available. If the number of options is small, the probability of selecting each one of them increases. The highest possible probability for any particular choice exists when it is the only choice to be had – in this case, the probability is 1, or 100%. If there is only one option and the user must make a choice, then there is a 100% chance that he will choose that option.

Of course, that’s what’s known as a tautology, and it’s not particularly interesting. In fact, according to Shannon, it tells us precisely nothing. It contains no information. To put it mathematically, he defines this situation to have an entropy of 0 – perfect order.

For Shannon, the interesting things happen when you allow disorder into the system. Now, instead of having only one option, the user has many and must choose among them. When making one choice affects the probability of another, that’s called redundancy.

For instance, say the user makes a series of choices about which letters to send in her message. Once she chooses q, the probability that the next letter will be u is close to 1, provided that she is writing in English. This suggests that the u could be omitted without sacrificing the readability of the message; therefore, it is redundant. In fact, it turns out that the redundancy of the English language is about 50%, which makes it easier to catch errors without sacrificing meaning. And so, redundancy makes a communication system more robust against the effects of noise. Random fluctuations or distortions introduced by the technology (what Shannon calls engineering noise) might change the signal received by one or two letters, but redundancy ensures that the original message can still be understood.

Because it constrains the choice of the second letter (u), redundancy decreases the entropy of the system. My car navigation system does exactly this every time I enter an address – once you start typing, it greys out all the letters that it doesn’t expect you to use (because that combination is not stored in its database). This makes it faster to select the next letter, and reduces the chance of hitting the wrong key while typing.

This interface constrains the entropy of the system by reducing my options for information that I can enter to words that are found in the car’s database. This is useful, because it keeps me from being able to enter complete gibberish, or the name of a place that the car thinks doesn’t exist. It’s cleverly using the concept of redundancy to “pre-filter” my choices and reduce the introduction of “engineering noise” (i.e., me mistyping the letters). Rather than waiting until I’ve entered the wrong message and applying the rules of redundancy to figure out what I really meant, it simply removes the “errors” from my list of options and doesn’t let me make “a mistake.”

This is all well and good, as long as all of the possible choices can be translated and indexed quickly. It’s useful for the user to know early on that a place doesn’t exist in the database by that name. However, it can also be frustrating when you’re sure (correctly, or otherwise) that you have the name right and the car refuses to allow you to type it in. In that situation, Google’s bigger relational database is far more effective than my car’s letter-by-letter comparison.

Google lets you type in anything at all (allowing higher system entropy), and then back-calculates what it thinks you must have meant. If you mistype a letter, it “knows” enough to replace it with the correct one, using redundancy. Or maybe you’re off by more than a letter. Maybe I type in “Woodstock Seafood” and Google helpfully suggests that I might have meant “Woodman’s Seafood” instead. Because Google keeps the entropy of the system high for longer, it can tease out information from the entire message, rather than just the amount I’ve entered at the point where I make a mistake.

And so, Shannon argues, even though entropy increases disorder, it actually provides us with the means of extracting information from the system. The car prevents me from introducing engineering noise into the system while I type, but because it reduces entropy, it also loses information about the rest of my request along the way.

In part, the car does this because it has no way of filtering out “semantic noise,” or differences in meaning based on the language used. The technical transmission of my request could be perfect – I could type exactly what I mean to type – but the car would still have no way of finding “Woodstock Seafood,” because what I am calling the restaurant doesn’t match what the car is calling it. Google’s more advanced search algorithm is equipped to filter both the engineering and semantic noise from the system, and so does a better job of extracting meaning from the interaction.

So why do scientists think of entropy mostly as noise, if it’s really information? Mathematical models work well for simple, ordered systems, and these organized “models” constitute most of the problems studied by scientists, for reasons that I’ll get to in a moment. Physicists are famous for the injunction to “assume that a cow is a sphere.” To approximate a complex physical system, sometimes you need to add several layers of abstraction. Very often, you have to blur the details before you can actually see the bigger patterns.

When you are searching for a tiny, tiny signal amidst lots of other signals (what we’ll call “background noise”), it’s easy to think of everything outside of that signal as distraction, not information. Things you can’t make sense of aren’t “information” in a problem-solving context. In order to see the tiny signal clearly, you need to remove all the other information around it.

Like having a conversation in a crowded room, you need to find the one message that you want to hear and separate it from the rest. Here, having a lot of entropy isn’t helpful. If you don’t know the person or the topic and context of the conversation, it’s going to be a lot harder to hear what they’re whispering to you. Fi__ ____ty doesn’t mean much if the person could be saying anything. But if you know from context that they’re talking about time, your brain can easily combine redundancy and semantics to arrive at the conclusion that they likely said “five thirty.”

Shannon is primarily concerned with the technical accuracy of the transmission. Everything else, he relegates to semantic noise, and assumes that figuring out meaning is the responsibility of the receiver.

This works with an intelligent human being talking on the phone to another intelligent human. But what about the researcher, trying to suss out the nature of a barely-detectable particle, which has no desire at all to “send” a message of any kind?

Shannon would argue that the problem is the same – if the researcher could identify a clear signal from the particle and reproduce it accurately, then he would be able to understand its nature.

And this, I think, is where the different perspectives on entropy-as-information and entropy-as-noise comes in. In science, we are seldom looking at information from a single particle: the signal is simply too small to be measured above the background noise. Instead, we rely on what I’ll call a “chorus” or system of particles, all chanting the same thing at the same time, in the middle of the crowd. This makes it easier to “hear” them (measure their signal) over the crowd.

It’s very complicated to make sense of how light scatters off a single atom, but if you line a bunch of atoms up, constructive and destructive interference work together to simplify the light waves emitted, and you get discrete diffraction spots that can easily be measured.

As long as those atoms are organized and “chanting” together, you can learn about their organization by studying those diffraction spots, and that tells you something about the particles themselves.

Because of the technical difficulty of measuring complex systems, scientists focus first on organized (low-entropy) systems and build their model from those. The models are then developed to include the disorder of the system to the extent that it’s necessary to mimic observed behavior. The complexity of the model increases very quickly once disorder is introduced, and in any but the simplest systems you run out of computing power.

And so, for the scientist using an instrument to detect signal from particles, entropy in the system of particles is (or can be) a bad thing. Low entropy systems where the particles “chant together” produce a stronger, simpler signal, which is easier to distinguish from the background noise, and easier to model and reproduce. The more disorder that you introduce into the system (differences in key, timing, words chanted), the harder it is to figure out what’s going on, and the muddier the signal becomes. For these kinds of studies, keeping entropy low is critical.

But of course, if you measure the signal produced by a group, that necessarily contains less information than a private conversation with each individual. Reducing the entropy does make it simpler to build a model and to measure certain properties, but it also sacrifices some of the richness of the system. In some situations, it tells you almost nothing about the individual, and more about the organization that it belongs to. Sometimes, those are the same thing. Often, they’re not.

A crystal is a perfect example of a low-entropy system of study (or at least I think so, because that’s what I used to do). You grow a crystal by convincing a majority of the atoms/molecules to line up in a particular way, effectively creating a low-entropy system. This is a great little chorus of particles to study with x-ray diffraction, which tells us all about the bonds and the structures involved in the molecules (“chanters”). The whole field of materials science is based on measuring materials structures and using that to understand their properties. But there are limits.

For years, solving the crystal structures of proteins has been something of a holy grail. It wasn’t long ago that solving a single crystal structure for a single protein was enough to get you a PhD – it was that difficult to do. That’s partly because proteins are particularly bad at “chanting together.” They don’t tend to form a rigid, ordered chorus, but instead act more like a jazz group – each molecule takes on its own slightly different configuration, riffing off the basic “rules” of the group.

This additional entropy makes protein structures profoundly difficult to study, and finding methods to cope with or reduce this entropy is a very difficult task. It’s no wonder that protein crystallographers shudder at the idea of entropy containing information – it’s exactly because of entropy that it is hard for them to collect their data!

But the disorder in a protein crystal also contains information. There is good evidence that the conformations adopted by a protein in a biological system usually bear only passing resemblance to their structures in a crystal. It would be crazy to imagine that every person who sings in a chorus walks around talking in 4/4 time and the key of C all day – they sing when they’re in the chorus, and they talk the rest of the time.

Proteins do the same thing. They organize one way in a crystal, and do completely different things elsewhere. You might learn something about a particular person’s voice or the aesthetic sensibilities of the group by listening to the chorus, but you don’t expect the music to tell you everything about them. For a more nuanced understanding, you have to look at people outside of that group; either alone, or in other structures.

So, I think Shannon’s right. Information is entropy, and reducing complexity reduces the information that your system can hold. To the extent that scientists see entropy as the enemy of information, it is because we are limited by our mathematical and technological tools to the study of low-entropy systems (though this becomes less and less true by the minute). Low-entropy, ordered systems are the ones with signals strong enough to stand out from the background noise. You don’t build history by compiling a biography of every living person at a particular point in time (though perhaps you should…that’s a story for another day). Instead, you identify patterns and trends, and introduce layers of abstraction such as “groups,” “countries,” and “allies.”

These layers of abstraction remove individuals, but help us to understand how the system functions as a whole. In discarding information about individuals, you also discard a majority of the information. As information designers, it’s our job to help people move between these different layers of abstraction, adjusting the level of entropy to suit different purposes at different times. (As I write this sentence, I am struck by the fact that these thoughts must have been influenced by this wonderful article on layers of abstraction that I read for my Studio class last night.)

From that perspective, information design is a lot like Google’s search algorithm, which extends Shannon’s question of technical accuracy into semantics and beyond. It’s essentially a filtering problem – we need a set of advanced filters that allows us to adjust the level of entropy in a dataset according to the specific needs and desires of the end user, at this particular moment in time.

Reducing the entropy too early causes loss of information that more advanced filters might catch. Google can figure out from the semantic context that I mean “Woodman’s” rather than “Woodstock.” My car reduces entropy too soon, and gives up on the first pass. In focusing too heavily on ensuring Shannon’s technical transmission accuracy (no typos!), it misses the lower layers in his system and discards too much information to figure out what I really mean.

And so, the question for information design is this: how do you keep the entropy and the full, glorious complexity of the system, while still maintaining our ability to separate the signal that we care about from all the background “noise”? (Which is really just all the other signals that we don’t care about right now, but might want later.)

I don’t know the answer to that, but adjustable entropy at the viewing level seems to be key. You have to simplify the protein structure (by reducing entropy) in order to know what you’re looking for when you start to study the protein behavior in biological systems. Reduction of entropy is a natural and necessary step in progressing toward a working model of a complex system. Without simplifying, you can’t identify the behaviors to study in the more complicated system. But it is only a step: as with choruses and proteins, measuring how an individual functions in a rigid system tells you very little about its behavior in “the real world.”

It is only by transitioning between different values of entropy for a system that we can understand both the complexity and the broader trends. Figuring out how to do that seems to me to be the future of information design, and of science.