This post is part of an exploratory series for a new project that will visualize data related to plants, botany, evolutionary history, and the coming of spring. Use the PlantVis tag to follow along.
As I was reading through Plant Families: A Guide for Gardeners and Botanists by Ross Bayton and Simon Maughan, I felt a need to tie all of the small facts and pieces back to a bigger picture, to be able to put them in context. There are so many fascinating things to know, but it’s hard for me to remember them if I don’t have a way of understanding how one thing relates to another. Disconnected facts are just clutter, but an organizing principle turns them into understanding. One of my favorite things about the book is that it has a very structured approach, which means that you tend to have the same basic data for each information type, which gave me a place to start playing around with the data.
My first step was to explore the relative size of plant families, as I wrote about in a previous post. Next, I started trying to understand the evolutionary sequence, and to ground that in some sense of scale. I’m not so good at imagining what 500 million years looks like, or how that number relates to anything else that I should know about. So, I started collecting all of the data points for when different plant families split off, and putting those in a spreadsheet. That helped with the relative dates (Magnolias came first, citrus is one of the most recent families to differentiate, at 15 million years ago). I still didn’t have a sense of what those numbers meant, so I also collected some dates from Wikipedia to identify evolutionary milestones for the earth.
Excel is not good at this kind of chart (I was sort of hacking a scatterplot to get something on the page quickly), so none of my points had labels. That made it important to have two different point types, so that I could tell at least what kind of information I was looking at. Because so many of the points were compressed into a small part of the axis, it was also helpful to designate them to different y values. The y axis doesn’t mean anything, but it helped me see which data points were my vascular plant family data, and which were related to geologic and other contextual events. And again: the point of this graph is not to be good. It’s to be quick and dirty, to give me some way of mapping my new knowledge onto something that already exists.
It helps to see that all of the plant divergence information clusters in relatively recent history. The far right blue dot is the earth forming, the second is the oceans forming, and the third is archea, which were the first lifeforms. It was pretty amazing to me to see that mapped out. I hadn’t realized how quickly life began once the rock cooled enough for water to build up on the planet’s surface. The next dot (4th from the right) is the first photosynthesis, in single-celled organisms living in the oceans. The next data point shows where geologists first detect significant evidence of atmospheric oxygen in the rock data – a reminder of how profoundly photosynthesis has changed our planet.
Zooming in on more recent history, it’s interesting to see how the red data points cluster. There were a few early evolutionary branches, but things really picked up for vascular plants around 100 million years ago. I’m not sure why this is true, but it poses some interesting questions about what else was going on around that time.
The Excel plot was useful, but I really wanted something a little bit more interactive for exploring this data. To me, that was worth the extra effort of migrating to d3. I started out with basically the same chart, but this time I did put all points on the same axis (can always change it back later).
Next, I decided to make them different colors, to help distinguish between the data sources and types.
Then I decided to keep the geologic milestones as context lines, and represent the plant data with green dots. I also added in hover and expand/zoom behaviors, so that I could start investigating relationships within the data. Tooltip styling and position is something that really should be easy but is always a pain, so I just ignored the fact that my text is an inch or so away from my mouse. Good enough for now.
Once I started getting into this level of detail, I could start to see relationships between data points, and that always leads to new questions. I didn’t get too deep into those (trying not to get too distracted here), but I was curious about the size of the different families since I’d been working on that data, so I pulled that into the y axis, too. Daisies are the biggest family in terms of number of species, and also relatively recent in the broad scheme of things.
I’m not sure that there is any real correlation to be pulled out here, but this chart is giving me another way to identify context and relationships for the data, to see if there is anything interesting while I explore. It’s also a good way to check for problems in the data: that bunch of points at x = 0 is just families for which there is no data. It’s much easier to see that here than in my spreadsheet of dates.
We’re certainly not done with this one yet, but this was enough to give me a sense of what’s in the data to work with, and to satisfy my curiosity about how all these different time points fit in. This needs a lot more work to be a useful data browser, but as a learning tool and contextual aid it gave me what I need for now. I do think it’s interesting to compare the family size and time scale data, and it’s helping me to learn about botany, but both topics are a bit outside of my initial interest in mapping out the timeline for the coming of spring. They’re sort of different axes in a similar variable space, and I’m not sure yet how (or if) they’ll fit in. It’s worth taking a quick look to see what’s in there, but my guess is that these will turn into a second project that’s fairly separate from the growing season data. This was enough to satisfy my goal of understanding the book data context, and get a sense of what’s available here, so I decided to just take some notes and just leave it here for now.