This post is part of an exploratory series for a new project that will visualize data related to plants, botany, evolutionary history, and the coming of spring. Use the PlantVis tag to follow along.
The spring project is about learning as much as it is about data vis, so I’ve been reading several books about plants, along with the basic project outlining and initial data explorations I’ve talked about in the last couple of posts. Plant Families: A Guide for Gardeners and Botanists by Ross Bayton and Simon Maughan has been very helpful for understanding how plants are classified and grouped, and I’m starting to learn a little bit more about their evolutionary history, as well. The book is beautifully illustrated, and takes you on a systematic walk through the plant kingdom, with all kinds of interesting factoids to keep things interesting along the way.
As I read, I started to feel like I needed a better sense of scale and relationship between the different plant families to create some context for understanding all of the details. The book provided a good collection of data that I’ve manually compiled into a spreadsheet, but its coverage was not complete, and I didn’t really feel like building the hierarchical relationships out manually. Fortunately, someone else has already done a thorough job of that, and there’s a dataset available for download from the World Flora Online. It was too big to work with in Excel, so I pulled it into R to do some high-level analysis and get a sense of what’s in there. For now, I focused only on the accepted items in the dataset, rather than including all of the entries that are currently open for debate (turns out that there’s a lot that botanists are still debating!).
I’m not very familiar with R and we’re still in the quick-and-dirty exploratory phase, so I was really just looking for a couple of default visualizations to give me a sense of what’s in there, and the distribution and differences between major groups. An out-of-the-box bubble map shows that there are some very large families, and many smaller ones. Asters are the largest family (#55), with orchids, beans, and the madder family not far behind.
That’s about all that the bubble map had to give me without modifications, but the treemap gave a slightly more ordered (and labeled) view. This is helpful in terms of understanding the relative size of the families and their number.
The data includes a hierarchical grouping, so I was also able to add a second layer, and look at the distribution of genera within each family.
I’m not really sure yet how (or if) I will use the species distribution data in the first part of the project, but its helpful to have a sense of what’s in there and what it might be able to do. Knowing that the data is already structured and readable makes it a lot more attractive, especially because I can reshape it in R and export for use in D3 or other software as I go.
Right now, I’m thinking that this might turn into two separate projects: there’s one cluster of information around the first topic, related to bloom times and the coming of spring, and another (much larger, and much more complex project) related to plant taxonomies, families, and relationships between groups. This often happens in the exploratory phase; I spin out enough side project ideas to keep me going for a decade or more, and then the trick is deciding how deep to go, and which ones to pursue. Right now, I’m just skimming the surface, trying to get a sense of what’s out there, and what’s harder and easier to use. The families project wouldn’t be possible if I had to assemble the information myself, but with such good data readily to hand it is tempting to see what it might have to teach me, and how it might inform the other parts of the project. I’ve been meaning to learn a bit of botany to support my interest in botanical drawing anyway, and in this phase of a project I’m always looking for places where different interest sets converge.
For now, I think that’s all I need from this dataset, but I’m definitely putting a bookmark on this one as an interesting area to come back and explore.