This post is part of an exploratory series for a new project that will visualize data related to plants, botany, evolutionary history, and the coming of spring. Use the PlantVis tag to follow along.
One of the first datasets I found when I started this project was from the National Phenology Network. Phenology is the study of the timing of plant lifecycle events – when a particular plant blooms, when leaves change color in the fall, etc. The NPN has citizen science data for lots of plant events, and you can filter by geography as well.
I downloaded a dataset for Massachusetts, did some initial exploration and took a few averages using pivot tables in Excel, and then put together a couple of quick charts, just to get a sense of what’s in there. Each plant type has multiple data points (bloom time, first leaves, fall color, etc.), so I just started out with a single line in the chart for each one. American Beech is #1, and Yellow Birch is #46, for instance.
We’re definitely in the quick-and-dirty phase of the project, so this is more about exploring the dataset and getting a sense of whether it’s worth pursuing than it is about making a good graphic, but a few small tweaks make it a lot easier to see what’s in this cluster of points. Just getting rid of the symbols reduces a lot of the cluster, and makes it easier to scan the data points.
Aligning the color encoding with the kinds of colors you might expect to see for each phase (bright green for breaking leaves, darker green for full leafing, pink for open flowers, orange for colored leaves in the fall) also helps to orient for faster scanning, and makes it easier to separate out the different distributions. I can already start to see spring vs fall here, and I notice right away that fruits actually ripen in the winter for some plants. Some plants leaf first; others bloom first, and there’s quite a distribution of fruiting times as well.
There are definitely some interesting things in here to tease out later, but the dataset itself is actually quite limited in its coverage. The plants listed are mostly trees, a few wildflowers, and a couple of garden plants; not really enough to give the broad picture of spring that I’m looking to create. The smaller plants tended not to have as much detailed phenophase information, and it was hard to be sure of the quality of the data with the number of observations in some cases. I think this is a very interesting dataset, and there is probably more to be pulled out of here, but it will require some more careful cleaning and analysis to decide what’s useable. Because of the level of effort required to move beyond the sketching/exploration phase, I decided to put this dataset aside until I have a clearer idea of what’s out there, and what I’m looking for in the final charts.