ResearchMethods: Defining narrative and scope

I think it’s time to start narrowing down my narrative flow for the Research Methods project to help reduce the amount of reading and data processing necessary for the project. At this point, the dominant points seem to be:

  • Soil is a complex system, and critically important for both food security and climate change.
  • Soil has multiple components, whose proportions vary according to soil type
  • The presence of different components affects the soil stability (resistance to erosion), moisture retention properties (reduces need for irrigation, and mitigates flooding), and fertility/nutrient availability (ability to support plant life, important for agriculture)
  • Soil mineral content is largely determined by geography. Organic content is largely determined by land use practices and climate (esp. availability of water).
  • Organic inputs to the soil vary according to ecosystem, but come primarily from plants that live in the soil and, in agricultural lands, from manure supplementation.
  • Plants provide organic material to the soil, and use nutrients from the soil, which are metabolized/released by soil microorganisms
  • Soil biodiversity is not well characterized, but is thought to be even greater than above-ground biodiversity. There is a complex interrelationship between plants and soil bacteria, which may have important effects on plant health and output. Like the microbiome for humans, plants also have a strong symbiotic relationship with microorganisms, which may be essential for their health. There is evidence for a chemical communication system between plants and their immediate environment, likely used for controlling the population of beneficial soil bacteria. Some believe that the next step for increasing agricultural productivity will be manipulating this relationship between plant and soil bacteria. The population of different classes of bacteria have been found to vary depending on the specific plant, and also the location sampled along the plant root.
  • As the primary engine of soil decomposition, soil microorganisms also play a dominant role in controlling soil greenhouse gas emissions, which are an important aspect of climate change.
  • Soil organic carbon (SOC) is an important sink for carbon in the atmosphere. Heavily tilled lands tend to have lower SOC levels, and therefore a reduced capacity to mitigate global warming and climate change. Heavily-used land also tends to be less fertile and more prone to erosion.
  • Population growth requires a constantly-growing food supply. This demand has been met by increased agricultural productivity due to modern agricultural techniques in the past decades. Industrial agriculture has increased the output per area of agricultural land, but has had other adverse effects as well (fertilizer runoff causes dead zones, general loss of soil productivity). It appears that this enhanced productivity has begun to level off, meaning that we may be reaching the upper limit for food that can be grown in a single acre of land. This is important, because it means that the cultivated land area will have to increase to meet the demands of a growing population.
  • There is also good evidence that modern fertilizers cause increased emissions of NO2, also a greenhouse gas. High-till agriculture also increases the CO2 turnover (decreases SOC, as stated above).
  • Other agricultural emissions include methane, most of which is released by livestock and widespread fertilization of land using manure.
  • Urban and high-traffic areas seal soil beneath them, either through paving, building, or compaction. This makes the land less able to absorb water, leading to an increased risk of flooding.
  • Forests and non-cultivated lands are the primary global carbon sinks, and have the most robust soil ecosystems. Deforestation due to city and agricultural expansion is a threat to forests worldwide, but especially in developing nations.
  • Looking at trends in deforestation is important – developing nations are simply deforesting later than the rest of the world. (Compare Brazil and the US.)

That’s not particularly narrow, but it does help me to see the scope of the data that I should be looking to find. I probably don’t need a detailed breakdown of soil type by country, and possibly not even by continent. It’s an important part of the story, but soil type isn’t really the story itself. Change in agricultural practices is actually a much larger piece of the story I want to tell, and so likely deserves a larger amount of space (and data) devoted to it.

I’ve also started thinking of this as a two-part project: one part for my Research Methods class, and the other for myself. The class requires that we create a poster to discuss the impact of our particular “object” of interest. That medium gives me a very specific set of criteria, and a limited amount of space to address the issue. For a poster, I really need to narrow this story down to a point, and just focus on getting the core message across, as I did with the monarch butterfly poster last semester.

The second part of the project is more ambitious. The poster format doesn’t really allow me to take advantage of the full dataset (including comparisons between countries), and it lends itself to a didactic rather than an exploratory approach. I am more interested in exploratory graphics, though, and so I’ve been thinking about how I could expand the poster content to make a layered, interactive graphic using some of the programming we’ve been learning for the web.

The interactive web format does lend itself to comparisons between countries, structured using multiple interactive frames that each tell a piece of the story. Rather than trying to show all of the points listed above in one view to be read “at a glance,” an exploratory interactive graphic would frame each one as a separate piece. The user can then explore each one using the variables provided and make their own connections about what’s going on in the data.

My thinking about how to frame this project was first based on Fathom’s “What the World Eats” graphic for National Geographic. Their side-by-side country comparisons are really interesting to me, and I’d like to incorporate something similar into my final design. I think that’s more appropriate for an interactive graphic than a poster, though, so that section may have to wait for the later version.

Daniel Muller from the New England Journal of Medicine gave a talk at Northeastern last week, and I was really inspired by his work as well. His graphics include a scrolling navigation bar, which I thought was a very effective way of helping a user navigate through a narrative independently. I was also impressed by his use of tooltips and interactivity to increase the depth of the data available, and the multi-pane layout of his graphics.

I asked Daniel how he approached the process of creating such complicated visualizations, and he said that he really focused first on which variables which were most important for the user to be able to control, and then built the specific graphics from there. That answer is very appealing to the scientist side of me, and is really helping me to frame the question of what data I want to collect.

Deciding which data is worth curating, and at what level of detail, is the most difficult part of this project. The World Bank data is very clean and uniformly formatted, so that’s been a great starting place to play with, but it doesn’t contain everything that I need. Several of the soil type databases I’ve looked at have had incomplete or non-aggregated information, and I’m not sure that it’s worth the effort to compile them, given the story that I want to tell. I was looking at the website for the statistics division of the Food and Agriculture Organization of UNESCO this morning, and I think that their data is much more likely to have the level of detail and kinds of information that I want.

For the poster, I’ll need to distill all of the country/region and trend data down into just a handful (maybe 5?) of simple graphics. If I’m working toward a more detailed, interactive visualization, then the full time range and individual country data will likely be useful.

Then, it becomes a question of how to process and store all of that data in an accessible way. For the Research Methods class, we’ve been using R to process a .csv and export visualizations, but loading a huge .csv can be slow in web applications, and I’d need multiple different versions to support each graphic. If I’m interested in a truly complex interactive graphic, I’m thinking that building a database might be the way to go. Of course, I have no idea how to do that, but it’s just one more thing to figure out…