Creating a Map for Organic Dairy

Using python to make a multivariable map

4/6/20233 min read

I've been interviewing for a few different positions and have been thinking of a way to showcase the different skills I have gained throughout the years. One of the skills which I have been working on is data visualization. The position I am interested is related to the dairy organic agriculture industry and so I decided to have a look at what kind of visual data exists for the space.

There wasn't much - I was able to find a visualization from the USDA In 2011, but it wasn't quite how I would choose to  represent the data (Figure below)

The USDA visualization had some good elements, but left a lot to be desired.

What I liked

  • It was visually appealing, the unusual combination of a visualization wheel and a map was eye-catching.

  • The colors were well chosen and very distinct

What I didn't like

  • It took significant time and effort to understand the visualization

  • Hard to make relative comparisons between regions and farm indicators

I decided to create my own updated map based on a recently compiled report by the USDA (2022) - Certified Organic Survey 2021. The data in question is the latest available dataset from the USDA and came from a section of the report detailing the production of organic milk throughout the country. The process to create the map was relatively straightforward, I coded in python with the help of Chat-GPT for aspects of the visualization which I did not know or for use in debugging.

Step 1 - Digitizing the data

Took the data from the PDF and created a CSV file

Step 2 - Cleaning the data

Rectified the missing data, added missing states (with zeroed data values, important for the mapping)

Step 3 - Shape file & Boundaries

Since I did this without ArcGIS, I had to download a shape file and figure out how to use it within Python. It was relatively straightforward once I figured out where I could download it from (Census Data - https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html). An additional consideration was what to include. Since Alaska and Hawaii had no data points, they were chosen to be excluded from the map.

Step 4 - Choose the Visualization Parameters

The dataset provided information on sales, production volume, and the number of farms, which I chose as visualization parameters. To accommodate the varying units and scales, I normalized the data for display purposes.

Step 5 - Plot the Visualizations on the Map

Drawing from my previous experience overlaying bar plots on map data in the FAO map visualizer, I employed a similar technique to visualize all three data parameters for each state. Using the cleaned, normalized data, I created bar plots from the centroids of each state. I assigned separate colors to the different data parameters and added a legend with relevant information, as well as an informative title.

Overall, I am pleased with the final visualization. The design is intuitive, and the color palette is visually appealing. The presentation of the data allows for easier conclusions to be drawn about production numbers and the types of organic milk producers in different regions. For instance, larger organic cattle operations are more prevalent in the west, while smaller and more numerous operations dominate the mid-west and east.

As an additional note, the current structure of this visualization could easily be adapted to create a more interactive experience. For example, a graphical user interface could be developed, allowing users to scroll over an interactive map and click on individual states to access further information. This functionality would be particularly useful for an interactive website.

Language used: Python

Libraries used: pandas, geopandas, and matplotlib