Where Did All the Bees Go? (Part 1)

A Deep Dive into Bee Colony Loss with CanvasXpress and Plotly in R

Today we’ll be looking at creating some data visualizations in both Plotly and the Javascript-based visualization library CanvasXpress. Both are robust libraries with lots of similar features, so it’ll be fun to see what it’s like to try and recreate certain plots in both.

First, let’s load the appropriate libraries. We’ll need the Tidyverse for some data wrangling.

library(tidyverse)
library(plotly)
library(canvasXpress)

Our dataset today was obtained from the USDA’s 2022 release of the annual Honey Bee Colonies report (https://usda.library.cornell.edu/concern/publications/rn301137d?locale=en). We can grab some of the cleaned data from the TidyTuesday Github Repository.

# get data from tidytuesday's github 
colony_data <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-01-11/colony.csv')

stressor_data <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-01-11/stressor.csv')

Scatterplots

Let’s start off with a basic visualization of which states have the most bee colonies and how many they’ve lost in total since 2015. We’ll be using scatterplots to visualize this data to get a sense of how the different libraries handle basic visualization.

# prepare data 
colony_data_grouped <- colony_data %>%
    mutate(state = as.factor(state)) %>%
    select(c("state", "colony_max", "colony_lost", "colony_n")) %>%
    group_by(state) %>%
    na.omit() %>%
    summarize_all(sum)

# take a look at the data
head(colony_data_grouped)

## # A tibble: 6 × 4
##   state       colony_max colony_lost colony_n
##   <fct>            <dbl>       <dbl>    <dbl>
## 1 Alabama         214000       32710   197500
## 2 Arizona         734500      132300   706500
## 3 Arkansas        561000       84300   539500
## 4 California    29830000     3456000 23510000
## 5 Colorado        632500       94080   489500
## 6 Connecticut      89400        6450    87900

All Plotly plots first begin with initializing a Plotly object using plot_ly(). After this is done you then build upon the Plotly object with your plot elements. We have added in an additional fit line so we can better visualize the trend. In Plotly we do this by creating a linear model and then plot it using add_lines(). We then add our aesthetic configurations at the end with layout().

# build initial Plotly plot
fig <- plot_ly(colony_data_grouped,
               x = ~colony_lost,
               y = ~colony_n,
               type = 'scatter',
               mode = 'markers')

# add linear regression model  
fit <- lm(colony_n ~ colony_lost, data = colony_data_grouped)

fig <- fig %>% add_lines(x = ~colony_lost, y = fitted(fit))

# add layout arguments and aesthetics
fig %>% layout(yaxis = list(title = "Number of Bee Colonies"),
               xaxis = list(title = "Number of Colonies Lost"),
               title = list(text = "Total Number of Bee Colonies by State Since 2015"),
               showlegend = FALSE)

Now, let’s try recreating something similar in CanvasXpress.

CanvasXpress was built initially with biological data in mind, so it tends to work better with wide data formats. An example of this is having samples as columns, and variables as rows. CanvasXpress uses rownames to match up the data, so reshaping the data to emphasize row and column names are important. If you would like to learn more about wide datasets, you can check out this article.

# group data 
colony_data_grouped <- colony_data %>%
    mutate(state = as.factor(state)) %>%
    select(c("state", "colony_max", "colony_lost", "colony_n")) %>%
    group_by(state) %>%
    na.omit() %>%
    summarize_all(sum)

# reshaping the data so that states are rownames
cx_colony_data_grouped <- colony_data_grouped %>%
    select(c(colony_n, colony_lost, state)) %>%
    column_to_rownames("state")

# take a look at the data
head(cx_colony_data_grouped)

##             colony_n colony_lost
## Alabama       197500       32710
## Arizona       706500      132300
## Arkansas      539500       84300
## California  23510000     3456000
## Colorado      489500       94080
## Connecticut    87900        6450

Unlike Plotly, CanvasXpress plots can all be configured using the canvasXpress() function to build our plot object. CanvasXpress can take yAxis and xAxis arguments, but I personally found it easier to model the data and then feed it in. CanvasXpress has a very useful function to add a regression line to the plot without the need for a pre-existing model. This is because of the extensive data modelling we did earlier, which makes it easier for CanvasXpress to make assumptions about our data.

canvasXpress(data = cx_colony_data_grouped,
             graphtype = "Scatter2D",
             afterRender = list(list("addRegressionLine")),
             title =  "Total Number of Bee Colonies by State Since 2015",
             yAxisTitle = list("Number of Bee Colonies"),
             xAxisTitle = list("Number of Colonies Lost"))

We can see that although CanvasXpress requires a bit more mindfulness with regards to data wrangling, we don’t have to go through the layering approach that Plotly typically requires.

Stacked Bar Chart

Let’s try to create a stacked bar chart. For this one, let’s take a look at the stressors involved in bee colony losses and their percent contributions. For the Plotly visualization we needed to do some wrangling of the data to used summarised values.

# prepare the data 
stressor_data_filtered <- stressor_data %>%
    mutate(stressor = as.factor(stressor_data$stressor)) %>%
    group_by(stressor, state, year) %>%
    summarise(pct_avg = mean(stress_pct, na.rm = TRUE)) %>% # quarterly average
    group_by(stressor, year) %>%
    summarise(pct_avg = mean(pct_avg, na.rm = TRUE)) %>% # average across states
    pivot_wider(names_from = "stressor", values_from = "pct_avg") %>%
    rename('Diseases' = 'Disesases')

# take a look at the data
head(stressor_data_filtered)

## # A tibble: 6 × 7
##    year Diseases Other `Other pests/parasites` Pesticides Unknown `Varroa mites`
##   <dbl>    <dbl> <dbl>                   <dbl>      <dbl>   <dbl>          <dbl>
## 1  2015     3.67  6.32                   11.2        7.76    4.18           27.2
## 2  2016     4.15  5.78                   10.4        8.06    4.37           29.3
## 3  2017     5.87  6.61                   11.2        8.06    4.37           33.0
## 4  2018     4.06  7.80                   14.4        8.36    4.66           34.9
## 5  2019     4.32  6.45                   12.3        8.32    4.05           31.7
## 6  2020     3.52  5.28                    9.83       5.73    4.47           28.9

Next, in we will build our plotting object and add the traces for the individual factors manually. The layering here can be useful for controlling the order of the stacks. I personally found it a little cumbersome to repeat these traces, but I’m sure you could use an lapply() or for loop to make this easier on yourself.

fig <- plot_ly(stressor_data_filtered,
               x = ~year,
               y = ~`Varroa mites`,
               type = "bar",
               name = "Varroa mites")

# add stacked bars through traces
fig <- fig %>% add_trace(y = ~`Other pests/parasites`,
                         name = "Other pests/parasites")
fig <- fig %>% add_trace(y = ~Pesticides,
                         name = "Pesticides")
fig <- fig %>% add_trace(y = ~`Diseases`,
                         name = "Diseases")
fig <- fig %>% add_trace(y = ~Other,
                         name = "Other")
fig <- fig %>% add_trace(y = ~Unknown,
                         name = "Unknown")
fig %>% layout(yaxis = list(title = "Average Stress PCT %"),
               barmode = 'stack')

Now, for the CanvasXpress version! In addition to the data wrangling that was included for the Plotly visualization, you can see that we have also had to pivot the data to be in wide format.

# prepare the data 
stressor_data_filtered <- stressor_data %>%
    mutate(stressor = as.factor(stressor_data$stressor)) %>%
    group_by(stressor, state, year) %>%
    summarise(pct_avg = mean(stress_pct, na.rm = TRUE)) %>% # quarterly average
    group_by(stressor, year) %>%
    summarise(pct_avg = mean(pct_avg, na.rm = TRUE)) %>% # average across states
    pivot_wider(names_from = "stressor", values_from = "pct_avg") %>%
    rename('Diseases' = 'Disesases')

# setting rownames to stressors and columns to year
cx_stressor_data_filtered <- stressor_data_filtered %>%
    pivot_longer(Diseases:`Varroa mites`) %>% 
    pivot_wider(names_from = year) %>% 
    column_to_rownames('name')

# take a look at the data
head(cx_stressor_data_filtered)

##                            2015      2016      2017      2018      2019
## Diseases               3.667572  4.152482  5.868794  4.064007  4.323404
## Other                  6.317199  5.781206  6.614184  7.795745  6.450709
## Other pests/parasites 11.219149 10.362766 11.248582 14.437589 12.277305
## Pesticides             7.760993  8.062943  8.056522  8.364539  8.323333
## Unknown                4.183156  4.372518  4.373404  4.660638  4.045390
## Varroa mites          27.172872 29.330142 33.002128 34.948759 31.731206
##                            2020      2021
## Diseases               3.519858  3.058333
## Other                  5.275709  5.774468
## Other pests/parasites  9.834574 10.315116
## Pesticides             5.725189  5.440000
## Unknown                4.474291  3.936905
## Varroa mites          28.882447 27.784043

Looking at the data, we can see the format looks much cleaner and simplified compared to what we were able to give Plotly. As a result our actual plotting code below is fairly straightforward.

canvasXpress(data = cx_stressor_data_filtered, 
             graphType = "Stacked",
             graphOrientation = "vertical",
             title = "Stressor Contributors to Colony Loss",
             xAxisTitle = "Average Stress PCT %")

Our stacked bar charts here are pretty comparable and look solid. The approach towards the same type of chart here was very different depending on which package you were using.

Where Did All the Bees Go? (Part 1)

Lisa Cao

2022-09-07

A Deep Dive into Bee Colony Loss with CanvasXpress and Plotly in R

Scatterplots

Stacked Bar Chart

Let’s take a look at the breakdown between both visualization libraries.

Plotly R

CanvasXpress