Pro Bono: IEEE Chicago Section Statistician - Part 4
This post is the fourth in a series on my pro bono efforts outside of the workplace as Statistician for the IEEE Chicago Section, a new position that I filled in September 2012 after the membership development team realized that ongoing data analyses and reporting was needed around declining enrollment.
As discussed in the last post, the bulk of my initial time spent on this work was fixing existing graphs that I had inherited, moving from Microsoft Excel to the R language, maintaining existing data sets, creating new data sets, and creating formal presentations that I started distributing on a monthly basis.
After creating re-runnable R scripts in early 2013 for the monthly graphs that the team came to expect, my attention turned toward creating new visualizations that seek to better understand the data, and I began to spend more time to communicate my analyses for the nonanalytical.
My first post discussed some of my initial thoughts surrounding my early experiences on the team, and presented one of the new visualizations that I created, a heat map that breaks down top technical interest areas of members which the Membership Development Chair began using in his marketing efforts.
In my second post, I walked through some of the issues in the existing graphs that I had inherited, and showed how I fixed them in Microsoft Excel and ported them to the R language with a consistent look and feel, as well as how I began including them in IEEE Chicago Section presentations.
My third post discussed the goal of the Membership Development Chair to "close the membership gap" that is currently increasing each year, and how these declines in membership are heavily the result of members going into arrears each February when annual membership payments become due.
Rather than closing the membership gap during the course of the year (i.e. during months other than February), I argued that the goal should really be to minimize the number of members who go into arrears due to missed payments, because on an annual basis a pareto chart would show that membership deletions due to arrears status exhibit by far the largest frequency of occurrence.
During the annual membership development summit meeting held in November 2013, I had the floor for a bulk of the discussion, and I emphasized that we should really be comparing to previous years, the reason for my inclusion of an increased number of visualizations to the mix.
Our membership numbers get hit hard one month of the year, and then we spend the rest of the year trying to make up for these losses. The membership development team largely lacks control over the one month (corporate IEEE should be able to help out here), but it can help change course throughout the rest of the year and compare how the data looks relative to other years during the course of the year.
In addition, I reiterated that if we focus solely on total membership numbers, we are going to miss everything else that is going on with membership. Getting down to the individual members that are in these aggregates at some point in the future should provide quite a bit of additional insight.
Initially, my goal was to put together a proposal to corporate IEEE to provide at least one additional data item: historical member status. We have access to historical member grade in the database, but not historical member status.This data item seemed to be the missing link that the database was not providing, but it has been admittedly difficult to determine whether or to what degree this is the case because corporate IEEE at this point in time has not yet provided data models.
Following my communication to the team of this goal, I realized that the data issues inherent in the database are likely more pervasive, because a number of other data items do not appear to be effective dated. It is an issue if I am using Oracle Business Intelligence Enterprise Edition (OBIEE) 11g to create a "new analysis", and the tool permits me to report effective dated versus non-effective dated data items.
What may be needed is an overhaul of the way the data is stored, and tasks such as this are typically not trivial. During a call to corporate IEEE last year, it was communicated to me that the database has historically had no "advanced users" such as myself, and so many of the topics that I had introduced in the conversation were simply off of their radar.
None of the presentations I have created are affected by such a data issue because they all explore data at the aggregate level, and not at the individual member level. Since the database does not yet seem to provide the ability to effectively consider historical member data in relationship to aggregate data, I am still exploring the use of other visualizations that help provide a more holistic view of the aggregate data.
After working with simple line charts which plot member counts across targeted member grades over time, I moved to diagrams that use February as the baseline month for the year and plot percent difference in member counts from this baseline, since this is the month that typically introduces a vast majority of arrears status occurrences.
The follow-up box plots and beeswarms were intended to show monthly member count variances over the years, how these break down by member additions and deletions, and any data patterns that might exist. However, none of these visualizations show relative net difference in member counts across all years and months, so I added a variation of the scatter plot, the balloon plot (aka bubble chart) to my last two presentations, using the R language ggplot2 package.
Each data point represents a calendar month since January 2008. The position of each data point shows the relative relationship between member additions and deletions for each month, and the size of each data point shows the net difference between the two, in absolute terms (with green data points as net gains and red data points as net losses). In my first plot for January 2014 above, a quick glance communicates where the month falls relative to all other months.
The second balloon plot, for the month of February 2014, shows that it is a typical month relative to all past February months. Providing labels for every data point to indicate month would introduce significant clutter, but suffice it to say that all of the data points in red in the lower left hand corner are for the month of February across the indicated years.
Quite coincidentally, the month of March 2014 was the best month across the years covered by the database in terms of net difference. The larger the green data point the better, and the closer to the upper right hand corner, the better. While it is difficult to assess the size of the data point for this month relative to some other data points just below it, I know that the net difference for this month is larger because I revisited the source data to verify.
The Membership Development Chair requested that I walk him through the plots in my latest monthly presentation to the team (currently 21 slides), partially in preparation for his request of me to present to the executive committee next month, and I realized that this visualization in particular might benefit from some annotation for the nonanalytical.
For example, one question that was posed is why I think March 2014 was the best month when there are data points above it in the plot. So I explained that although a few months have a greater number of (positive) additions along the y-axis, these months also have a greater number of (negative) deletions along the x-axis, and the quantified net difference favors March 2014.
Outside the drilling down to individual members that will be my focus for my proposal to corporate IEEE, the only other aggregate level visualizations that I am currently considering are breakdowns of monthly additions and deletions into categories.
Member addition counts can be broken down into the following categories: arrears paid, correction addition, elections, grade transfer to, moved into, and reinstatements. And member deletion counts can be broken down as follows: arrears, correction deletion, deceased, grade transfer from, member to affiliate, moved out of, and resigned.
My next entries in this series of blog posts will likely discuss my presentation to the executive committee (which some team members are now suggesting I execute on a regular basis because of the value add that they see), the additional visualizations just mentioned, and potentially some high-level whiteboarding as to how data needs to be stored in Oracle in order for me to effectively use it.
Pro Bono: IEEE Chicago Section Statistician – Part 1, Part 2, Part 3, Part 5