This document is intended to help you start a data for social good (D4G) group in your area. It reflects what I have learned in helping get such a group up and running in Ottawa and Montreal. Of course, you must modify/adapt/tweak to your groups needs and abilities.
I have been doing data science forever – seriously. In 1986 I reverse engineered the file formats of a simplex algorithm optimizer in order to make an interface between it and 20/20 (a Lotus 123 clone), so that the company I worked for could more easily optimize its distribution network.
When I first heard about DataKind I thought ‘what a great idea!, there must be something similar in Ottawa.’ And there was – sort of. A meetup site for ‘Data For Good Ottawa’ had been created by Joy & Victor, of ‘Data For Good Toronto’. In fact they had created sites for many of the major cities in Canada, and it seemed they had Toronto up and running, I watched the Ottawa site for a while to see when meetups were being scheduled, but none were. So I pinged the members to see if anybody wanted to get together and chat about this stuff. A few brave souls did, and that was the beginning.
After a while we realized that we had two major problems: no space and no projects. If we had a good place to meet, then even without a project we could get together and do presentations about data science to each other while searching for a project. And certainly working on a project would attract more members.
Joy was hugely helpful in recruiting Tina to be a co-organizer with me. And Tina knows people. Including Greg, a biggie at one of the local MBA schools. Greg was willing to let us use some space at the university for our meetups, so we were halfway there.
Next we needed a project. A buddy organized a few of us one day at lunch to go to the grocery store to buy food for the local food bank. On the way back I realized that the food bank might be a good choice for a first project: the value of a food bank is obvious, and they probably had data about their donations and deliveries. After a few failed attempts on my part, Tina managed to find the right lever to pull (she’s magic), and we arranged a meeting. We discussed their data, the kinds of questions they had, bounced around ideas of what we might be able to do for them, and reiterated that this would cost nothing….free…no dollars attached…volunteers only.
Fast forward a few years and we have completed a few projects for very worthy causes in Ottawa, we have a few at various stages getting ready to start, and we have some repeat customers waiting patiently for our services. We have gone from five people at the first meeting in spring of 2014 to 429 meetup members, of which about 50 are ‘active’ at any time.
So my main message is the Nike slogan, ‘just do it’. This document is an attempt to make it a bit easier for you to do so, but remember this is only my experience. Adapt as you see fit. I make no claims as to optimality, nor even rationality….just one guy’s observations.
How to Start a D4G
To start a D4G you need several things, some more easy to come by than others. You need people, a suitable place to meet, and a project.
People are relatively easy to get (but harder to keep). Advertising your meetup site, talking about it on Facebook, twitter, and in real-life spreads the word. Going to other related meetups and presenting there also helps recruit new members.
A suitable space is one that is centrally located so that everyone is equally unhappy about the location, and that has all the needed technology (projector, white boards, chairs & tables). In Ottawa some pubs have back rooms that they are happy to let meetups use on off-nights, but they tend to be noisy and not have tables big enough to work on. Conference rooms and classrooms are good, but harder to come by. Beg your members to get access.
You will also have to decide how you will run the group. There are two major modes, not necessarily exclusive. One mode runs a few weekend long datathons each year. The other mode is longer term projects where the monthly meetups are for status updates, sharing results and various presentations, but the majority of the work happens asynchronously by the members at a time suitable to them.
To some degree that decision guides the next major decisions, about ‘business polices’: project management, confidentiality, project selection guidelines, scheduling, collaborations with other groups, etc. In general, every decision should be evaluated in terms of ‘does it help answer the data questions of our clients?’ If not then don’t do it. This is data for good, not project management for good, or database design for good.
Projects are key. Until you get a project going, presenting to each other on data science topics of interest can keep the enthusiasm going, but the whole purpose of the group is to help he community. Projects can come from anywhere. Talk to neighbors, friends, co-workers. Spread the word about what you can do and buy many people a cup of coffee while you explain. Let a thousand blossoms bloom.
We have found that thinking of the clients business with reference to a framework of innies, outies, and externalities helps guide conversations, leading to potential project questions. Innies are all the things that flow into the client organization, and outies are all the things that leave the client organization. Externalities are (obviously) the external factors that can affect the client’s business, and it is our job to understand how all these interact. In the simple case of a food bank, innies may include donations of food and money, requests for food, volunteer hours, etc. Outies could include trucks of food and advertising dollars. Externalities might include stories in the local newspaper that raise awareness about hunger in the city, or an influx of immigrants.
Your first project should be one that you are certain you can deliver on. Set expectations such that in the worst case you could do something reasonable yourself. You will likely end up with much more than that, but better safe than sorry. Try to find a project that has an obvious value (so that potential members and clients see the benefit of the work), and no controversy associated (so that nobody gets turned off). You may want to check out charity rating organizations, etc to ensure the organization you are thinking of working with is as reputable as they seem. Google is your friend.
Under-promise and over-deliver. Set minimum, base, and stretch goals for the project, but do not tell the client about the stretch goal in case you can not deliver on it.
At the end, get feedback from the client, and a testimonial that you can use to help solicit new clients.
How to Run a D4G
We like to run it loose. We all have day jobs, and jiras will turn people away. It should be fun for the members to contribute in any way they wish.
It is the nature of a volunteer effort that people come and go. Some people are in town for only a semester. Some people get busy with new jobs or kids. Others just don’t find the group to their liking. So there is a need for constant recruitment.
Luckily, data science is a hot topic these days (with too much hype) and so many people are trying to shift their careers towards the field. They see this group as a way to learn some skills while filling out their resume/portfolio, or trying out the work before they commit to it as a career. That’s good. But it also means they may have limited data science skills. In Ottawa a little goes a long way – maybe in your city too, as the organizations we typically work with are not very data-sophisticated, as they are busy with their day-to-day business. That means that simple maps and bar charts can be very useful to them.
Students of local universities are looking to find a competitive advantage over their classmates in the job market, and D4G can help in that role. You just have to get to them. Recent statistics grads are good people to help make those contacts.
How to get new members up to speed
You may find that many people attend a meetup or two but don’t really want to do the work and stay involved. It seems that many people hear great things about careers in data science and realize that D4G is a way to get started. But, then they show up and find out that it is a lot of work, and that we are not training them, and they are unwilling/unable to participate and so leave.
On the other hand, a smaller number of people make the effort, and we certainly want to make their indoctrination as painless as possible. We try to do that in several ways (but we are always willing to find ways to make it easier). We typically have on our site a FAQ describing all about how we work. We also have occasional ‘boot camps’ where we go over how we work. These also form a chunk of the presentations we give to other groups when trying to recruit members.
For each project we try to have ‘starter code’ that shows a beginner (or not-so beginner) how to load the data, do some plots, and address some simple questions.
We do some ongoing training, mostly in the form of demo-ing things we’ve learned to each other in the context of the projects we are working on. We also have occasional speakers from the organizations we work with, members from other meetup groups, etc.
As we work on long term projects (as opposed to weekend long datathons), we divide our projects into several phases. Sometimes the phases overlap, and/or repeat, but the general flow is as follows:
- Start: Conversations with contacts etc. Initial contact with the client.
- Exploration: Discussions with the client about their data, business issues that can be addressed by the data, etc. Output is a SoW (Statement of Work)
- Data extraction: Lead by the data ambassador in conjunction withe the clients dba. Data is exported, anonymized, validated, cleaned, and morphed into a form suitable for use by the team. Starter code is also created in this phase.
- Analysis I: Heads down, everybody creating research artifacts guided by the clients questions
- Checkpoint: A meeting with the client to ensure that we are addressing their issues, presentation of preliminary results, direction setting for the remainder of the project.
- Analysis II: Heads down, everybody creating research artifacts guided by the clients questions and results of the checkpoint meeting
- Wrap-up: A doc is prepared with the results, which is presented to the client
- Follow up: A week or two after the presentation meeting the client is contacted and asked if they have any need for clarification, etc. This may result in a little more effort, or the potential of a new project. The purpose is not to extend the current project, but to conclude it.
There are several roles with a project that must be filled. One person can fill many roles, and in the extreme case they can all be the same person. But it is probably simpler if they are different people. And I hate the titles of the roles, but whatever.
- Data ambassador: The prime contact to the org’s main data person. They are responsible for handling all queries about the data (so as to not burden the org, side benefit they may already know the answer). They do the extract. They sanitize and reformat the data for general consumption. They anonymize. They explain trade-offs in levels of anonymization to the org, risk vs reward, etc.
- Quant: Ensures that any results that are to be presented to the org are quantitatively (statistically) correct, and not erroneous or statistical anomalies.
- Story Teller: Charged with ensuring that the results of the analysis are turned into a coherent story that can be understood by the org/non-stats people
- Project Manager: Does the initial contact with the org, develops the SoW, presents. Ensures the project gets completed by being the bad guy when necessary.
Note All these roles may call upon other members for additional manpower, skills, etc as needed.