Analyzing the Analyzers is a recently published report by Harlan Harris, Sean Patrick Murphy and Marck Vaisman, documenting the results of a 2012 survey of ‘several hundred’ data scientists.
The report is free and just 25 pages of text, plus an appendix- you should read it.
The authors’ central contention is that there is not one set of skills that organizations should look for in a data scientist. Instead, there are four distinct skill groupings that you will find in the ‘data science’ world:
- Data Businesspeople: managers primarily focused on their organization and the bottom line
- Data Creatives: hackers who feel comfortable with the entire data pipeline, from extraction to presentation
- Data Developers: back-end and infrastructure engineers primarily working on data extraction, storage, and scale issues
- Data Researchers: academics, usually with a strong background in statistics
(if you are interested in the skill sets considered and how you would be categorized, you can check out the authors’ quick survey).
The report goes on to cover career path issues and the distribution of skill that data-scientists have. Again, if you’re interested in these things, I recommend you spend the 30-40 minutes and read the report.
Unfortunately, it seems like the authors didn’t make any data available for us to play with or check their work (come on guys, know your audience!) but I certainly agree with their main point: ‘Data Scientist’ as a title, isn’t particularly useful.
@revodavid the main problem with DS title is that EVERYBODY wants to use it. it’s so diluted that i put “engineer” on my biz card instead.
— Adam Laiacano (@adamlaiacano) May 16, 2013
— Ryan Rosario (@DataJunkie) May 16, 2013
Kaggle now has 100K data scientists, but what’s a data scientist? http://t.co/tRGpd4Q97Z
— GigaOM (@gigaom) July 11, 2013
So are these four groups the right way to think about data science? One thing that jumped out at me is that they seem to match up with my experience with the data pipeline.
- Development is needed for Extraction and Storage of the data
- Research is needed for finding patterns in the data
- Analysis and Presentation of the data are needed to drive decision-making
- Coordination and Productization are need to actually extract value from all this work
Though I think about the roles differently, these four sections of the data-to-product pipeline seem to match the authors’ categories. And from the beginning, it has been clear that ‘data-science-whatever-that-is’ requires a bunch of talents: see DJ Patil’s thoughts about building a data science team. Since DJ Patil allegedly came up with the term ‘data scientist’ in the first place, this seems appropriate to me, though I doubt we’ll see people moving en-mass away from the ‘Data Science’ moniker.
For the record, I felt the need when I started this blog to stress that I’m not a scientist.