Subscribe here to receive the Data Science Roundup every Sunday morning.
Tools, Trends, and Salaries
2015 Data Science Salary Survey | O’Reilly
Researchers John King and Roger Magoulas released the third edition of the O’Reilly Data Science Salary Survey. The report found that SQL, R, Excel, and Python were the most used tools by data professionals for the third year in a row. With that said, Spark (and Scala) saw the largest usage rate increases, and their users tended to earn more. The research also found that salaries were highest in the software industry, and although there was a decrease in the discrepancy in earnings between men and women, when all else is equal, female data professionals still earn $8,026 less than their male counterparts.
If You Can Write a For-Loop, You Can Do Statistics
Statistics for Hackers | Speaker Deck
Jake VanderPlas, the author of the Python Data Science Handbook, recently presented at StitchFix. His talk was not recorded, but his slide presentation covers how you can “hack statistics” if you have basic programming skills. In other words, “if you can write a for-loop you can do statistical analysis.”
RJMetrics Data Science Roundup: Statistics for Hackers preso by @jakevdp http://ow.ly/SHQqB
Everyone is Your Cousin
What’s The Point Podcast | FiveThirtyEight
Esquire editor at large, A.J. Jacobs, talks about his ongoing project of attempting to build a family tree of the entire human race, and how data is changing genealogy through “the plummeting cost of DNA sequencing” and the rise of massive, online “Wikipedia-style” family trees. You can visit Geni.com to dig into the collaborative research of millions of geneologists that has so far connected 95,392,521 profiles.
The Augmented Data Scientist
Learning to learn, or the advent of augmented data scientists | Medium
Simon Benhamou of MFG Labs takes a closer look at the concept of automatic machine learning (autoML), and wonders if it’s possible to “find patterns in the way data scientists work” in order to automate portions of the typical data scientist workflow.
Predictive Policing
Police Program Aims to Pinpoint Those Most Likely to Commit Crimes | New York Times
John Eligon and Timothy Williams of the New York Times report on a strategy being implemented in Kansas City that “combines elements of traditional policing” with “data, including information about friendships, social media activity and drug use, to identify ‘hot people’ and aid the authorities in forecasting crime.” The article explains how “predictive policing” is part of a larger trend in the public and private sector that is using predictive analytics and data mining to analyze behaviors.
Here’s Why Your Doctor Won’t “Be Right With You”
The Googol Reasons Your Doctor Never Sees You On Time – How Data Science Has the Answers | Forbes
Sanjeev Agrawal spent the summer working with a startup focused on healthcare operations analytics, and discovered that “optimal patient slotting” is a challenge much more complex than it appears on the surface. The challenge goes beyond the impact of patient dissatisfaction and extends to “chronically inefficient ‘patient flow’ through the system” which has “serious negative impact on the hospital’s economic bottom line and social responsibility.” Agrawal believes that the “system is broken because hospitals are using a calculator, standard EHR templates, and a whiteboard to solve a math problem that needs a cluster of servers and data scientists to crunch.”
Each week we surface, summarize, and share the most interesting stories and biggest news from the world of data science. Have articles or podcasts that you think we should be covering in our Data Science Roundup? Send them to editor@rjmetrics.com.