Subscribe here to receive the Data Science Roundup every Sunday morning.

Understanding Algorithms

What to look for in a data scientist | O’Reilly

Data Scientist, Jerry Overton, makes the case that the best way to judge the skill of a data scientist is by looking at his or her “track record of shepherding ideas through funnels of evidence and arriving at insights that are useful in the real world.” Overton believes that “we spend way too much time celebrating the details of machine learning algorithms. A machine learning algorithm is to a data scientist what a compound microscope is to a biologist. The microscope is a source of evidence. The biologist should understand that evidence and how it was produced, but we should expect from our biologists, contributions well beyond custom grinding lenses or calculating refraction indices.”

RJMetrics Data Science Roundup: @JerryAOverton on the best way to judge the skill of a data scientist

Solving the Data Scientist Shortage

Black Boxes & Unicorns | FirstMark’s Data Driven NYC

In his two-part presentation at Data Driven NYC, Jeremy Achin, CEO of DataRobot, refers to complex machine learning models as “black boxes” and data scientists as “unicorns.” Achin discussed ways to assess and interpret predictive models, and starting at 14:23, he presents his view of how the data scientist shortage will be solved.

RJMetrics Data Science Roundup: @DataRobot’s Jeremy Achin on how the data scientist shortage will be solved

Using Data to Break Up Bottlenecks

How RJMetrics Used Their New Product to Analyze its Own Onboarding Funnel | Mode

In a guest post on the Mode blog, our own Senior Business Intelligence Analyst, David Wallace, shares how he built a single funnel analysis chart to monitor user acquisition flow around the launch of our new product, Pipeline. Wallace walks through how he used Pipeline to consolidate relevant acquisition and onboarding data into a single Redshift instance, and then constructed a funnel analysis report using Mode to visualize and monitor the onboarding flow. In the end, Wallace shares how “having this insight visualized in an easily-digestible way allowed us to quickly take action on a problem we otherwise would have been oblivious to.”

RJMetrics Data Science Roundup: @davidjwallace on building funnel analysis to monitor user acquistion flow

Data Storage and DNA

Data Storage on DNA Can Keep It Safe for Centuries | New York Times

John Markoff reports on a group of researchers who discovered how DNA molecules can be used as the basis for an archival storage system. The research is at the forefront of the convergence of computer technology and biology. “The raw storage capacity of DNA is staggering compared with even the most advanced electronic or magnetic storage systems. It is theoretically possible to store an exabyte of information, if it were coded into DNA, in the volume of a grain of sand. An exabyte is roughly equivalent to 200 million DVDs.”

RJMetrics Data Science Roundup: @markoff how the raw storage capacity of DNA is impacting data storage

Data Ethics

Coming Soon: Ethics Training for Data Scientists | Fortune

Barb Darrow calls attention to Microsoft researcher and MIT visiting professor, Kate Crawford’s prediction that the key data science technology breakthrough in 2016 will be that “every data science program will have a data ethics curriculum, giving greater understanding to the human implications of large-scale data collection and experimentation (and ideally producing greater fairness and protection from forms of data discrimination).” Crawford’s prediction is a part of a post that collects predictions for the year 2016 from 16 leading thinkers within Microsoft’s Technology and Research organization.

RJMetrics Data Science Roundup: @gigabarb on @katecrawford’s view of coming ethics training for data scientists

The Rise of the Notebooks

Jupyter, Zeppelin, Beaker: The Rise of the Notebooks | Open Data Science News

Alex Perrier compares the key attributes of the most popular computing notebooks. “Standard software development practices for web, Saas, and industrial environments tend to focus on maintainability, code quality, robustness, and performance. Scientific programming in data science is more concerned with exploration, experimentation, making demos, collaborating, and sharing results. It is this very need for experiments, explorations, and collaborations that is addressed by notebooks for scientific computing. Notebooks are collaborative web-based environments for data exploration and visualization — the perfect toolbox for data science.”

RJMetrics Data Science Roundup: @alexip on the rise of the notebooks via @odsc

The Field Guide to Data Science

The Second Edition Field Guide to Data Science | Booz Allen

Booz Allen Hamilton’s second edition of the Field Guide to Data Science covers “what data science is, why it matters to organizations, as well as how to create data science teams.” The 126 page PDF also includes five new case studies on how a range of organizations have applied data science to gain insights, an updated guide to analytic selection, Kirk Borne’s feature on The Future of Data Science, and a series of “Life in the Trenches” reflections from a variety of practicing data scientists.

RJMetrics Data Science Roundup: @BoozAllen releases their 2nd edition of the #datascience field guide @KirkDBorne

Virtual Reality Data Viz

Exploring Virtual Reality Data Visualization with Gear VR | Civis Analytics

Michael Heilman explores virtual reality data visualization with the Gear VR headset from Samsung and Oculus. The post walks through how to visualize recent research that the Civis Analytics team has done on the the data science social network (shared via an animated GIF). Heilman points out that even though this type of visualization is not particularly useful, there does seem to be potential for ways to convey certain types of visual information more creatively, and possibly more efficiently.

RJMetrics Data Science Roundup: @heilman13 explores virtual reality #dataviz with Gear VR via @CivisAnalytics

Each week we surface, summarize, and share the most interesting stories and biggest news from the world of data science. Have articles or podcasts that you think we should be covering in our Data Science Roundup? Send them to

If you’re not signed up to receive the Data Science Roundup, subscribe here.