Subscribe here to receive the Data Science Roundup every Sunday morning.
What to look for in a data scientist | O’Reilly
Data Scientist, Jerry Overton, makes the case that the best way to judge the skill of a data scientist is by looking at his or her “track record of shepherding ideas through funnels of evidence and arriving at insights that are useful in the real world.” Overton believes that “we spend way too much time celebrating the details of machine learning algorithms. A machine learning algorithm is to a data scientist what a compound microscope is to a biologist. The microscope is a source of evidence. The biologist should understand that evidence and how it was produced, but we should expect from our biologists, contributions well beyond custom grinding lenses or calculating refraction indices.”
Solving the Data Scientist Shortage
Black Boxes & Unicorns | FirstMark’s Data Driven NYC
In his two-part presentation at Data Driven NYC, Jeremy Achin, CEO of DataRobot, refers to complex machine learning models as “black boxes” and data scientists as “unicorns.” Achin discussed ways to assess and interpret predictive models, and starting at 14:23, he presents his view of how the data scientist shortage will be solved.
Using Data to Break Up Bottlenecks
In a guest post on the Mode blog, our own Senior Business Intelligence Analyst, David Wallace, shares how he built a single funnel analysis chart to monitor user acquisition flow around the launch of our new product, Pipeline. Wallace walks through how he used Pipeline to consolidate relevant acquisition and onboarding data into a single Redshift instance, and then constructed a funnel analysis report using Mode to visualize and monitor the onboarding flow. In the end, Wallace shares how “having this insight visualized in an easily-digestible way allowed us to quickly take action on a problem we otherwise would have been oblivious to.”
Data Storage and DNA
Data Storage on DNA Can Keep It Safe for Centuries | New York Times
John Markoff reports on a group of researchers who discovered how DNA molecules can be used as the basis for an archival storage system. The research is at the forefront of the convergence of computer technology and biology. “The raw storage capacity of DNA is staggering compared with even the most advanced electronic or magnetic storage systems. It is theoretically possible to store an exabyte of information, if it were coded into DNA, in the volume of a grain of sand. An exabyte is roughly equivalent to 200 million DVDs.”
Barb Darrow calls attention to Microsoft researcher and MIT visiting professor, Kate Crawford’s prediction that the key data science technology breakthrough in 2016 will be that “every data science program will have a data ethics curriculum, giving greater understanding to the human implications of large-scale data collection and experimentation (and ideally producing greater fairness and protection from forms of data discrimination).” Crawford’s prediction is a part of a post that collects predictions for the year 2016 from 16 leading thinkers within Microsoft’s Technology and Research organization.
The Rise of the Notebooks
Jupyter, Zeppelin, Beaker: The Rise of the Notebooks | Open Data Science News
Alex Perrier compares the key attributes of the most popular computing notebooks. “Standard software development practices for web, Saas, and industrial environments tend to focus on maintainability, code quality, robustness, and performance. Scientific programming in data science is more concerned with exploration, experimentation, making demos, collaborating, and sharing results. It is this very need for experiments, explorations, and collaborations that is addressed by notebooks for scientific computing. Notebooks are collaborative web-based environments for data exploration and visualization — the perfect toolbox for data science.”
The Field Guide to Data Science
The Second Edition Field Guide to Data Science | Booz Allen
Booz Allen Hamilton’s second edition of the Field Guide to Data Science covers “what data science is, why it matters to organizations, as well as how to create data science teams.” The 126 page PDF also includes five new case studies on how a range of organizations have applied data science to gain insights, an updated guide to analytic selection, Kirk Borne’s feature on The Future of Data Science, and a series of “Life in the Trenches” reflections from a variety of practicing data scientists.
Virtual Reality Data Viz
Exploring Virtual Reality Data Visualization with Gear VR | Civis Analytics
Michael Heilman explores virtual reality data visualization with the Gear VR headset from Samsung and Oculus. The post walks through how to visualize recent research that the Civis Analytics team has done on the the data science social network (shared via an animated GIF). Heilman points out that even though this type of visualization is not particularly useful, there does seem to be potential for ways to convey certain types of visual information more creatively, and possibly more efficiently.
Each week we surface, summarize, and share the most interesting stories and biggest news from the world of data science. Have articles or podcasts that you think we should be covering in our Data Science Roundup? Send them to firstname.lastname@example.org.
If you’re not signed up to receive the Data Science Roundup, subscribe here.