Subscribe here to receive the Data Science Roundup every Sunday morning.

Big Data in NYC

Strata + Hadoop World 2015 | O’Reilly

If you weren’t lucky enough to attend last week’s Strata + Hadoop World conference in New York City, you can watch recordings of keynotes and interviews of the “defining event of the big data movement” here. Recorded presentations include: Jeff Jonas on context computing, Maria Konnikova in praise of boredom, Joseph Sirosh on what 50 million users in 7 days can teach us about big data, Jake Porway’s inspirational talk on what it takes to apply data science for social good, and lots more.

RJMetrics Data Science Roundup: #StrataHadoop recap via @strataconf includes: @JeffJonas @jakeporway @mkonnikova &..


The Human Upgrade

Thought Process: Building an Artificial Brain | Washington Post

In part three of an in depth series on the visionaries who have created some of the biggest tech firms, and are now focused on how to transform the human body, Ariana Eunjung Cha tells the story of Microsoft co-founder, Paul Allen’s quest to dissect the brain and code a new one from scratch. “Although today’s computers are great at storing knowledge, retrieving it and finding patterns, they are often still stumped by a simple question: ‘Why? ’… The most exciting –and disconcerting — developments in the field may be in predictive analytics, which aims to make an informed guess about the future.”

RJMetrics Data Science Roundup: Great read by @arianaeunjung on Paul Allen’s quest to build an artificial brain

Saving Lives with Python

How the Bill & Melinda Gates Foundation Saves Lives with Python | Galvanize

Bo Moore explains how the Gates Foundation uses data science to prioritize where to spend money for funding global health initiatives. The Global Burden of Disease project comes out of the University of Washington’s Institute for Health Metrics and Evaluation (IHME), and “tracks more than 300 diseases and 50 risk factors across 188 countries, modeling data from 1990 through present day and the future.” Kyle Foreman, IHME’s assistant director of scientific computing, says that a single run of their data “produces more than 25 TB of data, and some newer projects exceed 1 petabyte.” Foreman explains why using Python allows them to scale and easily collaborate with people around the world when they are developing models.

RJMetrics Data Science Roundup: @usebomswisely on using data science to save lives via @galvanize

Mastering Tetris May Be the Most Important Thing You Ever Do

How Tetris explains the promise of the ultimate algorithm | Washington Post

Matt McFarland shares why Pedro Domingos believes that if “you can solve Tetris, you can solve thousands of the hardest and most important problems in science, technology, and management — all in one fell swoop. That’s because at heart they are all the same problem.” In his new book, The Master Algorithm, Domingos tells the story of how “the five tribes of machine learning: symbolists, connectionists, evolutionaries, Bayesians and analogizers” have the potential to invent the “one learning algorithm that can derive all knowledge from data.” But, ultimately Domingos believes that the race to solve this problem, “one of the greatest scientific achievements of all time,” is likely to come from someone outside these schools of thought, and instead from someone “who is 20-years-old and just has a new idea.”

RJMetrics Data Science Roundup: @mattmcfarland on @pmddomingos & how Tetris holds the hope of the ultimate algorithm

Putting Dark Data to Good Use

Meet the computer scientist who just got $625K for his work that helps track down human traffickers | Business Insider

Christopher Re is a Stanford computer scientist who was chosen as one of the MacArthur Foundation 2015 ‘genius grant’ award winners, for his work developing DeepDive, a “new type of data management system that enables one to tackle extraction, integration, and prediction problems in a single system, which allows users to rapidly construct sophisticated end-to-end pipelines, such as, dark data BI systems.” Re’s programs can “improve on their own with machine learning and can be integrated into existing database systems,” and they are available to everyone. Notably, DeepDive has been used to track down human traffickers on the dark web. To learn more about DeepDive listen to the O’Reilly Data Show podcast with Hadoop co-founder, Mike Cafarella, shared in Roundup #1

RJMetrics Data Science Roundup: #MacFellow Christopher Re and his DeepDive inference engine

The Data of Space

Podcast Episode 36 | Partially Derivative

The Partially Derivative team talks astrophysics, cosmology, machine vision, dwarf galaxies and dark matter with guests including: Sudeep Das, currently a data scientist at OpenTable, but known for his research as an astrophysicist at Princeton and Berkeley; Gurtina Besla, assistant professor of Astronomy at the University of Arizona; Destry Saul, former astrophysicist turned data scientist at Cisco; Andrzej Stewart, the Chief Engineering Officer of a year-long simulation mission to Mars; and Kirk Borne, currently Principal Data Scientist at Booze Allen Hamilton, and formerly of NASA.

RJMetrics Data Science Roundup: The Data of Space via @partiallyd w/ @datamusing, @KirkDBorne, @DestrySaul, & more

Each week we surface, summarize, and share the most interesting stories and biggest news from the world of data science. Have articles or podcasts that you think we should be covering in our Data Science Roundup? Send them to