Subscribe here to receive the Data Science Roundup every Sunday morning.
Big Data in NYC
Strata + Hadoop World 2015 | O’Reilly
If you weren’t lucky enough to attend last week’s Strata + Hadoop World conference in New York City, you can watch recordings of keynotes and interviews of the “defining event of the big data movement” here. Recorded presentations include: Jeff Jonas on context computing, Maria Konnikova in praise of boredom, Joseph Sirosh on what 50 million users in 7 days can teach us about big data, Jake Porway’s inspirational talk on what it takes to apply data science for social good, and lots more.
The Human Upgrade
Thought Process: Building an Artificial Brain | Washington Post
In part three of an in depth series on the visionaries who have created some of the biggest tech firms, and are now focused on how to transform the human body, Ariana Eunjung Cha tells the story of Microsoft co-founder, Paul Allen’s quest to dissect the brain and code a new one from scratch. “Although today’s computers are great at storing knowledge, retrieving it and finding patterns, they are often still stumped by a simple question: ‘Why? ’… The most exciting –and disconcerting — developments in the field may be in predictive analytics, which aims to make an informed guess about the future.”
Saving Lives with Python
How the Bill & Melinda Gates Foundation Saves Lives with Python | Galvanize
Bo Moore explains how the Gates Foundation uses data science to prioritize where to spend money for funding global health initiatives. The Global Burden of Disease project comes out of the University of Washington’s Institute for Health Metrics and Evaluation (IHME), and “tracks more than 300 diseases and 50 risk factors across 188 countries, modeling data from 1990 through present day and the future.” Kyle Foreman, IHME’s assistant director of scientific computing, says that a single run of their data “produces more than 25 TB of data, and some newer projects exceed 1 petabyte.” Foreman explains why using Python allows them to scale and easily collaborate with people around the world when they are developing models.
Mastering Tetris May Be the Most Important Thing You Ever Do
How Tetris explains the promise of the ultimate algorithm | Washington Post
Matt McFarland shares why Pedro Domingos believes that if “you can solve Tetris, you can solve thousands of the hardest and most important problems in science, technology, and management — all in one fell swoop. That’s because at heart they are all the same problem.” In his new book, The Master Algorithm, Domingos tells the story of how “the five tribes of machine learning: symbolists, connectionists, evolutionaries, Bayesians and analogizers” have the potential to invent the “one learning algorithm that can derive all knowledge from data.” But, ultimately Domingos believes that the race to solve this problem, “one of the greatest scientific achievements of all time,” is likely to come from someone outside these schools of thought, and instead from someone “who is 20-years-old and just has a new idea.”
Putting Dark Data to Good Use
Meet the computer scientist who just got $625K for his work that helps track down human traffickers | Business Insider
Christopher Re is a Stanford computer scientist who was chosen as one of the MacArthur Foundation 2015 ‘genius grant’ award winners, for his work developing DeepDive, a “new type of data management system that enables one to tackle extraction, integration, and prediction problems in a single system, which allows users to rapidly construct sophisticated end-to-end pipelines, such as, dark data BI systems.” Re’s programs can “improve on their own with machine learning and can be integrated into existing database systems,” and they are available to everyone. Notably, DeepDive has been used to track down human traffickers on the dark web. To learn more about DeepDive listen to the O’Reilly Data Show podcast with Hadoop co-founder, Mike Cafarella, shared in Roundup #1
The Data of Space
Podcast Episode 36 | Partially Derivative
The Partially Derivative team talks astrophysics, cosmology, machine vision, dwarf galaxies and dark matter with guests including: Sudeep Das, currently a data scientist at OpenTable, but known for his research as an astrophysicist at Princeton and Berkeley; Gurtina Besla, assistant professor of Astronomy at the University of Arizona; Destry Saul, former astrophysicist turned data scientist at Cisco; Andrzej Stewart, the Chief Engineering Officer of a year-long simulation mission to Mars; and Kirk Borne, currently Principal Data Scientist at Booze Allen Hamilton, and formerly of NASA.
Each week we surface, summarize, and share the most interesting stories and biggest news from the world of data science. Have articles or podcasts that you think we should be covering in our Data Science Roundup? Send them to editor@rjmetrics.com.