Subscribe here to receive the Data Science Roundup every Sunday morning.
Reflecting on the Current “Data Science Moment”
50 Years of Data Science | Keynote speech at John W. Tukey 100th Birthday Celebration
On September 18th, Stanford professor, David Donoho, gave the keynote speech at the John W. Tukey 100th Birthday Celebration at Princeton University. Recently, the preprint of his speech has surfaced, in which Donoho “reviews some ingredients of the current ‘Data Science moment,’ including recent commentary about data science in the popular media, and about how/whether Data Science is really different from Statistics.” Donoho states: “Because all of science itself will soon become data that can be mined, the imminent revolution in Data Science is not about mere ‘scaling up’, but instead the emergence of scientific studies of data analysis science-wide.” In it’s entirety, the speech presents a “vision of data science based on the activities of people who are ‘learning from data’ ” and suggests that this “new field is a better academic enlargement of statistics and machine learning than today’s Data Science Initiatives.”
The Data Science Machine
Automating big-data analysis | MIT News
Max Kanter, a data scientist at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), has developed an algorithm that outperforms humans when searching for patterns in data sets. Working with his thesis advisor, Kanter developed a new system that outperformed 615 of the 906 human teams participating in three data science competitions they entered. More importantly, the “Data Science Machine” took between two and 12 hours to produce entries that human teams were only capable of producing over the course of several months. Margo Seltzer, a professor of computer science at Harvard University, referred to their work as “one of those unbelievable projects where applying cutting-edge research to solve practical problems opens an entirely new way of looking at the problem.”
From Boondoggle to Breakthrough
The Latest Medical Breakthrough In Spinal Cord Injuries Was Made By A Computer Program | Fast Company
A team of neuroscientists and statisticians from the University of California San Francisco partnered with the software firm Ayasdi, to perform “a meta-analysis of $60 million worth of basic research written off as useless 20 years ago.” The team discovered a “relationship between the long-term recovery of spinal cord injury victims and high blood pressure during their initial surgeries.” They accomplished this by reconstructing data from multiple studies conducted in the mid-1990’s, but they decided not to only rely on the published results, and instead asked for unpublished data and lab notes. Because of the complexity of spinal cord injuries, “efforts to isolate simple causal mechanisms have proven elusive,” which led the team to “test old, dark data again, this time using techniques designed for uncovering hidden relationships between large numbers of variables.” They did this by using a technique called topological data analysis (TDA), which uses “concepts from geometric topology—the study of highly complex shapes—to find patterns hidden in large datasets.” In the end, the idea to sift through the old data in new ways meant the team “opened $60 million worth of value” for just over a million dollars.
Building the Future
Hilary Mason: Use data science and machine intelligence to build a better future | TechRepublic
Hilary Mason, Founder of Fast Forward Labs, delivered the keynote at this year’s Grace Hopper Celebration of Women in Computing Conference. In her presentation, titled “Machine Intelligence: A Startup Adventure,” Mason shared what it means now that machines are doing things that we once thought were only in the creative domain of humans and how this evolution impacts the role of the computer scientist, the rise of the data scientist, and the way organizations will be created in the future. She ended the presentation with a slide that read: “You’re building the future. Please build the one you want to live in.” You can watch the full conference video here. (Mason’s presentation starts at 22:00.)
The Science Surveyor
A group of researchers is trying to help science journalists parse academic articles on deadline | Nieman Lab
Researchers at Columbia and Stanford are developing a tool called Science Surveyor to help journalists “find context and background information on tight deadlines.” The project is still in its experimental phase, but the high-level idea is that “the tool takes the text of an academic paper and searches academic databases for other studies using similar terms. The algorithm will surface relevant articles and show how scientific thinking has changed through its use of language.” The open source work so far has involved interdisciplinary work from computer scientists, designers, and journalists, and lives on GitHub. The researchers are hopeful that a library or database might take interest in the project in the future.
Meet the Man Who Built the Google of Math
How to Build a Search Engine for Mathematics | Nautilus
Neil Sloane has been called the “world’s most influential mathematician” for his work in founding the Online Encyclopedia of Integer Sequences (OEIS), one of the best-known mathematical databases on the web. Siobhan Roberts profiles Sloane and takes you inside “command central for the encyclopedia,” which happens to be his attic in Highland Park, New Jersey since his retirement from AT&T labs in 2012. “The resulting reach and range of the encyclopedia sends one down a cascading index encompassing the natural sciences, physical sciences, earth and space, logic and math, applied sciences and technology, social sciences, business and finance, and beyond.” Roberts captures how tirelessly Sloane works on the OEIS and how the “encyclopedia’s impact on scientific research broadly speaking can be measured by its citations in journals, which currently Sloane has tallied to more than 4,500, ranging through biology, botany, zoology, chemistry, thermodynamics, optics, quantum physics, astrophysics, geology, cybernetics, engineering, epidemiology, and anthropology. It is a numerical database of the human canon.”
Each week we surface, summarize, and share the most interesting stories and biggest news from the world of data science. Have articles or podcasts that you think we should be covering in our Data Science Roundup? Send them to editor@rjmetrics.com.