Subscribe here to receive the Data Science Roundup every Sunday morning.

Be the Data

Virtual Reality Space Lets Students Experience Big Data| Campus Technology

Virginia Tech’s Cube is an “adaptable space for research and experimentation” that is currently hosting “an immersive big data environment.” Visitors don a head-mounted virtual reality display and move through a “room four stories high, 50 feet wide and 40 feet deep.” As they move, they are immersed in vast amounts of visual and aural data that was collected from an actual tornado that hit Moore, Oklahoma in 2013. Visitors are then able to become the data and explore concepts like clustering, while also “learning about the underlying math involved in data analysis.” Ben Knapp, director of the Institute for Creativity, Arts & Technology (ICAT) is excited that this new approach to instruction could be used to help students learn how to “explore abstract data, visualize it and detect changes in patterns within the data.”

RJMetrics Data Science Roundup: The immersive big data experience via @Campus_Tech & @vtnews http://ow.ly/TexjS

The Self-Identified Data Scientist

How Many Data Scientists Are There?| The Data Point

A look at how many data scientists are out there, and how this number has changed over time, based on our recently published, 3,000 word report on The State of Data Science. We analyzed self-reported LinkedIn data that included 60,000 professional experiences, 27,000 degrees, and 254,000 skills. Out of 236 M LinkedIn profiles we found 11,400 self-identifying data scientists, with 52% of them having earned that title within the past 4 years.

RJMetrics Data Science Roundup: How many self-identified data scientists are out there? http://ow.ly/TexjS

Ten Solutions to Your Small Data Problems

What to do with “small” data? | Medium

Ahmed El Deeb, an applied scientist at Microsoft, shares ten possible solutions for what to do when “a data set with very few data points turns up” and your team of data scientists with “big-data infrastructure tools and machine learning algorithms” struggle to adjust to the unique challenges of “small” data. The article points out that most problems of small data are related to high variance, and the ten suggested solutions “revolve around three main themes: constrained modeling, smoothing and quantification of uncertainty.” The number one solution? Hire a statistician. “Statisticians are the original data scientists. The field of statistics was developed when data was much harder to come by, and as such was very aware of small-sample problems.”

RJMetrics Data Science Roundup: What to do with your “small” data via @D33B http://ow.ly/TexjS

Differential Privacy and the Promise of Big Data

Theoretical computer science provides answers to data privacy problem| NSF

The National Science Foundation (NSF) announced the development of new tools that allow researchers to safely share and study sensitive data. Salil Vadhan, professor of computer science at Harvard, is leading research into an approach known as “differential privacy” that allows for investigations of data without revealing confidential information about the participants. Vadhan hopes the techniques that come from this research will “enable more researchers to share, retain control of, and credit for their data contributions as part of the Dataverse Network, a project that guarantees the long-term preservation of critical datasets.” The project takes a “highly interdisciplinary approach which brings together deep expertise in computer science, social science, statistics, and law,” and has the potential to transform the ability of researchers to uncover insights that could “save lives, improve services, and inform our understanding of the world.”

RJMetrics Data Science Roundup: Differential privacy and the promise of big data via @NSF http://ow.ly/TexjS

Head to Head Data Analysis

R vs Python: Head to Head Data Analysis | Dataquest

The team at Dataquest offers lessons for both R and Python, and believe that both languages “have a place in a data science toolkit.” Instead of offering another comparative article to the mix, they decided to “analyze a dataset side by side in Python and R, and show what code is needed in both languages to achieve the same result.” For each step of the analysis they provide the code for both languages, along with explanation of the different approaches. The result is an interesting look at the strengths and weaknesses of each language.

RJMetrics Data Science Roundup: Head to head data analysis: R vs Python via @dataquestio http://ow.ly/TexjS

The Human Algorithm

Automating Startup Data Collection at Mattermark | Mattermark

Sarah Catanzaro is the head of data at Mattermark, where she leads a team of three analysts and three machine learning engineers that “collect and clean massive amounts of company data.” Last week, at our Datapoint Live conference in San Francisco, Sarah spoke about how she leverages a lean data team to achieve scale through automation. The article includes links to Sarah’s deck and an additional video on how the machine learning engineers “tackle the software engineering challenges” the team faces as they attempt to “organize the world’s business information.”

RJMetrics Data Science Roundup: The Human Algorithm via @sarahcat21 and @Mattermark http://ow.ly/TexjS

Each week we surface, summarize, and share the most interesting stories and biggest news from the world of data science. Have articles or podcasts that you think we should be covering in our Data Science Roundup? Send them to editor@rjmetrics.com.

ds-cta