Subscribe here to receive the Data Science Roundup every Sunday morning.

Science Isn’t Broken. It’s Just Really Hard.

Science Isn’t Broken | FiveThirtyEight

In a widely discussed article, Christie Aschwanden, lead writer for science at FiveThirtyEight, details a crisis of confidence that is currently roiling through the scientific community. Aschwanden believes the crisis is fueled by a broken incentive system that currently doesn’t align with our need to rely “on science as a means for reaching the truth.”

The article sheds light on a concerning technique of data manipulation known as “p-hacking.” Aschwanden goes deeper on the topic on the What’s the Point podcast. Rather than calling for an end to p-hacking, her recommendation is more of it…just paired with increased transparency:

We shouldn’t think of p-hacking as this terrible thing that’s cheating. We should actually make it compulsory…tell us every single variable that you collected, every way that you analyzed it and don’t just give us the one thing that you got a good p-value for…seeing [variations] in context of the other analyses, it really gets us a lot closer to truth, and remember, that’s what we’re supposed to be going after.

Aschwanden’s investigation leads her to conclude that science is far from broken, but that variations in results occur precisely because the “scientific method is the most rigorous path to knowledge,” and as long as practitioners continue to embrace this fact, science will garner the respect it deserves.

RJMetrics Data Science Roundup: Science isn’t broken. It’s just really hard. From @cragcrest of @FiveThirtyEight http://ow.ly/S6njW

Which Type of Data Scientist Are You?

Doing Data Science at Twitter | Medium

Data Scientist, Robert Chang, shares a reflection on his two years working as a data scientist at Twitter, and why it’s useful for an aspiring data scientist to keep in mind the distinction between two types of data scientists: Type A (for Analysis) vs. Type B (for Building).

Chang was inspired to write the post out of a concern for how popular articles on learning how to become a data scientist (which he does not diminish, and admits to benefiting from) can often put too much emphasis on “techniques, tools, and skill-sets,” while shedding less insight on what data science looks like “in practice.”

The post includes Chang’s description of recent changes in data science at Twitter, how these changes impacted his role, and what skills he needed to use to adapt. It’s a must read for aspiring data scientists.

RJMetrics Data Science Roundup: What type of data scientist are you? Great read by @_rchang via @medium http://ow.ly/S6njW

DS-Ad-blog

Got Social Skills?

Research: Technology Is Only Making Social Skills More Important | HBR

Nicole Torres reports on a paper published by the National Bureau of Economic Research (NBER) that found a growing importance for social skills in the labor market. Why? David Demming, associate professor at the Harvard Graduate School of Education, offers two reasons in the paper:

  1. computers still can’t simulate social interaction
  2. flexibility in playing off the strengths of fellow team members and adapting to change is increasingly significant.

Demming’s research can essentially be boiled down to these two charts:

W150821_TORRES_HIGHSOCIAL
W150821_TORRES_USECONOMY2

While the job market has always looked favorably on those with both math and people skills, the balances have never been tipped so greatly in the favor of this unique (and valuable) blend.

RJMetrics Data Science Roundup: Technology is only making social skills more important. @nicole90t of @HarvardBiz http://ow.ly/S6njW

Meet the Man Who Revolutionized R

Hadley Wickham, the Man Who Revolutionized R | Priceonomics

Dan Kopf of Priceonomics shares the story of Hadley Wickham, the statistician from New Zealand who has become a “giant among data nerds” because of the programming packages he has created for the open-source statistical programming language R. His packages are used by tech giants like Google and Facebook, media outlets like the New York Times and FiveThirtyEight, and government agencies like the FDA and DEA.

Wickham shares how his motivation for empowering people who like to play with data, combined with the fact that he is “exquisitely sensitive to frustration,” has fueled his creation of packages like reshape and ggplot2, the latter of which has fundamentally changed the way many people think about data visualization. The chart below, displays the initial release date and number of downloads for seventeen of Hadley’s packages, but just barely scratches the surface of the impact of what has come to be dubbed the “Hadleyverse.”

ChartHadley

RJMetrics Data Science Roundup: Meet the man who revolutionized R. Recommended read by @dkopf of @priceonomics http://ow.ly/S6njW

The Early Days of Hadoop

Mike Cafarella Episode | O’Reilly Data Show Podcast

On the O’Reilly Data Show Podcast, Ben Lorica talks with Mike Cafarella, currently an assistant professor of computer science at the University of Michigan, but perhaps more famous for co-founding Hadoop and Nutch with Doug Cutting. Cafarella shares insights into his pioneering efforts on open source search and distributed systems, as well as a new startup, ClearCutAnalytics that developed out of a highly successful academic project on structured data extraction.

On the creation of the initial Hadoop Distributed File System (HDFS), Cafarella states:

We spent about a summer working on this distributed indexing mechanism…We finished it, and I felt pretty good about it, and then something like two hours later, we read the Google File System paper and realized, ‘Boy, actually that would be pretty handy; we could really use that.’ We threw out a chunk of it, implemented a very early version of the ideas in that paper, which we called HDFS.

RJMetrics Data Science Roundup: The early days of Hadoop. Good podcast of @MikeCafarella w/ @bigdata of @OReillyMedia http://ow.ly/S6njW

Each week we surface, summarize, and share the most interesting stories and biggest news from the world of data science. Have articles or podcasts that you think we should be covering in our Data Science Roundup? Send them to editor@rjmetrics.com.

If you’re not signed up to receive the Data Science Roundup, subscribe here.

ds-cta