Subscribe here to receive the Data Science Roundup every Sunday morning.

One Huge Lesson in Humility

Data Helped Me Lose 100 Pounds | New Republic

Six years ago, New Republic data columnist, Paul Ford, built a website and database to help him lose weight. And it worked. He went on to lose 100 pounds. But he’s since gained it back. “All of this has made me incredibly empathetic to the tools of the quantified self, the little devices that have sprouted around the world. . .But they’re not for me. They lack the capacity for narrative. A chart is not a story.” Ford’s account is a deeply personal meditation on how we measure and understand “the texture of [our] existence.”

RJMetrics Data Science Roundup: One huge lesson in humility. From @ftrain of @NewRepublic http://ow.ly/SqXhL

Can You Quantify Style?

Deep Style | MultiThreaded

Stitch Fix data scientist, TJ Torres, shares how “developing algorithms to quantify abstract concepts like style, fashion, and art may one day move us forward toward a more complex understanding of how we as people process and analyze abstract unstructured data.” A fascinating read that covers concepts such as artificial neural networks, deep learning, the variational auto-encoder, and how Stitch Fix is creating an automated process to understand and quantify the style of its inventory and clients.

RJMetrics Data Science Roundup: Can you quantify style? From @Teejosaur of @stitchfix http://ow.ly/SqXhL

DS-Ad-blog

Here’s What Machine Learning Looks Like

A Visual Introduction to Machine Learning | R2D3

Stephanie Yee and Tony Chu created a step-by-step visual explanation of what actually happens as a machine “learns.” It’s part one of an interesting exercise to explain statistical concepts through interaction design.

Machine-learning-visual-guide

RJMetrics Data Science Roundup: Visualizing Machine Learning. From @stephaniejyee and @tonyhschu of @r2d3us http://ow.ly/SqXhL

Computer Knows Best

Facebook ‘Likes’ Mean a Computer Knows You Better Than Your Mother | WSJ

Georgia Wells of the Wall Street Journal talks to Stanford Graduate School of Business assistant professor, Michal Kosinski, who recently published research conducted with colleagues Wu Youyou and David Stillwell, that shed light on the “major advantages computers have over humans” in their ability to make predictions based off of millions of pieces of information at once. The researchers developed a model that predicted a broad range of people’s preferences and behavior. Try it for yourself here: applymagicsauce.com.

RJMetrics Data Science Roundup: Computer Knows Best. From @georgia_wells of @WSJ http://ow.ly/SqXhL

Getting Your Hands on a New Data Set

What I do when I get a new data set as told through tweets | Simply Statistics

Jeff Leek shares a glimpse into his step-by-step process of what he does when he gets new data to work with. Leek was inspired by the question Hilary Mason posed on twitter. “At least for me I come to a new data set in one of three ways: (1) I made it myself, (2) a collaborator created a data set with a specific question in mind, or (3) a collaborator created a data set and just wants to explore it. In the first case and the second case I already know what the question is, although sometimes in case (2) I still spend a little more time making sure I understand the question before diving in.” Leek goes on to elaborate on the variety of twitter responses to share his own, in-depth perspective.

RJMetrics Data Science Roundup: Getting your hands on a new data set. From @jtleek of @simplystats http://ow.ly/SqXhL

That’s a Lot of Code

Google Is 2 Billion Lines of Code — And It’s All in One Place | Wired

Wired’s Cade Metz reports that Google’s Rachel Potvin revealed that the software needed to run all of Google’s Internet services makes up around 2 billion lines of code. This means that “building Google is roughly the equivalent of building the Windows operating system 40 times over.” And here’s the kicker, all 2 billion lines sit in one single code repository. The system is called Piper, and it spans around 85 terabytes of data, across 10 different Google data centers, and the company’s 25,000 engineers make around 45,000 commits to the repository every day, or in other words, they “modify 15 million lines of code across 250,000 files each week.”

RJMetrics Data Science Roundup: That’s a lot of code. From @CadeMetz of @WIRED http://ow.ly/SqXhL

Each week we surface, summarize, and share the most interesting stories and biggest news from the world of data science. Have articles or podcasts that you think we should be covering in our Data Science Roundup? Send them to editor@rjmetrics.com.

If you’re not signed up to receive the Data Science Roundup, subscribe here.

ds-cta