Subscribe here to receive the Data Science Roundup every Sunday morning.

Here’s What Machine Learning Looks Like

A Visual Introduction to Machine Learning | R2D3

Stephanie Yee and Tony Chu created a step-by-step visual explanation of what actually happens as a machine “learns.” It’s part one of an interesting exercise to explain statistical concepts through interaction design. Yee and Chu are working on a follow up that visually explores the concept of overfitting, and how it relates to a fundamental tradeoff in machine learning.

RJMetrics Data Science Roundup: Visualizing #machinelearning by @stephaniejyee & @tonyhschu #mostpopular2015 https://goo.gl/wDe4MQ

Tools, Trends, and Salaries

2015 Data Science Salary Survey | O’Reilly

Researchers John King and Roger Magoulas released the third edition of the O’Reilly Data Science Salary Survey. The report found that SQL, R, Excel, and Python were the most used tools by data professionals for the third year in a row. However, Spark and Scala saw the largest usage rate increases, and their users tended to earn more. The research also found that salaries were highest in the software industry, and although there was a decrease in the discrepancy in earnings between men and women, when all else is equal, female data professionals still earn $8,026 less than their male counterparts.

RJMetrics Data Science Roundup: Tools, trends, & salaries of data professionals via @OReillyMedia #mostpopular2015 https://goo.gl/wDe4MQ

If You Can Write a For-Loop, You Can Do Statistics

Statistics for Hackers | Speaker Deck

Jake VanderPlas, the author of the Python Data Science Handbook, recently presented at StitchFix. His talk was not recorded, but his slide presentation covers how you can “hack statistics” if you have basic programming skills. In other words, “if you can write a for-loop you can do statistical analysis.”

RJMetrics Data Science Roundup: Statistics for Hackers preso by @jakevdp #mostpopular2015 https://goo.gl/wDe4MQ

One Huge Lesson in Humility

Data Helped Me Lose 100 Pounds | New Republic

Six years ago, New Republic data columnist, Paul Ford, built a website and database to help him lose weight. And it worked. He went on to lose 100 pounds. But he’s since gained it back. “All of this has made me incredibly empathetic to the tools of the quantified self, the little devices that have sprouted around the world. . .But they’re not for me. They lack the capacity for narrative. A chart is not a story.” Ford’s account is a deeply personal meditation on how we measure and understand “the texture of [our] existence.”

RJMetrics Data Science Roundup: One huge lesson in humility. From @ftrain of @NewRepublic #mostpopular2015 https://goo.gl/wDe4MQ

Can You Quantify Style?

Deep Style: Inferring the Unknown to Predict the Future of Fashion | MultiThreaded

Stitch Fix data scientist, TJ Torres, shares how “developing algorithms to quantify abstract concepts like style, fashion, and art may one day move us forward toward a more complex understanding of how we as people process and analyze abstract unstructured data.” A fascinating read that covers concepts such as artificial neural networks, deep learning, the variational auto-encoder, and how Stitch Fix is creating an automated process to understand and quantify the style of their inventory and clients.

RJMetrics Data Science Roundup: Can you quantify style? From @Teejosaur of @stitchfix #mostpopular2015 https://goo.gl/wDe4MQ

Which Type of Data Scientist Are You?

Doing Data Science at Twitter | Medium

Data Scientist, Robert Chang, shares a reflection on his two years working as a data scientist at Twitter, and why it’s useful for an aspiring data scientist to keep in mind the distinction between two types of data scientists: Type A (for Analysis) vs. Type B (for Building). Chang was inspired to write the post out of a concern for how popular articles on learning how to become a data scientist (which he does not diminish, and admits to benefiting from) can often put too much emphasis on “techniques, tools, and skill-sets,” while shedding less insight on what data science looks like “in practice.” The post includes Chang’s description of recent changes in data science at Twitter, how these changes impacted his role, and what skills he needed to use to adapt. It’s a must read for aspiring data scientists.

RJMetrics Data Science Roundup: What type of data scientist are you? by @_rchang #mostpopular2015 https://goo.gl/wDe4MQ

Head to Head Data Analysis

R vs Python: Head to Head Data Analysis | Dataquest

The team at Dataquest offers lessons for both R and Python, and believe that both languages “have a place in a data science toolkit.” Instead of offering another comparative article to the mix, they decided to “analyze a dataset side by side in Python and R, and show what code is needed in both languages to achieve the same result.” For each step of the analysis they provide the code for both languages, along with explanation of the different approaches. The result is an interesting look at the strengths and weaknesses of each language.

RJMetrics Data Science Roundup: Head to head data analysis: R vs Python via @dataquestio #mostpopular2015 https://goo.gl/wDe4MQ

Each week we surface, summarize, and share the most interesting stories and biggest news from the world of data science. Have articles or podcasts that you think we should be covering in our Data Science Roundup? Send them to editor@rjmetrics.com.

If you’re not signed up to receive the Data Science Roundup, subscribe here.

ds-cta