Right now, somewhere in the mountains of Northern Nevada, data scientist Will Kurt is busy pondering the study of probability. Whether it’s explaining Bayes’ Theorem with lego pieces, or walking you through different ways to write Monte Carlo simulations with R, Kurt is always thinking up interesting ways to illustrate “what makes probability theory both immensely important in how we understand the world as well as a delight to study in its own right” for his blog, Count Bayesie.
I first discovered Kurt’s blog after multiple data scientists mentioned that Count Bayesie was on their regular reading lists. Based on the blog topics covered, my assumption was that Kurt must have come to data science via a background heavy in math and statistics. Yet, I was surprised to learn that Kurt was an English major as an undergrad, and graduated without having taken a single formal math course.
I recently caught up with Kurt about his unique path to becoming a data scientist, getting served ice cream by Ray Tomlinson (the guy who put the @ sign in email), and the one bit of advice he’d give to a data scientist who’s just starting out.
The software spark
I’m willing to bet that Kurt may be the only English major to start a probability blog. Kurt studied English and Literature as an undergraduate at Rutgers University, graduating without taking any math courses, nor having any programming ability. He went on to pursue his Master’s in Library and Information Science and started his professional career working at the MIT libraries.
It was while at MIT that Kurt was first exposed to the world of software and programming, and as this exposure increased he began to realize, “If I don’t learn to code here, I’m really going to kick myself when I get older.” So, he signed up for an intersession course in Python in the hopes of learning programming. On the first day of the intro course they asked the room full of 250 people to raise their hand if they had programming experience and Kurt was literally the only one who didn’t raise his hand. He said, “I almost ran out of the room thinking I don’t belong here.” But, then he reminded himself that the hardest part of learning something new was always getting over that initial fear. So, he stuck it out and the course ended up sparking an intense interest in software and programming.
Eating ice cream with the ARPAnet vets
Around that time Kurt got a job as a research librarian at BBN Technologies, a company that played a key role in the creation of the internet. Here Kurt was exposed to the people who were responsible for building a large amount of the hardware and software of ARPAnet, the government-funded network that was the precursor to the internet, built in the late 60’s and 70s. This included a particularly memorable staff event where the legendary Ray Tomlinson served ice cream to the rest of the staff. Overall, it provided Kurt with a unique glimpse into the potential of software, computer science, and machine learning.
Kurt described himself at that point as a librarian “who barely knew how to code, and didn’t even know calculus,” but he could tell the stuff that he was starting to learn was the future – building out complicated models, machine learning, etc. This led him to start taking night classes at Boston University: Intro to Computer Science, Java, and also Discrete Math and Data Structures. He found that he really enjoyed the Discrete Math and Data Structures classes, and when he shared his level of enjoyment with people they encouraged him to go further down that path, telling him that discrete math is usually what filters people out, and that few people genuinely like it.
Out of the stacks and into software
A short time later, Kurt and his wife moved to Nevada where they were both faculty librarians at the University of Nevada, Reno. He began to take graduate classes in computer science, eventually earning his Master’s degree, and officially moving out of libraries and into software. It was during this time that the seeds of his blog, Count Bayesie, were first planted in Kurt’s mind. He still had very little formal math background at this point: “I really didn’t know calculus that well, and I didn’t know linear algebra, so I felt illiterate, and had to teach myself all of these things in order to function.” This led to a fascination with how we learn math, and how we learn to evolve our learning.
Kurt recognized that the entry level to math and software “is really boring, really difficult, and doesn’t have a lot of reward,” but he believes that:
“one of the most dangerous things is the ingrained sense that I don’t know math, and can’t. The irony is when you look at the upper level advanced stuff in math, the skills needed are related to creative skills. The danger then becomes that the barriers of the tech industry keep the wrong people out. Poking at what’s happening and trying to find something more interesting about it fits both the mathematical and the creative mindset.”
Ending up on the “data science train”
After working at a small software company for over a year, and continuing to work towards completing his Master’s, Kurt took a government position as a “data science fellow” with the Consumer Financial Protection Bureau (CFPB). He ended up using R a lot of the time, even though it wasn’t originally a part of the position. It was a job that had him working with a lot of economists who had “a ton of data,” but weren’t really doing much in the way of machine learning. At this point, Kurt discovered how the vocabulary changes around the way different groups think about statistics and probability. It occurred to him that “what statistics is, and what probability is, is a hugely broad and complicated thing, all these different fields and sub-philosophies on how people think about how we use data to explore the world.”
From there Kurt went on to become the Lead Data Scientist at KissMetrics where he was managing a small team and doing really interesting product-based stuff that became his favorite type of work: “blending skills with creativity and the product mindset to build things that are unique.”
Kurt admits that he wound up getting on the “data science train” at the right time, and he acknowledges that there is a trendiness surrounding the discipline: “there is so much content out there framed around, “So, you want to be a data scientist?” He also recognizes how the trendiness invites skepticism and cynicism, “like the joke that a data scientist really is just a statistician that lives in San Francisco.” But he believes, “people often miss that the real value around the heightened attention to data science is that it has injected the idea that you should have a small research team, even if you’re a small startup/company.” To Kurt, the perfect data scientist is “a one person research team that can hack enough to get things prototyped, they know enough math to solve problems other people can’t solve, and they can help engineers do things a little better.” So, even if a startup or small company is hiring a data scientist “just because,” this is a good thing, because “the real value is that a mini research team is being injected into a company early on, and you have someone on your team spending their time at an early stage thinking about how to solve difficult problems.”
Advice for the new data scientist
Kurt believes that the strongest foundation in data science is based in programming and software, because it’s important that a data scientist has enough programming ability to avoid being positioned as an outsider to the engineering team. He points out that this applies less at a company like Google, Netflix, Amazon, etc. “where they want to hire a PhD to sit in a room and work on a single algorithm all day.” Yet, at a small startup being able to communicate fluently with the engineers is invaluable. There is a big difference in the engineers saying, “okay, this guy can code and he knows a lot of math” versus a data scientist being able to go to the engineers and say, “here’s some math, now try and do this with it.”
Kurt admits that no one starting out wants to hear, “learn to code really well for a few years and then learn math.” It’s not easy. But, his ability to work closely with engineering teams, and at times even having engineers approach him to help solve math problems became a key element to his own growth as a data scientist.
When reflecting on his own path to becoming a data scientist, Kurt says he would stress to the data scientist just starting out that “simpler things are almost always more right. Not only are they always easier to do, but they are almost always better because they make fewer assumptions.” Kurt admits that there are exceptions, but in general, “there’s a real desire to prove yourself by doing complicated things, and for most business problems, they are so complicated anyway that you need to rely on simple models to explain the data.”
The simple solutions may not be the most exciting, but Kurt has consistently seen the biggest return come from times when he approached a problem as if he didn’t know any math, and had to just pull up the data and try to make a story. When he was just starting out, he always wanted to “make a huge machine learning model and shove the data in it, and make these great predictions.” But, he’s found what it comes down to is that “prediction is not as important as understanding why.” The Why is always harder, but a prediction can prove to be useless without understanding what it means. Real business value comes in when you can explain things outside of just, “you’re going to lose customers, but rather, why that will happen, and what can you do to change it.”
To follow Will’s musing on math, probability, and data science, be sure to check out his blog here, and look out for updates by following him on Twitter here.