There is
no shortage of articles attempting to lay out a step-by-step process of how to
become a data scientist. “It’s easy! Are you a recent graduate? Do this… Are
you changing careers? Do that… And make sure you’re focusing on the top skills:
coding, statistics, machine learning, storytelling, databases, big data… Need
resources? Check out Andrew Ng’s Coursera ML course, …”. Although these are
important things to consider once you have made up your mind to pursue a career
in data science, I hope to answer the question that should come before
all of this. It’s the question that should be on every aspiring data
scientist’s mind: “should I become a data scientist?” This question addresses
the why before you try to answer the how. What is it about the
field that draws you in and will keep you in it and excited for
years to come?
In order
to answer this question, it’s important to understand how we got here and where
we are headed. Because by having a full picture of the data science landscape,
you can determine whether data science makes sense for you.
Where it all started…
Before
the convergence of computer science, data technology, visualization,
mathematics, and statistics into what we call data science today, these fields
existed in siloes — independently laying the groundwork for the tools and
products we are now able to develop, things like: Oculus, Google Home, Amazon
Alexa, self-driving cars, recommendation engines, etc.
The
foundational ideas have been around for decades... early scientists dating back
to the pre-1800s, coming from wide range of backgrounds, worked on developing
our first computers, calculus, probability theory, and algorithms like: CNNs,
reinforcement learning, least squares regression. With the explosion in data
and computational power, we are able to resurrect these decade old ideas and
apply them to real-world problems.
In 2009
and 2012, articles were published by McKinsey and the Harvard Business Review,
hyping up the role of the data scientist, showing how they were revolutionizing
the way businesses are operating and how they would be critical to future
business success. They not only saw the advantage of a data-driven approach,
but also the importance of utilizing predictive analytics into the future in
order to remain competitive and relevant. Around the same time in 2011, Andrew
Ng came out with a free online course on machine learning, and the curse of AI
FOMO (fear of missing out) kicked in.
Where we are now…
Companies
began the search for highly skilled individuals to help them collect, store,
visualize and make sense of all their data. “You want the title and the high
pay? You got it! Just please come and come quick.” With very little knowledge
of what they were looking for, job postings went up.
If you
searched ZipRecuiter today, you’d find over 190k open data science positions
currently open, each one looking for their own data unicorn. Thus, in an effort
to get talent in the door, the definition of what it meant to be a data
scientist soon widened with definitions varying from company to company and
person to person.
On the
other hand, candidates saw a great opportunity: a career with high pay, high
demand, and the promise of job security and glory. Everyone rushed to develop
all the right skills with one goal in mind: to hold the “sexist job of the 21st
century”.
We have
the demand and we have the supply, so what’s the problem? Well, the problem
isn’t a shortage of programs to support that demand and capitalize on the hype.
It feels like every day there are new courses being developed to satisfy the
cravings from aspiring data scientist to break into the field: master’s
programs, boot camps and online courses. It’s an arms race to make the right
courses with the promise of a Machine Learning job at the end of it. “No PhD?
No problem. Just three to six months and a small investment of ~10-15k and
you’ll be guaranteed a well-paid job upon graduation.” (wink)
These
programs are designed to be a one-stop-shop for everything data science: you
learn the programming, the visualization, the modeling-- it’s all there. What
you soon discover is that many (surely, not all) of the business problems being
faced can be solved using similar approaches, so if you’re looking to apply
some algorithm, chances are there’s a library that already exists to help you
do just that. Simple right?
Hold up…
If you’ve
been paying attention so far, you will have picked up on a few important things
so far:
- By getting ahead of themselves, companies are hiring data scientist before they have even started collecting the right data (i.e. they are suffering from the Cold Start Problem of AI), meaning you will need to be involved in every step of the data pipeline including data collection, storage, and visualization before you can get to the modeling.
- Rushing to get a job in data science (going through one of the above-mentioned methods) means you will be competing against hundreds of thousands of others in the same exact position. Expect that they will have similar projects to yours and similar experience. To get yourself noticed, you will need to find a way to differentiate yourself: showing your creativity and grittiness.
- Chances are you won’t be developing algorithms from scratch. Unless you have a lot of extra time on your hands, you’ll most likely on the existing and well-trusted libraries. Why compete against a group of PhDs that helped develop the library and risk putting something less than optimal into production unless you had to develop something specific to your use-case.
What’s to come…
I
believe, a few major things will change from here.
First,
the data scientist role will become well-defined in terms of the skills
needed to do the job and the expected impact on businesses. Second, automation
of AI models will come. Don’t believe me? It’s already happening with products
like Google AutoML and software packages like TPot. What seemed like the most
technically challenging and appealing part of the job will be automated. As
they say now: “No ML expertise? No problem!”
The
implications of the trajectory on which we are headed is that:
- No longer will data engineers or data analysts be mistakenly given the title or the pay of a “real” data scientist, which will likely dwindle down the number of data scientists in the market.
- Human insight and oversight will become increasingly more important. Not to say that technical expertise will become less important, but the importance of connecting the analytics to the business and knowing how software interacts with people will increase (i.e the human side of data science).
- Lastly, since the Machine Learning part of the job behaves most consistently, it will be the first part to be automated (and we see that happening already). This means, that most of the value that data scientists will bring will involve the unification of messy data, preparation for modeling, and evaluation.
Therefore,
when trying to answer that question for yourself “should you become a data
scientist?”, imagine yourself through the entire hype cycle. Could you see
yourself growing in fast growing and evolving field that will challenge you in
every way? Can you embrace the change and adapt yourself just as fast? If so,
then this is the career for you...welcome aboard.
Should you become a data scientist?
No comments:
Post a Comment