Tuesday, 25 December 2018

Should you become a data scientist?


There is no shortage of articles attempting to lay out a step-by-step process of how to become a data scientist. “It’s easy! Are you a recent graduate? Do this… Are you changing careers? Do that… And make sure you’re focusing on the top skills: coding, statistics, machine learning, storytelling, databases, big data… Need resources? Check out Andrew Ng’s Coursera ML course, …”. Although these are important things to consider once you have made up your mind to pursue a career in data science, I hope to answer the question that should come before all of this. It’s the question that should be on every aspiring data scientist’s mind: “should I become a data scientist?” This question addresses the why before you try to answer the how. What is it about the field that draws you in and will keep you in it and excited for years to come?
In order to answer this question, it’s important to understand how we got here and where we are headed. Because by having a full picture of the data science landscape, you can determine whether data science makes sense for you.

Where it all started…­­

Before the convergence of computer science, data technology, visualization, mathematics, and statistics into what we call data science today, these fields existed in siloes — independently laying the groundwork for the tools and products we are now able to develop, things like: Oculus, Google Home, Amazon Alexa, self-driving cars, recommendation engines, etc.
History of Data Science
The foundational ideas have been around for decades... early scientists dating back to the pre-1800s, coming from wide range of backgrounds, worked on developing our first computers, calculus, probability theory, and algorithms like: CNNs, reinforcement learning, least squares regression. With the explosion in data and computational power, we are able to resurrect these decade old ideas and apply them to real-world problems.
In 2009 and 2012, articles were published by McKinsey and the Harvard Business Review, hyping up the role of the data scientist, showing how they were revolutionizing the way businesses are operating and how they would be critical to future business success. They not only saw the advantage of a data-driven approach, but also the importance of utilizing predictive analytics into the future in order to remain competitive and relevant. Around the same time in 2011, Andrew Ng came out with a free online course on machine learning, and the curse of AI FOMO (fear of missing out) kicked in.

Where we are now…

Companies began the search for highly skilled individuals to help them collect, store, visualize and make sense of all their data. “You want the title and the high pay? You got it! Just please come and come quick.” With very little knowledge of what they were looking for, job postings went up.
Job Trends Data Scientist
If you searched ZipRecuiter today, you’d find over 190k open data science positions currently open, each one looking for their own data unicorn. Thus, in an effort to get talent in the door, the definition of what it meant to be a data scientist soon widened with definitions varying from company to company and person to person.
On the other hand, candidates saw a great opportunity: a career with high pay, high demand, and the promise of job security and glory. Everyone rushed to develop all the right skills with one goal in mind: to hold the “sexist job of the 21st century”.
We have the demand and we have the supply, so what’s the problem? Well, the problem isn’t a shortage of programs to support that demand and capitalize on the hype. It feels like every day there are new courses being developed to satisfy the cravings from aspiring data scientist to break into the field: master’s programs, boot camps and online courses. It’s an arms race to make the right courses with the promise of a Machine Learning job at the end of it. “No PhD? No problem. Just three to six months and a small investment of ~10-15k and you’ll be guaranteed a well-paid job upon graduation.” (wink)
These programs are designed to be a one-stop-shop for everything data science: you learn the programming, the visualization, the modeling-- it’s all there. What you soon discover is that many (surely, not all) of the business problems being faced can be solved using similar approaches, so if you’re looking to apply some algorithm, chances are there’s a library that already exists to help you do just that. Simple right?

Hold up…

If you’ve been paying attention so far, you will have picked up on a few important things so far:
  1. By getting ahead of themselves, companies are hiring data scientist before they have even started collecting the right data (i.e. they are suffering from the Cold Start Problem of AI), meaning you will need to be involved in every step of the data pipeline including data collection, storage, and visualization before you can get to the modeling. 
  2. Rushing to get a job in data science (going through one of the above-mentioned methods) means you will be competing against hundreds of thousands of others in the same exact position. Expect that they will have similar projects to yours and similar experience. To get yourself noticed, you will need to find a way to differentiate yourself: showing your creativity and grittiness.
Data Science Job
  1. Chances are you won’t be developing algorithms from scratch. Unless you have a lot of extra time on your hands, you’ll most likely on the existing and well-trusted libraries. Why compete against a group of PhDs that helped develop the library and risk putting something less than optimal into production unless you had to develop something specific to your use-case.

Why You Shouldn’t be a Data Science Generalist




I work at a data science mentorship startup,
and I’ve found there’s a single piece of advice that I catch myself
giving over and over again to aspiring mentees. And it’s really not what
I would have expected it to be.


Rather than suggesting a new library or tool, or some resume hack, I find myself recommending that they first think about what kind of data scientist they want to be.


The reason this is crucial is that data science isn’t a single,
well-defined field, and companies don’t hire generic, jack-of-all-trades
“data scientists”, but rather individuals with very specialized skill
sets.


To see why, just imagine that you’re a company trying to hire a data
scientist. You almost certainly have a fairly well-defined problem in
mind that you need help with, and that problem is going to require some
fairly specific technical know-how and subject matter expertise. For
example, some companies apply simple models to large datasets, some
apply complex models to small ones, some need to train their models on
the fly, and some don’t use (conventional) models at all.


Each of these calls for a completely different skill set, so it’s
especially odd that the advice that aspiring data scientists receive
tends to be so generic: “learn how to use Python, build some
classification/regression/clustering projects, and start applying for
jobs.”


Those of us who work in the industry bear a lot of the blame for
this. We tend to lump an excessive number of things into the “data
science” bucket in casual conversations, blog posts and presentations.
Building a robust data pipeline for production? That’s a “data science
problem.” Inventing a new kind of neural network? That’s a “data science
problem.”


That’s not good, because it tends to cause aspiring data scientists
to lose focus on specific problem classes, and instead become jacks of
all trades — something that can make it harder to get noticed or break
through, in a market that’s already saturated with generalists.


But it’s hard to avoid becoming a generalist if you don’t know which
common problem classes you could specialize in in the fist place. That’s
why I put together a list of the five problem classes that are often
lumped together under the “data science” heading:





1. Data engineer



Job description: You’ll be managing data pipelines for
companies that deal with large volumes of data. That means making sure
that your data is being efficiently collected and retrieved from its
source when needed, cleaned and preprocessed.


Why it’s important: If you’ve only ever worked with
relatively small (<5 Gb) datasets stored in .csv or .txt files, it
might be hard to understand why there would exist people whose full-time
jobs it is to build and maintain data pipelines. Here are a couple of
reasons: 1) A 50 Gb dataset won’t fit in your computer’s RAM, so you
generally need other ways to feed it into your model, and 2) that much
data can take a ridiculous amount of time to process, and often has to
be stored redundantly. Managing that storage takes specialized technical
know-how.


Requirements: The technologies you’ll be working
with include Apache Spark, Hadoop and/or Hive, as well as Kafka. You’ll
most likely need to have a solid foundation in SQL.


The questions you’ll be dealing with sound like:


→ “How do I build a pipeline that can handle 10 000 requests per minute?”

→ “How can I clean this dataset without loading it all in RAM?”





2. Data analyst



Job description: Your job will be to translate data
into actionable business insights. You’ll often be the go-between for
technical teams and business strategy, sales or marketing teams. Data
visualization is going to be a big part of your day-to-day.


Why it’s important: Highly technical people often
have a hard time understanding why data analysts are so important, but
they really are. Someone needs to convert a trained and tested model and
mounds of user data into a digestible format so that business
strategies can be designed around them. Data analysts help to make sure
that data science teams don’t waste their time solving problems that
don’t deliver business value.


Requirements: The technologies you’ll be working with include Python, SQL, Tableau and Excel. You’ll also need to be a good communicator.


The questions you’ll be dealing with sound like:


→ “What’s driving our user growth numbers?”

→ “How can we explain to management that the recent increase in user fees is turning people away?”





3. Data scientist



Job description: Your job will be to clean and explore
datasets, and make predictions that deliver business value. Your
day-to-day will involve training and optimizing models, and often
deploying them to production.


Why it’s important: When you have a pile of data
that’s too big for a human to parse, and too valuable to be ignored, you
need some way of pulling digestible insights from it. That’s the basic
job of a data scientist: to convert datasets into digestible
conclusions.


Requirements: The technologies you’ll be working
with include Python, scikit-learn, Pandas, SQL, and possibly Flask,
Spark and/or TensorFlow/PyTorch. Some data science positions are purely
technical, but the majority will require you to have some business
sense, so that you don’t end up solving problems that no one has.


The questions you’ll be dealing with sound like:


→ “How many different user types do we really have?”

→ “Can we build a model to predict which products will sell to which users?”





4. Machine learning engineer



Job description: Your job will be to build, optimize
and deploy machine learning models to production. You’ll generally be
treating machine learning models as APIs or components, which you’ll be
plugging into a full-stack app or hardware of some kind, but you may
also be called upon to design models yourself.


Requirements: The technologies you’ll be working
with include Python, Javascript, scikit-learn, TensorFlow/PyTorch
(and/or enterprise deep learning frameworks), and SQL or MongoDB
(typically used for app DBs).


The questions you’ll be dealing with sound like:


→ “How do I integrate this Keras model into our Javascript app?”

→ “How can I reduce the prediction time and prediction cost of our recommender system?”





5. Machine learning researcher



Job description: Your job will be to find new ways to
solve challenging problems in data science and deep learning. You won’t
be working with out-of-the-box solutions, but rather will be making your
own.


Requirements: The technologies you’ll be working
with include Python, TensorFlow/PyTorch (and/or enterprise deep learning
frameworks), and SQL.


The questions you’ll be dealing with sound like:


→ “How do I improve the accuracy of our model to something closer to the state of the art?”

→ “Would a custom optimizer help decrease training time?”




The five job descriptions I’ve laid out here definitely don’t stand
alone in all cases. At an early-stage startup, for instance, a data
scientist might have to be a data engineer and/or a data analyst, too.
But most jobs will fall more neatly into one of these categories than
the others — and the larger the company, the more these categories will
tend to apply.


Overall, the thing to remember is that in order to get hired, you’ll
usually be better off building a more focused skillset: don’t learn
TensorFlow if you want to become a data analyst, and don’t prioritize
learning Pyspark if you want to become a machine learning researcher.


Think instead about the kind of value you want to help companies
build, and get good at delivering that value. That, more than anything
else, is the best way to get in the door.

Learning Machine Learning vs Learning Data Science

Learning Machine Learning vs Learning Data Science

SWOT Analysis

A SWOT analysis is an incredibly simple, yet powerful tool to help you develop your business strategy, whether you’re building a startup or guiding an existing company.
SWOT stands for Strengths, Weaknesses, Opportunities, and Threats.
Strengths and weaknesses are internal to your company—things that you have some control over and can change. Examples include who is on your team, your patents and intellectual property, and your location.
Opportunities and threats are external—things that are going on outside your company, in the larger market. You can take advantage of opportunities and protect against threats, but you can’t change them. Examples include competitors, prices of raw materials, and customer shopping trends.
A SWOT analysis organizes your top strengths, weaknesses, opportunities, and threats into an organized list and is usually presented in a simple two-by-two grid. Go ahead and download our free template if you just want to dive right in and get started.
SWOT template image
Here’s what the layout of a SWOT analysis looks like.
When you take the time to do a SWOT analysis, you’ll be armed with a solid strategy for prioritizing the work that you need to do to grow your business.
You may think that you already know everything that you need to do to succeed, but a SWOT analysis will force you to look at your business in new ways and from new directions. You’ll look at your strengths and weaknesses, and how you can leverage those to take advantage of the opportunities and threats that exist in your market.

Who should do a SWOT analysis?

For a SWOT analysis to be effective, company founders and leaders need to be deeply involved. This isn’t a task that can be delegated to others.
But, company leadership shouldn’t do the work on their own, either. For best results, you’ll want to gather a group of people who have different perspectives on the company. Select people who can represent different aspects of your company, from sales and customer service to marketing and product development. Everyone should have a seat at the table.
Innovative companies even look outside their own internal ranks when they perform a SWOT analysis and get input from customers to add their unique voice to the mix.
If you’re starting or running a business on your own, you can still do a SWOT analysis. Recruit additional points of view from friends who know a little about your business, your accountant, or even vendors and suppliers. The key is to have different points of view.
Existing businesses can use a SWOT analysis to assess their current situation and determine a strategy to move forward. But, remember that things are constantly changing and you’ll want to reassess your strategy, starting with a new SWOT analysis every six to 12 months.
For startups, a SWOT analysis is part of the business planning process. It’ll help codify a strategy so that you start off on the right foot and know the direction that you plan on going.

How to do a SWOT analysis the right way

As I mentioned above, you want to gather a team of people together to work on a SWOT analysis. You don’t need an all-day retreat to get it done, though. One or two hours should be more than plenty.
Gather people from different parts of your company and make sure that you have representatives from every part. You’ll find that different groups within your company will have entirely different perspectives that will be critical to making your SWOT analysis successful.
Doing a SWOT analysis is similar to brainstorming meetings, and there are right and wrong ways to run them. I suggest giving everyone a pad of sticky-notes and have everyone quietly generate ideas on their own to start things off. This prevents groupthink and ensures that all voices are heard.
After five to 10 minutes of private brainstorming, put all the sticky-notes up on the wall and group similar ideas together. Allow anyone to add additional notes at this point if someone else’s idea sparks a new thought.
Once all of the ideas are organized, it’s time to rank the ideas. I like using a voting system where everyone gets five or ten “votes” that they can distribute in any way they like. Sticky dots in different colors are useful for this portion of the exercise.
Based on the voting exercise, you should have a prioritized list of ideas. Of course, the list is now up for discussion and debate, and someone in the room should be able to make the final call on the priority. This is usually the CEO, but it could be delegated to someone else in charge of business strategy.
You’ll want to follow this process of generating ideas for each of the four quadrants of your SWOT analysis: Strengths, Weaknesses, Opportunities, and Threats.

Questions that can help inspire your analysis

Here are a few questions that you can ask your team when you’re building your SWOT analysis. These questions can help explain each section and spark creative thinking.

Strengths

Strengths are internal, positive attributes of your company. These are things that are within your control.
  • What business processes are successful?
  • What assets do you have in your team, such as knowledge, education, network, skills, and reputation?
  • What physical assets do you have, such as customers, equipment, technology, cash, and patents?
  • What competitive advantages do you have over your competition?

Weaknesses

Weaknesses are negative factors that detract from your strengths. These are things that you might need to improve on to be competitive.
  • Are there things that your business needs to be competitive?
  • What business processes need improvement?
  • Are there tangible assets that your company needs, such as money or equipment?
  • Are there gaps on your team?
  • Is your location ideal for your success?

Opportunities

Opportunities are external factors in your business environment that are likely to contribute to your success.
  • Is your market growing and are there trends that will encourage people to buy more of what you are selling?
  • Are there upcoming events that your company may be able to take advantage of to grow the business?
  • Are there upcoming changes to regulations that might impact your company positively?
  • If your business is up and running, do customers think highly of you?

Threats

Threats are external factors that you have no control over. You may want to consider putting in place contingency plans for dealing them if they occur.
  • Do you have potential competitors who may enter your market?
  • Will suppliers always be able to supply the raw materials you need at the prices you need?
  • Could future developments in technology change how you do business?
  • Is consumer behavior changing in a way that could negatively impact your business?
  • Are there market trends that could become a threat?

Example of a SWOT analysis

To help you get a better sense of what at SWOT example actually looks like, we’re going to look at UPer Crust Pies, a specialty meat and fruit pie cafe in Michigan’s Upper Peninsula. They sell hot, ready-to-go pies and frozen take-home options, as well as an assortment of fresh salads and beverages.
The company is planning to open its first location in downtown Yubetchatown and is very focused on developing a business model that will make it easy to expand quickly and that opens up the possibility of franchising. Here’s what their SWOT analysis might look like:

SWOT analysis for UPer Crust Pies

SWOT analysis example

What to do next

With your SWOT analysis complete, you’re ready to convert it into real strategy. After all, the exercise is about producing a strategy that you can work on during the next few months.
The first step is to look at your strengths and figure out how you can use those strengths to take advantage of your opportunities. Then, look at how your strengths can combat the threats that are in the market. Use this analysis to produce a list of actions that you can take.
With your action list in hand, look at your company calendar and start placing goals (or milestones) on it. What do you want to accomplish in each calendar quarter (or month) moving forward?
You’ll also want to do this by analyzing how external opportunities might help you combat your own, internal weaknesses. Can you also minimize those weaknesses so you can avoid the threats that you identified?
Again, you’ll have an action list that you’ll want to prioritize and schedule.
Back to the Uper Crust Pies example: Based on their SWOT analysis, here are a few potential strategies for growth to help you think through how to translate your SWOT into actionable goals.

Uper Crust Pies: Potential strategies for growth

  1. Investigate investors. UPer Crust Pies might investigate its options for obtaining capital.
  2. Create a marketing plan. Because UPer Crust Pies wants to execute a specific marketing strategy—targeting working families by emphasizing that their dinner option is both healthy and convenient—the company should develop a marketing plan.
  3. Plan a grand opening. A key piece of that marketing plan will be the store’s grand opening, and the promotional strategies necessary to get UPer Crust Pies’ target market in the door.
With your goals and actions in hand, you’ll be a long way toward completing a strategic plan for your business. I like to use the Lean Planning methodology for strategic plans as well as regular business planning. The actions that you generate from your SWOT analysis will fit right into milestones portion of your Lean Plan and will give you a concrete foundation that you can grow your business from.

Azure AzCopy Command in Action

Azure AzCopy Command  in Action -  Install - Module - Name Az - Scope CurrentUser - Repository PSGallery - Force # This simple PowerShell ...