Friday, 12 February 2021

How to Encrypt and Restore Your SQL Server Database Backups

We’ve had backup encryption out of the box since SQL Server 2014, yet I’ve rarely seen it used. In an age where we’re moving more and more things to the cloud including those backup files, backup encryption is becoming more and more necessary. Sure we have transport encryption and your cloud provider of choice most probably has an option for data at rest encryption but why leave any room for error? If you encrypt your backups on site before they leave you remove any margin of chance for potentially un-encrypted backups being stored somewhere.

One thing I have found is the documentation around this is a little bit disjointed and scattered over several different topics. This post is going to demo a full end to end solution of encrypting a backup on your source server and restoring it on your destination server along with some of the issues you may face on the way…

If you want to follow along you’ll need two different instances of SQL Server, I’m using SQL Server 2017 but the below should work on anything from 2014 onwards…

Source Server

On our source server, let’s create a new sample database with a couple of rows of data to test with…

CREATE DATABASE BackupEncryptionDemo
GO
CREATE TABLE BackupEncryptionDemo.dbo.Test(Id INT IDENTITY, Blah NVARCHAR(10))
INSERT INTO BackupEncryptionDemo.dbo.Test(Blah) VALUES('Testing')
INSERT INTO BackupEncryptionDemo.dbo.Test(Blah) VALUES('Testing2')

In order to encrypt a backup of this database we need either a certificate or an asymmetric key, I’m going to be using Certificates for the sake of this demo. When you create a certificate SQL Server encrypts it with a MASTER KEY before it gets stored so we’ll first need to create one of those…

USE master
GO
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '(MasterKeyEncryptionPassword123)'

This key is then used to encrypt our certificate for storage…

CREATE CERTIFICATE SuperSafeBackupCertificate 
WITH SUBJECT ='Backup Encryption Certificate For Database1 and Database2'

Armed with our SuperSafe certificate we can now backup a database with encryption…

BACKUP DATABASE BackupEncryptionDemo 
   TO DISK = 'C:\keys\DatabaseBackup.bak'
   WITH ENCRYPTION(
      ALGORITHM = AES_256, 
      SERVER CERTIFICATE = SuperSafeBackupCertificate
   )

Notice the helpful warning reminding us that we’ve not backed up our certificate. I cannot stress how important this is! If we lose that certificate then we won’t be able to restore any of our backups. The below TSQL will backup the certificate and a private key for its encryption, both of these files need to be put in a safe place where they will not be lost. The combination of these files and the password specified is all that’s needed to decrypt our backups so they need to be kept safe and in a real-world scenario should not be kept in the same place as the database backups…

BACKUP CERTIFICATE SuperSafeBackupCertificate 
   TO FILE = 'C:\keys\SuperSafeBackupCertificate.cer'
   WITH PRIVATE KEY(
      FILE='C:\keys\SuperSAfeBackupCertificate.ppk', 
      ENCRYPTION BY PASSWORD ='(PasswordToEncryptPrivateKey123)'
   )

If we then run another backup there will be no warnings…

BACKUP DATABASE BackupEncryptionDemo 
   TO DISK = 'C:\keys\DatabaseBackup2.bak'
   WITH ENCRYPTION(
      ALGORITHM = AES_256, 
      SERVER CERTIFICATE = SuperSafeBackupCertificate
   )

Now on to our first gotcha! If you run the above backup a second time you’ll get the following error…

Encrypted backups cannot append existing media sets like non-encrypted backups can, so you’ll need to write each one to a new set by specifying a different filename.

Destination Server

Now we have our encrypted backup, let’s try to restore it on our second server…

RESTORE DATABASE BackupEncryptionDemo 
   FROM DISK = N'C:\Keys\DatabaseBackup.bak' 
   WITH 
      MOVE N'BackupEncryptionDemo' TO N'D:\Data\EncryptionDemo.mdf', 
      MOVE N'BackupEncryptionDemo_log' TO N'D:\Data\EncryptionDemo_log.ldf'

We can’t restore it because it was encrypted with a certificate that we don’t yet have on this server and without this certificate the backup can’t be decrypted.

As before we can’t store any certificates without a master key so let’s get that created…

CREATE MASTER KEY ENCRYPTION BY PASSWORD = '(DestinationMasterKeyEncryptionPassword1234)'

Now lets see if we can restore that certificate backup we made on our new server…

CREATE CERTIFICATE SuperSafeBackupCertificate 
   FROM FILE = 'C:\Keys\SuperSafeBackupCertificate.cer'

At this point, depending on your credentials there is a good chance you will see an error similar to this…

This is because the NTFS permissions SQL Server put on the certificate and private key backup don’t give access to the service account your destination server is running under. To fix this open a Command Prompt window as Administrator and run the following command, replacing the username (MSSQLSERVER) with the account your server is running under and point it at the directory the backup keys are stored in…

icacls c:\Keys /grant MSSQLSERVER:(GR) /T

This will have now granted our SQL Server account read access to these files so let’s try restoring that certificate again…

CREATE CERTIFICATE SuperSafeBackupCertificate 
   FROM FILE = 'C:\Keys\SuperSafeBackupCertificate.cer'

That time it should go through with no error, so we now have our certificate and master key all setup, Let’s try restoring that backup again…

RESTORE DATABASE BackupEncryptionDemo 
   FROM DISK = N'C:\Keys\DatabaseBackup.bak' 
   WITH 
      MOVE N'BackupEncryptionDemo' TO N'D:\Data\EncryptionDemo.mdf', 
      MOVE N'BackupEncryptionDemo_log' TO N'D:\Data\EncryptionDemo_log.ldf'

Still no luck, the restore failed because the keys we restored are corrupt. This is because when we restored the certificate we didn’t specify our private key and password file to decrypt it, let’s drop the certificate we restored and try again…

DROP CERTIFICATE SuperSafeBackupCertificate
GO

CREATE CERTIFICATE SuperSafeBackupCertificate 
   FROM FILE = 'C:\Keys\SuperSafeBackupCertificate.cer'
   WITH PRIVATE KEY(
      FILE ='C:\Keys\SuperSAfeBackupCertificate.ppk', 
      DECRYPTION BY PASSWORD='test'
   )

Oops, we specified our password as ‘test’ when actually the password we specified when we backed up the private key was ‘(PasswordToEncryptPrivateKey123)’. We’re getting close now…

CREATE CERTIFICATE SuperSafeBackupCertificate 
   FROM FILE = 'C:\Keys\SuperSafeBackupCertificate.cer'
   WITH PRIVATE KEY(
      FILE ='C:\Keys\SuperSAfeBackupCertificate.ppk', 
      DECRYPTION BY PASSWORD='(PasswordToEncryptPrivateKey123)'
   )

We’ve now successfully restored our certificate, let’s try that database restore one last time!

RESTORE DATABASE BackupEncryptionDemo 
   FROM DISK = N'C:\Keys\DatabaseBackup.bak' 
   WITH 
      MOVE N'BackupEncryptionDemo' TO N'D:\Data\EncryptionDemo.mdf', 
      MOVE N'BackupEncryptionDemo_log' TO N'D:\Data\EncryptionDemo_log.ldf'

Bingo!!!

As one final check let’s query our only table

SELECT * FROM BackupEncryptionDemo.dbo.Test

Saturday, 28 December 2019

SSAS Processing Error: Unicode string issue during dimension processing

SSAS and the database engine use different comparison rules depending on the collation, character sets, and handling of blanks in the middle or at the end of a string. This becomes an issue during SSAS processing when key values used to establish attribute relationships must be an exact match. Sometimes, what passes as a ‘match’ in the database engine is seen by SSAS as a non-matching value, resulting in processing errors that can be a challenge to track down if the value happens to be a blank! This article describes the problem in more detail and provides various workarounds.

Actual error (with placeholder values)

Errors in the OLAP storage engine: The attribute key cannot be found when processing: Table: 'MyDimTable', Column: 'Column1', Value: 'alzイ'. The attribute is 'Column1'.

Errors in the OLAP storage engine: The attribute key was converted to an unknown member because the attribute key was not found. Attribute MyDimTableKey of Dimension: MyDimTable from Database: MyDB, Record: 3.

Notice the Unicode value, where the blank terminator is from a Japanese character set. If you get the “attribute key cannot be found” error and the value contain a Unicode blank in the middle or end of the string, you are most likely seeing the effects of these different comparison rules.

Cause:

The problem arises when Analysis Services uses different comparison rules when processing attribute relationships.

By default, the relational database engine uses a width-insensitive collation, such that the following strings are interpreted as equivalent values:

string1+<double-byte-blank>+string2
string1+<single-byte-blank>+string2

Notice the first member has a double-byte space/blank and the second member has a single-byte space/blank, at the same position in the member name.

If these strings were used as keys to relate rows from different tables, the database engine would recognize these strings as the same value and create the relationship accordingly.

Now suppose that you are processing an Analysis Services database that uses these strings as KeyColumns in an attribute relationship. Unlike the database engine (set to width-insensitive collation), SSAS will interpret these as different strings, generating an error that a matching record cannot be found, and possibly registering one or more of the values as the unknown member.

The attribute key cannot be found because to SSAS, string1+<double-byte-blank>+string2 is not the same as string1+<single-byte-blank>+string2, and therefore fails to meet the criteria used to establish an attribute relationship.

Resolution:

If this behavior is acceptable, then you should do nothing.

However, if want to SSAS to exhibit the same behaviors as the relational database engine, you can use one of the following workarounds:

Set ProcessingGroup to ByTable (instead of the default, ByAttribute). This setting is specified in Dimension Designer, in SQL Server Data Tools, on the dimension definition.
Set server configuration property | Language/Collation, to be width-sensitive and the dimension definition to be width-sensitive. You can set this in Management Studio, in server properties.

By using either workaround, each string in our example, (string1+<double-byte-blank>+string2 and string1+<single-byte-blank>+string2) would each be considered a viable match for the other (in terms of an attribute relationship), allowing processing to succeed.

Alternatively, you can address the issue in the relational database by changing double-byte spaces to single-byte spaces. See Server-Side Programming with Unicode . For information about the T-SQL REPLACE function, see http://technet.microsoft.com/en-us/library/ms186862.aspx

Notes:

Width-insensitive is the default collation for the SQL Server relational engine, so if you are working with global data, you are more likely to run into this issue when processing an Analysis Services database.

DBCS is a double-byte character set. A blank character in DBCS is Unicode 12288 (hex 3000). SBCS is a single-byte character set. A blank character in SBCS is 32. Width-sensitivity on the collation will determine whether these are interpreted as the same or different values, for strings having a trailing blank.

For a simple explanation of width-sensitive collations, see http://blog.sqlauthority.com/2012/07/14/sql-server-example-of-width-sensitive-and-width-insensitive-collation/

https://social.msdn.microsoft.com/Forums/sqlserver/en-US/42ddf00c-24ae-4db7-b8ce-672e3163b45a/the-attribute-key-cannot-be-found-because-of-special-characters?forum=sqlanalysisservices

https://social.technet.microsoft.com/wiki/contents/articles/23979.ssas-processing-error-blanks-in-a-unicode-string-have-different-processing-outcomes-based-on-collation-and-character-set.aspx

https://info.inspari.com/da/blog/2012/11/29/special-characters-in-ssas-name-properties/

https://social.msdn.microsoft.com/Forums/sqlserver/en-US/ef5551e2-aa79-410d-b46d-fa7c8e76ced8/unable-to-display-dimension-names-with-special-characters

Sunday, 10 November 2019

SENTIMENT ANALYSIS OF YAMMER POSTS - MICROSOFT FLOW & AZURE COGNITIVE SERVICES

As an Office 365 product manager and corporate evangelist, I’m responsible for engaging users and driving adoption of Microsoft Collaboration tools. Measuring the saturation and use of Office 365 is a key part of my role. Yes, I regularly review Office 365 usage metrics for high-level trending. But metrics alone don’t tell the story of user satisfaction and adoption. In order to build better training and adoption programs, I need to understand why my dedicated users love the tools and why others remain resistant.

Many companies rely on surveys to gather end-user feedback. While surveys are useful for gathering specific types of quantitative data, surveys are one-dimensional. You can’t dynamically ask follow-up questions to learn more about specific survey responses, and you can only capture a limited set of data points. Innovation games enable you to gather a much broader set of quantitative and qualitative user data, but require an investment of time to facilitate games and distill the results. For best results, I recommend a multidisciplinary approach that leverages Office 365 usage statistics, user survey responses, innovation games data, user testimonials, etc. to measure user satisfaction.

With the release of artificial intelligence (AI) and machine-learning algorithms, we also have the ability to gather user sentiments automatically. If your organization uses Yammer to drive employee engagement and empower open dialogue, you have a wealth of user data that can be analyzed. With Azure Cognitive Services and Microsoft Flow, you can perform automated sentiment analysis of your Yammer group posts. Sentiment scores for each Yammer message can be stored in SharePoint and visualized for trending analysis via Power BI. You can even send push email notifications to your Office 365 administrators or Corporate Communications team when strong positive or negative messages are posted in Yammer.

Chris Bortlik, Principal Technical Architect for Microsoft, recently shared a blog post on Yammer sentiment analysis. I used Chris’ model, with a few modifications, to gather and report on Office 365 user sentiment.

The scenario:
My organization leverages a Microsoft Flow Yammer group to foster employee conversations and questions/answers about flow. We want to monitor the Microsoft Flow Yammer group using sentiment analysis so we can:

Identify negative flow Yammer posts that require follow-up
Identify positive Yammer posts that can serve as user testimonials or references
Define trends in our Microsoft Flow Yammer posts (e.g. daily/weekly/monthly positive and negative trends, overall positive or negative sentiments for flow, etc.)
Validate the success of our Microsoft Flow education and adoption program (e.g. confirm we’re seeing growth in the volume of positive flow Yammer posts over time)

The setup:
Follow the steps outlined below to set up automated Yammer sentiment analysis.

Step 1: Confirm you have a Cognitive Services Text Analytics Account. In order to set up this solution, you will need a Cognitive Services account key and a root site URL.

Step 2: Create a SharePoint list to store your Yammer sentiment analysis scores. Flow will create a new item in your list for each Yammer message it analyzes. Here’s a list of the custom columns I added to my list:

Score – Number column; stores the sentiment rating for each Yammer message
Message link – Hyperlink column; stores a link to the rated Yammer message
Posted by – Person/Group column; stores the name of the person that posted the Yammer message
Thread ID – Single line of text column; stores the Yammer thread ID for the message. Enables you to sort, filter, and group sentiment scores for a given Yammer thread (including original message and replies).

Azure Cognitive Services will provide a numeric sentiment score between 0 and 1 for each Yammer message it analyzes. The more negative a Yammer message is, the closer to 0 its score will be. More positive messages will receive a rating closer to 1.

Here’s a screen shot of my SharePoint list. Each list item represents a rated Yammer message:
Yammer sentiment flow-11.png

Step 3: Identify the Yammer group you want to perform sentiment analysis on. You can set up sentiment analysis for multiple Yammer groups, but each will require a separate flow process. I also recommend setting up a different SharePoint list to hold sentiment scores for each of your Yammer groups. (Having different SharePoint lists enables you to set up different trending reports on Yammer group sentiment.)

Step 4: Create your Microsoft Flow. I created my flow from scratch (not using a template). Here’s a quick breakdown of the flow conditions and actions:

When there is a new message in a group – Detects when a new Yammer message is posted in my Yammer group
Get user details – Pulls Yammer user profile details. (Enables us to capture the full name and email address for the person posting the Yammer message.)
Detect Sentiment – Calls the Azure Cognitive Services API so it can calculate a sentiment score for the Yammer message
Create item – Creates a SharePoint list item for the Yammer message being analyzed
If the comment is negative – Sends an email to my Office 365 admin team if the sentiment score for a Yammer message is ≤0.3.
If the comment is positive – Sends an email to my Office 365 admin team if the sentiment score for a Yammer message is ≥0.7.

Step 5: Create Power BI report(s) to visualize your Yammer sentiment scores. Published reports can be rendered in your SharePoint Online Communications or Team sites using the Power BI web part. For help in setting up sentiment analysis slicers, check out this DataChant blog post.

Here’s a sample dashboard that shows Yammer sentiment data for my Microsoft Flow Yammer group:

Step 6: Distill the results. Once you start calculating Yammer sentiment and have reports to visualize the data, you can analyze the results and follow up where needed. Here are a few ideas to get you started:

Break down negative Yammer posts (e.g. posts with a score ≤0.3) by user. Schedule follow-up meetings with Office 365 end-users that consistently post negative messages. The goal is to ask questions and understand the pain points the users are facing. Perhaps they have hardware or network issues that impact their productivity. Or maybe they’re having issues with Microsoft Flow and need a coach/mentor to spur their learning. Having one-on-one dialogues provides the opportunity for candid feedback and enables you to make a difference in the user’s productivity and technology experience.
Identify Office 365 enthusiasts. Break down Yammer posts by volume or by high sentiment average in order to find power users across your organization. Set up meetings with these enthusiasts to understand how they leverage Office 365. Incorporate them into your user group or internal community and support them in their growth. These enthusiasts can become your Office 365 evangelists!
Monitor the volume of Yammer posts in your group. Build a gauge that shows your number of Yammer messages month-to-date and identifies progress towards your monthly Yammer message goal. Keeping an eye on your total posts month-to-date and year-to-date will help you monitor use over time and highlight areas you may need to invest additional time and adoption efforts in.

Optimize your communications. If one of your Office 365 admin resources has consistent negative Yammer sentiment scores, have them evaluate the verbiage they’re using. Slight wording changes can change the tone of their messages, increasing Yammer sentiment scores and better engaging with end-users.
Take a health pulse. Build trending visuals that show average post volumes and sentiment averages by week or month. If you start seeing spikes on volume of posts and/or significant changes in your sentiment averages, it’s time to dig deeper. Perhaps you’re seeing a spike in interest in PowerApps after a compelling user group presentation or have network bandwidth issues that are causing issues. Either way, Yammer sentiment analysis can be your early warning indicator that something has changed.

https://blog.splibrarian.com/2019/01/14/using-microsoft-flow-azure-cognitive-services-to-automate-sentiment-analysis-of-yammer-posts/

Tuesday, 25 December 2018

Should you become a data scientist?

There is no shortage of articles attempting to lay out a step-by-step process of how to become a data scientist. “It’s easy! Are you a recent graduate? Do this… Are you changing careers? Do that… And make sure you’re focusing on the top skills: coding, statistics, machine learning, storytelling, databases, big data… Need resources? Check out Andrew Ng’s Coursera ML course, …”. Although these are important things to consider once you have made up your mind to pursue a career in data science, I hope to answer the question that should come before all of this. It’s the question that should be on every aspiring data scientist’s mind: “should I become a data scientist?” This question addresses the why before you try to answer the how. What is it about the field that draws you in and will keep you in it and excited for years to come?

In order to answer this question, it’s important to understand how we got here and where we are headed. Because by having a full picture of the data science landscape, you can determine whether data science makes sense for you.

Where it all started…

Before the convergence of computer science, data technology, visualization, mathematics, and statistics into what we call data science today, these fields existed in siloes — independently laying the groundwork for the tools and products we are now able to develop, things like: Oculus, Google Home, Amazon Alexa, self-driving cars, recommendation engines, etc.

The foundational ideas have been around for decades... early scientists dating back to the pre-1800s, coming from wide range of backgrounds, worked on developing our first computers, calculus, probability theory, and algorithms like: CNNs, reinforcement learning, least squares regression. With the explosion in data and computational power, we are able to resurrect these decade old ideas and apply them to real-world problems.

In 2009 and 2012, articles were published by McKinsey and the Harvard Business Review, hyping up the role of the data scientist, showing how they were revolutionizing the way businesses are operating and how they would be critical to future business success. They not only saw the advantage of a data-driven approach, but also the importance of utilizing predictive analytics into the future in order to remain competitive and relevant. Around the same time in 2011, Andrew Ng came out with a free online course on machine learning, and the curse of AI FOMO (fear of missing out) kicked in.

Where we are now…

Companies began the search for highly skilled individuals to help them collect, store, visualize and make sense of all their data. “You want the title and the high pay? You got it! Just please come and come quick.” With very little knowledge of what they were looking for, job postings went up.

If you searched ZipRecuiter today, you’d find over 190k open data science positions currently open, each one looking for their own data unicorn. Thus, in an effort to get talent in the door, the definition of what it meant to be a data scientist soon widened with definitions varying from company to company and person to person.

On the other hand, candidates saw a great opportunity: a career with high pay, high demand, and the promise of job security and glory. Everyone rushed to develop all the right skills with one goal in mind: to hold the “sexist job of the 21^st century”.

We have the demand and we have the supply, so what’s the problem? Well, the problem isn’t a shortage of programs to support that demand and capitalize on the hype. It feels like every day there are new courses being developed to satisfy the cravings from aspiring data scientist to break into the field: master’s programs, boot camps and online courses. It’s an arms race to make the right courses with the promise of a Machine Learning job at the end of it. “No PhD? No problem. Just three to six months and a small investment of ~10-15k and you’ll be guaranteed a well-paid job upon graduation.” (wink)

These programs are designed to be a one-stop-shop for everything data science: you learn the programming, the visualization, the modeling-- it’s all there. What you soon discover is that many (surely, not all) of the business problems being faced can be solved using similar approaches, so if you’re looking to apply some algorithm, chances are there’s a library that already exists to help you do just that. Simple right?

Hold up…

If you’ve been paying attention so far, you will have picked up on a few important things so far:

By getting ahead of themselves, companies are hiring data scientist before they have even started collecting the right data (i.e. they are suffering from the Cold Start Problem of AI), meaning you will need to be involved in every step of the data pipeline including data collection, storage, and visualization before you can get to the modeling.
Rushing to get a job in data science (going through one of the above-mentioned methods) means you will be competing against hundreds of thousands of others in the same exact position. Expect that they will have similar projects to yours and similar experience. To get yourself noticed, you will need to find a way to differentiate yourself: showing your creativity and grittiness.

Chances are you won’t be developing algorithms from scratch. Unless you have a lot of extra time on your hands, you’ll most likely on the existing and well-trusted libraries. Why compete against a group of PhDs that helped develop the library and risk putting something less than optimal into production unless you had to develop something specific to your use-case.

Why You Shouldn’t be a Data Science Generalist

I work at a data science mentorship startup,
and I’ve found there’s a single piece of advice that I catch myself
giving over and over again to aspiring mentees. And it’s really not what
I would have expected it to be.

Rather than suggesting a new library or tool, or some resume hack, I find myself recommending that they first think about what kind of data scientist they want to be.

The reason this is crucial is that data science isn’t a single,
well-defined field, and companies don’t hire generic, jack-of-all-trades
“data scientists”, but rather individuals with very specialized skill
sets.

To see why, just imagine that you’re a company trying to hire a data
scientist. You almost certainly have a fairly well-defined problem in
mind that you need help with, and that problem is going to require some
fairly specific technical know-how and subject matter expertise. For
example, some companies apply simple models to large datasets, some
apply complex models to small ones, some need to train their models on
the fly, and some don’t use (conventional) models at all.

Each of these calls for a completely different skill set, so it’s
especially odd that the advice that aspiring data scientists receive
tends to be so generic: “learn how to use Python, build some
classification/regression/clustering projects, and start applying for
jobs.”

Those of us who work in the industry bear a lot of the blame for
this. We tend to lump an excessive number of things into the “data
science” bucket in casual conversations, blog posts and presentations.
Building a robust data pipeline for production? That’s a “data science
problem.” Inventing a new kind of neural network? That’s a “data science
problem.”

That’s not good, because it tends to cause aspiring data scientists
to lose focus on specific problem classes, and instead become jacks of
all trades — something that can make it harder to get noticed or break
through, in a market that’s already saturated with generalists.

But it’s hard to avoid becoming a generalist if you don’t know which
common problem classes you could specialize in in the fist place. That’s
why I put together a list of the five problem classes that are often
lumped together under the “data science” heading:

1. Data engineer

Job description: You’ll be managing data pipelines for
companies that deal with large volumes of data. That means making sure
that your data is being efficiently collected and retrieved from its
source when needed, cleaned and preprocessed.

Why it’s important: If you’ve only ever worked with
relatively small (<5 Gb) datasets stored in .csv or .txt files, it
might be hard to understand why there would exist people whose full-time
jobs it is to build and maintain data pipelines. Here are a couple of
reasons: 1) A 50 Gb dataset won’t fit in your computer’s RAM, so you
generally need other ways to feed it into your model, and 2) that much
data can take a ridiculous amount of time to process, and often has to
be stored redundantly. Managing that storage takes specialized technical
know-how.

Requirements: The technologies you’ll be working
with include Apache Spark, Hadoop and/or Hive, as well as Kafka. You’ll
most likely need to have a solid foundation in SQL.

The questions you’ll be dealing with sound like:

→ “How do I build a pipeline that can handle 10 000 requests per minute?”

→ “How can I clean this dataset without loading it all in RAM?”

2. Data analyst

Job description: Your job will be to translate data
into actionable business insights. You’ll often be the go-between for
technical teams and business strategy, sales or marketing teams. Data
visualization is going to be a big part of your day-to-day.

Why it’s important: Highly technical people often
have a hard time understanding why data analysts are so important, but
they really are. Someone needs to convert a trained and tested model and
mounds of user data into a digestible format so that business
strategies can be designed around them. Data analysts help to make sure
that data science teams don’t waste their time solving problems that
don’t deliver business value.

Requirements: The technologies you’ll be working with include Python, SQL, Tableau and Excel. You’ll also need to be a good communicator.

The questions you’ll be dealing with sound like:

→ “What’s driving our user growth numbers?”

→ “How can we explain to management that the recent increase in user fees is turning people away?”

3. Data scientist

Job description: Your job will be to clean and explore
datasets, and make predictions that deliver business value. Your
day-to-day will involve training and optimizing models, and often
deploying them to production.

Why it’s important: When you have a pile of data
that’s too big for a human to parse, and too valuable to be ignored, you
need some way of pulling digestible insights from it. That’s the basic
job of a data scientist: to convert datasets into digestible
conclusions.

Requirements: The technologies you’ll be working
with include Python, scikit-learn, Pandas, SQL, and possibly Flask,
Spark and/or TensorFlow/PyTorch. Some data science positions are purely
technical, but the majority will require you to have some business
sense, so that you don’t end up solving problems that no one has.

The questions you’ll be dealing with sound like:

→ “How many different user types do we really have?”

→ “Can we build a model to predict which products will sell to which users?”

4. Machine learning engineer

Job description: Your job will be to build, optimize
and deploy machine learning models to production. You’ll generally be
treating machine learning models as APIs or components, which you’ll be
plugging into a full-stack app or hardware of some kind, but you may
also be called upon to design models yourself.

Requirements: The technologies you’ll be working
with include Python, Javascript, scikit-learn, TensorFlow/PyTorch
(and/or enterprise deep learning frameworks), and SQL or MongoDB
(typically used for app DBs).

The questions you’ll be dealing with sound like:

→ “How do I integrate this Keras model into our Javascript app?”

→ “How can I reduce the prediction time and prediction cost of our recommender system?”

5. Machine learning researcher

Job description: Your job will be to find new ways to
solve challenging problems in data science and deep learning. You won’t
be working with out-of-the-box solutions, but rather will be making your
own.

Requirements: The technologies you’ll be working
with include Python, TensorFlow/PyTorch (and/or enterprise deep learning
frameworks), and SQL.

The questions you’ll be dealing with sound like:

→ “How do I improve the accuracy of our model to something closer to the state of the art?”

→ “Would a custom optimizer help decrease training time?”

The five job descriptions I’ve laid out here definitely don’t stand
alone in all cases. At an early-stage startup, for instance, a data
scientist might have to be a data engineer and/or a data analyst, too.
But most jobs will fall more neatly into one of these categories than
the others — and the larger the company, the more these categories will
tend to apply.

Overall, the thing to remember is that in order to get hired, you’ll
usually be better off building a more focused skillset: don’t learn
TensorFlow if you want to become a data analyst, and don’t prioritize
learning Pyspark if you want to become a machine learning researcher.

Think instead about the kind of value you want to help companies
build, and get good at delivering that value. That, more than anything
else, is the best way to get in the door.

Learning Machine Learning vs Learning Data Science

SWOT Analysis

A SWOT analysis is an incredibly simple, yet powerful tool to help you develop your business strategy, whether you’re building a startup or guiding an existing company.
SWOT stands for Strengths, Weaknesses, Opportunities, and Threats.
Strengths and weaknesses are internal to your company—things that you have some control over and can change. Examples include who is on your team, your patents and intellectual property, and your location.
Opportunities and threats are external—things that are going on outside your company, in the larger market. You can take advantage of opportunities and protect against threats, but you can’t change them. Examples include competitors, prices of raw materials, and customer shopping trends.
A SWOT analysis organizes your top strengths, weaknesses, opportunities, and threats into an organized list and is usually presented in a simple two-by-two grid. Go ahead and download our free template if you just want to dive right in and get started.

Here’s what the layout of a SWOT analysis looks like.

When you take the time to do a SWOT analysis, you’ll be armed with a solid strategy for prioritizing the work that you need to do to grow your business.
You may think that you already know everything that you need to do to succeed, but a SWOT analysis will force you to look at your business in new ways and from new directions. You’ll look at your strengths and weaknesses, and how you can leverage those to take advantage of the opportunities and threats that exist in your market.

Who should do a SWOT analysis?

For a SWOT analysis to be effective, company founders and leaders need to be deeply involved. This isn’t a task that can be delegated to others.
But, company leadership shouldn’t do the work on their own, either. For best results, you’ll want to gather a group of people who have different perspectives on the company. Select people who can represent different aspects of your company, from sales and customer service to marketing and product development. Everyone should have a seat at the table.
Innovative companies even look outside their own internal ranks when they perform a SWOT analysis and get input from customers to add their unique voice to the mix.
If you’re starting or running a business on your own, you can still do a SWOT analysis. Recruit additional points of view from friends who know a little about your business, your accountant, or even vendors and suppliers. The key is to have different points of view.
Existing businesses can use a SWOT analysis to assess their current situation and determine a strategy to move forward. But, remember that things are constantly changing and you’ll want to reassess your strategy, starting with a new SWOT analysis every six to 12 months.
For startups, a SWOT analysis is part of the business planning process. It’ll help codify a strategy so that you start off on the right foot and know the direction that you plan on going.

How to do a SWOT analysis the right way

As I mentioned above, you want to gather a team of people together to work on a SWOT analysis. You don’t need an all-day retreat to get it done, though. One or two hours should be more than plenty.
Gather people from different parts of your company and make sure that you have representatives from every part. You’ll find that different groups within your company will have entirely different perspectives that will be critical to making your SWOT analysis successful.
Doing a SWOT analysis is similar to brainstorming meetings, and there are right and wrong ways to run them. I suggest giving everyone a pad of sticky-notes and have everyone quietly generate ideas on their own to start things off. This prevents groupthink and ensures that all voices are heard.
After five to 10 minutes of private brainstorming, put all the sticky-notes up on the wall and group similar ideas together. Allow anyone to add additional notes at this point if someone else’s idea sparks a new thought.
Once all of the ideas are organized, it’s time to rank the ideas. I like using a voting system where everyone gets five or ten “votes” that they can distribute in any way they like. Sticky dots in different colors are useful for this portion of the exercise.
Based on the voting exercise, you should have a prioritized list of ideas. Of course, the list is now up for discussion and debate, and someone in the room should be able to make the final call on the priority. This is usually the CEO, but it could be delegated to someone else in charge of business strategy.
You’ll want to follow this process of generating ideas for each of the four quadrants of your SWOT analysis: Strengths, Weaknesses, Opportunities, and Threats.

Questions that can help inspire your analysis

Here are a few questions that you can ask your team when you’re building your SWOT analysis. These questions can help explain each section and spark creative thinking.

Strengths

Strengths are internal, positive attributes of your company. These are things that are within your control.

What business processes are successful?
What assets do you have in your team, such as knowledge, education, network, skills, and reputation?
What physical assets do you have, such as customers, equipment, technology, cash, and patents?
What competitive advantages do you have over your competition?

Weaknesses

Weaknesses are negative factors that detract from your strengths. These are things that you might need to improve on to be competitive.

Are there things that your business needs to be competitive?
What business processes need improvement?
Are there tangible assets that your company needs, such as money or equipment?
Are there gaps on your team?
Is your location ideal for your success?

Opportunities

Opportunities are external factors in your business environment that are likely to contribute to your success.

Is your market growing and are there trends that will encourage people to buy more of what you are selling?
Are there upcoming events that your company may be able to take advantage of to grow the business?
Are there upcoming changes to regulations that might impact your company positively?
If your business is up and running, do customers think highly of you?

Threats

Threats are external factors that you have no control over. You may want to consider putting in place contingency plans for dealing them if they occur.

Do you have potential competitors who may enter your market?
Will suppliers always be able to supply the raw materials you need at the prices you need?
Could future developments in technology change how you do business?
Is consumer behavior changing in a way that could negatively impact your business?
Are there market trends that could become a threat?

Example of a SWOT analysis

To help you get a better sense of what at SWOT example actually looks like, we’re going to look at UPer Crust Pies, a specialty meat and fruit pie cafe in Michigan’s Upper Peninsula. They sell hot, ready-to-go pies and frozen take-home options, as well as an assortment of fresh salads and beverages.
The company is planning to open its first location in downtown Yubetchatown and is very focused on developing a business model that will make it easy to expand quickly and that opens up the possibility of franchising. Here’s what their SWOT analysis might look like:

SWOT analysis for UPer Crust Pies

What to do next

With your SWOT analysis complete, you’re ready to convert it into real strategy. After all, the exercise is about producing a strategy that you can work on during the next few months.
The first step is to look at your strengths and figure out how you can use those strengths to take advantage of your opportunities. Then, look at how your strengths can combat the threats that are in the market. Use this analysis to produce a list of actions that you can take.
With your action list in hand, look at your company calendar and start placing goals (or milestones) on it. What do you want to accomplish in each calendar quarter (or month) moving forward?
You’ll also want to do this by analyzing how external opportunities might help you combat your own, internal weaknesses. Can you also minimize those weaknesses so you can avoid the threats that you identified?
Again, you’ll have an action list that you’ll want to prioritize and schedule.
Back to the Uper Crust Pies example: Based on their SWOT analysis, here are a few potential strategies for growth to help you think through how to translate your SWOT into actionable goals.

Uper Crust Pies: Potential strategies for growth

Investigate investors. UPer Crust Pies might investigate its options for obtaining capital.
Create a marketing plan. Because UPer Crust Pies wants to execute a specific marketing strategy—targeting working families by emphasizing that their dinner option is both healthy and convenient—the company should develop a marketing plan.
Plan a grand opening. A key piece of that marketing plan will be the store’s grand opening, and the promotional strategies necessary to get UPer Crust Pies’ target market in the door.

With your goals and actions in hand, you’ll be a long way toward completing a strategic plan for your business. I like to use the Lean Planning methodology for strategic plans as well as regular business planning. The actions that you generate from your SWOT analysis will fit right into milestones portion of your Lean Plan and will give you a concrete foundation that you can grow your business from.

MSBI TIPS - Collection of dailly notes