The top-5 data science skills that don't involve "sklearn"

The top-5 data science skills that don't involve "sklearn"

There is so much emphasis and insecurity about knowing the greatest and latest machine learning libraries and modeling techniques, but what makes an exceptional data scientist has almost nothing to do with machine learning. Here are the top-5 skills you can hone to be an invaluable data scientist (see slideshow below):

  1. Identify the right problem: Unfortunately, you can’t throw a neural network at every problem. Spend time finding the right problem to tackle as opposed to diving straight into modeling as soon as you get a dataset. Working on a value-adding problem is half the battle.

  2. Develop data empathy: 9 times out of 10, a data scientist inherits the data she is asked to analyze. Spend time exploring your data and talking to those who have used it or collected it to develop data empathy. After this investigation you should be able to answer questions such as: How were the data collected? By whom? For what purpose? What do NaNs mean? Were the data imputed? What are the potential biases in the data? etc. For more on data empathy see my 2014 paper.

  3. Correlation isn’t causation: the majority of AI/ML methods are correlation based but businesses are most often interested in causal relationships. The general rule is: you cannot draw causal relationships from a predictive model.

  4. Cross the “so what” chasm: it’s wonderful that you got an AI algorithm to work and give you an output. However, if you want to be an indispensable data scientist you need to translate your model’s output into an operationally useful recommendation that a stakeholder can use.

  5. Help anyone you can even if it’s not related to data science: You are more than a data scientist who analyzes data! If someone needs help at work, you are smart and capable. Don’t just brush it to the side because it is not in your technical wheelhouse.