A breakdown of the areas of statistics you need to know for an entry-level data science role, and helpful resources
Let's be honest: Mathematics, especially statistics, can be very scary.
In a previous post, I discussed the mathematics required to become a good data scientist. In short, he needs to know three main areas: linear algebra, calculus, and statistics.
Now, statistics are the most useful and important to fully understand. Statistics is the backbone of many data science principles and is used every day. Machine learning also grew out of statistical learning theory.
I'd like to devote an entire post to a detailed roadmap of the statistical knowledge you need to know as a data scientist and the resources to learn all of this.
Obviously, statistics is a huge field, and you won't be able to learn everything about it, especially when there is active research going on. However, if you have a solid working knowledge of the topics covered in this article, you are in a very good position.
If you want a complete overview of this field, this Wikipedia article provides a comprehensive overview of the statistics.
Wikipedia defines statistics as
“a statistics (singular) or sample statistics Any quantity calculated from the values in a sample that is considered for statistical purposes. ”
In other words, statistics summarize information about specific data, samples, or populations. Therefore, the first thing aspiring data scientists should know is the various summary statistics to describe data.
Summary statistics typically measure four things: location, extent, shape, and dependence. Below is a list of important things to know.
- Mean, mode, median.
- Variance, standard deviation, coefficient of variation.
- Skewness and kurtosis.
- Percentiles, quartiles, and interquartile ranges.