Statistical Analysis for Data science

Statistics has proven to be the most important game changer within the context of a business within the 21st century, resulting in the boom of the new oil, that’s “Data”.

Through this blog, we aim to supply a definitive understanding to the reader on how the method of Statistical Analysis for Data Science are often done on an actual business use case.

Let’s start :

Data are often analysed to urge valuable insights, but when analysis isn’t done, data is simply a bunch of numbers that wouldn’t make any sense.

According to Croxton and Cowden,

Statistics may be a Science of Collection, Presentation, Analysis and Interpretation of any numerical data.

A few examples include:

Route Optimisation in Airlines Industry
ROI Prediction of a corporation
Stock Market Share Price Prediction
Predictive Maintenance in Manufacturing
For any data set, statistical analysis for Data Science are often done consistent with the six points as shown below. They form the skeleton of statistical analysis.

The steps are as follows :

Defining business objective of study
Collection of knowledge
Data Visualization
Data Pre-Processing
Data Modelling
Interpretation of knowledge

Step 1: Defining the target of the analysis :
The first step is to know the business objective and therefore the reason for the analysis.

Objective also can be an exercise that’s wont to reduce costs, improve efficiency etc….

In this case, our objective is obvious . it’s to predict the number which will be sold for December 2020 using the past data.

Step 2: Collection of knowledge
This is the foremost important step within the analysis process. Because here you’ve got to gather the specified data from various sources.

Step 3: Data Visualization
This step is crucial because it will help us understand the non-uniformities of the info during a data set. this may help us visualize the info during a manner which will help us fill the gaps and expedite the method of study . Various tools like Tableau,Power BI are often used for the aim of knowledge Visualization.

Step 4: Data Pre-Processing
I. Data preprocessing/Data wrangling/Data cleaning:

Data preprocessing is that the process of gathering, selecting, and reworking data for easier data analysis. it’s also referred to as data cleaning or munging. it’s the foremost important process, because it accounts to 80% of the entire duration of study .

Step 5 : Data Modelling :
After data preprocessing, the info is prepared for analysis. We must choose statistical techniques like ANOVA, Regression or the other methods, supported the variables within the data.

To find the sales for the month of Dec 2020, we’ll use the moving average technique.

Note : There are many techniques like Moving Average, Exponential Smoothing, Advanced Smoothing etc… which will be used for forecasting sales. Here supported objective, the author’s inclination is towards the moving average technique.

Based on the info , the sixth month moving average is 245. Here’s how we got the moving average.

Step 6: Interpretation:
We then come to the ultimate step of our analysis which is Interpretation. supported modelling analysis,our interpretation is that, for the month of December 2020 we will sell 245 packets of 1kg quantity. during this way, we will predict the longer term sales using historical data.


The 6 steps during this blog, enhances your understanding of varied applications of statistical concepts in Data Science. Further stats are often divided into various categories like Descriptive Statistics, Inferential Statistics, Predictive Stats etc…. supported the info set and objective we affect . inspect these blogs now and understand how each of those aspects of stats are often utilized in detail.

Resource Article :

Add a Comment

Your email address will not be published. Required fields are marked *