Statistical Tools used in Marketing

Ever since I got myself to work on analytics, there have been times when I have seen my friends and other people use all kinds of statistical tools to perform analysis on data. But. Isn’t this a good thing?

Yes. It is, provided that we exactly know how, when, and why is a particular tool used. Herein lies the gap. This issue got more prominence when I along with some of my friends had been asked to complete a research on different topics, with each of us being asked to use some of the other statistical tests to get some insights and analysis on data.

From there, started a pattern which I saw in almost each of us – scrambling across the Internet and across various tools to apply any random method which came to us. This, without exactly understanding why is that test used. Hence, the need for this article came up.

This article will not be explaining the major statistical analysis methods. Instead, it would be taking the smaller route of explaining basic methods and outlining when they should be used. It will be a pretty long read, so gear up for it.

To start with. All of us have come across the term ‘hypothesis’, right? 

What is the hypothesis?

A hypothesis is – to put it simply – an assumption. It is an assumption which analysts take, intending to prove the assumption during the analysis. From now on, whenever I’ll be talking about a hypothesis, I’ll use the term ‘assumption’.

For ease of understanding, you can think of it as – the sole motive of research or analysis is to complete the objectives using some hypothesis (or assumptions). Now since it’s an assumption, you will prove that assumption with an experiment with a certain set of data, based on which the assumption can be accepted or rejected. 

Now. Is it always required to formulate a hypothesis every single time? The answer is ‘NO’. It is recommended that we use hypothesis testing to test out our hypothesis (assumptions), but it is not always necessary. They can be absent in descriptive analysis and studies. 

Generally, we define a hypothesis in our research in two ways:

  1. Null Hypothesis (H0):  Simply says there is no difference between certain characteristics of a poulation.For instance – H0Application of bio-fertilizer does not increase the plant growth.
  2. Alternate Hypothesis (H1 or Ha): Put the thing which you expect to prove through your research. For instance – HaApplication of bio-fertilizer increases the plant growth.

The next common question which we generally come across is “What kind of data does our research has?”

Well to answer this question, have a look at the below image.

The different types of data

Now. We always focus on creating models, while doing any sort of analysis. But do we exactly know what does a model speak, and what is it about?

A model is something, which perfectly and adequately describes & explains your data. It acts as a generalized view of the data which we have. It gives a summary of our data in a simplified manner. Remember the ‘best fit’ line of the graph? Consider the statistical model as the ‘best fit’ line of our data. They show the relationship between the data variables.

We often have the habit of speaking “I have a dataset. I’ll create a model from it and show which variables are significant.” Here, the model which we’ll create explains the data which we have. This model (based on its accuracy & precision) stands true and can explain any data of the same kind. Thus, lending itself the term “a generalized view of data”.

From here, I proceed to the main focus of the article – statistical models. I will list out five commonly used statistical tools in this article and will be giving out the three W’s about them: the What, the Why, the When.


Consider a scenario, where you are trying to understand a topic. We call it the ‘dependent variable’ (DV). Now to completely understand this topic, there are certain other things or factors which explain it, right? Let me call them as ‘independent variables’ (IV). We assume (remember hypothesis?) that these independent variables have some impact on our topic.

Regression helps in finding out to what extent do these IV’s have an impact on the topic (DV) – hence finding out the relationship between these variables. 

Why should we use it? To check the influence of one factor on our main topic.  Example: If I want to find out what factors lead to the price setting of a consumer product. And my team says it is dependent on consumption volume, demand, and costs of production, then these become my independent variables (IV) which have some or the other effect on my price. To what extent do they effect, this is determined by regression. It gives us the importance of a factor and tells us whether we have to invest our time and money on that factor or not. 

When to use it? Whenever you have to find out a relationship between two variables or predict one variable with the help of the other, we use regression. There are numerous types of regression, but we’ll be covering them in later articles.


What is it? ANOVA stands for Analysis of Variance. Useful for continuous data. 

Consider two groups of people:

  1. Group 1: People drink coffee from Starbucks.
  2. Group 2: People drink coffee from Dunkin Donuts.

And you have been given the task to check whether there is any sort of a difference between these two groups or not, and ultimately to find out which coffee is better: Starbucks or Dunkin Donuts. This is where an ANOVA test is used. You are testing groups to see if there is any difference in the coffees by Starbucks and Dunkin’.

Why is it used? ANOVA is useful to know how different consumer groups (here, Group 1 and Group 2) respond. Are there any significant differences between the responses of different groups, or are they the same? If they are different, then how much different are they?

Now ANOVA also has many types going for it, the most common types being one-way and two-way ANOVA. We’ll keep the detailed explanation to a later stage, giving just a slight mention of the different types. ANOVA works through means and variances, thereby comparing their means and analysing the variance of both the groups being tested. Now since you define a hypothesis (an assumption) before defining the test you are going to use, an ANOVA should always be used when – in the alternate hypothesis, at least one of the values is different.  

As an example,

H0: u1=u2=u3

Ha can be u1=u2 not equal to u3; or u1=u3 not equal to u2 (atleast one of the values has to be different).

When is it used? To take an example, when you have a customer group and two different product types, and you want to find out how does the group respond to the different products – usage of ANOVA will be preferred.

Chi-Square Test

Hope you’ve learnt about categorical data from the above picture. Now, we’ll take a test which is most suitable for categorical data – the Chi-Square test. 

To keep this simple, I will say that this is used to test relationships between two categorical variables. It also signifies which categorical variable is important and which is not. Let me take an example. If we have to find out a relation between the gender (Male/ Female) which is the first categorical variable & aggressive driving (Yes/No) as the second categorical variable, we use a Chi-Square test to check whether the gender types have “any” relation with the aggressive driving or not.

It also tests out whether there is any sort of association between the categorical variables. Also. Chi-Square test works on frequencies. It tests the distribution of frequencies in different categories. This is carried out through hypothesis testing. 

Factor Analysis

Consider a scenario, where you feel there are 3 things (factors) which affect pricing a product (P) – production cost (PC), consumption (C), and demand (D). Each of these 3 factors has a further of 3 sub-factors (or variables).

For PC, you have PC1, PC2, PC3.
For C, you have C1, C2, C3.
For D, you have D1, D2, D3.

And you need to find out which of these factors is the most important or have the most impact on pricing a product (P).

Architecture of a factor analysis

Here comes the role of factor analysis. To explain this simply, it accesses those 3 variables in each of those factors and allocates a particular score/index to each factor. It then analyses each of those allotted indexes and determines which is the best factor. This is factor analysis explained in simple terms. Of course, there are terms & jargons associated with it, but we’ll keep that to a later article.

When to use it? To be majorly used when there a lot of factors involved in your dataset, and you have to reduce them to a minimal number (to reduce a large number of variables into fewer factors).

Conjoint Analysis

This is used for comparing different brands/products based on certain attributes and features. What factors of products are valued by people? – to find out what exactly is more important for them.  This is also hugely employed for product or service analysis to ultimately better them in the future.

Laying out the product features in front of the consumer, getting data on their choices, and analysing which feature they prefer or like the most. That’s all about why a conjoint analysis should be used.

When to use it? Should ideally be used when we have to know which product features does the consumer prefers or values more.

Now that I have listed out 5 commonly used statistical tools, let me end this article by telling you how these can be used in marketing.

They help in every single aspect of marketing, but these are most importantly used in the field of market research. Creating market scenarios and market models, forecasting models, and resource allocation. These help in thinking of the features which the business can implement, what and how will this exactly impact their sales and revenues. Statistical tools help in comparing customer choices, preferences and also the reason behind them. The conjoint analysis also helps businesses gain insights about how do the customer value product attributes and features – helps in determining the idea behind the consumer buying a product.

The usage of these tools and more, are endless in this ocean of analytics. With this, we have covered a very basic overview of some commonly used tools. We hope you now know what does each of these mean and when should they be used. We’ll be bringing in more detailed articles on each of those tools. Till then, continue learning and knowing more about the world of analytics.

One thought on “Statistical Tools used in Marketing

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s