What Comes First After Data Collection in Psychological Research?- Descriptive Statistics

Before psychologists run their fancy stats tests to find deep truths about human behaviour, they have to do something way less glamorous: sort the data, just like you sort your clothes before tossing them into the washing machine (or at least… you should). That’s exactly what Descriptive Statistics does. It doesn’t prove revelating facts or answers, it doesn’t make big claims — it just helps you figure out what kind of data you’re dealing with. It’s the difference between ruining all your white shirts with one rogue red sock — or knowing exactly what you’ve got before you hit ‘start.’ In research, just like in laundry, sorting first saves you from a total mess later.

If words like mean, median, or standard deviation make you panic, relax. Descriptive stats are just like sorting your laundry: you’re not solving mysteries yet, just figuring out what’s what so nothing bleeds, shrinks, or explodes later. It’s the first step to keeping your data (and your sanity) clean.

Let’s see how we can make sorting data as easy as doing laundry in a machine.

Descriptive Statistics = Sorting Your Data Laundry

“Just like you can sort laundry by colour, fabric, or how bad it smells, you can sort your data in different ways using descriptive statistics — like by its average (mean), middle value (median), or most frequent value (mode).”

Mean – “Add ’em all up, divide by the number of items !”

Ex- 2,3,1,2,2 (5 number of items )

2+3+1+2+2 =10 then 10/5 = 2(average)

Only works with numbers. It gives you the overall average.

Median – “Line up your values in increasing or decreasing order and pick the one in the middle.”

Ex- 2,3,1,2,2

1,2,2,2,3 (lined up)

Middle value is 2

This shows the midpoint —Median is great when your data is a bit messy or has one number that’s way off from the rest(outlier).

Mode – “The value that shows up the most.”

Ex- 2,3,1,2,2

2 appears most often in the data hence MODE

Perfect for finding the most common thing, especially in non-numerical data.

“Sorting and washing clothes is tricky as it is . Sorting data? We’ll make it easy.”

“Let’s put this new information to use and see how much of it you’ve grasped.”

Let’s say you did laundry 5 times last month and tracked how many scoops of detergent you used each time:

Number of sccops each day for 5 days –

2(Monday ), 3(Tuesday), 2(Wednesday), 4(Thursday), 3(Friday)

You’re doing this because you’re trying to figure out how much detergent you’ll need for the rest of the semester. You’re not going home until the break, and there’s no way you’re dragging a giant detergent tub around for no reason.

So, you wish to find out – What’s your usual usage?

Let’s apply Central Tendencies- which one of the central tendencies would you pick now, given you have those three (Mean, Median and Mode)

Best Measure: MEAN – Why?

You’re working with numbers, not categories.
There are no extreme outliers (like suddenly using 8 scoops).
You’re trying to estimate future usage → planning requires the average.

📊 Mean = (2+3+2+4+3) ÷ 5 = 2.8 scoops per wash

If you thought of any other type of centreal tendency meausure then revisit “When to use which”.

After Central Tendency comes Measures of Variability

(Range ,Interquartile Range (IQR), Variance, Standard Deviation)

Range – The difference between the highest and lowest values

Interquartile Range (IQR) – The range of the middle 50% of the data. Helps avoid outliers

Variance – The average of squared differences from the mean

Standard Deviation – How much, on average, data values differ from the mean

“Don’t be scared — we’ll break it down and explore how it really works, together.”

Before we get into variance and standard deviation, let’s first understand range and interquartile range (IQR) — and more importantly, why we even need these measures of variability after we’ve already used central tendency.

Range

Range tells us how widely spread the data is.

Using the t-shirt size example, if people buy sizes from XS to XXL, that means the data is spread across a wide range of sizes.

So, Range = XXL – XS → the full stretch from smallest to largest size sold.

So, Range = XXL – XS → the full stretch from smallest to largest size sold.

Interquartile Range (IQR)

Now, IQR focuses on the middle 50% of the data — the “bulk” of the sizes people actually buy, without getting distracted by rare extremes like XS or XXL.

In our t-shirt example, even if the full range goes from XS to XXL,

Most people may buy S, M, or L.

That’s the IQR — where the middle group of buyers fall.

So, while range looks at the full size spread,
IQR zooms in on the most common zone — the “crowded section of the rack.”

Now that we have range and IQR sorted, let’s dive into Variance and Standard Deviation.

Variance and its buddy Standard Deviation helps you see:

How spread out the data is
Whether values are clustered around the mean or scattered
And whether there are any values sitting way off to the side — a.k.a. outliers
If you calculate variance/Standard Devistaion and it’s very high, that’s a clue you’ve got extreme outliers and your data will be skewed.

If variance and standard deviation both measure how spread out the data is,
then why do we need both? Why not just stick to one?

Standard deviation and variance show the same thing — just in different forms:

Variance is in squared units, which makes it harder to understand.
(If your data is in meters, your variance will come out in square meters — not super intuitive!)
Standard Deviation is the square root of variance, so it’s in the same units as your original data.(If your data is in meters, your SD will also be in meters, much easier to make sense of!)
So while variance tells you about spread, standard deviation tells you that same spread in a way your brain can actually picture.

You may wonder why we need to measure variance when we already have descriptive statistics. Why not just use statistical tools like t-tests or ANOVA directly?

You tracked your detergent use — average comes out to 2.8 scoops a day. So, you pack exactly that for the semester.

But what if one week, you’re sweating more, playing football, and your roommate spills ketchup on your sheets?
Boom — now you’re using 4 scoops a day. That average? Not enough.

Now imagine you’re super consistent — 3 scoops every single day. Easy. The average works perfectly.

So what’s the difference?
Variance.

It tells you how much your daily use bounces around the average. Are you steady… or all over the place?

And in research? Same deal.
Low variance = people gave similar responses.
High variance = answers were everywhere.

That changes how much you can trust the results.
Average is nice — but variance tells you how real life actually behaves.

After Choosing a Measure of Central Tendency, What Variability Measure Do You Use?

➤ If you used Mean:

Use: Standard Deviation or Variance
Why? They both consider all values and show how far data spreads from the mean.

➤ If you used Median:

Use: Interquartile Range (IQR)
Why? Both focus on the middle part of the data and ignore outliers.

➤ If you used Mode:

Use: Frequencies or Category counts
Why? Mode works with categorical data, so numerical spread doesn’t apply.

After identifying the central tendency (mean, median, mode) and measuring the variability (range, IQR, variance, standard deviation), the next step is to choose the right statistical test. This choice depends on both the type of data (numerical, categorical, or ranked) and how the data is distributed. If the data is numerical and normally distributed, you can apply parametric tests like the t-test or ANOVA. But if the data is not normally distributed, contains outliers, or is categorical or ordinal, then non-parametric tests such as the Mann–Whitney U test, Chi-Square test, or Kruskal–Wallis test are more appropriate. In short, your decision depends on understanding both what kind of data you have and how it behaves.

You might be wondering — “Why all these math terms and not a single scary equation yet?”
That’s because our goal is to help you understand the concepts first — what these terms mean and why they matter — before jumping into formulas.

Especially for the CUET exam, you’re tested more on your understanding of statistical ideas, not on solving complex numerical problems.
So once the concept clicks, the equations (when they do show up) won’t feel intimidating at all.

After your done with Centrel tendencies and measure of variance then come the visual representation –

Visual Representation of Data — Like Drawing What Numbers Say!

So now that we’ve measured our data (with mean, median, mode, and variance), it’s time to see it. Because sometimes looking at numbers is boring — but pictures? Easy!

It’s like turning numbers into drawings so our brain says, “Oh! Now I get it!”

Visuals make data easy: – Quick Rank (from exam + concept usefulness)

Histogram for groups
Box Plot for spread
Bar Graph for compare
Line for change
Pie for % shares

let’s see how all of this pans out-

Histogram

Say you asked 100 people how old they are. You don’t list 1years, 2years, 3 years till 100 years old!
You group them like:

0–10 years
11–20 years … so on and so forth
And then draw bars for each group.

Useful to see how data spreads out — like most people are between 20–30!

This makes patterns easier to see.

for eaxmple – Are most people young?

Is the data evenly spread?

Is there a spike in some age range?

Box Plot

It shows:

The smallest value
The biggest
The middle (median)
And where most data is squeezed in.

It’s like peeking inside a packed suitcase — you see how data is stuffed!

It tells you:

Where most data clusters (the box)
Where it starts and ends (min & max)
If there’s a big gap or everything’s tightly packed.
(It’s a quick snapshot of variability.)

2. Reveals the Median

3. Catches Outliers

Bar Graph

Each bar is like a building — taller means more.
Used to compare things.

Compares categories or groups – People who prefer cats vs dogs, Like how many mangoes, apples, and bananas you ate last week.

Line Graph

Imagine a dot for each number, and you join the dots — like connect-the-dots!
Used when we want to see how something changes over time.

Tracks progress and looks for trends – Stock prices, Temperature this week, Like your mood on Monday vs Friday.

Pie Chart

The circle is the whole pizza (100%).
Each slice = a piece of data.

If you spent your day like:

8 hrs sleeping
6 hrs studying
2 hrs chilling
You slice your pizza to show how much of the day each thing takes.

Best when showing parts of a whole – Budget: where your money goes, How you spend your day (sleep, study, phone, eat).

Key Formulas in Central Tendency & Variability

CENTRAL TENDENCY

Mean (Average)

Mean=Sum of all values/number of values

Median (Middle value)

Line up all numbers in increasing and decressing oredr
Odd number of values → Middle value after sorting
Even number of values → add the middle two values and dive by 2

Mode
→ The value that appears most frequently in the data
(No formula — just count!)

MEASURES OF VARIABILITY

Range

Interquartile Range (IQR)

Variance (σ² or s²)

For a population:

For a sample:

Where:
x = each data point
μ or x̄ = mean
N = total number of data points (population)
n = sample size

Standard Deviation (σ or s)

For a population:

For a sample:

Tip:

If you’re not solving numericals (like in CUET),
you just need to know:

What each formula measures
When to use it
How it connects to the type of data

Central Tendency & Variability – Concept-Based MCQs

1. Which measure of central tendency is most sensitive to outliers?
A. Mean
B. Median
C. Mode
D. Range

2. If a data set is skewed due to a few very high values, the best measure of central tendency is:
A. Mean
B. Mode
C. Median
D. Standard Deviation

3. Mode is most useful when dealing with:
A. Skewed numerical data
B. Categorical data
C. Small datasets
D. Continuous data

4. If the mean = median = mode, the distribution is likely:
A. Skewed
B. Bimodal
C. Uniform
D. Normal

5. Which measure tells you the most frequently occurring value in a dataset?
A. Mean
B. Median
C. Mode
D. Range

6. What does the range measure?
A. The average distance from the mean
B. The spread of the middle 50%
C. The difference between the highest and lowest values
D. The most frequent value

7. Standard deviation is preferred over range because:
A. It ignores outliers
B. It is based on all data values
C. It is easier to calculate
D. It’s used for categorical data

8. Which of the following is a measure of spread that works well with the median?
A. Variance
B. Standard Deviation
C. IQR (Interquartile Range)
D. Mean

9. If standard deviation is low, it means the data is:
A. Widely spread out
B. Skewed
C. Closely clustered around the mean
D. Has many modes

10. Two datasets have the same mean, but Dataset A has a much higher standard deviation than Dataset B. What can you conclude? (Hard)
A. Dataset A has more values close to the mean than Dataset B
B. Dataset B is more consistent than Dataset A
C. Both datasets are equally spread out
D. The mean of Dataset A is not reliable

If you could do these know your foundation for stats is between moderate -strong.