Finding Districts Above Average Using JOIN + Subqueries

This topic is empty.

Viewing 1 post (of 1 total)

Author

Posts
March 19, 2026 at 6:31 pm #6241
Rajeev Bagra
Keymaster
📌 Problem Context

Imagine you are working with an education dataset containing three tables:
- districts → names of school districts
- expenditures → per-pupil spending for each district
- staff_evaluations → performance metrics (e.g., exemplary ratings)
🎯 Your Goal

You want to answer:
Which districts have both:
- Above-average per-pupil expenditure
- Above-average staff evaluation scores?
This is a very common real-world analytics problem.

🧾 Initial Attempt (Common Mistake)

A beginner might write:
```
SELECT districts.name, expenditures.per_pupil_expenditure, staff_evaluations.exemplary
FROM districts
JOIN expenditures
ON districts.id = expenditures.district_id
JOIN districts.id = staff_evaluations.district_id
WHERE expenditures.per_pupil_expenditure > AVG(expenditure.per_pupil_expenditure) AND
staff_evaluations.exemplary > AVG(staff_evaluations.exemplary);
```
❌ What Went Wrong?

1️⃣ Incorrect JOIN syntax
```
JOIN districts.id = staff_evaluations.district_id
```
SQL requires:
```
JOIN staff_evaluations
ON districts.id = staff_evaluations.district_id
```
2️⃣ Using AVG() directly in WHERE
```
WHERE value > AVG(column)
```
This doesn’t work because:
- WHERE processes rows one at a time
- AVG() calculates a single value from many rows
3️⃣ Typo in table name
```
AVG(expenditure.per_pupil_expenditure)
```
Correct:
```
AVG(expenditures.per_pupil_expenditure)
```
🧠 Key Insight

You cannot compare a row directly with an aggregate unless you:

👉 First compute the aggregate separately
👉 Then compare using that result

This is where subqueries come in.

✅ Correct Solution
```
SELECT 
    districts.name,
    expenditures.per_pupil_expenditure,
    staff_evaluations.exemplary
FROM districts
JOIN expenditures
ON districts.id = expenditures.district_id
JOIN staff_evaluations
ON districts.id = staff_evaluations.district_id
WHERE expenditures.per_pupil_expenditure > (
    SELECT AVG(expenditures.per_pupil_expenditure)
    FROM expenditures
)
AND staff_evaluations.exemplary > (
    SELECT AVG(staff_evaluations.exemplary)
    FROM staff_evaluations
);
```
🔍 Step-by-Step Explanation

1️⃣ Joining the Tables
```
FROM districts
JOIN expenditures
ON districts.id = expenditures.district_id
JOIN staff_evaluations
ON districts.id = staff_evaluations.district_id
```
This combines:
- District name
- Spending data
- Staff evaluation data
2️⃣ Subquery for Average Spending
```
SELECT AVG(expenditures.per_pupil_expenditure)
FROM expenditures
```
👉 Returns a single number (e.g., 15000)

3️⃣ Subquery for Average Evaluation
```
SELECT AVG(staff_evaluations.exemplary)
FROM staff_evaluations
```
👉 Returns another number (e.g., 85)

4️⃣ Filtering the Results
```
WHERE value > (average)
```
Now SQL compares:
- Each row’s value
- Against a single computed average
✔ This is valid and works correctly

📊 Example Output

district expenditure exemplary

District A 18000 90

District B 17000 88

These districts meet both conditions:

✔ Above-average spending
✔ Above-average performance

⚡ Important Rule

❌ Incorrect ✅ Correct

WHERE value > AVG(column) WHERE value > (SELECT AVG(column))

🧩 Mental Model

Think of SQL execution like this:
1. Compute averages (subqueries)
2. Join tables
3. Filter rows using those averages
🚀 Why This Matters

This pattern is widely used in:
- Business analytics
- Education data analysis
- Economics research
- Data science workflows
- SQL interviews and exams (e.g., CS50 SQL)
✅ Final Takeaway

Whenever you need to compare a row with an overall statistic:
- Use subqueries with aggregate functions
- Ensure proper JOIN syntax
- Be precise with table and column names
Mastering this pattern allows you to answer powerful questions like:

“Which entities perform better than average?”

And that’s a key skill in data analysis.
Author

Posts

district	expenditure	exemplary
District A	18000	90
District B	17000	88

❌ Incorrect	✅ Correct
`WHERE value > AVG(column)`	`WHERE value > (SELECT AVG(column))`

Viewing 1 post (of 1 total)

You must be logged in to reply to this topic.

Additional menu

📌 Problem Context

🎯 Your Goal

🧾 Initial Attempt (Common Mistake)

❌ What Went Wrong?

1️⃣ Incorrect JOIN syntax

2️⃣ Using AVG() directly in WHERE

3️⃣ Typo in table name

🧠 Key Insight

✅ Correct Solution

🔍 Step-by-Step Explanation

1️⃣ Joining the Tables

2️⃣ Subquery for Average Spending

3️⃣ Subquery for Average Evaluation

4️⃣ Filtering the Results

📊 Example Output

⚡ Important Rule

🧩 Mental Model

🚀 Why This Matters

✅ Final Takeaway

2️⃣ Using `AVG()` directly in `WHERE`