- This topic is empty.
-
AuthorPosts
-
April 8, 2026 at 1:47 pm #6345
If you’ve ever wondered:
«🤔 “Why does “GROUP BY” work even though “SELECT” comes first?”»
This post will give you a crystal-clear understanding using the Moneyball dataset from Harvard University’s CS50 SQL.
🎯 The Problem We Solved
We wanted to compute:
«💰 Average player salary by year»
Using:
SELECT year, ROUND(AVG(salary), 2) AS “average salary”
FROM salaries
GROUP BY year
ORDER BY year DESC;
⚠️ The Confusing Part
At first glance, SQL looks like it runs top to bottom:
- SELECT
- FROM
- GROUP BY
But that’s not true.
🧠 Actual SQL Execution Order
SQL internally executes in this order:
- FROM → get the data
- WHERE → filter rows
- GROUP BY → create groups
- HAVING → filter groups
- SELECT → compute output
- ORDER BY → sort results
🔍 Let’s See It with Sample Data
📋 Raw Table (“salaries”)
player_id| year| salary
1| 2001| 500000
2| 2001| 1000000
3| 2001| 1500000
1| 2000| 400000
2| 2000| 800000
3| 2000| 1200000
🧩 Step-by-Step Execution
1️⃣ FROM
SQL loads all rows from “salaries”
2️⃣ GROUP BY year
Rows are grouped:
2001 →
- 500000
- 1000000
- 1500000
2000 →
- 400000
- 800000
- 1200000
3️⃣ SELECT + AVG()
Now SQL computes:
- 2001 → 1000000
- 2000 → 800000
4️⃣ ORDER BY
Final result sorted:
year| average salary
2001| 1000000.00
2000| 800000.00
⚡ Key Insight
«💡 “GROUP BY” happens BEFORE “SELECT” is calculated»
That’s why this works:
SELECT year, AVG(salary)
FROM salaries
GROUP BY year;
🚫 Common Mistake
Trying to use aggregation in “WHERE”:
SELECT year, AVG(salary)
FROM salaries
WHERE AVG(salary) > 500000; ❌👉 This fails because:
- “WHERE” runs before grouping
- At that point, “AVG()” doesn’t exist yet
✅ Correct Approach → HAVING
SELECT year, AVG(salary)
FROM salaries
GROUP BY year
HAVING AVG(salary) > 500000;👉 “HAVING” filters after aggregation
🧠 Mental Model (Remember This!)
Think of SQL like a pipeline:
«📥 FROM → 🔍 WHERE → 🧩 GROUP → 🧮 CALCULATE → 📤 SELECT → 📊 SORT»
⚾ Moneyball Insight
In the real-world strategy popularized by Moneyball:
- Teams didn’t just look at players
- They analyzed aggregated performance metrics
- They made decisions based on data patterns
👉 Exactly what you’re doing with “GROUP BY”
🚀 What You Learned Today
- ✔ SQL execution order ≠ written order
- ✔ “GROUP BY” runs before “SELECT”
- ✔ Aggregates require grouping
- ✔ Use “HAVING” instead of “WHERE” for aggregates
🏁 Final Takeaway
Whenever you write SQL, ask yourself:
«🔍 “At what stage does this operation happen?”»
That single question will prevent most SQL mistakes.
🔥 What to Try Next
- 💰 Average salary per team
- ⚾ Salary per home run
- 📊 Best performance under budget
-
AuthorPosts
- You must be logged in to reply to this topic.
