› Forums › Web Development › HarvardX: CS50W – CS50’s Web Programming with Python and JavaScript › CS50W – Lecture 3 – Django › Why views.py Usually Does NOT Import urls.py in Django (Beginner Confusion Explained)
- This topic is empty.
-
AuthorPosts
-
May 23, 2026 at 9:38 am #6630
A beginner learning Kaggle’s Model Validation tutorial may naturally expect that:
val_predictions == val_ybecause both appear to represent the same thing:
- house prices
- dependent variable (
y) - target values
So many learners wonder:
If both contain house prices, then why are there two separate variables?
The Actual Difference
Although both relate to the same target variable, they represent completely different things:
Variable Meaning val_yActual correct prices from dataset val_predictionsPrices guessed by the machine learning model
The Initial Kaggle Code
In the Kaggle tutorial, learners typically see code like this:
# get predicted prices on validation data val_predictions = iowa_model.predict(val_X) # print the top few validation predictions print(val_predictions[:5]) # print the top few actual prices from validation data print(val_y.head())This often creates confusion because both outputs look like house prices.
What Actually Happens During Validation
Machine learning models are first trained using:
train_X train_yAfter training, the model is tested using validation data:
val_X val_yHere:
val_Xcontains house featuresval_ycontains the real house prices
Now the model is asked:
“Can you predict the prices for these houses?”
That prediction step happens here:
val_predictions = iowa_model.predict(val_X)
Important Beginner Insight
Think of:
val_yas the answer sheetval_predictionsas the student’s answers
The entire purpose of validation is to compare them.
Example
Suppose the real prices are:
val_yHouse Actual Price 1 200000 2 150000 3 300000 Now suppose the model predicts:
val_predictionsHouse Predicted Price 1 210000 2 140000 3 310000 Notice:
- they are close
- but not identical
That difference is the prediction error.
Why Validation Exists
Without validation, we would never know:
- whether the model predicts accurately
- how close predictions are to reality
- whether the model generalizes well
Validation compares:
predicted values vs actual values
Machine Learning Is About Approximation
A beginner often unconsciously assumes:
“If the model predicts house prices, then the predictions should automatically equal the real prices.”
But machine learning is actually:
an attempt to estimate unknown outputs as closely as possible.
If predictions were always identical to actual values:
- there would be no prediction problem
- there would be no uncertainty
- machine learning would not even be necessary
How Kaggle Measures Error
One common metric used in the tutorial is MAE (Mean Absolute Error):
MAE = \frac{1}{n}\sum_{i=1}^{n}|y_i-\hat{y}_i|
where:
y_i= actual values\hat{y}_i= predicted values
Smaller MAE means:
- predictions are closer to reality
- the model performs better
Another Important Observation
Notice the syntax difference:
print(val_predictions[:5]) print(val_y.head())Why not use
.head()for both?Because:
Variable Data Type val_predictionsNumPy array val_yPandas Series So:
-
- NumPy arrays commonly use slicing:
val_predictions[:5]
- NumPy arrays commonly use slicing:
[code lang=text]
<ul>
<li>Pandas objects commonly use:
[/code]val_y.head()
Core Takeaway
The important relationship is:
val_predictions = model guesses val_y = real answersMachine learning validation exists to compare those two things.
The closer they are:
- the better the model
- the smaller the prediction error
- the more reliable the model becomes
That comparison is one of the central ideas behind model validation in machine learning.
-
AuthorPosts
- You must be logged in to reply to this topic.
