› Forums › Python › CS50’s Introduction to Artificial Intelligence with Python › Why Does the CS50 Degrees Project Use "movies": set()? Understanding Empty Data Structures in Python
- This topic is empty.
-
AuthorPosts
-
May 22, 2026 at 1:50 am #6621
A beginner working through the CS50 Degrees project may quickly understand lines like:
"name": row["name"], "birth": row["birth"]because the values clearly come from the CSV file.
However, this line often feels confusing at first:
"movies": set()The reason is that unlike name and birth, the movies field does not directly exist inside people.csv.
Understanding why this empty set is created helps learners understand how programs prepare data structures for future relationships.
Q1. What is the original code?
people[row["id"]] = { "name": row["name"], "birth": row["birth"], "movies": set() }This code appears inside the load_data() function in the CS50 Degrees project.
Q2. What does the CSV file contain?
Suppose people.csv contains:
id,name,birth 102,Kevin Bacon,1958The CSV file contains only:
- id
- name
- birth
Notice that there is no movies column.
Q3. What does one CSV row become in Python?
When Python reads the CSV file using csv.DictReader(), the row becomes:
row = { "id": "102", "name": "Kevin Bacon", "birth": "1958" }This is why these lines feel intuitive:
row["name"] row["birth"]because those keys already exist inside the CSV row.
Q4. Why does
row["movies"]not exist?The learner may expect something like:
row["movies"]However, this key does not exist because the CSV file never contained a movies column.
The CSV row only includes:
- id
- name
- birth
So Python cannot retrieve movie information directly from this file.
Q5. Why does the program create
"movies": set()?The program knows that each actor will later be connected to multiple movies.
So even though the movie information is not available yet, the program prepares an empty container in advance.
This line:
"movies": set()essentially means:
“Create an empty collection where this actor’s movie IDs will be stored later.”
Q6. What does the actor dictionary look like initially?
After processing the CSV row, Python creates:
people["102"] = { "name": "Kevin Bacon", "birth": "1958", "movies": set() }At this stage:
- the actor’s name is known
- the birth year is known
- the movie collection is still empty
Q7. Why create empty containers before data exists?
This is a very common programming pattern.
Programs often:
- create structures first
- fill them gradually later
Real-world analogy:
student = { "name": "Alice", "subjects": [] }Even if the subjects are not known yet, the program prepares an empty list in advance.
Q8. Where are movie IDs added later?
Later in the CS50 Degrees project, another CSV file called stars.csv connects actors to movies.
Then this code runs:
people[row["person_id"]]["movies"].add(row["movie_id"])Suppose:
row["person_id"] = "102" row["movie_id"] = "104257"Then Python executes:
people["102"]["movies"].add("104257")
Q9. What does the structure look like after movie IDs are added?
Initially:
"movies": set()Later:
people["102"] = { "name": "Kevin Bacon", "birth": "1958", "movies": {"104257"} }Eventually an actor may have many movie IDs:
"movies": {"104257", "112233", "998877"}
Q10. Why use
set()instead of a list?A beginner may wonder why Python uses:
set()instead of:
[]The reason is that sets automatically prevent duplicates.
Example:
movies = set() movies.add("104257") movies.add("104257") print(movies)Output:
{"104257"}Even though the same movie was added twice, the set stores only one copy.
Q11. Why are sets useful here?
Sets help:
- avoid duplicate movie entries
- improve lookup performance
- maintain cleaner graph relationships
This becomes important later when the program performs graph traversal using Breadth-First Search.
Q12. Why does this line initially feel cryptic?
This line feels different because:
"name": row["name"]copies existing CSV data directly.
But:
"movies": set()does not copy CSV data.
Instead, it creates a brand new empty structure for future relationships.
That is a more advanced programming concept.
Q13. What is the biggest conceptual takeaway?
Programs often prepare empty containers before all related data becomes available.
This line:
"movies": set()is essentially saying:
“This actor will eventually be connected to many movies, so prepare an empty collection now.”
This pattern appears frequently in:
- graph algorithms
- database systems
- web applications
- machine learning pipelines
- large software architectures
Understanding this idea helps learners better understand how programs model complex relationships using data structures.
-
AuthorPosts
- You must be logged in to reply to this topic.
