Skip to main content

Modules and Packages

Organizing and reusing code effectively

A module is one Python file. A package is a directory of modules that Python treats as a single unit. Together they let you split a 600-line script across files, share helpers between assignments, and install third-party code like requests or pandas without copying source. Every multi-file assignment at CS50P, DATA 100, and CSE 163 expects you to know this layout, the import rules around it, and the virtual-environment workflow that keeps versions clean.

Writing my_module.py and importing it in 3 steps

A module is created the moment you save a `.py` file. Put a function or class in `my_module.py`, place it next to the file that needs it, and import. Three import forms cover almost every case: `import my_module` (whole module under one name), `from my_module import greet` (one name in your namespace), and `from my_module import greet as g` (renamed).

The import system looks up names in **sys.path**, the list of directories Python searches. The current working directory is on `sys.path` by default, which is why two files in the same folder import each other. When Python imports a module, it runs the file top to bottom once, then caches the result in `sys.modules` so a second import is free.

Star imports like `from my_module import *` pull every public name into the current namespace. Avoid them in coursework. They make the source of any function ambiguous, hide name collisions, and trigger lint warnings on most graders.

Example

                      
                        # File: my_module.py
def greet(name):
    return f"Hello, {name}!"

PI = 3.14159


# File: main.py
import my_module
from my_module import PI

print(my_module.greet("Alice"))  # Hello, Alice!
print(PI)                        # 3.14159

# Renamed import for clarity in long names
import my_module as mm
print(mm.greet("Bob"))           # Hello, Bob!
                      
                    

Packages and the role of __init__.py

A package is a directory containing an `__init__.py` file. The presence of that file tells Python "treat this folder as importable". The file can be empty, can re-export selected names, or can run package-wide setup. Submodules inside the package are reached with dot notation: `my_package.module1`.

The classic layout below scales to assignments with 6 or 8 files. Put related helpers in one package, expose the public API through `__init__.py` using `from .module1 import useful_thing`, and import them as `from my_package import useful_thing`. The dot before `module1` is a relative import, which works only inside a package.

Since Python 3.3, **namespace packages** allow a directory without `__init__.py` to act as a package. They are useful for plugin systems where multiple installations contribute submodules to one logical name. For coursework, keep `__init__.py` present even if empty. Graders that pre-Python-3.3 scripts run against expect it.

Example

                      
                        # Layout
# my_package/
#     __init__.py
#     stats.py
#     plotting.py
#
# File: my_package/__init__.py
from .stats import mean, median

# File: my_package/stats.py
def mean(values):
    return sum(values) / len(values)

def median(values):
    s = sorted(values)
    n = len(s)
    return s[n // 2] if n % 2 else (s[n // 2 - 1] + s[n // 2]) / 2

# File: main.py
from my_package import mean, median
print(mean([1, 2, 3, 4]))    # 2.5
print(median([1, 2, 3, 4]))  # 2.5
                      
                    

Why if __name__ == "__main__" matters

Every module has a built-in attribute called **__name__**. When the file runs as a script (`python my_file.py`), `__name__` equals the string `"__main__"`. When the same file is imported by another script, `__name__` equals the module name (`"my_file"`). The guard `if __name__ == "__main__":` lets you put both library code and a test driver in one file.

Without the guard, importing the file runs its test code as a side effect, which prints output to stderr and slows down imports. With the guard, the heavy block runs only on direct execution. Use it in every script that defines functions you want importable later. Course graders that import student code as a module rely on this pattern.

Example

                      
                        # File: calc.py
def add(a, b):
    return a + b

def multiply(a, b):
    return a * b


def _demo():
    # only runs when executed directly
    print(add(2, 3))       # 5
    print(multiply(4, 5))  # 20


if __name__ == "__main__":
    _demo()

# python calc.py        -> runs _demo()
# import calc in REPL   -> defines add and multiply only
                      
                    

Virtual environments with python -m venv

A virtual environment is an isolated Python installation tied to one project. It prevents the requests version one assignment needs from breaking the one a different assignment expects. The standard library ships **venv**, so no extra install is required.

Create the environment with `python -m venv .venv`. Activate it with `source .venv/bin/activate` on macOS and Linux or `.venv\Scripts\activate` on Windows. The shell prompt gains a `(.venv)` prefix, and `which python` now points inside the project. Deactivate with `deactivate`.

Inside the activated environment, `pip install pandas` writes to the project's `.venv/` folder, never to the system Python. Capture the installed set with `pip freeze > requirements.txt`. A teammate or grader reproduces the environment with `pip install -r requirements.txt`. Most CS50P and DATA 100 graders accept either a `requirements.txt` or the modern `pyproject.toml` for dependency declaration.

Example

                      
                        # Terminal, not Python. Run from the project root.
#
# 1. Create the environment in a hidden folder
# python -m venv .venv
#
# 2. Activate it
# source .venv/bin/activate          (macOS / Linux)
# .venv\Scripts\activate              (Windows)
#
# 3. Install what the assignment needs
# pip install requests pandas
#
# 4. Freeze the exact versions
# pip freeze > requirements.txt
#
# 5. Anyone else reproduces it with
# pip install -r requirements.txt

print("See comments. venv lives at .venv/, activated per shell.")
                      
                    

Relative versus absolute imports inside a package

Absolute imports name the full path from the project root: `from my_package.stats import mean`. Relative imports use dots to mean "from this package" (`.`) or "from the parent package" (`..`). Relative imports work only inside a package, never in a top-level script.

The Python style guide (PEP 8) recommends absolute imports for clarity. Use relative imports inside a package when the absolute path is long or when the package may be renamed. Mixing both in one file is fine if you stay consistent within an import block.

One gotcha: running a submodule directly with `python my_package/stats.py` breaks relative imports because the file runs as a top-level script. Use `python -m my_package.stats` instead, which executes the module while keeping the package context. This is the same flag pattern used by `python -m venv` and `python -m pytest`.

Example

                      
                        # Inside package my_package/

# File: my_package/stats.py
def mean(values):
    return sum(values) / len(values)

# File: my_package/report.py
# Absolute import (preferred)
from my_package.stats import mean

# Relative import (also valid inside the package)
# from .stats import mean

def summary(values):
    return f"Mean: {mean(values)}"


# Run from the project root with:
# python -m my_package.report   (works, keeps package context)
# NOT: python my_package/report.py  (breaks relative imports)
                      
                    

Controlling exports with __all__ and pyproject.toml

The list **__all__** at the top of a module declares which names are exported when someone runs `from my_module import *`. It is also the hint a documentation tool uses to decide what counts as public API. Without `__all__`, the star import grabs every name that does not start with an underscore. With it, you get an exact whitelist.

For distributable packages, **pyproject.toml** is the modern standard. It replaces `setup.py` and combines metadata, dependencies, and build configuration in one TOML file. A minimal `pyproject.toml` declares the package name, version, and a build backend like setuptools or hatchling. Coursework rarely requires you to publish to PyPI, but understanding `pyproject.toml` matters for any open-source contribution and for capstone projects at CSE 163 and CS229.

Pin versions in dependency files. `pandas==2.2.1` reproduces exactly. `pandas>=2.0` accepts any newer release, which can break compatibility silently. Course graders that auto-test against Pandas 1.5 fail on 2.x because of the deprecation of `DataFrame.append`. Always check the assignment brief for the expected version.

Example

                      
                        # File: my_module.py

__all__ = ["mean", "median"]  # only these go in star imports

def mean(values):
    return sum(values) / len(values)

def median(values):
    s = sorted(values)
    n = len(s)
    return s[n // 2] if n % 2 else (s[n // 2 - 1] + s[n // 2]) / 2

def _internal_helper(x):
    # leading underscore: not in star imports anyway
    return x * 2


# In another file:
# from my_module import *
# mean and median are imported; _internal_helper is not.

print(mean([1, 2, 3]))    # 2.0
print(median([1, 2, 3]))  # 2
                      
                    

Common pitfalls

ModuleNotFoundError when running a script that imports a sibling file. The directory is not on sys.path because you ran python from somewhere else.

Run python from the project root, or add the project to PYTHONPATH. For packages, use python -m my_package.module instead of python my_package/module.py.

Circular import error: module A imports B at the top, B imports A at the top, and Python halts mid-load. Both files are half-defined when the second import fires.

Move one import inside the function that uses it (lazy import), or extract the shared code into a third module that both can import without cycles.

Installing packages globally with sudo pip install. Pollutes the system Python and breaks other projects on the same machine.

Create a venv per project with python -m venv .venv, activate it, then pip install. The system Python stays clean.

Committing .venv/ to git. Inflates the repo to hundreds of MB of binary wheels that vary by OS and Python version.

Add .venv/ to .gitignore and commit requirements.txt or pyproject.toml instead. The next contributor recreates the environment from the lockfile.

Top-level print statements that fire on every import because the script lacks an if __name__ == "__main__" guard.

Wrap script-only code under the guard. Imported modules then expose functions silently, and the test driver runs only on direct execution.

Relative import ImportError when you run a submodule directly. Python loses the package context because the file becomes the top-level script.

Run python -m my_package.submodule from the project root. The -m flag preserves the package, so relative imports resolve correctly.

When to use modules and packages

Split into modules once a single file passes 200 lines or holds 3 distinct responsibilities. Use a package the moment two modules want to share a private helper, and add a virtual environment for any assignment that installs even one third-party library like requests, pandas, or numpy.

Need Help?

Having trouble with this topic on an assignment? Our Python developers ship working code plus a walkthrough that helps you explain the code in class.