Skip to main content
Resource · Free

Python Autograder Survival Guide

You wrote the code. You tested it on your laptop. It worked. You submitted to Gradescope. Six out of eight tests failed and the error messages are unhelpful. This is the most common Python homework problem of 2026, and most of the time it has nothing to do with you not understanding Python.

The autograder is testing edge cases your assignment brief never mentioned: empty inputs, unicode characters, weird output format rules, environment differences between your machine and the grading server. This guide walks through the 15 specific edge cases that catch the most student submissions, with a code example for each. Read it before you submit, run the pre-submission checklist at the end, and you avoid most of the failures that students bring to us every week.

Python 3.10+ 15 edge cases Pre-submission checklist
Why this happens

Why your code passes locally and fails the autograder

Four reasons cover almost every failure.

Hidden tests

The autograder runs test cases you do not see. The brief shows you one example input. The hidden tests check empty inputs, single-element inputs, unicode strings, inputs at the maximum size, and inputs designed to break common student approaches. Gradescope, GitHub Classroom, CodePost, AutoGrader, and HackerRank all work this way.

Output diff strictness

The autograder compares your output to the expected output byte by byte. A trailing newline, an extra space, a different capitalization, a comma where the expected output has none, all cause a test failure even when the answer is correct.

Environment differences

Your laptop runs Python 3.11 on macOS. The autograder runs Python 3.9 on Ubuntu in a Docker container with a 768MB memory limit and a 10-minute timeout. Library versions differ. File paths work differently. Standard input and output are redirected, which changes how input() and print() behave.

Time and memory limits

Your laptop has 16GB of RAM and patient testing. The autograder times out at 10 minutes by default and kills any process that exceeds 768MB. An O(n^2) solution that finishes in 30 seconds on a 1,000-element test case takes 50 minutes on a 100,000-element hidden test case, and the autograder kills it.

Every one of the 15 edge cases below maps to one of these four reasons.

Group A of 4

Group A. Input edge cases (1 to 4)

These are the inputs the autograder probably tests that your brief never mentioned.

1. Empty input

Your function works on a list of 5 numbers. The autograder also passes it an empty list. Your code crashes or returns something weird.

The broken version

def average(numbers):
    total = sum(numbers)
    return total / len(numbers)

Pass [] to this function and you get ZeroDivisionError: division by zero. The autograder shows a red X and the message is unhelpful.

The fix

def average(numbers):
    if not numbers:
        return 0
    total = sum(numbers)
    return total / len(numbers)

Depending on the rubric, raise a clear exception instead of returning 0. Both are defensible. Returning a silent 0 is fine for stats homework. Raising ValueError is better for production-style code.

Why students miss this. Empty input feels like a degenerate case nobody puts in a real test. Autograders almost always test it.

2. None values inside the input

The autograder passes a list like [3, None, 7, None, 12] to your function that expects numbers.

The broken version

def total_score(scores):
    return sum(scores)

TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'

The fix

def total_score(scores):
    return sum(s for s in scores if s is not None)

If the rubric says None values are missing data and excluded from the sum, this is the fix. If they count as zero, replace with sum(s if s is not None else 0 for s in scores).

Why students miss this. The brief shows clean example data. Real datasets and autograder hidden tests have None values everywhere.

3. Single-element input

Your code uses a loop that compares each element to the next one. The autograder passes a list with one element.

The broken version

def is_sorted(numbers):
    for i in range(len(numbers)):
        if numbers[i] > numbers[i + 1]:
            return False
    return True

Pass [5] and you get IndexError: list index out of range. The off-by-one is obvious in hindsight. The autograder catches it the first time you submit.

The fix

def is_sorted(numbers):
    for i in range(len(numbers) - 1):
        if numbers[i] > numbers[i + 1]:
            return False
    return True

Why students miss this. A list of one is obvious to humans. The autograder treats it as a real test case.

4. Unicode and special characters in strings

Your string manipulation works on plain ASCII text. The autograder passes "naïve résumé" or an emoji or a Chinese character.

The broken version

def first_letter(word):
    return word[0].upper()

Looks fine on "hello". Pass an accented character or emoji and .upper() does not do what you expect.

The fix

import unicodedata

def first_letter(word):
    normalized = unicodedata.normalize('NFC', word)
    if not normalized:
        return ''
    return normalized[0].upper()

For most coursework, use str.casefold() instead of .upper() for case-insensitive comparison, and normalize unicode with unicodedata.normalize('NFC', text) before manipulating strings.

Why students miss this. Course examples are usually English ASCII. Real-world test cases include international characters.

Group B of 4

Group B. Numerical edge cases (5 to 8)

These break code that does math.

5. Off-by-one indexing

The most famous bug in programming. Your loop runs one too many or one too few times.

The broken version

def last_n_items(items, n):
    result = []
    for i in range(len(items) - n, len(items) + 1):
        result.append(items[i])
    return result

len(items) + 1 is one past the end. The loop hits IndexError on the last iteration.

The fix

def last_n_items(items, n):
    if n >= len(items):
        return items[:]
    return items[len(items) - n:]

Use slicing where possible. Slicing handles the boundaries automatically.

Why students miss this. Python is zero-indexed. Math homework is one-indexed. The brief says "return the last 3 elements" and the student starts the slice at index 3 instead of len - 3.

6. Integer vs float type mismatch

The autograder expects an integer. Your code returns a float.

The broken version

def count_items(data):
    return len(data) / 1

len(data) / 1 returns a float in Python 3. The autograder expects 5, your code returns 5.0, the byte-by-byte string comparison fails.

The fix

def count_items(data):
    return len(data)

If division is required, use integer division // instead of true division /.

Why students miss this. In Python 3, the / operator always returns a float, even when both operands are integers.

7. Division by zero

Your function divides two numbers the autograder controls. The autograder passes zero.

The broken version

def percentage(part, whole):
    return (part / whole) * 100

whole = 0 crashes with ZeroDivisionError.

The fix

def percentage(part, whole):
    if whole == 0:
        return 0
    return (part / whole) * 100

Decide based on the rubric whether the function returns 0, returns None, or raises an exception when the denominator is zero. All three are defensible.

Why students miss this. Division by zero feels like an edge case briefs typically flag in advance. Autograders flag nothing in advance.

8. Floating point precision

Your function computes 0.1 + 0.2 and compares the result to 0.3. The comparison returns False.

The broken version

def equals_three_tenths(value):
    return value == 0.3

Pass 0.1 + 0.2 and you get False because 0.1 + 0.2 is actually 0.30000000000000004 in IEEE 754.

The fix

import math

def equals_three_tenths(value):
    return math.isclose(value, 0.3, rel_tol=1e-9)

Use math.isclose() for any float comparison. Never use == on floats unless you know the values were computed by exact integer arithmetic.

Why students miss this. Algebra says 0.1 + 0.2 = 0.3. Computer science says it does not. The autograder uses computer science.

Group C of 4

Group C. Output format edge cases (9 to 11)

These are the ones that frustrate students most because the answer is right and the test still fails.

9. Trailing newline

The expected output ends with a newline. Your output does not. The autograder compares byte by byte and fails the test.

The broken version

print("Hello", end="")

You suppressed the newline because you did not want extra space. The autograder expects the newline.

The fix

print("Hello")

Use print() without overriding the end parameter. Unix convention is for text output to end with a newline. Autograders follow that convention.

Why students miss this. They see "Hello\n" in the expected output and think the \n is literal text.

10. Capitalization mismatch

Expected output is "Pass". Your output is "PASS" or "pass". The byte-by-byte diff fails.

The broken version

def grade(score):
    if score >= 60:
        return "PASS"
    return "FAIL"

The rubric says return "Pass" or "Fail". Your code returns "PASS" or "FAIL". The autograder fails the test even though the logic is correct.

The fix

def grade(score):
    if score >= 60:
        return "Pass"
    return "Fail"

Match the rubric exactly. Capitalize what the rubric capitalizes. Lowercase what the rubric lowercases.

Why students miss this. Their brain reads "Pass" and "PASS" as the same word. The autograder compares them as different strings.

11. Decimal precision in output

The expected output is "3.14". Your code returns "3.141592653589793". The autograder fails the test.

The broken version

import math

def circle_area(radius):
    return math.pi * radius ** 2

For radius=1, this returns 3.141592653589793. The rubric expected 3.14.

The fix

import math

def circle_area(radius):
    return round(math.pi * radius ** 2, 2)

Or format the output explicitly:

def circle_area_formatted(radius):
    return f"{math.pi * radius ** 2:.2f}"

The rubric tells you how many decimal places. Match the rubric exactly.

Why students miss this. The rubric mentions decimal places once at the top of the assignment. The student reads past it.

Group D of 4

Group D. Environment edge cases (12 to 15)

These have nothing to do with your code logic. They have to do with the difference between your laptop and the autograder Docker container.

12. Wrong file name

The autograder expects assignment.py. You submitted assignment_v3_final.py. The autograder cannot find the file and reports a ModuleNotFoundError.

The fix

Name the file exactly what the brief says. Case matters on Linux. Assignment.py and assignment.py are different files on the autograder server even though they look the same on macOS or Windows.

# Wrong, the autograder runs on Linux which is case-sensitive
Assignment.py
my_solution.py
hw1_final_v2_REAL.py

# Right, exactly what the brief asked for
assignment.py

Check the brief before you submit. Many briefs explicitly state the required file name. The instructor has set the autograder to import that exact name.

Why students miss this. They rename files during development. They forget to rename back before submitting.

13. EOFError when the autograder pipes empty input

Your code uses input() to read from the user. The autograder pipes empty input. Your input() call raises EOFError.

The broken version

def main():
    name = input("Enter your name: ")
    print(f"Hello, {name}")

main()

This works on your laptop because you type something when prompted. The autograder pipes nothing and input() raises EOFError: EOF when reading a line.

The fix

def main():
    try:
        name = input()
    except EOFError:
        name = ""
    if name:
        print(f"Hello, {name}")
    else:
        print("Hello, World")

main()

Two principles. First, do not include a prompt string in input("Enter your name: "). The prompt becomes part of your output and the autograder fails the byte-by-byte diff. Second, handle the EOFError case by giving a sensible default.

For assignments where the brief says the function takes one argument, restructure as a function the autograder can call directly:

def greet(name):
    if not name:
        return "Hello, World"
    return f"Hello, {name}"

Why students miss this. They develop the code interactively. The autograder runs it non-interactively.

14. Recursion depth exceeded

Your recursive function works on small inputs. The autograder passes a large input and your code hits RecursionError: maximum recursion depth exceeded.

The broken version

def factorial(n):
    if n <= 1:
        return 1
    return n * factorial(n - 1)

Works for factorial(100). The autograder passes factorial(2000) and Python hits its default recursion limit (1,000).

The fix

def factorial(n):
    result = 1
    for i in range(2, n + 1):
        result *= i
    return result

If the assignment specifically requires recursion (some algorithms courses do), raise the recursion limit at the top of your file with sys.setrecursionlimit(10000). Use this only when the rubric expects recursion.

Why students miss this. They test on inputs of size 10 or 100. The hidden tests use inputs of size 1,000 or 10,000.

15. Memory limit exceeded

Your code builds a list of all intermediate results. The autograder passes a large input and your process gets killed with no error message, just "Killed" or a timeout.

The broken version

def all_pairs(numbers):
    pairs = []
    for i in range(len(numbers)):
        for j in range(i + 1, len(numbers)):
            pairs.append((numbers[i], numbers[j]))
    return pairs

For numbers of length 100,000, this builds 5 billion pairs. The autograder kills the process when it exceeds 768MB.

The fix

def all_pairs(numbers):
    for i in range(len(numbers)):
        for j in range(i + 1, len(numbers)):
            yield (numbers[i], numbers[j])

A generator does not store the full list in memory. It yields one pair at a time. The caller iterates and processes each pair without holding all of them at once.

If the assignment requires returning a list (the rubric says return list of tuples), the assignment expects a smaller input. If your code is being killed at the memory limit, your algorithm is using more memory than the input warrants. Look for an O(n) or O(n log n) approach instead of O(n^2).

Why students miss this. Their laptop has 16GB of RAM and tolerates an inefficient algorithm. The autograder has 768MB and does not.

Pre-submission checklist

The pre-submission checklist

Run this before every submission. Five minutes spent here saves hours of resubmission cycles.

File and naming

  • The file name matches exactly what the brief requires, case included.
  • The function names match exactly what the brief requires, case included.
  • The file contains no print() statements outside the required output (debug prints are removed).
  • The file contains no extra input() prompt strings (use bare input() if input is required).

Function signatures

  • Each function takes exactly the parameters the brief specifies. No extra parameters added.
  • Each function returns the expected type (string vs int vs list vs None) per the rubric.
  • The number of decimal places in numeric output matches the rubric.

Edge case handling

  • Empty input does not crash the function.
  • A single-element input does not crash the function.
  • None values in the input are handled or filtered out.
  • Division operations are guarded against zero denominators.
  • Float comparisons use math.isclose() instead of ==.
  • Off-by-one boundaries on slices and loops are checked.

Output format

  • Output ends with a newline (use plain print(), not print(..., end="")).
  • Capitalization matches the rubric exactly.
  • No leading or trailing whitespace inside output strings unless the rubric specifies.
  • Number formatting (decimal places, scientific notation, currency) matches the rubric.

Performance

  • The solution handles the maximum input size from the brief without hitting timeout.
  • Memory usage stays under 768MB for the largest expected input.
  • Recursion depth stays under 1,000 levels (or sys.setrecursionlimit is set if recursion is required).

Submission

  • The submitted file imports cleanly when you run python -c "import your_filename" locally.
  • The submitted file runs to completion in under 10 minutes on the largest test case you can construct.
  • All test cases you wrote locally pass before you upload.
Escalation

When self-fixing is not enough

Some autograder failures are easy to identify and fix. Off-by-one errors, missing newlines, wrong capitalization. The 15 edge cases above cover most of them.

Other failures resist easy diagnosis. The error message is generic ("test failed"), the test name is opaque ("test_q3_hidden_2"), and the gap between what the autograder expects and what your code produces is not obvious. After two or three failed submissions, every additional submission attempt costs more time than it saves.

That is the point where sending the code and the autograder output to a working Python developer is faster than continuing to guess. Our Python Code Rescue service covers exactly this scenario. Send the code that passes locally, the autograder output that fails, and the assignment brief. A human Python expert reads the failing test patterns, identifies which edge cases the code does not handle, and fixes the specific test cases that broke the submission.

Python assignment help FAQ

The questions students ask most, answered straight.

Why does my code work on my laptop but fail the autograder?
The four common reasons: hidden test cases that your laptop testing did not cover, output format strictness on the autograder that your laptop did not enforce, environment differences (Python version, library version, operating system), and resource limits (memory, timeout) that your laptop does not impose.
Why does the autograder fail my test when the output looks identical?
Look for invisible characters. A trailing newline, a trailing space, a tab versus four spaces, a non-breaking space copied from a PDF. Run repr() on your output and on the expected output and compare character by character. The difference is almost always whitespace.
My submission gets "Killed" with no error message. What does that mean?
The autograder ran out of memory. Default Gradescope memory limit is 768MB per submission. Switch to a generator, process the input in chunks, or use a more memory-efficient algorithm.
My code times out on the autograder. How do I fix it?
Three usual causes: an infinite loop or unbounded recursion, an inefficient algorithm (O(n^2) on large inputs), or a blocking call (an input() waiting for input that never arrives).
The autograder expects None but I am returning 0. Why does it matter?
None is a distinct value in Python, not a placeholder. The autograder uses assertEqual(result, None) or assertIsNone(result). Both fail when the function returns 0 because 0 == None is False in Python.
Is it safe to keep print statements in my submission for debugging?
No. Remove them all. Print statements add extra output to stdout, and the autograder compares stdout byte by byte. Use breakpoint() and the Python debugger instead, or write your debug output to sys.stderr (which the autograder usually ignores) using print("debug", file=sys.stderr).
What if my code passes locally on the same input the autograder uses?
The input is probably not the same. The autograder may include trailing whitespace, different line endings (Windows CRLF vs Unix LF), or unicode characters that look identical to ASCII characters but encode differently. Save the autograder input to a file and compare byte by byte.
My function passes the visible tests but fails the hidden ones. How do I prepare for hidden tests?
Construct your own hidden tests. For every visible test the autograder shows, write three or four variations: an empty input, a single-element input, the maximum size input, an input with values at the boundaries.

Still stuck after the checklist?

Send the brief, the code, and the autograder output. A working Python developer reads the failing test patterns and fixes the specific cases that broke the submission.