Skip to main content
Originality, not evasion · Academic integrity first

How to Write Original Python Code You Can Defend in Viva

This page explains how MOSS, JPlag, Codequiry, Gradescope, and Turnitin work, why honest code still flags 18% to 22% similar on standard imports, and how to write Python that passes both a similarity scan and a live in-class discussion. The goal is genuine originality plus oral defence, not evasion of detection on copied work. If your plan is to submit code you cannot explain line by line, close this tab now.

5 detection tools explained 5-step originality workflow 10 sample in-class questions
Section 1

How Python code plagiarism detection works

Five tools cover roughly 95% of university submissions. Each one targets a different layer: tokens, AST, web sources, pairwise diffs, and prose. Knowing the mechanism is the first step to writing code that survives them.

MOSS (Stanford)

Measure of Software Similarity · Used by 700+ universities for CS101, CS50P, CS106A, DATA 100, and CSE 163 submissions.

Tokenizes each source file, drops comments and whitespace, fingerprints k-gram sequences with a winnowing algorithm, and reports pairs of files that share long matched substrings. Output: a percentage similarity score plus a side-by-side diff view of matched regions.

JPlag

Karlsruhe Institute of Technology · European universities, AlgoLab competitions, and several US CS departments. Supports Python, Java, C++, Kotlin, Scala, and 15 other languages.

Builds an abstract syntax tree (AST) for each submission, runs Greedy String Tiling on the token stream, then clusters submissions by structural similarity. Catches renamed variables, reordered functions, and superficial reformatting that MOSS sometimes misses.

Codequiry

commercial plagiarism scanner · Adopted by bootcamps and some online masters programs. Integrates with Canvas and Moodle.

Combines token sequence matching with a web-search step that compares submissions against GitHub, GitLab, Stack Overflow, and Pastebin. The web check is what catches copy-paste from a public repo even when no two students share code.

Gradescope similarity tools

built-in to the Gradescope autograder · Standard at UC Berkeley, CMU, UPenn, and 200+ other universities. DATA 100 and 6.100L use it every semester.

Pairwise comparison of all submissions in the same assignment. Reports the top 5% of suspicious pairs to the instructor with a token-level diff. Distinct from the autograder itself, which only measures correctness.

Turnitin

global text-similarity scanner · Standard for written reports, lab writeups, and any prose submission attached to a coding assignment.

Designed for natural-language prose. Compares the submission against a corpus of academic papers, websites, and prior student submissions. Useful on the report half of a project, not the code half.

Section 2

Why honest code still flags as similar

A 20% similarity score is the de facto tolerance for Python coursework, because the language and the curriculum produce shared patterns by design. Five categories of overlap appear on legitimate, original submissions every semester.

Standard library imports

Every numpy assignment starts with import numpy as np. Every pandas notebook starts with import pandas as pd. Hundreds of submissions share the first 4 lines.

Common single-letter variables

i, j, k for loop indices, x, y for coordinates, n for size, df for DataFrame. Convention is universal across textbooks and lectures.

Standard idioms

for i in range(n), if __name__ == "__main__":, with open(path) as f, and dict/list comprehensions look identical across solutions because the language only has one good way to write them.

Required function signatures

When the rubric specifies def compute_grade(scores: list[int]) -> float, every submission has the same signature. The shared interface is mandatory, not plagiarism.

Test scaffolding from the starter file

Many courses distribute a starter file with constants, helper functions, and test stubs. The unchanged scaffolding flags as identical, but it is supplied code, not student code.

The DMPH refund policy codifies the 20% tolerance in the Source Code and the 20% Similarity Rule section. Reports above 20% on Python source receive a full investigation; reports at or under 20% with no specific structurally unique match fall inside the accepted band for the niche.

Section 3

What actually flags as plagiarism

Five signals separate copied code from coincidence. Each one comes from a documented MOSS or JPlag instructor case study, and each one is invisible to the student who copied without thinking about structure.

Verbatim algorithm body

The structurally unique part of a Dijkstra implementation, copied character for character, including comment placement.

Identical variable names on non-conventional variables

Both submissions name a graph adjacency map weighted_adj_dict and a frontier set frontier_unvisited. Conventional names like adj or queue would be fine; these are not.

Identical comment text on identical lines

Same TODO note in the same dead branch. Same misspelling. Same author voice in the comments.

Independent functions in the same order

Five helper functions appear in the same order in both files, with no rubric reason for the ordering. The probability of that under independent authorship is small.

Identical whitespace and dead-code remnants

Both files include a commented-out print statement at line 87. Both have a trailing blank line after the last function. Both leave the same import unused.

A concrete before and after

Two students implement the same CS50P assignment: write a function that returns the longest word in a sentence. The first solution looks copied; the second does not. The algorithm is identical. The signals are different.

Flags
# Verbatim variable names from the textbook example,
# same dead branch, same comment.
def longest_word(s):
    words = s.split()  # split on whitespace
    longest = ""
    for word in words:
        if len(word) > len(longest):
            longest = word
    # TODO: handle ties
    return longest
Original
def longest_word(sentence: str) -> str:
    tokens = sentence.split()
    if not tokens:
        return ""
    return max(tokens, key=len)

The second version has a type annotation, an explicit empty-input branch, and a built-in max(key=len) call. None of those are tricks. They are choices a student makes after thinking about the problem for 5 minutes.

Section 4

The 5-step originality workflow

Run every assignment through these 5 steps. The order matters. Step 2 produces the pseudocode that step 3 translates; skip step 2 and step 3 falls back to copying.

  1. 1

    Read the brief twice and write it in your own words

    Open the assignment PDF or LMS page and read it end to end without writing code. Then write a 6-line plain-English summary of what the program does, what the inputs are, what the outputs are, and which rubric criteria carry the most points. If you cannot summarize it, you cannot implement it.

  2. 2

    Sketch pseudocode in your own variable names

    Write the algorithm as numbered steps in English. Pick names that mean something to you. If the rubric specifies a signature like compute_grade(scores), keep that, but invent every other name yourself. Pseudocode lives in a separate notebook page, never copy-pasted from a reference.

  3. 3

    Write code from the pseudocode, not from a reference

    Close every Stack Overflow tab. Close ChatGPT. Close the textbook. Translate your own pseudocode to Python line by line. The translation step is where you internalize the algorithm; doing it with a reference open shortcuts the learning and produces code you cannot defend.

  4. 4

    Run tests and verify against the rubric

    Run the provided autograder. Compare actual output against expected output for the 5 example inputs in the brief. Add your own edge cases (empty input, single element, all duplicates, very large input). Every failing test surfaces a fix, every passing test confirms one rubric line.

  5. 5

    Explain every line out loud

    Walk through the file top to bottom and explain what each line does, why it is there, and what would change if you removed it. Record yourself on phone audio if helpful. Any line you cannot explain becomes a TODO: rewrite this until you can. The explanation step is the actual in-class rehearsal.

Section 5

The in-class explanation test

Many courses now run a 10-minute class discussion after submission. The teaching assistant picks 5 lines at random and asks why this approach. If you cannot explain in plain English, the code is not yours yet. Five questions show up over and over.

Why a dict here instead of a list?

You should be able to name the operation that motivated the choice. Dict for O(1) key lookup, list for ordered iteration, set for membership tests at scale. If "it just felt right" is the only answer, the data structure choice was not yours.

Why this loop bound?

Why range(len(arr)) and not range(len(arr) - 1)? Why is the comparison strict (<) and not non-strict (<=)? Off-by-one questions reveal whether the student wrote the loop or copied it.

What happens if the input is empty?

Trace the function on an empty list. Does it raise, return a default, return None? If the behaviour is undefined, the test coverage is thin and the author probably did not write the edge case.

Why this exception type?

A KeyError caught and re-raised as a ValueError changes the contract of the function. The student should explain why the caller cares about the conversion, or admit that the catch was decorative.

Walk me through this comprehension as a regular for loop

Comprehensions and generator expressions are the easiest place to spot copied code, because translating one back to a loop in real time requires understanding it. Slow walkthroughs reveal authorship.

The 5-line rule

Pick 5 random lines from your own submission right now. For each one, answer in 30 seconds: what does this line do, why did I pick this approach, and what would break if I removed it? If 2 of 5 stump you, rewrite those sections from your own pseudocode before submitting. The class discussion is the same test, just with a TA on the other end of the table.

Section 6

Tools you can use, tools you cannot

Six tools and the line each one draws. The honest rule: a tool that helps you learn is fine; a tool that substitutes for your authorship is not. Most academic integrity policies are clearer than students assume, and most violations come from skipping the rubric, not from genuine confusion.

IDE autocomplete

Always fine. Tab-completion of method names, parameter hints, and import suggestions is not authorship transfer.

Official documentation

Always fine. Reading docs.python.org, pandas.pydata.org, and the standard library reference is required for any non-trivial program.

Stack Overflow for syntax recall

Fine if you read the answer, close the tab, then write the line from memory. Not fine if you copy a 20-line answer block. When you copy a recognisable idiom verbatim, cite the URL in a comment.

AI assistants for explanation

Fine for asking what a built-in does, why a traceback says what it says, or how a concept like list slicing works. Not fine for generating a function and submitting the output. Most universities ban submission of AI-generated code, with or without disclosure.

Past assignments and solution keys

Not fine. Even when a friend offers an old submission, looking at it then writing your own is a documented integrity violation under most honour codes. MOSS catches the pattern transfer easily.

Pair programming with a classmate

Fine if your course permits it and you cite the partnership. Not fine if the course brief specifies individual work. Read the rubric on collaboration before splitting any keyboard time.

Section 7

How DMPH handles originality

DoMyPythonHomework operates as a paid Python developer service for university students. The position on originality is explicit. Every delivery includes the four artefacts below.

  • MOSS and JPlag reports on request. Run before delivery against the student's course corpus when the cohort is small enough to be useful. Default outcome sits inside the 20% band documented in the refund policy 20% similarity rule.
  • No code reuse across students. The same assignment ordered by two students in the same course produces two structurally different submissions. The internal pairing check runs MOSS against the prior 90 days of deliveries.
  • Named author defence walkthrough. Every delivery names the developer (Samuel P., Priya M., Daniel K., and 6 others) and ships with 2 to 3 sample TA questions and answers per file. The walkthrough is what the academic integrity section on the homepage describes as oral-defence preparation.
  • Version match and stylistic variance. Code is written to the version in the brief (Python 3.10 to 3.12, pandas 1.5 to 2.2, Django 3.2 to 5.0) and the developer rotates between 4 idiomatic styles, so two submissions from the same course never look like a template.

The service exists to help students who already know the problem and need a tested, documented reference implementation they can study and defend. Submission of any deliverable as if it were the student's own original work, without the study and class discussion preparation that the walkthrough enables, lands the student outside the academic integrity policy of the course. That risk sits with the student. The refund policy is explicit on the acceptance and use clause.

Need original Python code you can actually defend?

Send the brief, the rubric, and your course's Python version. A named developer ships tested code, a MOSS-aware similarity check, and a 2 to 3 question in-class walkthrough so you can explain every line. Pay 50% to start, 50% after the code runs on your data and the walkthrough makes sense.