Skip to content

pytest-repeated

Tests & Lint PyPI Downloads Monthly Downloads License

A statistical unit testing plugin for pytest. 📊

What is pytest-repeated?

pytest-repeated enables statistical unit testing by repeating tests multiple times and evaluating results using statistical methods. It's designed for testing code with non-deterministic outputs like:

  • 🤖 Large Language Model (LLM) outputs
  • 📊 Machine learning model predictions
  • 🎲 Data science algorithms with randomness
  • 🔀 Any code that produces random variables

Why Statistical Testing?

Originally, unit testing was completely deterministic because computer programs behaved deterministically. Today, with the rise of:

  • Data science algorithms (recommendation engines, etc.)
  • Machine learning models
  • Large Language Models (LLMs)

...computer outputs are random variables. Statistical testing has been part of QA processes in manufacturing for decades. It's time to incorporate it into software testing.

Looking for simple repetition?

If you just need to repeat tests without statistical evaluation, check out pytest-repeat. pytest-repeated is specifically for statistical testing where some failures are acceptable.

Three Testing Approaches

pytest-repeated offers three ways to evaluate repeated tests:

  1. Basic Usage - Simple threshold-based: "Pass if X out of Y tests succeed"
  2. Frequentist - Hypothesis testing with null hypothesis and confidence intervals
  3. Bayesian - Posterior probability that code meets success criteria

Choose the approach that best fits your team's statistical background and communication needs.

Quick Example

import pytest
import random

@pytest.mark.repeated(times=20, threshold=19)
def test_llm_output():
    response = call_llm("What is 2+2?")
    assert response.strip() == "4"  # May occasionally fail due to LLM randomness

This test runs 20 times and passes if at least 19 iterations succeed.

Key Features

Flexible evaluation: Basic threshold, frequentist, or Bayesian approaches 🛡️ Smart error handling: Stops on non-AssertionErrors (real bugs) 📊 Statistical rigor: Proper hypothesis testing and confidence intervals 🔍 Detailed reporting: Run-by-run results at high verbosity ⚡ CI/CD ready: Integrates seamlessly into pytest workflows

Next Steps