Skip to content

API Reference

Low-level API documentation for pytest-repeated internals and extension points.

Using mkdocstrings

This page will be automatically generated from the source code docstrings using mkdocstrings.

Plugin Module

pytest_repeated.plugin

bayes_one_sided_proportion_test(r, n, N, prior_successes=1, prior_failures=1, posterior_threshold_probability=0.95)

Perform a one-sided Bayesian test for a population proportion.

Uses Beta-Binomial conjugate prior to compute posterior probability that the true proportion p > r.

Parameters:

Name Type Description Default
r float

threshold proportion (0 <= r <= 1)

required
n int

number of successes observed (0 <= n <= N)

required
N int

total number of trials (N > 0)

required
prior_successes float

prior pseudo-count for successes (default 1, uninformative)

1
prior_failures float

prior pseudo-count for failures (default 1, uninformative)

1
posterior_threshold_probability float

credible threshold (default 0.95)

0.95

Returns:

Name Type Description
dict dict

{ "posterior_prob": float, # P(p > r | data) "passes": bool, # whether posterior_prob >= threshold "alpha": float, # posterior Beta parameter "beta": float, # posterior Beta parameter

dict

}

Source code in pytest_repeated/plugin.py
def bayes_one_sided_proportion_test(
    r,
    n,
    N,
    prior_successes=1,
    prior_failures=1,
    posterior_threshold_probability=0.95,
) -> dict:
    """Perform a one-sided Bayesian test for a population proportion.

    Uses Beta-Binomial conjugate prior to compute posterior probability
    that the true proportion p > r.

    Args:
        r (float): threshold proportion (0 <= r <= 1)
        n (int): number of successes observed (0 <= n <= N)
        N (int): total number of trials (N > 0)
        prior_successes (float): prior pseudo-count for successes (default 1, uninformative)
        prior_failures (float): prior pseudo-count for failures (default 1, uninformative)
        posterior_threshold_probability (float): credible threshold (default 0.95)

    Returns:
        dict: {
            "posterior_prob": float,  # P(p > r | data)
            "passes": bool,            # whether posterior_prob >= threshold
            "alpha": float,            # posterior Beta parameter
            "beta": float,             # posterior Beta parameter
        }
    """
    # Posterior parameters (Beta conjugate update)
    alpha = prior_successes + n
    beta = prior_failures + (N - n)

    # Posterior P(p > r) = 1 - F_Beta(r; alpha, beta)
    posterior_cdf = betainc(alpha, beta, 0, r, regularized=True)
    posterior_prob = float(1 - posterior_cdf)  # convert from mpmath to float

    return {
        "posterior_prob": posterior_prob,
        "passes": posterior_prob >= posterior_threshold_probability,
        "alpha": alpha,
        "beta": beta,
    }

one_sided_proportion_test(r, n, N, alpha=0.05)

Perform a one-sided hypothesis test for a population proportion.

Uses exact binomial test for small samples (N < 30 or Nr(1-r) < 10), and normal approximation for larger samples.

H0: p <= r H1: p > r

Parameters:

Name Type Description Default
r float

hypothesized proportion under null (0 <= r <= 1)

required
n int

number of successes observed (0 <= n <= N)

required
N int

total number of trials (N > 0)

required
alpha float

significance level for the test (default 0.05)

0.05

Returns:

Name Type Description
dict dict

{"p_value": float, "reject": bool, "p_hat": float, "method": str}

Raises:

Type Description
ValueError

if input parameters are out of bounds

Source code in pytest_repeated/plugin.py
def one_sided_proportion_test(r, n, N, alpha=0.05) -> dict:
    """Perform a one-sided hypothesis test for a population proportion.

    Uses exact binomial test for small samples (N < 30 or N*r*(1-r) < 10),
    and normal approximation for larger samples.

    H0: p <= r
    H1: p > r

    Args:
        r (float): hypothesized proportion under null (0 <= r <= 1)
        n (int): number of successes observed (0 <= n <= N)
        N (int): total number of trials (N > 0)
        alpha (float): significance level for the test (default 0.05)

    Returns:
        dict: {"p_value": float, "reject": bool, "p_hat": float, "method": str}

    Raises:
        ValueError: if input parameters are out of bounds
    """
    if not (0 <= r <= 1):
        raise ValueError("r must be in [0, 1]")
    if not (0 <= n <= N):
        raise ValueError("n must be in [0, N]")
    if N == 0:
        raise ValueError("N must be > 0")

    # Observed proportion
    p_hat = n / N

    # Handle edge cases for r = 0 or r = 1
    if r == 0:
        # H0: p <= 0 means any n>0 rejects immediately
        return {
            "p_value": 0.0 if n > 0 else 1.0,
            "reject": (n > 0),
            "p_hat": p_hat,
            "method": "exact",
        }
    if r == 1:
        # H0: p <= 1 is always true unless we need perfect success
        return {
            "p_value": 1.0 if n < N else 0.0,
            "reject": False,
            "p_hat": p_hat,
            "method": "exact",
        }

    # Decide which test to use based on sample size
    use_exact = N < 30 or N * r * (1 - r) < 10

    if use_exact:
        # Exact binomial test
        # P-value = P(X >= n | X ~ Binomial(N, r))
        # = sum_{k=n}^{N} C(N,k) * r^k * (1-r)^(N-k)

        # Use log probabilities to avoid overflow
        from math import comb, exp, log

        # Compute log(P(X = k)) for k from n to N
        log_probs = []
        for k in range(n, N + 1):
            # log(C(N,k) * r^k * (1-r)^(N-k))
            log_prob = log(comb(N, k)) + k * log(r) + (N - k) * log(1 - r)
            log_probs.append(log_prob)

        # Use log-sum-exp trick for numerical stability
        max_log_prob = max(log_probs)
        p_value = sum(exp(lp - max_log_prob) for lp in log_probs) * exp(
            max_log_prob
        )

        # Clamp to [0, 1] due to numerical errors
        p_value = max(0.0, min(1.0, p_value))

        return {
            "p_value": p_value,
            "reject": p_value < alpha,
            "p_hat": p_hat,
            "method": "exact_binomial",
        }
    else:
        # Normal approximation with continuity correction
        # Standard error from null proportion r
        se = math.sqrt(r * (1 - r) / N)

        # Apply continuity correction: use (n - 0.5) instead of n
        # This gives better approximation for discrete -> continuous
        Z = ((n - 0.5) / N - r) / se

        # One-sided p-value: P(Z >= z)
        # For right-tailed test: P(Z >= z) = 1 - Φ(z) = 0.5 * erfc(z/sqrt(2))
        p_value = 0.5 * math.erfc(Z / math.sqrt(2))

        return {
            "p_value": p_value,
            "reject": p_value < alpha,
            "p_hat": p_hat,
            "method": "normal_approximation",
        }

pytest_report_teststatus(report, config)

Customize terminal output for repeated tests.

Source code in pytest_repeated/plugin.py
def pytest_report_teststatus(report, config):
    """Customize terminal output for repeated tests."""
    if hasattr(report, "_repeated_summary") and report.when == "call":
        passes, total = report._repeated_summary

        # short progress character: use '+' for full pass, '.' otherwise
        short = "+" if passes == total else "."

        # verbose string shown in -v/-vv
        if hasattr(report, "_posterior_prob"):
            posterior_prob = report._posterior_prob
            success_rate_threshold = getattr(
                report, "_success_rate_threshold", 0.5
            )
            verbose = (
                f"PASSED (P(p>{success_rate_threshold}|tests)={posterior_prob:.3f})"
                if report.outcome == "passed"
                else f"FAILED (P(p>{success_rate_threshold}|tests)={posterior_prob:.3f})"
            )
        elif hasattr(report, "_p_value"):
            p_value = report._p_value
            verbose = (
                f"PASSED (p={p_value:.3f})"
                if report.outcome == "passed"
                else f"FAILED (p={p_value:.3f})"
            )
        else:
            verbose = (
                f"PASSED ({passes}/{total})"
                if report.outcome == "passed"
                else f"FAILED ({passes}/{total})"
            )

        # Return correct tuple shape
        return (report.outcome, short, verbose)
    return None  # use default

pytest_runtest_logreport(report)

Print detailed run results for -vvv.

Source code in pytest_repeated/plugin.py
@pytest.hookimpl(hookwrapper=True)
def pytest_runtest_logreport(report):
    """Print detailed run results for -vvv."""
    _ = yield
    if hasattr(report, "_repeated_summary") and report.when == "call":
        if hasattr(report, "config"):
            config = report.config
            verbosity = config.option.verbose
            if verbosity >= 3 and hasattr(report, "_repeated_run_details"):
                tw = config.get_terminal_writer()
                tw.line()
                tw.sep("-", "repeated details")
                tw.line("Run-by-run results:")
                for run_num, status, error in report._repeated_run_details:
                    if status == "PASS":
                        tw.line(f"  Run {run_num}: {status}")
                    else:
                        # At -vvv, show full error without truncation
                        tw.line(f"  Run {run_num}: {status} - {error}")

Key Functions

pytest_runtest_call

Handles test repetition and error detection. This is the core hook that runs tests multiple times.

pytest_runtest_makereport

Applies statistical evaluation (basic/frequentist/Bayesian) to determine overall test pass/fail.

pytest_runtest_logreport

Displays run-by-run results at verbosity level 3 (-vvv).

Extension Points

pytest-repeated integrates with pytest's hook system. If you're extending or modifying the plugin, these are the key hooks:

  • pytest_configure: Plugin registration and configuration
  • pytest_runtest_protocol: Test execution protocol override
  • pytest_runtest_call: Individual test run execution
  • pytest_runtest_makereport: Test result reporting
  • pytest_runtest_logreport: Logging and verbosity handling

Statistical Methods

Wilson Score Interval (Frequentist)

pytest-repeated uses the Wilson score interval for constructing confidence intervals in frequentist testing. This method provides better coverage than the normal approximation, especially for small sample sizes.

Beta-Binomial Model (Bayesian)

The Bayesian approach uses a Beta-Binomial conjugate prior model:

  • Prior: Beta(α, β) distribution over success rate θ
  • Likelihood: Binomial(n, θ) for observed successes
  • Posterior: Beta(α + successes, β + failures)

The test evaluates P(θ ≥ threshold | data) using the posterior CDF.

Next Steps