2026-04-28

SciPy QMC Is Great Locally. What Do You Add When You Need Persistence And Governance?

SciPy's quasi-random samplers generate better parameter coverage than grid or uniform random search. What they do not give you is execution, persistence, or any record of what ran. Here is what you add when that starts to matter.

What SciPy QMC Does Well

scipy.stats.qmc is a genuinely good module. Halton sequences, scrambled Sobol sequences, and Latin hypercube sampling all produce better parameter space coverage than either a grid or uniform random draws — the points are more evenly spread, less prone to clumping, and require fewer samples to achieve a given coverage density.

For a three-parameter model with bounded continuous inputs, the setup is straightforward:

Python
from scipy.stats import qmc
import numpy as np

sampler = qmc.Halton(d=3, scramble=True)
sample = sampler.random(n=200)

# scale to your parameter ranges
l_bounds = [1e-3, 0.1, 0.0]
u_bounds = [10.0, 5.0, 1.0]
scaled = qmc.scale(sample, l_bounds, u_bounds)

# scaled is a (200, 3) array — alpha, beta, gamma values

Two hundred joint samples, well distributed. The math here is correct and the ergonomics are reasonable.

What You Still Have To Build

That (200, 3) array is data. It is not a sweep.

Getting from that array to actual model outputs means writing everything that turns a list of parameter combinations into a collection of executed, recorded results:

Execution. You need to call your model for each of the 200 rows. A loop works. If the model takes five seconds per evaluation, that is 16 minutes single-threaded. If you parallelize, you now own subprocess management, exception propagation, and result ordering.

Failure handling. If row 147 throws an exception, a naive loop stops or silently drops the result depending on how you wrote it. You need to decide: catch and continue, or let it propagate? Either way, you write the logic.

Persistence. Your results are in memory. Restart the kernel — results are gone. Close the terminal — results are gone. If you want the outputs to outlive the process that ran them, you write them to disk, to a database, or to some timestamp-prefixed results directory that accumulates over the course of a project.

Traceability. Each result needs to be mapped back to the parameter combination that produced it. If you are building a DataFrame after the fact, this is bookkeeping you manage manually. If the sweep ran in parallel, you need to ensure ordering is correct.

Retrieval. Six months from now, or when a teammate asks what parameters produced that particular output, you need a way to look up the record. That means either you built a lookup system at the start or you are searching log files.

None of this is hard in isolation. Across a project, across a team, it accumulates.

What Combinate Does Instead

Combinate is not a QMC replacement. Its Halton, Sobol, and Latin hypercube sampling is powered by scipy.stats.qmc under the hood — the same library, the same engines. The point is not which tool generates the samples. The point is what happens after.

The equivalent sweep submitted through Combinate looks like this:

Python
import combinate as cb

def simulate(alpha: float, beta: float, gamma: float) -> dict:
    # your model — nothing changes here
    ...

result = cb.sweep(
    simulate,
    params={
        "alpha": {"type": "range", "min": 1e-3, "max": 10.0},
        "beta":  {"type": "range", "min": 0.1,  "max": 5.0},
        "gamma": {"type": "range", "min": 0.0,  "max": 1.0},
    },
    sampling_spec={
        "method": "random",
        "sampler": "halton",
        "samples": 200,
        "seed": 42,
    },
)

print(result.describe())

The parameter space is specified as bounds rather than a pre-generated array. The sampling happens inside the platform with the same Halton sequence you would get from SciPy. The execution fans out across cloud workers.

What This Changes In Practice

Each combination is an independent task. A failure at parameter combination 147 does not abort the remaining 199. Every task has a status, and failed tasks are inspectable without affecting the rest of the sweep:

Python
for task in result.failed_tasks:
    print(task.task_id, task.error_summary)

The sweep has a durable identifier. Every submission gets a sweep_id that exists independently of the notebook or process that submitted it. The record persists after the kernel restarts:

Python
print(result.sweep_id)
# → swp_01j8ktx3p2...

You can retrieve the same result from a different machine, share it with a teammate, or reference it in a bug report. The sweep is a record, not an in-memory object.

Outputs come back as structured Python. You do not have to correlate results back to inputs manually — each task carries its parameter values and outputs together:

Python
import pandas as pd

rows = [
    {**task.parameter_values, **task.inline_output}
    for task in result.succeeded_tasks
]
df = pd.DataFrame(rows)

No manual bookkeeping. No alignment logic. The structure is there because it was designed in, not bolted on after.

Where SciPy QMC Still Belongs

Being direct: if you are using SciPy QMC today, you do not have to stop. There are cases where it is still the right tool to reach for:

Custom distributions. scipy.stats.qmc composes with other SciPy distributions cleanly. If your parameters are log-normally distributed, come from a truncated distribution, or are correlated through a copula, SciPy gives you the transformation machinery. Combinate's range spec handles linear and log-scaled uniform bounds — more complex distributions are not in scope.

Validation and visualization outside the sweep. Generating a reference QMC sample to plot coverage, compare sampling strategies, or check against analytic expectations is a different task from running a sweep. SciPy is the right tool for that analysis.

QMC-enhanced Monte Carlo integration. Using quasi-random points to estimate integrals is a specific numerical technique that is distinct from parameter sweeps entirely. That is not what Combinate is for.

If your goal is to run a parameter study, record every result durably, and retrieve it later as structured data — that is where Combinate fits and where a SciPy array by itself leaves you doing infrastructure work.


Combinate is currently in private beta. If you are running parameter studies in Python and want structured cloud execution, join the beta list.

← All posts Join the Private Beta