Validate Before You Scale — What to Check Before a Large Cloud Sweep

Local validation before a cloud sweep matters for two distinct reasons, and they get conflated.

The first is fatal flaws — bugs that affect every task or a large fraction of them. A typo in a unit conversion, a return type that isn't a dict, a closure over a variable that doesn't exist in the outer scope, an import that quietly works locally because of an installed dev package but fails inside a clean function call. These don't make it to the cloud as failures so much as they make it to the cloud as wasted submission. You spend setup time, push the package, kick off 500 tasks, and watch all 500 fail with the same error.

The second is correlated regional failures — bugs that only trigger in a specific corner of parameter space. Cloud sweeps isolate task failures, so the run completes and you get partial results, but the failures cluster geometrically: every configuration with fin_count > 20 crashes, every configuration where two parameters interact a certain way returns NaN. The sweep finishes, you pay for it, and the result has a structured hole exactly where you most wanted data. A 1000-task sweep that returns 850 successes and 150 failures all clustered in one corner is not a failed sweep — but it is an incomplete one, and the incompleteness is correlated with the geometry you were trying to explore.

local_sweep catches both classes. The fix is not to run more manual tests against hand-picked configurations. It is to sample the actual parameter space locally before submitting to the cloud, using the same expansion logic the cloud will use.

A worked example: heat sink design sweep

Suppose you are sizing an extruded aluminum heat sink for a forced-air application. You want to sweep fin geometry and airflow to find the lowest thermal resistance for a fixed 80mm base width. Here is a first-pass model:

● Python
import math

def heat_sink_resistance(
    fin_count: int,
    fin_height_mm: float,
    fin_thickness_mm: float,
    airflow_m_s: float,
) -> dict:
    BASE_WIDTH_MM = 80.0
    K_AL = 205.0  # W/m-K, aluminum

    spacing_mm = (BASE_WIDTH_MM - fin_count * fin_thickness_mm) / (fin_count - 1)
    h = 12.0 * (airflow_m_s ** 0.6) * ((spacing_mm / 5.0) ** 0.3)

    t_m = fin_thickness_mm / 1000.0
    L_m = fin_height_mm / 1000.0
    m = math.sqrt(2 * h / (K_AL * t_m))
    efficiency = math.tanh(m * L_m) / (m * L_m)

    fin_area_m2 = fin_count * 2 * L_m * (BASE_WIDTH_MM / 1000.0)
    R = 1.0 / (h * fin_area_m2 * efficiency)
    return {"thermal_resistance_C_per_W": round(R, 4)}

You want to run a 500-point Sobol sweep across the design space. Before submitting to the cloud, run it locally with 40 samples:

● Python
from combinate import local_sweep

result = local_sweep(
    heat_sink_resistance,
    params={
        "fin_count": {"type": "range", "min": 1, "max": 25},
        "fin_height_mm": {"type": "range", "min": 10, "max": 60},
        "fin_thickness_mm": {"type": "range", "min": 0.8, "max": 4.0},
        "airflow_m_s": {"type": "range", "min": 0.5, "max": 8.0},
    },
    sampling_spec={"method": "random", "sampler": "sobol", "samples": 40, "seed": 42},
    max_tasks=40,
)
print(result.describe())

Round 1: a divide-by-zero you never tested for

● Output
40 tasks: 36 succeeded, 4 failed

Four failures in 40 — but isolated, so the run finished. Inspect them:

● Python
for task in result.failed_tasks:
    print(task.parameter_values, "->", task.error_summary)

● Output
{'fin_count': 1, ...} -> ZeroDivisionError: float division by zero
{'fin_count': 1, ...} -> ZeroDivisionError: float division by zero
{'fin_count': 1, ...} -> ZeroDivisionError: float division by zero
{'fin_count': 1, ...} -> ZeroDivisionError: float division by zero

The Sobol sequence sampled fin_count=1 four times. The spacing formula divides by (fin_count - 1). A single fin is a real configuration — it just doesn't have spacing — but the function assumes at least two fins. Two fixes are reasonable: either change the parameter floor to 2, or special-case the single-fin geometry. You change the floor to min: 2 because a single-fin heat sink is not actually in the design space you care about.

Round 2: a math domain error in a corner of the space

Re-run with fin_count floor at 2:

● Output
40 tasks: 31 succeeded, 9 failed

Worse. What happened?

● Python
for task in result.failed_tasks[:3]:
    print(task.parameter_values, "->", task.error_summary)

● Output
{'fin_count': 22, 'fin_thickness_mm': 3.7, ...} -> ValueError: math domain error
{'fin_count': 19, 'fin_thickness_mm': 3.9, ...} -> ValueError: math domain error
{'fin_count': 24, 'fin_thickness_mm': 3.8, ...} -> ValueError: math domain error

All failures cluster in the high-fin-count, high-fin-thickness corner. When fin_count * fin_thickness_mm > 80mm, spacing goes negative, and (negative_spacing / 5.0) ** 0.3 raises a domain error. This is exactly the kind of correlated gap the opener described — if you had skipped local validation and run 500 cloud tasks, roughly 100 would have failed in this same corner, leaving a hole in your Pareto frontier exactly where high-density fin designs live.

The fix is geometric infeasibility handling. First attempt:

● Python
def heat_sink_resistance(fin_count, fin_height_mm, fin_thickness_mm, airflow_m_s):
    spacing_mm = (80.0 - fin_count * fin_thickness_mm) / (fin_count - 1)
    if spacing_mm < 0.5:
        return None
    # ... rest unchanged

Round 3: returning None breaks the result schema

Re-run:

● Output
40 tasks: 31 succeeded, 9 failed

Same numbers, but inspection shows a different error:

● Output
{'fin_count': 22, ...} -> TypeError: function did not return a dict

Returning None for infeasible configurations counted as failures, not successes. That is technically correct — the task did not produce a result — but it pollutes failed_tasks with configurations that are not bugs, just out-of-bounds geometry. You want infeasible configs to succeed cleanly with a flag so downstream analysis can filter them.

● Python
def heat_sink_resistance(fin_count, fin_height_mm, fin_thickness_mm, airflow_m_s):
    spacing_mm = (80.0 - fin_count * fin_thickness_mm) / (fin_count - 1)
    if spacing_mm < 0.5:
        return {
            "thermal_resistance_C_per_W": float("nan"),
            "valid": False,
            "reason": "geometric_infeasible",
        }
    # ... rest unchanged, plus "valid": True in success return

Using float("nan") rather than None for the resistance field keeps the column dtype as float in pandas, which avoids a separate class of analysis bugs later.

Round 4: clean run, sanity-check the distribution

● Output
40 tasks: 40 succeeded, 0 failed

Now look at what came back:

● Python
import pandas as pd

rows = [
    {**task.parameter_values, **task.inline_output}
    for task in result.succeeded_tasks
]
df = pd.DataFrame(rows)
print(df["valid"].value_counts())
print(df[df["valid"]]["thermal_resistance_C_per_W"].describe())

● Output
True     31
False     9

count    31.000000
mean      0.847
std       0.412
min       0.231
max       1.943

About 25% of the design space is geometrically infeasible — that's expected given the bounds. The valid configurations span thermal resistances from 0.23 to 1.94 °C/W, which matches the rough range you'd estimate for a heat sink this size. Variance is healthy across all four input parameters. The distribution looks sane.

You are ready to submit the full 500-sample sweep to the cloud.

The upgrade is one word

● Python
from combinate import sweep, CombinateConfig

result = sweep(                    # was: local_sweep
    heat_sink_resistance,
    params={
        "fin_count": {"type": "range", "min": 2, "max": 25},
        "fin_height_mm": {"type": "range", "min": 10, "max": 60},
        "fin_thickness_mm": {"type": "range", "min": 0.8, "max": 4.0},
        "airflow_m_s": {"type": "range", "min": 0.5, "max": 8.0},
    },
    sampling_spec={"method": "random", "sampler": "sobol", "samples": 500, "seed": 42},
    config=CombinateConfig(project_id="your-project-id"),
)

Same function. Same params. Same sampling spec. The only addition is config. The SweepResult you get back has the same shape as the local one — succeeded_tasks, failed_tasks, task.parameter_values, task.inline_output — so any analysis code you wrote against the local result runs unchanged on the cloud result.

Three rounds of local iteration eliminated three classes of failure — a divide-by-zero at a parameter boundary, a math domain error in a correlated corner, and a return-type contract mismatch — that would otherwise have shown up at scale, costing real cloud time and producing results with structured holes.

The things local_sweep does not catch

Being honest about the limits: local_sweep runs your function in-process with no isolation. A few things it will not catch:

Import failures on the worker image. If your function imports a package that is not installed in the cloud worker environment, that will fail in the cloud but not locally.
Memory or time limits. Cloud tasks run in containers with capped memory and a wall-clock timeout. A function that runs fine locally for 2 seconds might hit the container limit at scale if the parameter combination triggers a slower code path.
Serialization issues. The cloud path pickles your function and parameter set to ship it to the worker. If your function closes over an object that cannot be pickled, that fails at submission time, not at execution time — and local_sweep will not trigger it because it calls the function directly.

None of these are reasons to skip local validation. They are reasons to think of local_sweep as the first of two checks, not a replacement for a small real cloud run before the big one.

A reasonable workflow: local sweep with 50 samples, then cloud sweep with 25 samples as a paid smoke test, then cloud sweep at full scale. The local step is free and catches function-level issues. The small cloud step catches environment and serialization issues. The full run then has a much higher probability of succeeding cleanly.