Using AI Coding Agents To Explore Parameter Spaces Without Losing Control

What Agents Are Naturally Good At

An AI coding agent given a simulation function will do something reasonable: look at the function signature, propose a set of parameter ranges, write a loop, and start suggesting how to analyze the output. That part works.

The instinct is correct. A parameter sweep is a mechanical thing — enumerate the space, call the function, collect results. An agent can construct that structure faster than a human writing it by hand, and it can propose ranges that cover the interesting region of the space without extensive manual thought.

Here is what that looks like with a structural analysis function:

● Python
import combinate as cb
import numpy as np

# agent-constructed sweep over a stress simulation
result = cb.sweep(
    simulate_stress,
    params={
        "load_n":    np.linspace(100.0, 5000.0, 10).tolist(),
        "thickness": [2.0, 4.0, 6.0, 8.0, 10.0],
        "material":  ["steel", "aluminum", "titanium"],
    },
)

print(result.describe())

Two hundred combinations, declarative parameter spec, structured result. The agent did not need to write a loop, manage concurrency, or decide how to store the outputs.

Where The Naive Version Breaks

The same infrastructure problems that bite individual engineers hit agentic workflows harder, not more softly.

An agent running unsupervised can generate ten parameter sweep attempts in the time a human would run one. If those sweeps are local Python loops:

Each one runs synchronously, blocking the agent from reasoning about results until it finishes
Results are in memory and disappear when the process ends
If the agent is restarted, or the session times out, there is no record of what ran
There is no cost accounting — the agent has no concept of how many compute-hours it is spending

The agent's productivity advantage flips. It generates more experiments, accumulates more invisible technical debt, and produces a pile of ephemeral results with no traceable record of which inputs produced which outputs.

The Right Split Of Responsibility

The correct division is not "agent controls everything" or "human controls everything." It is a specific split based on what each party is better positioned to handle:

The human authors the model. The simulation function — its assumptions, its physical or mathematical validity, its output contract — is not something an agent should be writing or modifying autonomously. It encodes domain knowledge and its correctness can't be verified by running it against more test inputs.

The agent specifies the parameter space. Given the function signature and some constraints, an agent can construct a reasonable parameter space: sensible ranges, log-scaling for parameters that span orders of magnitude, representative discrete values for categorical inputs. This is mechanical and mostly correct.

The platform executes and records. Every combination runs as an independent task with a durable sweep identifier. Failures are isolated and inspectable. The record exists independently of the session that submitted it.

The human interprets results that drive real decisions. The agent can surface the most interesting tasks, flag failures, and compute summary statistics. The decision about whether an output is good enough to act on belongs with the human.

What The Agent Gets Back

The SweepResult is designed to be agent-friendly. Everything is structured, nothing requires string parsing:

● Python
# inspect failures without manual log parsing
for task in result.failed_tasks:
    print(task.task_id, task.error_summary)

# iterate succeeded tasks and build an analysis frame
import pandas as pd

rows = [
    {**task.parameter_values, **task.inline_output}
    for task in result.succeeded_tasks
]
df = pd.DataFrame(rows)

The agent does not have to correlate results back to inputs — each task already carries both. It does not have to parse log output or handle partial failure — failed_tasks and succeeded_tasks are already separated.

Why Sweep IDs Matter Here Specifically

In a single-session human workflow, a sweep ID is a nice-to-have. In an agentic workflow, it is the primary mechanism for accountability.

● Python
print(result.sweep_id)
# → swp_01j8ktx3p2...

Every submission produces a stable identifier that:

Survives session boundaries. If the agent process restarts, the sweep that was already submitted continues running. The ID can be logged and referenced on the next startup.
Creates an audit trail. Which agent session submitted which sweep, with what parameters, at what time — all traceable from the ID.
Enables cost attribution. A sweep ID is the natural unit for compute cost tracking. You can see exactly what was spent on each exploration rather than looking at aggregate machine costs.

Without durable identifiers, agentic exploration generates invisible compute spend with no way to answer "what did we actually learn from that run?"

What Should Not Be Autonomous

Being explicit about the boundary:

The model function should not change autonomously. An agent that rewrites the simulation between sweeps based on intermediate results is not exploring a parameter space — it is changing the experiment mid-run. The results from before and after the change are not comparable.

Budget and quota decisions should require human approval. An agent can observe that a wider parameter range would improve coverage. It should not unilaterally decide to run a sweep ten times larger than what was originally scoped.

Results that drive production changes need human sign-off. A sweep that finds an optimal parameter set is an input to a decision, not the decision itself. The agent can surface the finding; a human should decide whether to act on it.

Combinate's execution model supports this division naturally: the agent submits, the platform executes within the plan's quota, and the result is a structured record that a human can inspect before any downstream action is taken.

Combinate is currently in private beta. If you are building agent-assisted simulation workflows in Python, join the beta list.