Core Concepts¶
ShinkaEvolve works best when the task boundary is explicit: one initial program, one evaluator, one results directory, and a search loop that mutates code while preserving correctness.
Task Contract¶
Every runnable task centers on two files:
| File | Purpose |
|---|---|
initial.<ext> |
Seed program with mutable code regions |
evaluate.py |
Evaluation harness that executes candidates, validates outputs, returns metrics |
At minimum, the evaluator surfaces a combined_score and whether the candidate
is functionally correct. This creates a stable interface for both the Hydra
launcher and the task-directory CLI.
Evolution Loop¶
Each generation follows the same loop:
- Sample a parent or inspiration context from the database and archive
- Ask one or more LLMs to propose a patch or full candidate program
- Validate and materialize the candidate
- Run the evaluator and collect metrics
- Persist the result into the program database
- Update archive state, island state, and optional meta-memory
The async runner overlaps proposal generation and evaluation to keep worker slots busy when LLM sampling is slower than the evaluator.
Runtime Layers¶
The runtime splits into three major configuration domains:
| Domain | Controls |
|---|---|
EvolutionConfig |
Mutation behavior, model selection, budgets, prompt evolution, async proposal controls |
DatabaseConfig |
Archive sizing, island behavior, migration, parent selection, archive ranking |
JobConfig variants |
Where and how candidate programs are executed |
See Configuration for field-level reference.
Archives and Islands¶
The database stores both the current population and derived state used to guide future sampling:
- Global archive of high-value programs
- Island-local populations to preserve diversity
- Migration settings for cross-island transfer
- Parent and inspiration selection strategies
This mechanism lets Shinka balance local exploitation with broader search over qualitatively different candidate programs.
Prompt Co-Evolution¶
Shinka can evolve not only programs but also the system prompts that generate them:
- System-prompt archive
- Prompt mutation operators
- Prompt sampling controls
- Prompt fitness tracking over time
Use prompt co-evolution when task quality depends heavily on mutation style or when different prompts unlock distinct improvement modes.
Execution Modes¶
| Mode | Description |
|---|---|
| Local current interpreter | Default; runs in active Python env |
| Local sourced environment | Via activate_script |
| SLURM + Conda | Conda-backed cluster environments |
| SLURM + Docker | Container-backed cluster jobs |
This keeps the search loop decoupled from the execution environment of the candidate code.
Results and Inspection¶
Each run writes artifacts into a results directory:
- Persisted program records and metrics
- Candidate code snapshots and diffs
- Timing metadata
- Prompt-evolution artifacts when enabled
Inspect results through the built-in WebUI, notebooks in the examples directory, or custom post-processing scripts.