Skip to content

Core Runtime API

Primary runtime objects for constructing and running an evolution loop from Python.


EvolutionConfig

Controls mutation behavior, model selection, budgets, prompt evolution, and async proposal targeting.

EvolutionConfig dataclass

EvolutionConfig(
    task_sys_msg: Optional[str] = DEFAULT_TASK_SYS_MSG,
    patch_types: List[str] = default_patch_types(),
    patch_type_probs: List[float] = default_patch_type_probs(),
    num_generations: int = 50,
    max_patch_resamples: int = 3,
    max_patch_attempts: int = 1,
    job_type: str = "local",
    language: str = "python",
    llm_models: List[str] = default_llm_models(),
    llm_dynamic_selection: Optional[Union[str, BanditBase]] = "ucb",
    llm_dynamic_selection_kwargs: dict = default_llm_dynamic_selection_kwargs(),
    llm_kwargs: dict = default_llm_kwargs(),
    meta_rec_interval: Optional[int] = 10,
    meta_llm_models: Optional[List[str]] = None,
    meta_llm_kwargs: dict = (lambda: {})(),
    meta_max_recommendations: int = 5,
    sample_single_meta_rec: bool = True,
    embedding_model: Optional[str] = "text-embedding-3-small",
    init_program_path: Optional[str] = "initial.py",
    results_dir: Optional[str] = None,
    max_novelty_attempts: int = 3,
    code_embed_sim_threshold: float = 0.99,
    novelty_llm_models: Optional[List[str]] = None,
    novelty_llm_kwargs: dict = (lambda: {})(),
    use_text_feedback: bool = False,
    max_api_costs: Optional[float] = None,
    inspiration_sort_order: str = "ascending",
    enable_controlled_oversubscription: bool = True,
    proposal_target_mode: str = "adaptive",
    proposal_target_min_samples: int = 5,
    proposal_target_ratio_cap: float = 2.0,
    proposal_buffer_max: int = 2,
    proposal_target_hard_cap: Optional[int] = None,
    proposal_target_ewma_alpha: float = 0.3,
    evolve_prompts: bool = False,
    prompt_patch_types: List[str] = default_prompt_patch_types(),
    prompt_patch_type_probs: List[float] = default_prompt_patch_type_probs(),
    prompt_evolution_interval: Optional[int] = None,
    prompt_archive_size: int = 10,
    prompt_llm_models: Optional[List[str]] = None,
    prompt_llm_kwargs: dict = (lambda: {})(),
    prompt_ucb_exploration_constant: float = 1.0,
    prompt_epsilon: float = 0.1,
    prompt_evo_top_k_programs: int = 3,
    prompt_percentile_recompute_interval: int = 20,
)

ShinkaEvolveRunner

Main async runtime. Coordinates proposal generation, evaluation submission, persistence, and side-effect handling.

ShinkaEvolveRunner

ShinkaEvolveRunner(
    evo_config: EvolutionConfig,
    job_config: JobConfig,
    db_config: DatabaseConfig,
    banner_style: BannerStyle = "full",
    verbose: bool = True,
    max_evaluation_jobs: int = 2,
    max_proposal_jobs: int = 1,
    max_db_workers: int = 4,
    debug: bool = False,
    init_program_str: Optional[str] = None,
    evaluate_str: Optional[str] = None,
)

Fully async evolution runner with concurrent proposal generation.

Initialize async evolution runner.

Parameters:

Name Type Description Default
evo_config EvolutionConfig

Evolution configuration

required
job_config JobConfig

Job configuration

required
db_config DatabaseConfig

Database configuration

required
verbose bool

Enable verbose logging

True
max_evaluation_jobs int

Maximum concurrent evaluation jobs (defaults to 2)

2
max_proposal_jobs int

Maximum concurrent proposal generation tasks (defaults to 1)

1
max_db_workers int

Maximum concurrent async DB worker threads (defaults to 4)

4
init_program_str Optional[str]

Optional string content for initial program (will be saved to results dir and path updated in evo_config)

None
evaluate_str Optional[str]

Optional string content for evaluate script (will be saved to results dir and path updated in job_config)

None

run

run()

Synchronous convenience wrapper for script/CLI usage.


run_shinka_eval

Helper for evaluators: standard way to execute candidate programs and aggregate metrics.

run_shinka_eval

run_shinka_eval(
    program_path: str,
    results_dir: str,
    experiment_fn_name: str,
    num_runs: int,
    get_experiment_kwargs: Optional[Callable[[int], Dict[str, Any]]] = None,
    aggregate_metrics_fn: Optional[Callable[[List[Any]], Dict[str, Any]]] = None,
    validate_fn: Optional[Callable[[Any], Tuple[bool, Optional[str]]]] = None,
    plotting_fn: Optional[Callable[[Any], List[Any]]] = None,
    default_metrics_on_error: Optional[Dict[str, Any]] = None,
    early_stop_method: Optional[Union[str, EarlyStopMethod]] = None,
    early_stop_threshold: Optional[float] = None,
    early_stop_score_fn: Optional[Callable[[Any], float]] = None,
    early_stop_kwargs: Optional[Dict[str, Any]] = None,
    run_workers: int = 1,
    max_workers_cap: Optional[int] = None,
) -> Tuple[Dict[str, Any], bool, Optional[str]]

Runs an experiment multiple times, collects results, optionally validates, computes metrics, and saves them. Supports early stopping.

Parameters:

Name Type Description Default
program_path str

Path to the Python script/module to evaluate.

required
results_dir str

Directory to save metrics.json and correct.json.

required
experiment_fn_name str

Name of function to call in the loaded module.

required
num_runs int

Number of times to run the experiment function.

required
get_experiment_kwargs Optional[Callable[[int], Dict[str, Any]]]

Opt. fn (run_idx_0_based -> kwargs_dict) for experiment args. Seed passed if None.

None
aggregate_metrics_fn Optional[Callable[[List[Any]], Dict[str, Any]]]

Opt. fn (raw_results_list -> metrics_dict) for aggregation. If None, basic run stats (count, time) are recorded.

None
validate_fn Optional[Callable[[Any], Tuple[bool, Optional[str]]]]

Opt. fn (result -> (is_valid, error_msg)) to validate each run. Affects overall correctness.

None
plotting_fn Optional[Callable[[Any], List[Any]]]

Opt. fn (extra_data) -> List[(Figure|Animation, title)]. Returns list of (item, title) tuples. Title used as filename. Figures saved as PNG/PDF, animations as GIF.

None
default_metrics_on_error Optional[Dict[str, Any]]

Metrics for eval failure. Uses predefined default if None.

None
early_stop_method Optional[Union[str, EarlyStopMethod]]

Early stopping method. Either a string ("none", "bayesian", "ci", "hybrid") or an EarlyStopMethod instance. None disables early stop.

None
early_stop_threshold Optional[float]

Target threshold to beat. Required if early_stop_method is set.

None
early_stop_score_fn Optional[Callable[[Any], float]]

Function to extract score from run result. If None, assumes result is a numeric score.

None
early_stop_kwargs Optional[Dict[str, Any]]

Additional kwargs for create_early_stop_method (e.g., prob_cutoff, ci_confidence, min_trials).

None
run_workers int

Number of worker processes for per-run evaluations. 1 keeps sequential behavior. Values > 1 enable process-based parallelism.

1
max_workers_cap Optional[int]

Optional upper bound on effective worker count. Applied after run_workers and num_runs. Useful for externally constraining CPU use.

None

Returns:

Type Description
Tuple[Dict[str, Any], bool, Optional[str]]

A tuple: (metrics, overall_correct_flag, first_error_message)


Supporting Runtime Types

Lower-level components for customization:

  • PromptSampler
  • MetaSummarizer
  • NoveltyJudge / AsyncNoveltyJudge
  • SystemPromptEvolver
  • SystemPromptSampler

Available from shinka.core. Most integrations should start with the runner + config objects above.