Core Runtime API¶

Primary runtime objects for constructing and running an evolution loop from Python.

`EvolutionConfig`¶

Controls mutation behavior, model selection, budgets, prompt evolution, and async proposal targeting.

EvolutionConfig `dataclass` ¶

EvolutionConfig(
    task_sys_msg: Optional[str] = DEFAULT_TASK_SYS_MSG,
    patch_types: List[str] = default_patch_types(),
    patch_type_probs: List[float] = default_patch_type_probs(),
    num_generations: int = 50,
    max_patch_resamples: int = 3,
    max_patch_attempts: int = 1,
    job_type: str = "local",
    language: str = "python",
    llm_models: List[str] = default_llm_models(),
    llm_dynamic_selection: Optional[Union[str, BanditBase]] = "ucb",
    llm_dynamic_selection_kwargs: dict = default_llm_dynamic_selection_kwargs(),
    llm_kwargs: dict = default_llm_kwargs(),
    meta_rec_interval: Optional[int] = 10,
    meta_llm_models: Optional[List[str]] = None,
    meta_llm_kwargs: dict = (lambda: {})(),
    meta_max_recommendations: int = 5,
    sample_single_meta_rec: bool = True,
    embedding_model: Optional[str] = "text-embedding-3-small",
    init_program_path: Optional[str] = "initial.py",
    results_dir: Optional[str] = None,
    max_novelty_attempts: int = 3,
    code_embed_sim_threshold: float = 0.99,
    novelty_llm_models: Optional[List[str]] = None,
    novelty_llm_kwargs: dict = (lambda: {})(),
    use_text_feedback: bool = False,
    max_api_costs: Optional[float] = None,
    inspiration_sort_order: str = "ascending",
    enable_controlled_oversubscription: bool = False,
    proposal_target_mode: str = "adaptive",
    proposal_target_min_samples: int = 5,
    proposal_target_ratio_cap: float = 2.0,
    proposal_buffer_max: int = 2,
    proposal_target_hard_cap: Optional[int] = None,
    proposal_target_ewma_alpha: float = 0.3,
    evolve_prompts: bool = False,
    prompt_patch_types: List[str] = default_prompt_patch_types(),
    prompt_patch_type_probs: List[float] = default_prompt_patch_type_probs(),
    prompt_evolution_interval: Optional[int] = None,
    prompt_archive_size: int = 10,
    prompt_llm_models: Optional[List[str]] = None,
    prompt_llm_kwargs: dict = (lambda: {})(),
    prompt_ucb_exploration_constant: float = 1.0,
    prompt_epsilon: float = 0.1,
    prompt_evo_top_k_programs: int = 3,
    prompt_percentile_recompute_interval: int = 20,
)

`ShinkaEvolveRunner`¶

Main async runtime. Coordinates proposal generation, evaluation submission, persistence, and side-effect handling.

ShinkaEvolveRunner ¶

ShinkaEvolveRunner(
    evo_config: EvolutionConfig,
    job_config: JobConfig,
    db_config: DatabaseConfig,
    banner_style: BannerStyle = "full",
    verbose: bool = True,
    max_evaluation_jobs: int = 4,
    max_proposal_jobs: int = 6,
    max_db_workers: int = 2,
    debug: bool = False,
    init_program_str: Optional[str] = None,
    evaluate_str: Optional[str] = None,
)

Fully async evolution runner with concurrent proposal generation.

Initialize async evolution runner.

Parameters:

Name	Type	Description	Default
`evo_config`	`EvolutionConfig`	Evolution configuration	required
`job_config`	`JobConfig`	Job configuration	required
`db_config`	`DatabaseConfig`	Database configuration	required
`verbose`	`bool`	Enable verbose logging	`True`
`max_evaluation_jobs`	`int`	Maximum concurrent evaluation jobs (defaults to 4)	`4`
`max_proposal_jobs`	`int`	Maximum concurrent proposal generation tasks (defaults to 6)	`6`
`max_db_workers`	`int`	Maximum concurrent async DB worker threads (defaults to 2)	`2`
`init_program_str`	`Optional[str]`	Optional string content for initial program (will be saved to results dir and path updated in evo_config)	`None`
`evaluate_str`	`Optional[str]`	Optional string content for evaluate script (will be saved to results dir and path updated in job_config)	`None`

run ¶

run()

Synchronous convenience wrapper for script/CLI usage.

`run_shinka_eval`¶

Helper for evaluators: standard way to execute candidate programs and aggregate metrics.

run_shinka_eval ¶

run_shinka_eval(
    program_path: str,
    results_dir: str,
    experiment_fn_name: str,
    num_runs: int,
    get_experiment_kwargs: Optional[Callable[[int], Dict[str, Any]]] = None,
    aggregate_metrics_fn: Optional[Callable[[List[Any]], Dict[str, Any]]] = None,
    validate_fn: Optional[Callable[[Any], Tuple[bool, Optional[str]]]] = None,
    plotting_fn: Optional[Callable[[Any], List[Any]]] = None,
    default_metrics_on_error: Optional[Dict[str, Any]] = None,
    early_stop_method: Optional[Union[str, EarlyStopMethod]] = None,
    early_stop_threshold: Optional[float] = None,
    early_stop_score_fn: Optional[Callable[[Any], float]] = None,
    early_stop_kwargs: Optional[Dict[str, Any]] = None,
    run_workers: int = 1,
    max_workers_cap: Optional[int] = None,
) -> Tuple[Dict[str, Any], bool, Optional[str]]

Runs an experiment multiple times, collects results, optionally validates, computes metrics, and saves them. Supports early stopping.

Parameters:

Name	Type	Description	Default
`program_path`	`str`	Path to the Python script/module to evaluate.	required
`results_dir`	`str`	Directory to save `metrics.json` and `correct.json`.	required
`experiment_fn_name`	`str`	Name of function to call in the loaded module.	required
`num_runs`	`int`	Number of times to run the experiment function.	required
`get_experiment_kwargs`	`Optional[Callable[[int], Dict[str, Any]]]`	Opt. fn (run_idx_0_based -> kwargs_dict) for experiment args. Seed passed if None.	`None`
`aggregate_metrics_fn`	`Optional[Callable[[List[Any]], Dict[str, Any]]]`	Opt. fn (raw_results_list -> metrics_dict) for aggregation. If None, basic run stats (count, time) are recorded.	`None`
`validate_fn`	`Optional[Callable[[Any], Tuple[bool, Optional[str]]]]`	Opt. fn (result -> (is_valid, error_msg)) to validate each run. Affects overall correctness.	`None`
`plotting_fn`	`Optional[Callable[[Any], List[Any]]]`	Opt. fn (extra_data) -> List[(Figure\|Animation, title)]. Returns list of (item, title) tuples. Title used as filename. Figures saved as PNG/PDF, animations as GIF.	`None`
`default_metrics_on_error`	`Optional[Dict[str, Any]]`	Metrics for eval failure. Uses predefined default if None.	`None`
`early_stop_method`	`Optional[Union[str, EarlyStopMethod]]`	Early stopping method. Either a string ("none", "bayesian", "ci", "hybrid") or an EarlyStopMethod instance. None disables early stop.	`None`
`early_stop_threshold`	`Optional[float]`	Target threshold to beat. Required if early_stop_method is set.	`None`
`early_stop_score_fn`	`Optional[Callable[[Any], float]]`	Function to extract score from run result. If None, assumes result is a numeric score.	`None`
`early_stop_kwargs`	`Optional[Dict[str, Any]]`	Additional kwargs for create_early_stop_method (e.g., prob_cutoff, ci_confidence, min_trials).	`None`
`run_workers`	`int`	Number of worker processes for per-run evaluations. `1` keeps sequential behavior. Values > 1 enable process-based parallelism.	`1`
`max_workers_cap`	`Optional[int]`	Optional upper bound on effective worker count. Applied after `run_workers` and `num_runs`. Useful for externally constraining CPU use.	`None`

Returns:

Type	Description
`Tuple[Dict[str, Any], bool, Optional[str]]`	A tuple: (metrics, overall_correct_flag, first_error_message)

Supporting Runtime Types¶

Lower-level components for customization:

PromptSampler
MetaSummarizer
NoveltyJudge / AsyncNoveltyJudge
SystemPromptEvolver
SystemPromptSampler

Available from shinka.core. Most integrations should start with the runner + config objects above.

Core Runtime API¶

EvolutionConfig¶

EvolutionConfig dataclass ¶

ShinkaEvolveRunner¶