Core Runtime API¶
Primary runtime objects for constructing and running an evolution loop from Python.
EvolutionConfig¶
Controls mutation behavior, model selection, budgets, prompt evolution, and async proposal targeting.
EvolutionConfig
dataclass
¶
EvolutionConfig(
task_sys_msg: Optional[str] = DEFAULT_TASK_SYS_MSG,
patch_types: List[str] = default_patch_types(),
patch_type_probs: List[float] = default_patch_type_probs(),
num_generations: int = 50,
max_patch_resamples: int = 3,
max_patch_attempts: int = 1,
job_type: str = "local",
language: str = "python",
llm_models: List[str] = default_llm_models(),
llm_dynamic_selection: Optional[Union[str, BanditBase]] = "ucb",
llm_dynamic_selection_kwargs: dict = default_llm_dynamic_selection_kwargs(),
llm_kwargs: dict = default_llm_kwargs(),
meta_rec_interval: Optional[int] = 10,
meta_llm_models: Optional[List[str]] = None,
meta_llm_kwargs: dict = (lambda: {})(),
meta_max_recommendations: int = 5,
sample_single_meta_rec: bool = True,
embedding_model: Optional[str] = "text-embedding-3-small",
init_program_path: Optional[str] = "initial.py",
results_dir: Optional[str] = None,
max_novelty_attempts: int = 3,
code_embed_sim_threshold: float = 0.99,
novelty_llm_models: Optional[List[str]] = None,
novelty_llm_kwargs: dict = (lambda: {})(),
use_text_feedback: bool = False,
max_api_costs: Optional[float] = None,
inspiration_sort_order: str = "ascending",
enable_controlled_oversubscription: bool = True,
proposal_target_mode: str = "adaptive",
proposal_target_min_samples: int = 5,
proposal_target_ratio_cap: float = 2.0,
proposal_buffer_max: int = 2,
proposal_target_hard_cap: Optional[int] = None,
proposal_target_ewma_alpha: float = 0.3,
evolve_prompts: bool = False,
prompt_patch_types: List[str] = default_prompt_patch_types(),
prompt_patch_type_probs: List[float] = default_prompt_patch_type_probs(),
prompt_evolution_interval: Optional[int] = None,
prompt_archive_size: int = 10,
prompt_llm_models: Optional[List[str]] = None,
prompt_llm_kwargs: dict = (lambda: {})(),
prompt_ucb_exploration_constant: float = 1.0,
prompt_epsilon: float = 0.1,
prompt_evo_top_k_programs: int = 3,
prompt_percentile_recompute_interval: int = 20,
)
ShinkaEvolveRunner¶
Main async runtime. Coordinates proposal generation, evaluation submission, persistence, and side-effect handling.
ShinkaEvolveRunner
¶
ShinkaEvolveRunner(
evo_config: EvolutionConfig,
job_config: JobConfig,
db_config: DatabaseConfig,
banner_style: BannerStyle = "full",
verbose: bool = True,
max_evaluation_jobs: int = 2,
max_proposal_jobs: int = 1,
max_db_workers: int = 4,
debug: bool = False,
init_program_str: Optional[str] = None,
evaluate_str: Optional[str] = None,
)
Fully async evolution runner with concurrent proposal generation.
Initialize async evolution runner.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
evo_config
|
EvolutionConfig
|
Evolution configuration |
required |
job_config
|
JobConfig
|
Job configuration |
required |
db_config
|
DatabaseConfig
|
Database configuration |
required |
verbose
|
bool
|
Enable verbose logging |
True
|
max_evaluation_jobs
|
int
|
Maximum concurrent evaluation jobs (defaults to 2) |
2
|
max_proposal_jobs
|
int
|
Maximum concurrent proposal generation tasks (defaults to 1) |
1
|
max_db_workers
|
int
|
Maximum concurrent async DB worker threads (defaults to 4) |
4
|
init_program_str
|
Optional[str]
|
Optional string content for initial program (will be saved to results dir and path updated in evo_config) |
None
|
evaluate_str
|
Optional[str]
|
Optional string content for evaluate script (will be saved to results dir and path updated in job_config) |
None
|
run_shinka_eval¶
Helper for evaluators: standard way to execute candidate programs and aggregate metrics.
run_shinka_eval
¶
run_shinka_eval(
program_path: str,
results_dir: str,
experiment_fn_name: str,
num_runs: int,
get_experiment_kwargs: Optional[Callable[[int], Dict[str, Any]]] = None,
aggregate_metrics_fn: Optional[Callable[[List[Any]], Dict[str, Any]]] = None,
validate_fn: Optional[Callable[[Any], Tuple[bool, Optional[str]]]] = None,
plotting_fn: Optional[Callable[[Any], List[Any]]] = None,
default_metrics_on_error: Optional[Dict[str, Any]] = None,
early_stop_method: Optional[Union[str, EarlyStopMethod]] = None,
early_stop_threshold: Optional[float] = None,
early_stop_score_fn: Optional[Callable[[Any], float]] = None,
early_stop_kwargs: Optional[Dict[str, Any]] = None,
run_workers: int = 1,
max_workers_cap: Optional[int] = None,
) -> Tuple[Dict[str, Any], bool, Optional[str]]
Runs an experiment multiple times, collects results, optionally validates, computes metrics, and saves them. Supports early stopping.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
program_path
|
str
|
Path to the Python script/module to evaluate. |
required |
results_dir
|
str
|
Directory to save |
required |
experiment_fn_name
|
str
|
Name of function to call in the loaded module. |
required |
num_runs
|
int
|
Number of times to run the experiment function. |
required |
get_experiment_kwargs
|
Optional[Callable[[int], Dict[str, Any]]]
|
Opt. fn (run_idx_0_based -> kwargs_dict) for experiment args. Seed passed if None. |
None
|
aggregate_metrics_fn
|
Optional[Callable[[List[Any]], Dict[str, Any]]]
|
Opt. fn (raw_results_list -> metrics_dict) for aggregation. If None, basic run stats (count, time) are recorded. |
None
|
validate_fn
|
Optional[Callable[[Any], Tuple[bool, Optional[str]]]]
|
Opt. fn (result -> (is_valid, error_msg)) to validate each run. Affects overall correctness. |
None
|
plotting_fn
|
Optional[Callable[[Any], List[Any]]]
|
Opt. fn (extra_data) -> List[(Figure|Animation, title)]. Returns list of (item, title) tuples. Title used as filename. Figures saved as PNG/PDF, animations as GIF. |
None
|
default_metrics_on_error
|
Optional[Dict[str, Any]]
|
Metrics for eval failure. Uses predefined default if None. |
None
|
early_stop_method
|
Optional[Union[str, EarlyStopMethod]]
|
Early stopping method. Either a string ("none", "bayesian", "ci", "hybrid") or an EarlyStopMethod instance. None disables early stop. |
None
|
early_stop_threshold
|
Optional[float]
|
Target threshold to beat. Required if early_stop_method is set. |
None
|
early_stop_score_fn
|
Optional[Callable[[Any], float]]
|
Function to extract score from run result. If None, assumes result is a numeric score. |
None
|
early_stop_kwargs
|
Optional[Dict[str, Any]]
|
Additional kwargs for create_early_stop_method (e.g., prob_cutoff, ci_confidence, min_trials). |
None
|
run_workers
|
int
|
Number of worker processes for per-run evaluations.
|
1
|
max_workers_cap
|
Optional[int]
|
Optional upper bound on effective worker count.
Applied after |
None
|
Returns:
| Type | Description |
|---|---|
Tuple[Dict[str, Any], bool, Optional[str]]
|
A tuple: (metrics, overall_correct_flag, first_error_message) |
Supporting Runtime Types¶
Lower-level components for customization:
PromptSamplerMetaSummarizerNoveltyJudge/AsyncNoveltyJudgeSystemPromptEvolverSystemPromptSampler
Available from shinka.core. Most integrations should start with the runner +
config objects above.