Skip to content

LLM API

Provider selection, request fan-out, and structured-output querying.


LLMClient

Batch-oriented synchronous client for sampling candidate responses.

LLMClient

LLMClient(
    model_names: Union[List[str], str] = "gpt-5.1",
    temperatures: Union[float, List[float]] = 0.75,
    max_tokens: Union[int, List[int]] = 4096,
    reasoning_efforts: Union[str, List[str]] = "disabled",
    model_sample_probs: Optional[List[float]] = None,
    output_model: Optional[BaseModel] = None,
    verbose: bool = True,
)

batch_query

batch_query(
    num_samples: int,
    msg: Union[str, List[str]],
    system_msg: Union[str, List[str]],
    msg_history: Union[List[Dict], List[List[Dict]]] = [],
    llm_kwargs: List[Dict] = [],
) -> List[QueryResult]

Batch query the LLM with the given message and system message.

Parameters:

Name Type Description Default
msg str

The message to query the LLM with.

required
system_msg str

The system message to query the LLM with.

required

batch_kwargs_query

batch_kwargs_query(
    num_samples: int,
    msg: Union[str, List[str]],
    system_msg: Union[str, List[str]],
    msg_history: Union[List[Dict], List[List[Dict]]] = [],
    model_sample_probs: Optional[List[float]] = None,
) -> List[QueryResult]

Batch query the LLM with the given message and system message.

Parameters:

Name Type Description Default
msg str

The message to query the LLM with.

required
system_msg str

The system message to query the LLM with.

required
model_sample_probs Optional[List[float]]

Sampling probabilities for each model.

None

get_kwargs

get_kwargs(model_sample_probs: Optional[List[float]] = None)

Get model kwargs for sampling.

Parameters:

Name Type Description Default
model_sample_probs Optional[List[float]]

Sampling probabilities for each model.

None

AsyncLLMClient

Async counterpart for the same provider abstraction.

AsyncLLMClient

AsyncLLMClient(
    model_names: Union[List[str], str] = "gpt-5.1",
    temperatures: Union[float, List[float]] = 0.75,
    max_tokens: Union[int, List[int]] = 4096,
    reasoning_efforts: Union[str, List[str]] = "disabled",
    model_sample_probs: Optional[List[float]] = None,
    output_model: Optional[BaseModel] = None,
    verbose: bool = True,
)

batch_query async

batch_query(
    num_samples: int,
    msg: Union[str, List[str]],
    system_msg: Union[str, List[str]],
    msg_history: Union[List[Dict], List[List[Dict]]] = [],
    llm_kwargs: List[Dict] = [],
) -> List[QueryResult]

Batch query the LLM with the given message and system message asynchronously.

Parameters:

Name Type Description Default
msg str

The message to query the LLM with.

required
system_msg str

The system message to query the LLM with.

required

batch_kwargs_query async

batch_kwargs_query(
    num_samples: int,
    msg: Union[str, List[str]],
    system_msg: Union[str, List[str]],
    msg_history: Union[List[Dict], List[List[Dict]]] = [],
    model_sample_probs: Optional[List[float]] = None,
) -> List[QueryResult]

Batch query the LLM with the given message and system message asynchronously.

Parameters:

Name Type Description Default
msg str

The message to query the LLM with.

required
system_msg str

The system message to query the LLM with.

required
model_sample_probs Optional[List[float]]

Sampling probabilities for each model.

None

get_kwargs

get_kwargs(model_sample_probs: Optional[List[float]] = None)

Get model kwargs for sampling.

Parameters:

Name Type Description Default
model_sample_probs Optional[List[float]]

Sampling probabilities for each model.

None

Direct Query Helpers

Lower-level provider dispatch:

query

query(
    model_name: str,
    msg: str,
    system_msg: str,
    msg_history: List = [],
    output_model: Optional[BaseModel] = None,
    model_posteriors: Optional[Dict[str, float]] = None,
    **kwargs
) -> QueryResult

Query the LLM.


query_async async

query_async(
    model_name: str,
    msg: str,
    system_msg: str,
    msg_history: List = [],
    output_model: Optional[BaseModel] = None,
    model_posteriors: Optional[Dict[str, float]] = None,
    **kwargs
) -> QueryResult

Query the LLM asynchronously.


Model Prioritization

Bandit-style model prioritization strategies via shinka.llm.prioritization. Dynamically shifts sampling probability across models based on observed utility and cost.