Skip to content

Embeddings API

Code similarity, archive analysis, and provider integration across OpenAI, Azure, Gemini, OpenRouter, and local OpenAI-compatible backends.


EmbeddingClient

Synchronous embedding client with token counting and cost estimation.

EmbeddingClient

EmbeddingClient(model_name: str = 'text-embedding-3-small', verbose: bool = False)

Initialize the EmbeddingClient.

Parameters:

Name Type Description Default
model_name str

The OpenAI, Azure, or Gemini embedding model name to use.

'text-embedding-3-small'
verbose bool

Enable verbose logging.

False

count_tokens

count_tokens(text: Union[str, List[str]]) -> Union[int, List[int]]

Count tokens using tiktoken for accurate token counting.

Parameters:

Name Type Description Default
text Union[str, List[str]]

A string or list of strings to count tokens for.

required

Returns:

Type Description
Union[int, List[int]]

Token count (int) for single string, or list of counts for list.

get_embedding

get_embedding(
    code: Union[str, List[str]],
) -> Union[Tuple[List[float], float], Tuple[List[List[float]], float]]

Computes the text embedding for a string or list of strings.

Parameters:

Name Type Description Default
code (str, list[str])

The text as a string or list of strings.

required

Returns:

Name Type Description
tuple Union[Tuple[List[float], float], Tuple[List[List[float]], float]]

(embedding_vector(s), cost)

get_column_embedding

get_column_embedding(df: DataFrame, column_name: Union[str, List[str]]) -> pd.DataFrame

Computes the text embedding for a batch of strings in DataFrame columns.

Parameters:

Name Type Description Default
df DataFrame

A pandas DataFrame with the column to embed.

required
column_name (str, list)

The name of the columns to embed.

required

Returns:

Type Description
DataFrame

pd.DataFrame: A pandas DataFrame containing the embedded column.


AsyncEmbeddingClient

Async embedding client used by the async runtime.

AsyncEmbeddingClient

AsyncEmbeddingClient(model_name: str = 'text-embedding-3-small', verbose: bool = False)

Async version of EmbeddingClient for non-blocking API calls.

Initialize the AsyncEmbeddingClient.

Parameters:

Name Type Description Default
model_name str

The OpenAI, Azure, or Gemini embedding model name to use.

'text-embedding-3-small'
verbose bool

Enable verbose logging.

False

get_embedding

get_embedding(
    code: Union[str, List[str]],
) -> Union[Tuple[List[float], float], Tuple[List[List[float]], float]]

Synchronous wrapper for compatibility. Note: This defeats the purpose of async - use embed_async() instead.


Backend Resolution Helpers

Provider-specific client construction:

resolve_embedding_backend

resolve_embedding_backend(model_name: str) -> ResolvedEmbeddingModel

Resolve runtime backend info for embedding model identifiers.


get_client_embed

get_client_embed(model_name: str) -> Tuple[Any, str]

Get the client and model for the given embedding model name.


get_async_client_embed

get_async_client_embed(model_name: str) -> Tuple[Any, str]

Get the async client and model for the given embedding model name.