Embeddings API¶

Code similarity, archive analysis, and provider integration across OpenAI, Azure, Gemini, OpenRouter, and local OpenAI-compatible backends.

`EmbeddingClient`¶

Synchronous embedding client with token counting and cost estimation.

EmbeddingClient ¶

EmbeddingClient(model_name: str = 'text-embedding-3-small', verbose: bool = False)

Initialize the EmbeddingClient.

Parameters:

Name	Type	Description	Default
`model_name`	`str`	The OpenAI, Azure, or Gemini embedding model name to use.	`'text-embedding-3-small'`
`verbose`	`bool`	Enable verbose logging.	`False`

count_tokens ¶

count_tokens(text: Union[str, List[str]]) -> Union[int, List[int]]

Count tokens using tiktoken for accurate token counting.

Parameters:

Name	Type	Description	Default
`text`	`Union[str, List[str]]`	A string or list of strings to count tokens for.	required

Returns:

Type	Description
`Union[int, List[int]]`	Token count (int) for single string, or list of counts for list.

get_embedding ¶

get_embedding(
    code: Union[str, List[str]],
) -> Union[Tuple[List[float], float], Tuple[List[List[float]], float]]

Computes the text embedding for a string or list of strings.

Parameters:

Name	Type	Description	Default
`code`	`(str, list[str])`	The text as a string or list of strings.	required

Returns:

Name	Type	Description
`tuple`	`Union[Tuple[List[float], float], Tuple[List[List[float]], float]]`	(embedding_vector(s), cost)

get_column_embedding ¶

get_column_embedding(df: DataFrame, column_name: Union[str, List[str]]) -> pd.DataFrame

Computes the text embedding for a batch of strings in DataFrame columns.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	A pandas DataFrame with the column to embed.	required
`column_name`	`(str, list)`	The name of the columns to embed.	required

Returns:

Type	Description
`DataFrame`	pd.DataFrame: A pandas DataFrame containing the embedded column.

`AsyncEmbeddingClient`¶

Async embedding client used by the async runtime.

AsyncEmbeddingClient ¶

AsyncEmbeddingClient(model_name: str = 'text-embedding-3-small', verbose: bool = False)

Async version of EmbeddingClient for non-blocking API calls.

Initialize the AsyncEmbeddingClient.

Parameters:

Name	Type	Description	Default
`model_name`	`str`	The OpenAI, Azure, or Gemini embedding model name to use.	`'text-embedding-3-small'`
`verbose`	`bool`	Enable verbose logging.	`False`

get_embedding ¶

get_embedding(
    code: Union[str, List[str]],
) -> Union[Tuple[List[float], float], Tuple[List[List[float]], float]]

Synchronous wrapper for compatibility. Note: This defeats the purpose of async - use embed_async() instead.

Backend Resolution Helpers¶

Provider-specific client construction:

resolve_embedding_backend ¶

resolve_embedding_backend(model_name: str) -> ResolvedEmbeddingModel

Resolve runtime backend info for embedding model identifiers.

get_client_embed ¶

get_client_embed(model_name: str) -> Tuple[Any, str]

Get the client and model for the given embedding model name.

get_async_client_embed ¶

get_async_client_embed(model_name: str) -> Tuple[Any, str]

Get the async client and model for the given embedding model name.

Embeddings API¶

EmbeddingClient¶

EmbeddingClient ¶

count_tokens ¶

get_embedding ¶

get_column_embedding ¶

AsyncEmbeddingClient¶

AsyncEmbeddingClient ¶

get_embedding ¶

Backend Resolution Helpers¶

resolve_embedding_backend ¶

get_client_embed ¶

get_async_client_embed ¶

`EmbeddingClient`¶

`AsyncEmbeddingClient`¶