Embeddings API¶
Code similarity, archive analysis, and provider integration across OpenAI, Azure, Gemini, OpenRouter, and local OpenAI-compatible backends.
EmbeddingClient¶
Synchronous embedding client with token counting and cost estimation.
EmbeddingClient
¶
Initialize the EmbeddingClient.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_name
|
str
|
The OpenAI, Azure, or Gemini embedding model name to use. |
'text-embedding-3-small'
|
verbose
|
bool
|
Enable verbose logging. |
False
|
count_tokens
¶
Count tokens using tiktoken for accurate token counting.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
Union[str, List[str]]
|
A string or list of strings to count tokens for. |
required |
Returns:
| Type | Description |
|---|---|
Union[int, List[int]]
|
Token count (int) for single string, or list of counts for list. |
get_embedding
¶
get_embedding(
code: Union[str, List[str]],
) -> Union[Tuple[List[float], float], Tuple[List[List[float]], float]]
Computes the text embedding for a string or list of strings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
code
|
(str, list[str])
|
The text as a string or list of strings. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
tuple |
Union[Tuple[List[float], float], Tuple[List[List[float]], float]]
|
(embedding_vector(s), cost) |
get_column_embedding
¶
Computes the text embedding for a batch of strings in DataFrame columns.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
A pandas DataFrame with the column to embed. |
required |
column_name
|
(str, list)
|
The name of the columns to embed. |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
pd.DataFrame: A pandas DataFrame containing the embedded column. |
AsyncEmbeddingClient¶
Async embedding client used by the async runtime.
AsyncEmbeddingClient
¶
Async version of EmbeddingClient for non-blocking API calls.
Initialize the AsyncEmbeddingClient.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_name
|
str
|
The OpenAI, Azure, or Gemini embedding model name to use. |
'text-embedding-3-small'
|
verbose
|
bool
|
Enable verbose logging. |
False
|
get_embedding
¶
get_embedding(
code: Union[str, List[str]],
) -> Union[Tuple[List[float], float], Tuple[List[List[float]], float]]
Synchronous wrapper for compatibility. Note: This defeats the purpose of async - use embed_async() instead.
Backend Resolution Helpers¶
Provider-specific client construction:
resolve_embedding_backend
¶
Resolve runtime backend info for embedding model identifiers.
get_client_embed
¶
Get the client and model for the given embedding model name.
get_async_client_embed
¶
Get the async client and model for the given embedding model name.