Skip to content

VectorStackAI's Embeddings API Reference

Embeddings Operations

vectorstackai.Client.embed

embed(texts, model, is_query=False, instruction='')

Generates embeddings for a batch of text inputs using the specified model.

This method encodes a batch of text documents or queries into dense vector representations using the selected embedding model. It supports both document and query embeddings, with an optional instruction for instruction-tuned models.

Parameters:

Name Type Description Default
texts List[str]

Batch of text strings to be embedded as a list of strings. Each string represents either a document or a query.

required
model str

The name of the embedding model to use (e.g., "vstackai-law-1" for legal documents).

required
is_query bool

A flag indicating whether the input texts are queries (True) or documents (False). Defaults to False.

False
instruction str

An optional instruction to guide the model when embedding queries. Recommended for instruction-tuned models. Defaults to an empty string.

''

Returns:

Name Type Description
EmbeddingsObject EmbeddingsObject

An object that holds embeddings for the batch of texts. The embeddings are stored as a NumPy array of shape (num_texts, embedding_dimension), accesible via the embeddings attribute.

Raises:

Type Description
ValueError

If texts is not a list of strings. If model is not a string. If is_query is not a boolean. If instruction is not a string.

Example
client = vectorstackai.Client(api_key="your_api_key")

texts = [
    "The defendant was charged with violation of contract terms.",
    "Consumers have 30 days to return a defective product."
]

embeddings = client.embed(texts=texts, model="vstackai-law-1", is_query=False)

print(embeddings.embeddings.shape)  # (2, 1536)
Source code in src/vectorstackai/client.py
def embed(
    self,
    texts: List[str],
    model: str,
    is_query: bool = False,
    instruction: str = "",
) -> EmbeddingsObject:
    """
    Generates embeddings for a batch of text inputs using the specified model.

    This method encodes a batch of text documents or queries into dense vector 
    representations using the selected embedding model. It supports both 
    document and query embeddings, with an optional instruction for 
    instruction-tuned models.

    Args:
        texts (List[str]): 
            Batch of text strings to be embedded as a list of strings. 
            Each string represents either a document or a query.
        model (str): 
            The name of the embedding model to use (e.g., `"vstackai-law-1"` 
            for legal documents).
        is_query (bool, optional): 
            A flag indicating whether the input texts are queries (`True`) 
            or documents (`False`). Defaults to `False`.
        instruction (str, optional): 
            An optional instruction to guide the model when embedding queries. 
            Recommended for instruction-tuned models. Defaults to an empty string.

    Returns:
        EmbeddingsObject: 
            An object that holds embeddings for the batch of texts. 
            The embeddings are stored as a NumPy array of shape
            `(num_texts, embedding_dimension)`, accesible via the 
            `embeddings` attribute.

    Raises:
        ValueError: 
            If `texts` is not a list of strings.
            If `model` is not a string.
            If `is_query` is not a boolean.
            If `instruction` is not a string.

    Example:
        ```python
        client = vectorstackai.Client(api_key="your_api_key")

        texts = [
            "The defendant was charged with violation of contract terms.",
            "Consumers have 30 days to return a defective product."
        ]

        embeddings = client.embed(texts=texts, model="vstackai-law-1", is_query=False)

        print(embeddings.embeddings.shape)  # (2, 1536)
        ```
    """ 
    # Validate input arguments
    if not isinstance(texts, list) or not all(isinstance(text, str) for text in texts):
        raise ValueError("'texts' must be a list of strings")
    if not isinstance(model, str):
        raise ValueError("'model' must be a string")
    if not isinstance(is_query, bool):
        raise ValueError("'is_query' must be a boolean")
    if not isinstance(instruction, str):
        raise ValueError("'instruction' must be a string")

    for attempt in self.retry_controller:
        with attempt:
            response_json = api_resources.Embedding.encode(
                texts=texts,
                model=model,
                is_query=is_query,
                instruction=instruction,
                connection_params=self.connection_params
            )
    return EmbeddingsObject(response_json, batch_size=len(texts))