First-party SDKs for Python and TypeScript, a CLI that pipes into any workflow, a VS Code extension for inline testing, and full OpenAI compatibility so you can switch in two lines.
The Alveare Python SDK is designed for production use in backend services, data pipelines, and ML workflows. It provides typed responses, automatic retries with exponential backoff, connection pooling, and both synchronous and asynchronous interfaces. Install it with pip and start making inference requests in three lines of code.
For high-throughput services running on asyncio, FastAPI, or similar frameworks, the async client
avoids blocking your event loop. It uses httpx under the hood and supports the same
interface as the synchronous client.
Every response is a typed InferResult object with
.text,
.tokens_used,
.latency_ms,
.specialist, and
.cached fields.
Errors raise typed exceptions: AlveareAuthError,
AlveareRateLimitError,
AlveareValidationError.
Rate limit errors include a .retry_after field in seconds.
infer_stream() returning an iterator of chunksThe TypeScript SDK provides full type safety, native Promise support, and works across Node.js 18+, Deno, and Bun. It ships as both ESM and CommonJS, is tree-shakeable, and adds less than 15 KB to your bundle. Every method returns strongly typed responses so your IDE catches errors before runtime.
Every request and response is fully typed. The SDK exports interfaces for
InferRequest,
InferResult,
Specialist,
UsageStats, and
WebhookEvent.
Discriminated union types for error responses mean your catch
blocks can narrow by error type without casting.
The Alveare CLI is a single binary that lets you manage specialists, run inference, monitor usage, and automate batch jobs from your terminal. It is built for engineers who live in the terminal and want to pipe any file through Alveare without writing code.
The Alveare VS Code extension brings inference directly into your editor. Select text, right-click, and send it to any specialist. The response appears in a side panel with latency, token count, and cached status. No context switching. No terminal required.
Select any block of text in your editor, right-click, and choose "Alveare: Summarise". The summary appears in a side panel in under 300ms. Works with any file type.
Highlight a support ticket, log entry, or any text. Right-click and classify it instantly. The label appears as an inline decoration next to your selection.
View all your configured specialists in a sidebar tree view. Edit system prompts, temperature, and max tokens directly from VS Code. Changes deploy to your hive immediately.
A dedicated panel for testing prompts against different specialists and comparing outputs. Adjust parameters in real time and see how they affect quality and latency.
A persistent status bar item shows your current request count and plan allocation. Click it to see per-specialist breakdowns, average latency, and cache hit rates.
Compare outputs from different specialists or different parameter configurations side by side. Essential for prompt engineering and output quality validation.
Install from the VS Code Marketplace: search for "Alveare" or run
ext install alveare.alveare-vscode
from the command palette. Requires an Alveare API key.
If you already use the OpenAI API, you do not need to learn a new SDK. Alveare's inference endpoint is wire-compatible with the OpenAI chat completions API. Change two lines of code -- the base URL and the API key -- and your existing application works without modification.
This is not a subset. We support system messages, multi-turn conversations, streaming responses (SSE), temperature, top_p, max_tokens, stop sequences, JSON mode, and function calling. The Alveare endpoint also accepts OpenAI model names and maps them to the appropriate specialist automatically, so you do not even need to change your model parameter if you prefer.
The OpenAI compatibility layer means you can evaluate Alveare without modifying a single line of application code beyond the configuration. Run your existing test suite against Alveare, compare latency and quality, and make a decision based on production-grade evidence.
Install the SDK, get an API key, make your first request. Full documentation for every tool.
Get Started Free