Engineering Workflows

Workflow Integration

Alveare fits into the tools you already use. CI/CD pipelines, webhook-driven architectures, batch processing jobs, real-time streaming, and multi-environment deployments. No new infrastructure to manage.

CI/CD Integration

Use Alveare in your continuous integration pipelines to automate classification, summarization, and extraction tasks as part of your build process. The CLI installs in seconds and authenticates via environment variable, making it a drop-in addition to GitHub Actions, GitLab CI, Jenkins, or any CI system that runs shell commands.

.github/workflows/classify-prs.yml -- Auto-classify PR descriptions on every push
name: Classify PR on: [pull_request] jobs: classify: runs-on: ubuntu-latest steps: - name: Install Alveare CLI run: curl -fsSL https://get.alveare.ai | sh - name: Classify PR description env: ALVEARE_API_KEY: ${{ secrets.ALVEARE_API_KEY }} run: | LABEL=$(echo "${{ github.event.pull_request.body }}" | \ alveare infer --specialist classify --max-tokens 20) echo "PR classified as: $LABEL" # Apply the label to the PR via GitHub API gh pr edit ${{ github.event.number }} --add-label "$LABEL"
.gitlab-ci.yml -- Summarise changelog on release
summarise-release: stage: release script: - curl -fsSL https://get.alveare.ai | sh - git log --oneline $(git describe --tags --abbrev=0 HEAD~1)..HEAD > changes.txt - cat changes.txt | alveare infer --specialist summarise --max-tokens 512 > RELEASE_NOTES.md artifacts: paths: - RELEASE_NOTES.md only: - tags

The CLI authenticates via the ALVEARE_API_KEY environment variable, so there is no interactive login step in CI. Use a dedicated API key for CI with restricted permissions -- read-only access to specific specialists -- so a compromised key cannot modify your configuration.


Webhook Integration

Alveare sends real-time webhook notifications for billing events, usage alerts, and specialist health changes. Configure endpoints in the dashboard and receive HMAC-signed payloads that your backend can verify and act on automatically.

Example webhook payload: usage threshold alert
POST https://your-app.com/webhooks/alveare Content-Type: application/json X-Alveare-Signature: sha256=a1b2c3d4e5f6... { "event": "usage.threshold", "timestamp": "2026-03-17T14:30:00Z", "data": { "plan": "professional", "current_requests": 450000, "limit": 500000, "percentage": 90, "projected_overage": 75000, "projected_overage_cost": "$187.50", "days_remaining": 8 } }

Supported Webhook Events

All payloads include an HMAC-SHA256 signature computed with your webhook secret. Verify the signature before processing any event to prevent spoofed requests. The signing secret is generated when you create the webhook endpoint and can be rotated at any time without downtime.


Batch Processing

Process thousands of documents, support tickets, or records in a single command. The Alveare CLI and SDKs support batch mode with configurable concurrency, progress tracking, and automatic error handling. Failed items are retried automatically and logged separately for manual review.

CSV JSONL PDF Documents Input Data Alveare Batch API E extract C classify S summarise 50 concurrent requests { } Structured Results JSON / CSV ~1,000 documents / minute avg 8.1ms/row -- automatic retry on failure -- progress streaming
Batch processing with the CLI
# Process a CSV: classify every support ticket alveare batch \ --specialist classify \ --input tickets.csv \ --column description \ --output classified.csv \ --concurrency 50 Processing 12,847 rows... [========================================] 100% | 12,847/12,847 | 2m 14s Results written to classified.csv Errors: 0 | Cache hits: 1,923 (15.0%) # Process a directory of PDFs: extract key terms alveare batch \ --specialist extract \ --input-dir ./contracts/ \ --pattern "*.pdf" \ --output-dir ./extracted/ \ --output-format json
Batch processing with the Python SDK
from alveare import Alveare client = Alveare(api_key="alv_live_abc123...") # Process 10,000 support tickets with progress callback results = client.infer_batch( specialist="classify", prompts=ticket_descriptions, # list of 10,000 strings max_tokens=20, concurrency=50, on_progress=lambda done, total: print(f"{done}/{total}") ) # Results maintain input order for ticket, result in zip(ticket_descriptions, results): db.update_ticket(ticket.id, category=result.text)

Streaming

For chat interfaces and real-time applications, Alveare supports server-sent events (SSE) streaming. Tokens are delivered as they are generated, so your users see the response forming in real time rather than waiting for the full completion. Both SDKs and the API support streaming natively.

Python streaming
from alveare import Alveare client = Alveare(api_key="alv_live_abc123...") # Stream tokens as they are generated for chunk in client.infer_stream( specialist="chat", prompt="Explain the benefits of private inference for healthcare", max_tokens=512 ): print(chunk.text, end="", flush=True) # Each chunk: {"text": "Private", "token_index": 0, "finish_reason": null} # Or with the OpenAI-compatible endpoint import openai client = openai.OpenAI(base_url="https://api.alveare.ai/v1", api_key="alv_live_...") stream = client.chat.completions.create( model="chat", messages=[{"role": "user", "content": "Hello!"}], stream=True ) for chunk in stream: print(chunk.choices[0].delta.content, end="")

Streaming uses the standard SSE protocol, so it works with any HTTP client that supports chunked transfer encoding. Time to first token (TTFT) is typically 50-80ms for a 7B model, meaning your users see the response start almost immediately.


Built-in Response Caching

Alveare includes an automatic response cache that stores results for identical requests. If the same specialist receives the same prompt with the same parameters within the cache TTL, the cached response is returned without running inference. This reduces costs by 15-30% for workloads with repetitive inputs and drops latency to under 10ms for cache hits.

How It Works

For a Professional plan customer processing 500K requests/month with a 20% cache hit rate, that is 100,000 free requests per month. At the Starter plan's overage rate of $4.00/1K, that is $400/month in effective savings from caching alone.


Rate Limit Handling

All Alveare SDKs include automatic rate limit handling with configurable retry behavior. When your request hits the rate limit, the SDK reads the Retry-After header and waits the appropriate time before retrying. No request is dropped unless you exceed the maximum retry count.

Configuring retry behavior
from alveare import Alveare, RetryConfig client = Alveare( api_key="alv_live_abc123...", retry=RetryConfig( max_retries=5, # default: 3 initial_backoff=0.5, # seconds, default: 1.0 max_backoff=30, # seconds, default: 60 backoff_multiplier=2, # exponential, default: 2 retry_on=[429, 500, 502, 503] # HTTP status codes to retry ) ) # Rate limits by plan: # Starter: 100 req/s sustained, 200 req/s burst # Professional: 500 req/s sustained, 1000 req/s burst # Scale: 2000 req/s sustained, 5000 req/s burst

Rate limits are applied per API key using a token bucket algorithm. Sustained limits refill continuously; burst limits allow short spikes above the sustained rate. Both limits are visible in the response headers (X-RateLimit-Remaining, X-RateLimit-Reset) so you can implement client-side throttling if needed.


Environment Management

Alveare supports separate environments for development, staging, and production through API key scoping and sandbox mode. Each environment can have its own API keys, specialist configurations, and usage limits, so you never accidentally send production traffic to a test endpoint or vice versa.

Sandbox vs Production Keys

Sandbox keys (alv_test_...) connect to an isolated sandbox environment with its own specialist pool and rate limits. Sandbox requests do not count against your production allocation. Production keys (alv_live_...) connect to your production hive.

Structuring dev / staging / production
# .env.development ALVEARE_API_KEY=alv_test_dev_abc123 # sandbox, personal dev key ALVEARE_BASE_URL=https://api.alveare.ai/v1 # same URL, key determines env # .env.staging ALVEARE_API_KEY=alv_test_staging_def456 # sandbox, shared staging key # .env.production ALVEARE_API_KEY=alv_live_prod_ghi789 # production, restricted permissions # Key permissions can be scoped: # - Specific specialists only (e.g., classify, summarise) # - Read-only (inference only, no config changes) # - IP allowlist (only accept from your server IPs) # - Time-limited (auto-expire after 90 days)

We recommend a minimum of three API keys: one personal sandbox key per developer, one shared staging key for integration tests, and one production key stored in your secrets manager (AWS Secrets Manager, HashiCorp Vault, or your CI/CD secrets). Production keys should be scoped to the minimum permissions required and rotated every 90 days.

Fits your workflow, not the other way around

Start a 7-day free trial and integrate Alveare into your existing pipelines. No infrastructure changes required.

Get Started Free