Workflow Integration

CI/CD Integration

Use Alveare in your continuous integration pipelines to automate classification, summarization, and extraction tasks as part of your build process. The CLI installs in seconds and authenticates via environment variable, making it a drop-in addition to GitHub Actions, GitLab CI, Jenkins, or any CI system that runs shell commands.

                .github/workflows/classify-prs.yml -- Auto-classify PR descriptions on every push
            
name: Classify PR
on: [pull_request]

jobs:
  classify:
    runs-on: ubuntu-latest
    steps:
      - name: Install Alveare CLI
        run: curl -fsSL https://get.alveare.ai | sh

      - name: Classify PR description
        env:
          ALVEARE_API_KEY: ${{ secrets.ALVEARE_API_KEY }}
        run: |
          LABEL=$(echo "${{ github.event.pull_request.body }}" | \
            alveare infer --specialist classify --max-tokens 20)
          echo "PR classified as: $LABEL"
          # Apply the label to the PR via GitHub API
          gh pr edit ${{ github.event.number }} --add-label "$LABEL"

                .gitlab-ci.yml -- Summarise changelog on release
            
summarise-release:
  stage: release
  script:
    - curl -fsSL https://get.alveare.ai | sh
    - git log --oneline $(git describe --tags --abbrev=0 HEAD~1)..HEAD > changes.txt
    - cat changes.txt | alveare infer --specialist summarise --max-tokens 512 > RELEASE_NOTES.md
  artifacts:
    paths:
      - RELEASE_NOTES.md
  only:
    - tags

The CLI authenticates via the ALVEARE_API_KEY environment variable, so there is no interactive login step in CI. Use a dedicated API key for CI with restricted permissions -- read-only access to specific specialists -- so a compromised key cannot modify your configuration.

Webhook Integration

Alveare sends real-time webhook notifications for billing events, usage alerts, and specialist health changes. Configure endpoints in the dashboard and receive HMAC-signed payloads that your backend can verify and act on automatically.

                Example webhook payload: usage threshold alert
            
POST https://your-app.com/webhooks/alveare
Content-Type: application/json
X-Alveare-Signature: sha256=a1b2c3d4e5f6...

{
  "event": "usage.threshold",
  "timestamp": "2026-03-17T14:30:00Z",
  "data": {
    "plan": "professional",
    "current_requests": 450000,
    "limit": 500000,
    "percentage": 90,
    "projected_overage": 75000,
    "projected_overage_cost": "$187.50",
    "days_remaining": 8
  }
}

Supported Webhook Events

usage.threshold -- Triggered at configurable usage percentages (50%, 75%, 90%, 100%). Includes projected overage costs so you can upgrade proactively.
billing.payment_succeeded -- Confirms monthly or annual payment was processed. Includes receipt URL and next billing date.
billing.payment_failed -- Payment attempt failed. Includes failure reason, retry schedule, and days until service suspension.
specialist.health_degraded -- A specialist's latency or error rate exceeded its configured threshold. Includes current metrics and threshold values.
specialist.restarted -- The supervision tree restarted a specialist after a crash. Includes crash reason, restart count, and current health status.
hive.scaling -- Your hive scaled up or down in response to traffic. Includes old and new instance counts and trigger reason.

All payloads include an HMAC-SHA256 signature computed with your webhook secret. Verify the signature before processing any event to prevent spoofed requests. The signing secret is generated when you create the webhook endpoint and can be rotated at any time without downtime.

Batch Processing

Process thousands of documents, support tickets, or records in a single command. The Alveare CLI and SDKs support batch mode with configurable concurrency, progress tracking, and automatic error handling. Failed items are retried automatically and logged separately for manual review.

                Batch processing with the CLI
            
# Process a CSV: classify every support ticket
alveare batch \
  --specialist classify \
  --input tickets.csv \
  --column description \
  --output classified.csv \
  --concurrency 50

Processing 12,847 rows...
[========================================] 100% | 12,847/12,847 | 2m 14s
Results written to classified.csv
Errors: 0 | Cache hits: 1,923 (15.0%)

# Process a directory of PDFs: extract key terms
alveare batch \
  --specialist extract \
  --input-dir ./contracts/ \
  --pattern "*.pdf" \
  --output-dir ./extracted/ \
  --output-format json

                Batch processing with the Python SDK
            
from alveare import Alveare

client = Alveare(api_key="alv_live_abc123...")

# Process 10,000 support tickets with progress callback
results = client.infer_batch(
    specialist="classify",
    prompts=ticket_descriptions,  # list of 10,000 strings
    max_tokens=20,
    concurrency=50,
    on_progress=lambda done, total: print(f"{done}/{total}")
)

# Results maintain input order
for ticket, result in zip(ticket_descriptions, results):
    db.update_ticket(ticket.id, category=result.text)

Streaming

For chat interfaces and real-time applications, Alveare supports server-sent events (SSE) streaming. Tokens are delivered as they are generated, so your users see the response forming in real time rather than waiting for the full completion. Both SDKs and the API support streaming natively.

                Python streaming
            
from alveare import Alveare

client = Alveare(api_key="alv_live_abc123...")

# Stream tokens as they are generated
for chunk in client.infer_stream(
    specialist="chat",
    prompt="Explain the benefits of private inference for healthcare",
    max_tokens=512
):
    print(chunk.text, end="", flush=True)
    # Each chunk: {"text": "Private", "token_index": 0, "finish_reason": null}

# Or with the OpenAI-compatible endpoint
import openai
client = openai.OpenAI(base_url="https://api.alveare.ai/v1", api_key="alv_live_...")
stream = client.chat.completions.create(
    model="chat",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True
)
for chunk in stream:
    print(chunk.choices[0].delta.content, end="")

Streaming uses the standard SSE protocol, so it works with any HTTP client that supports chunked transfer encoding. Time to first token (TTFT) is typically 50-80ms for a 7B model, meaning your users see the response start almost immediately.

Built-in Response Caching

Alveare includes an automatic response cache that stores results for identical requests. If the same specialist receives the same prompt with the same parameters within the cache TTL, the cached response is returned without running inference. This reduces costs by 15-30% for workloads with repetitive inputs and drops latency to under 10ms for cache hits.

How It Works

Cache key is a SHA-256 hash of: specialist name + prompt text + all generation parameters (temperature, max_tokens, top_p, etc.)
Default TTL is 1 hour. Configurable per specialist from 0 (disabled) to 24 hours
Cache hit rate varies by workload: classification tasks see 20-30% hit rates, summarization sees 10-15%, chat sees near 0%
Cache invalidation happens automatically on specialist configuration change (system prompt update, parameter change)
Cached responses include a cached: true flag and do not count against your request allocation

For a Professional plan customer processing 500K requests/month with a 20% cache hit rate, that is 100,000 free requests per month. At the Starter plan's overage rate of $4.00/1K, that is $400/month in effective savings from caching alone.

Rate Limit Handling

All Alveare SDKs include automatic rate limit handling with configurable retry behavior. When your request hits the rate limit, the SDK reads the Retry-After header and waits the appropriate time before retrying. No request is dropped unless you exceed the maximum retry count.

                Configuring retry behavior
            
from alveare import Alveare, RetryConfig

client = Alveare(
    api_key="alv_live_abc123...",
    retry=RetryConfig(
        max_retries=5,           # default: 3
        initial_backoff=0.5,     # seconds, default: 1.0
        max_backoff=30,          # seconds, default: 60
        backoff_multiplier=2,   # exponential, default: 2
        retry_on=[429, 500, 502, 503]  # HTTP status codes to retry
    )
)

# Rate limits by plan:
# Starter:       100 req/s sustained, 200 req/s burst
# Professional:  500 req/s sustained, 1000 req/s burst
# Scale:         2000 req/s sustained, 5000 req/s burst

Rate limits are applied per API key using a token bucket algorithm. Sustained limits refill continuously; burst limits allow short spikes above the sustained rate. Both limits are visible in the response headers (X-RateLimit-Remaining, X-RateLimit-Reset) so you can implement client-side throttling if needed.

Environment Management

Alveare supports separate environments for development, staging, and production through API key scoping and sandbox mode. Each environment can have its own API keys, specialist configurations, and usage limits, so you never accidentally send production traffic to a test endpoint or vice versa.

Sandbox vs Production Keys

Sandbox keys (alv_test_...) connect to an isolated sandbox environment with its own specialist pool and rate limits. Sandbox requests do not count against your production allocation. Production keys (alv_live_...) connect to your production hive.

                Structuring dev / staging / production
            
# .env.development
ALVEARE_API_KEY=alv_test_dev_abc123       # sandbox, personal dev key
ALVEARE_BASE_URL=https://api.alveare.ai/v1  # same URL, key determines env

# .env.staging
ALVEARE_API_KEY=alv_test_staging_def456    # sandbox, shared staging key

# .env.production
ALVEARE_API_KEY=alv_live_prod_ghi789      # production, restricted permissions

# Key permissions can be scoped:
# - Specific specialists only (e.g., classify, summarise)
# - Read-only (inference only, no config changes)
# - IP allowlist (only accept from your server IPs)
# - Time-limited (auto-expire after 90 days)

We recommend a minimum of three API keys: one personal sandbox key per developer, one shared staging key for integration tests, and one production key stored in your secrets manager (AWS Secrets Manager, HashiCorp Vault, or your CI/CD secrets). Production keys should be scoped to the minimum permissions required and rotated every 90 days.