Use Cases — Alveare

Customer Support Automation

SaaS E-commerce

The problem. A mid-size SaaS company receives 5,000 support tickets per month. Their current workflow uses OpenAI's GPT-3.5 Turbo to classify each ticket into categories (billing, technical, feature request, bug report) and then draft an initial response using their knowledge base. The system works, but the costs are adding up: approximately $3,200 per month in API calls. More critically, their compliance team has flagged a growing concern -- every customer complaint, account detail, and billing dispute is being sent to OpenAI's servers. For a company handling financial data, this is becoming a liability that keeps the CISO up at night.

The support team has also noticed latency issues during peak hours. OpenAI's rate limits throttle their requests, meaning tickets pile up during Monday morning surges when customers return from the weekend. The average classification time has crept up to 800ms during peak, with occasional timeouts that require manual intervention.

With Alveare. The team replaces their OpenAI integration with two Alveare specialists running on a dedicated hive:

A classify specialist categorises each incoming ticket into one of 12 categories with sub-200ms latency, even during peak load. No rate limiting, because the GPU is dedicated to their workload.
A chat specialist drafts a response by combining the ticket content with their internal knowledge base context. The specialist uses a tuned system prompt that matches their brand voice and includes common resolution steps.
All processing happens on their dedicated hive. No customer data -- complaints, account numbers, billing details -- ever leaves their inference boundary.

                classify_ticket.py
            
import requests

# Classify an incoming support ticket
response = requests.post(
    "https://api.alveare.ai/v1/infer",
    headers={
        "Authorization": "Bearer alv_live_your_key",
        "Content-Type": "application/json"
    },
    json={
        "specialist": "classify",
        "prompt": f"Classify this support ticket into exactly one category: "
                   f"billing, technical, feature_request, bug, account, "
                   f"cancellation, upgrade, integration, security, "
                   f"performance, documentation, other.\n\n"
                   f"Ticket: {ticket_text}\n\nCategory:",
        "max_tokens": 10,
        "temperature": 0.1
    }
)

result = response.json()
# {"result": "billing", "tokens_used": 4, "latency_ms": 127}

                draft_response.py
            
# Draft a response using the chat specialist with knowledge base context
response = requests.post(
    "https://api.alveare.ai/v1/infer",
    headers={
        "Authorization": "Bearer alv_live_your_key",
        "Content-Type": "application/json"
    },
    json={
        "specialist": "chat",
        "prompt": f"You are a support agent for Acme SaaS. "
                   f"Use the following knowledge base to draft a response.\n\n"
                   f"Knowledge base:\n{kb_context}\n\n"
                   f"Customer ticket:\n{ticket_text}\n\n"
                   f"Draft a helpful, concise response:",
        "max_tokens": 256,
        "temperature": 0.4
    }
)

result = response.json()
# {"result": "Hi Sarah, I can see the charge on your account...
#  "tokens_used": 198, "latency_ms": 487}

The Solo plan at $49/month handles up to 10,000 requests -- enough for the 5,000 classifications plus 5,000 response drafts. If the company scales beyond that, the Starter plan at $499/month provides 100,000 requests on a fully dedicated hive, which accommodates growth to 50,000 tickets per month without any architecture changes.

Result

93% cost reduction ($3,200/mo to $49-499/mo). Full HIPAA compliance with zero customer data leaving their boundary. Classification latency dropped from 800ms peak to consistent sub-200ms. No more rate limit throttling during Monday morning surges. The support team now auto-resolves 40% of tickets without human intervention.

Document Processing for Legal

Legal Tech

The problem. A mid-size law firm processes approximately 200 commercial contracts per week. Each contract needs key terms extracted: parties, effective dates, termination clauses, payment obligations, indemnification provisions, governing law, and assignment restrictions. Currently, junior associates and paralegals spend 3-4 hours per contract on this extraction work. That is 600-800 billable hours per week spent on document review that could be automated.

The firm evaluated OpenAI and other cloud LLM providers. The technology works -- GPT-4 extracts contract terms with high accuracy. But the firm's managing partner shut it down immediately. Attorney-client privilege is not negotiable. Sending client contracts to any third-party API creates a privilege waiver risk. If opposing counsel discovers that privileged documents were transmitted to OpenAI's servers, the consequences range from sanctions to malpractice claims. The firm's professional liability insurer has explicitly stated that use of external AI APIs for client documents is not covered under their current policy.

With Alveare. The firm deploys an extraction pipeline using Alveare's dedicated hive infrastructure:

An extract specialist processes each contract and returns structured JSON with all key terms, clauses, and obligations identified and categorised.
Data never touches shared infrastructure. The firm's dedicated hive runs on an isolated GPU instance with no prompt logging. Contract text exists in GPU memory only during the inference call -- typically 1-3 seconds -- and is discarded immediately after.
The firm's IT security team audited the data flow and confirmed it meets their information governance requirements. No BAA was needed because no PHI is involved, but the architecture would support one if required.

                extract_contract.py
            
import requests, json

# Extract structured terms from a commercial contract
response = requests.post(
    "https://api.alveare.ai/v1/infer",
    headers={
        "Authorization": "Bearer alv_live_your_key",
        "Content-Type": "application/json"
    },
    json={
        "specialist": "extract",
        "prompt": f"Extract the following fields from this contract "
                   f"as a JSON object: parties (array), effective_date, "
                   f"termination_date, termination_clauses (array), "
                   f"payment_obligations (array with amount and schedule), "
                   f"indemnification (summary), governing_law, "
                   f"assignment_restrictions.\n\n"
                   f"Contract:\n{contract_text}",
        "max_tokens": 512,
        "temperature": 0.1
    }
)

result = response.json()
# Response:
# {
#   "result": {
#     "parties": ["Acme Corp (Licensor)", "Widget Inc (Licensee)"],
#     "effective_date": "2025-01-15",
#     "termination_date": "2028-01-14",
#     "termination_clauses": [
#       "Either party may terminate with 90 days written notice",
#       "Immediate termination for material breach after 30-day cure"
#     ],
#     "payment_obligations": [
#       {"amount": "$150,000", "schedule": "quarterly", "due": "net 30"}
#     ],
#     "indemnification": "Mutual indemnification for third-party IP claims",
#     "governing_law": "State of Delaware",
#     "assignment_restrictions": "No assignment without prior written consent"
#   },
#   "tokens_used": 387,
#   "latency_ms": 1243
# }

The firm processes 200 contracts per week, generating approximately 400 API calls (extraction plus a verification pass). The Starter plan at $499/month handles this volume comfortably within the 100,000 monthly request allocation. The extracted data feeds into their contract management system, where attorneys review and approve the AI-generated summaries rather than performing the extraction manually.

Result

What took paralegals 3-4 hours per contract now takes 30 seconds. The firm reclaimed approximately 700 billable hours per week. Zero attorney-client privilege risk -- contract text never reaches any shared infrastructure. The managing partner approved the system after reviewing the data boundary architecture. Extraction accuracy: 94% on first pass, with attorneys reviewing edge cases.

Real-Time Content Moderation

Social Media Marketplaces Forums

The problem. An online marketplace processes 50,000 user-generated product listings per day. Each listing needs to be checked for prohibited content (counterfeit goods, regulated items, scams), policy violations (misleading descriptions, fake reviews), and legal compliance (banned substances, age-restricted items). The marketplace currently uses OpenAI's moderation endpoint combined with custom GPT-3.5 classification. Three compounding issues are making this unsustainable:

Cost: At 50,000 listings/day with an average of 2 API calls per listing (classification + detailed review for flagged items), the monthly bill exceeds $15,000. During promotional events, listing volume doubles and costs spike proportionally.
Latency: OpenAI's rate limits create a bottleneck. During peak hours (evenings and weekends), the moderation queue backs up. Listings wait 5-10 minutes before going live, which frustrates sellers and drives them to competing platforms.
GDPR compliance: The marketplace operates in the EU. User-generated content -- including seller names, addresses, and product descriptions -- is being sent to OpenAI's US-based servers. Their Data Protection Officer has flagged this as a Schrems II issue that needs resolution before the next audit.

With Alveare. The marketplace deploys a high-throughput moderation pipeline on Alveare's Scale plan:

A classify specialist runs first-pass moderation on every listing. It categorises content as approved, flagged_for_review, or rejected, with a confidence score and category label.
On the Scale plan with 10 dedicated hives, the system processes 500 requests per second -- far beyond what the marketplace needs, with headroom for 10x growth.
Sub-200ms latency means content is moderated before it appears on the site. No more queuing delays.
No user content is shared with third parties, resolving the GDPR concern entirely. When EU-West region availability launches, the marketplace will pin their hive to Frankfurt for data residency compliance.

                moderate_listing.sh
            
# Moderate a product listing in real-time
curl -X POST https://api.alveare.ai/v1/infer \
  -H "Authorization: Bearer alv_live_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "specialist": "classify",
    "prompt": "You are a content moderator for an online marketplace. Classify this listing.\n\nCategories:\n- approved: listing is safe and compliant\n- flagged_for_review: needs human review (ambiguous or borderline)\n- rejected: clearly violates policy (counterfeit, prohibited, scam)\n\nAlso provide: category_reason, confidence (0-1), policy_violated (if any).\n\nRespond as JSON.\n\nListing title: Brand New Rolex Submariner - Only $99!\nDescription: Amazing deal on authentic luxury watch. Ships from China. No box or papers. Limited quantity available. Buy now before they are gone!\n\nClassification:",
    "max_tokens": 128,
    "temperature": 0.1
  }'

# Response:
# {
#   "result": {
#     "classification": "rejected",
#     "category_reason": "Suspected counterfeit luxury goods",
#     "confidence": 0.96,
#     "policy_violated": "counterfeit_goods"
#   },
#   "tokens_used": 67,
#   "latency_ms": 143
# }

The Scale plan at $2,999/month replaces $15,000+ in OpenAI costs. During promotional events, the dedicated hives absorb the traffic increase without rate limiting or additional charges -- the flat monthly price covers 2 million requests, regardless of when they arrive.

Result

Real-time moderation with sub-200ms latency -- listings go live instantly after passing moderation. 80% cost reduction ($15,000/mo to $2,999/mo). GDPR-compliant content processing with no user data sent to third parties. Seller satisfaction scores improved 23% after eliminating the publication queue. Counterfeit listings caught at the gate increased 31% due to faster, more consistent AI moderation.

Code Review & Documentation

Engineering Teams

The problem. A 40-person engineering team at a fintech company wants AI-assisted code review on every pull request. They also want to auto-generate documentation -- function docstrings, API endpoint descriptions, and changelog entries -- from code changes. The engineering manager evaluated GitHub Copilot, but the security team rejected it. Their codebase contains proprietary trading algorithms, risk models, and regulatory compliance logic. Sending source code to Microsoft/OpenAI servers violates their information security policy and their SOC 2 controls around intellectual property protection.

The team also considered self-hosting an open-source model (CodeLlama, StarCoder) on their own GPU infrastructure. The estimated cost: $8,000/month in cloud GPU instances, plus 2-3 months of engineering time to build the serving infrastructure, prompt engineering, and CI/CD integration. The engineering manager does not have the headcount to dedicate two engineers to an internal AI infrastructure project.

With Alveare. The team deploys two specialists and integrates them directly into their development workflow:

A code specialist reviews diffs on every pull request. It identifies potential bugs, suggests improvements, flags security issues, and checks for style consistency. The specialist's system prompt is configured with the team's coding standards and common pitfalls specific to their Python/Rust codebase.
A summarise specialist generates docstrings for new functions, updates API documentation when endpoint signatures change, and writes changelog entries from commit messages.
A VS Code extension lets developers right-click any function and select "Explain this function" or "Generate docstring" -- powered by the same Alveare API.
A GitHub Action runs on every PR, posting the code review as a comment within 30 seconds of the PR being opened.
All source code stays within the deployment boundary. The company's proprietary algorithms never leave their inference environment.

                .github/workflows/alveare-review.yml
            
name: AI Code Review
on: [pull_request]

jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0
      - name: Get diff
        run: git diff origin/main...HEAD > diff.txt
      - name: AI Review
        run: |
          curl -s -X POST https://api.alveare.ai/v1/infer \
            -H "Authorization: Bearer ${{ secrets.ALVEARE_API_KEY }}" \
            -H "Content-Type: application/json" \
            -d "$(jq -n --arg diff "$(cat diff.txt)" '{
              specialist: "code",
              prompt: ("Review this code diff. Flag bugs, security issues, and style violations. Suggest improvements.\n\nDiff:\n" + $diff),
              max_tokens: 1024,
              temperature: 0.3
            }')" | jq -r '.result' > review.md
      - name: Post review comment
        uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs');
            const review = fs.readFileSync('review.md', 'utf8');
            github.rest.issues.createComment({
              owner: context.repo.owner,
              repo: context.repo.repo,
              issue_number: context.issue.number,
              body: `## AI Code Review\n\n${review}`
            });

The Professional plan at $1,499/month gives the team 3 dedicated hives and 500,000 requests per month. With 40 engineers averaging 3 PRs per week, plus ad-hoc documentation generation and VS Code queries, they use approximately 15,000-20,000 requests per month -- well within the allocation. The remaining capacity handles batch documentation regeneration runs when they refactor large modules.

Result

Every PR gets AI-powered code review within 30 seconds of opening. Documentation stays current automatically -- when a function signature changes, the docstring updates in the same PR. The team catches 15-20% more bugs before code reaches the main branch. Total cost: $1,499/month vs the estimated $8,000/month for self-hosted infrastructure plus 2-3 months of engineering setup time. Source code and proprietary algorithms never leave the deployment boundary.

Healthcare Data Analysis

Healthtech

The problem. A health-tech startup builds clinical workflow software for outpatient clinics. Their platform processes patient intake forms, lab reports, and clinical notes. Physicians want three AI-powered features: structured data extraction from handwritten and dictated clinical notes, patient history summarisation before appointments, and a Q&A interface to quickly answer questions about a patient's medication history or prior diagnoses.

The technical challenge is straightforward -- these are well-understood NLP tasks. The compliance challenge is the hard part. HIPAA requires that all entities processing Protected Health Information (PHI) operate within controlled boundaries. OpenAI is not a HIPAA-eligible service for most use cases. Even if OpenAI signs a BAA, the shared infrastructure model raises questions during audits. The startup's compliance counsel has advised that sending PHI to any shared inference endpoint creates a risk that no BAA can fully mitigate, because the data co-resides (however briefly) on infrastructure serving thousands of other customers.

Self-hosting is not viable either. The startup has 12 engineers. They cannot afford to dedicate 2-3 people to building and maintaining GPU infrastructure, model serving, failover, and monitoring. They need a managed service that meets HIPAA requirements.

With Alveare. The startup deploys three specialists on a dedicated hive with a Business Associate Agreement:

An extract specialist pulls structured data from clinical notes -- diagnoses (ICD-10 codes), medications (with dosages and frequencies), vital signs, lab results, and follow-up instructions. Output is structured JSON that feeds directly into their EHR integration.
A summarise specialist creates concise patient summaries for physicians before appointments. It condenses months of visit notes, lab results, and medication changes into a one-page brief that takes 30 seconds to read instead of 15 minutes of chart review.
A qa specialist answers natural-language questions about patient history. "When was the last time this patient's A1C was checked?" or "What medications has this patient tried for hypertension?" -- answered in under a second with citations to specific chart entries.
The dedicated hive runs with no prompt logging. Metadata (timestamp, token count, latency, specialist used) is logged for billing, but PHI never appears in any log, database, or monitoring system. Data is encrypted at rest with KMS. The hive runs in a US-East region with HIPAA-eligible infrastructure.

                extract_clinical_note.py
            
import requests

# Extract structured data from a clinical note
response = requests.post(
    "https://api.alveare.ai/v1/infer",
    headers={
        "Authorization": "Bearer alv_live_your_key",
        "Content-Type": "application/json"
    },
    json={
        "specialist": "extract",
        "prompt": f"Extract structured medical data from this clinical note. "
                   f"Return JSON with: diagnoses (array of ICD-10 codes with "
                   f"descriptions), medications (name, dose, frequency, route), "
                   f"vitals (BP, HR, temp, weight), labs (test, value, reference), "
                   f"follow_up (instructions and timeframe).\n\n"
                   f"Clinical note:\n{clinical_note}",
        "max_tokens": 512,
        "temperature": 0.1
    }
)

# Response includes structured JSON:
# {
#   "diagnoses": [
#     {"icd10": "E11.9", "description": "Type 2 diabetes mellitus without complications"},
#     {"icd10": "I10", "description": "Essential hypertension"}
#   ],
#   "medications": [
#     {"name": "Metformin", "dose": "1000mg", "frequency": "twice daily", "route": "oral"},
#     {"name": "Lisinopril", "dose": "20mg", "frequency": "once daily", "route": "oral"}
#   ],
#   "vitals": {"bp": "138/88", "hr": 76, "temp": "98.4F", "weight": "198 lbs"},
#   "follow_up": {"instructions": "Recheck A1C in 3 months", "timeframe": "90 days"}
# }

The startup uses the Starter plan at $499/month. Their 15 clinic customers generate approximately 8,000 patient encounters per month, each requiring 2-3 API calls (extraction, summarisation, and occasional Q&A queries). Total monthly usage: approximately 20,000-24,000 requests, well within the 100,000 allocation.

Result

AI-powered clinical workflows without HIPAA violations. Physicians save 10-15 minutes per patient encounter on chart review. Structured data extraction accuracy: 91% on first pass (physicians review and correct edge cases). The startup passed their SOC 2 Type II audit with the Alveare integration fully documented. BAA in place. No PHI logged, stored, or transmitted outside the inference boundary.

Financial Report Analysis

Fintech Investment

The problem. A boutique investment firm analyses approximately 500 corporate earnings reports per quarter -- that is roughly 170 reports per month, with heavy clustering around earnings season when 300+ reports drop within a two-week window. Each 10-K or earnings call transcript needs three things: a concise summary (3-5 bullet points of what matters), sentiment classification (bullish, bearish, or neutral with confidence), and key metric extraction (revenue, gross margin, operating margin, EPS, and forward guidance numbers pulled into a structured format for their financial models).

Currently, a team of four junior analysts spends earnings season manually reading reports and populating spreadsheets. Each report takes 45-60 minutes. During peak weeks, the team works 14-hour days and still falls behind. The firm tried automating with OpenAI, and the accuracy was excellent. But the Chief Compliance Officer raised two objections: first, the firm's proprietary analysis prompts (which encode their investment thesis and what they consider material) are themselves confidential. Sending those prompts to OpenAI risks exposing their analytical methodology. Second, processing earnings data through external APIs creates a potential information leakage vector that conflicts with their SOX compliance controls and internal data handling policies.

With Alveare. The firm deploys three specialists that form an automated earnings analysis pipeline:

A summarise specialist condenses 50-page earnings reports and call transcripts to 3-5 actionable bullet points. The system prompt encodes the firm's analytical framework -- what they consider material information, how they weight revenue growth vs margin expansion, and their sector-specific criteria.
A classify specialist determines sentiment: bullish, bearish, or neutral, with a confidence score and the key factors driving the classification. The specialist's system prompt includes the firm's proprietary sentiment framework, which weights management tone, guidance language, and comparable period metrics differently than generic sentiment analysis.
An extract specialist pulls revenue, gross margin, operating margin, EPS (GAAP and non-GAAP), free cash flow, and forward guidance numbers into structured JSON. The output feeds directly into their Excel models via a Python script that populates template spreadsheets.
During earnings season, all 500 reports are processed in a single batch run that completes in under two hours. The pipeline runs overnight, and analysts arrive to find populated spreadsheets waiting for their review.

                analyse_earnings.py
            
import requests

# Full earnings analysis pipeline: summarise + classify + extract

def analyse_report(report_text, ticker):
    base_url = "https://api.alveare.ai/v1/infer"
    headers = {
        "Authorization": "Bearer alv_live_your_key",
        "Content-Type": "application/json"
    }

    # Step 1: Summarise
    summary = requests.post(base_url, headers=headers, json={
        "specialist": "summarise",
        "prompt": f"Summarise this earnings report in 3-5 bullet points. "
                  f"Focus on: revenue trajectory, margin changes, guidance "
                  f"revisions, and material risks.\n\n{report_text}",
        "max_tokens": 300
    }).json()

    # Step 2: Sentiment classification
    sentiment = requests.post(base_url, headers=headers, json={
        "specialist": "classify",
        "prompt": f"Classify the sentiment of this earnings report: "
                  f"bullish, bearish, or neutral. Provide confidence (0-1) "
                  f"and key_factors (array). Respond as JSON.\n\n{report_text}",
        "max_tokens": 128,
        "temperature": 0.1
    }).json()

    # Step 3: Metric extraction
    metrics = requests.post(base_url, headers=headers, json={
        "specialist": "extract",
        "prompt": f"Extract from this earnings report as JSON: revenue, "
                  f"revenue_yoy_growth, gross_margin, operating_margin, "
                  f"eps_gaap, eps_non_gaap, free_cash_flow, "
                  f"guidance_revenue_low, guidance_revenue_high, "
                  f"guidance_eps_low, guidance_eps_high.\n\n{report_text}",
        "max_tokens": 256,
        "temperature": 0.1
    }).json()

    return {
        "ticker": ticker,
        "summary": summary["result"],
        "sentiment": sentiment["result"],
        "metrics": metrics["result"]
    }

# Process all reports in batch
for report in quarterly_reports:
    result = analyse_report(report["text"], report["ticker"])
    populate_spreadsheet(result)
    print(f"{result['ticker']}: {result['sentiment']}")

The firm uses the Professional plan at $1,499/month. Each report requires 3 API calls (summarise, classify, extract), so 500 reports generate 1,500 requests per quarter. Even with ad-hoc analysis queries throughout the quarter, they use fewer than 10,000 requests per month -- well within the 500,000 allocation. They chose the Professional plan for the custom specialist capability, which lets them fine-tune the system prompts to match their proprietary analytical framework.

Result

What took four analysts a combined 500+ hours per quarter now takes under two hours of compute time plus 30 minutes of review per analyst. During earnings season, results are ready by 6 AM the next morning. The firm's proprietary analysis prompts and investment methodology never leave the inference boundary. Compliant with SOX internal controls and the firm's data handling policies. Zero data exposure risk.

What can you build with Alveare?

Customer Support Automation

Result

Document Processing for Legal

Result

Real-Time Content Moderation

Result

Code Review & Documentation

Result

Healthcare Data Analysis

Result

Financial Report Analysis

Result

What will you build?