You're right to ask: if you're calling our API, how is your data protected? Here's exactly what happens -- and what doesn't.
The question everyone asks
"If I send my data to api.alveare.ai, aren't I sending it to a third party -- just like OpenAI?"
This is a legitimate concern, and we are not going to dismiss it with marketing language. Let us explain what actually happens.
Yes, your request travels over the internet to our API endpoint. Your HTTP request containing your prompt text is sent from your infrastructure to ours. In that narrow sense, your data does leave your network and arrive at a system managed by Alveare. This is true for any hosted API service -- AWS Lambda, Google Cloud Functions, Stripe's payment API, or Twilio's messaging API.
But what happens after your request arrives is fundamentally different from what happens at OpenAI, Anthropic, or any other shared-infrastructure LLM provider. The difference is not in the network layer. It is in the compute layer, the storage layer, and the data handling policy.
OpenAI vs Alveare: what actually happens
Here is exactly what happens to your data at each step, compared side-by-side.
The critical architectural difference is compute isolation. At OpenAI, your prompt is processed on a GPU that is simultaneously serving hundreds or thousands of other customers. At Alveare, your prompt is processed on a GPU dedicated to your workload. No other customer's data, model weights, or inference requests exist on that machine.
What happens to your data
This is not a theoretical distinction. It has practical implications for security audits, compliance certifications, and incident response. If there is a vulnerability in the inference stack, the blast radius at OpenAI includes every customer on that GPU cluster. At Alveare, the blast radius is limited to your hive.
What is the data boundary?
The "data boundary" is a set of architectural guarantees about how your data is handled during inference. It is not a brand name or a marketing concept. It is a specific set of technical controls that you can audit, verify, and hold us accountable to.
1. Isolation
Your hive runs on a dedicated GPU instance. No other customer's code, model weights, or data is on that machine. The instance is provisioned exclusively for your account, with its own memory space, its own model checkpoint, and its own network namespace. This is equivalent to a dedicated EC2 instance -- not a shared Lambda function.
2. No Prompt Logging
We log metadata for billing and monitoring: timestamp, token count, latency, specialist used, request ID, and HTTP status code. We do not log your prompt text, the model's response, or any content from your request body. This is enforced at the infrastructure level -- the logging pipeline has no access to the inference payload. Our logging system physically cannot capture prompt content because it receives only the metadata struct, not the request body.
3. No Training
Your data is never used to train, fine-tune, or improve any model -- ours or anyone else's. This is not an opt-out setting you need to remember to toggle. It is a structural guarantee. Our inference infrastructure has no training pipeline connected to it. There is no mechanism by which inference data could flow into a training job, because the systems are architecturally separate.
4. No Retention
After we return the response to you, the prompt and response are discarded from GPU memory. Nothing is written to disk. The VRAM used by your request is overwritten by the next inference operation. There is no buffer, no queue, no temporary file, and no database entry containing your prompt or response text. Once the HTTP response is sent, the data exists only in your systems.
5. Encryption
TLS 1.3 for all data in transit. Every connection to api.alveare.ai uses TLS 1.3 with modern cipher suites. KMS-managed encryption at rest for any stored data (model checkpoints, usage records, account information). Your prompts are never stored, so encryption at rest is not applicable to inference data -- but every other piece of stored data is encrypted with AES-256 via AWS KMS with automatic key rotation.
6. Region Control
You choose where your hive runs. US-East is available today. EU-West (Frankfurt) is planned and will enable full GDPR data residency compliance. When you select a region, your data stays in that region -- the API gateway, the GPU instance, and the metadata logs are all co-located. No cross-region data transfer occurs during inference.
"But you still see my data momentarily?"
The honest answer: yes, briefly.
For the few hundred milliseconds that your request is being processed, your prompt text exists in GPU VRAM on your dedicated instance. This is unavoidable -- the model needs to read your text to process it. You cannot run inference on data you cannot see.
Here is what makes this different from OpenAI:
It is your dedicated instance. No other customer's data is on that machine. The GPU, the CPU, the RAM, and the network interface are allocated to your account. This is single-tenant hardware, not a multi-tenant cluster.
It is in GPU VRAM only. Your prompt is loaded into GPU memory for inference. It is not written to disk. It is not stored in a database. It is not buffered in a message queue. It exists in volatile memory only, and only during the inference operation.
It is discarded after inference completes. When the model finishes generating the response, the VRAM used by your prompt is released and overwritten by the next operation. There is no retention period -- the data is gone as soon as the response is sent.
It is never logged, stored, or transmitted elsewhere. No copy of your prompt is sent to any logging system, analytics pipeline, or monitoring tool. The inference process is a dead end for your data -- it goes in, the result comes out, and nothing persists.
No human at Alveare ever sees it. Our operations team has access to infrastructure metrics (CPU usage, GPU utilisation, memory pressure, latency percentiles) but no access to inference payloads. Even if an engineer SSH'd into your instance (which would trigger an alert and require documented justification), the prompt data no longer exists in memory after inference completes.
This is the same security model as using AWS Lambda, Google Cloud Functions, or Azure Functions. Your code runs on their hardware momentarily, but it is isolated, ephemeral, and not accessible to the provider's employees. The difference between Alveare and OpenAI is that we treat inference like a stateless compute operation -- process and discard. OpenAI treats it like a data pipeline -- process, log, retain, and optionally train.
"What about the Shared Hive on Solo?"
This is an important distinction, and we want to be transparent about it.
Solo plan customers ($49/month) share a hive with other Solo users. This means multiple Solo customers' requests may be processed on the same GPU. Here is exactly what that means and what it does not mean:
Requests are processed sequentially, not concurrently. Your prompt is loaded into GPU memory, processed, and the result is returned before the next customer's request is loaded. At no point are two customers' prompts in GPU memory simultaneously. The inference engine processes one request at a time per GPU.
No prompt logging applies equally. Whether you are on the Solo plan or the Scale plan, we do not log prompt content. The no-logging guarantee is the same across all plans. It is an infrastructure-level control, not a plan-level feature.
No customer can see another customer's data. There is no shared state between requests. Each inference call starts with a clean context. The model has no memory of previous requests -- every call is independent. The specialist system prompt is injected fresh for each request.
The security model is equivalent to a shared web server. When you use any SaaS product, your HTTP requests are processed on servers that also process other customers' requests. The isolation is at the request level, not the hardware level. This is standard for shared infrastructure.
If infrastructure-level isolation is critical for your compliance requirements, the Starter plan ($499/month) and above provide a fully dedicated hive. Your inference runs on a GPU instance that serves only your account. For HIPAA, SOC 2, or any regulatory framework that requires dedicated compute, the Starter plan is the minimum we recommend.
To summarise the difference plainly: Solo gives you request-level isolation (same security guarantees around logging and retention, but shared hardware). Starter and above give you hardware-level isolation (a dedicated GPU that only processes your workload).
How do we verify this?
Trust but verify. We do not expect you to take our word for any of this. Here is how you can independently validate our data boundary claims.
SOC 2 Type II Audit
Independent third-party audit of our security controls, data handling, and operational practices. Currently in progress. Report will be available to customers and prospects under NDA upon completion.
Architecture Documentation
Detailed technical documentation of our infrastructure architecture is available for security review. We provide this to your security team during vendor evaluation. Request it at security@alveare.ai.
BAA for HIPAA
Business Associate Agreement available for HIPAA-covered entities and their business associates. The BAA covers all data processed through your dedicated hive on Starter plan and above.
DPA for GDPR
Data Processing Agreement available for organizations subject to GDPR. Covers data processing terms, subprocessor lists, data transfer mechanisms, and breach notification procedures.
Penetration Testing
We welcome responsible security research. If you want to conduct penetration testing against your own Alveare deployment, contact us to coordinate. We support responsible disclosure and have a published security policy.
Vendor Security Review
Send us your vendor security questionnaire. We complete SIG, CAIQ, and custom questionnaires regularly. Contact security@alveare.ai with your review requirements.
The bottom line
Here is the comparison that matters. Three options for running AI inference, evaluated honestly across the dimensions that your security and compliance team cares about.
OpenAI vs Self-Hosted vs Alveare
Dimension
OpenAI
Self-Hosted (vLLM)
Alveare
Data leaves your network
Yes
No
Yes (encrypted, ephemeral)
Shared infrastructure
Yes (multi-tenant GPUs)
No
No (Starter+)
Prompt logging
Yes (30 days minimum)
Your choice
No (metadata only)
Used for training
Opt-out required
No
Never
Data retention
30 days
Your choice
Zero
Compliance ready
Limited (SOC 2 only)
Full control
HIPAA / SOC 2 / GDPR
Ops burden
Zero
High (GPU mgmt, model serving, monitoring)
Zero
Monthly cost (100K req)
$3,000-5,000
$2,000+ plus eng time
$499
Time to production
Hours
Weeks to months
Minutes
Model version control
Vendor-controlled
Full control
Customer-controlled
The trade-off is clear. Self-hosting gives you maximum control but demands significant engineering investment and ongoing operational burden. OpenAI gives you zero ops burden but sends your data to shared infrastructure with logging, retention, and potential training use. Alveare gives you the operational simplicity of a managed API with the data isolation guarantees that approach self-hosting -- at a fraction of the cost of either alternative.
If your compliance requirements allow third-party API usage with proper controls (which most do -- your company already uses AWS, Stripe, and Twilio), Alveare provides the strongest data boundary available in a managed inference service. If your requirements prohibit any third-party data processing (rare, but some government and defense applications require this), self-hosting is your only option.
For everyone else -- which is the vast majority of companies we talk to -- the question is not whether to use a managed service. It is whether to use one that logs your data and trains on it, or one that does not.
Start your free trial
7 days, no credit card. Test the data boundary with your own workload. Review our architecture documentation.
Send us your vendor security questionnaire. We will earn your trust.