Amazon Bedrock Architecture for Enterprise: Where Models Run, Where Data Flows, and How Custom Models Fit¶
For / Key Points
For: AI platform leads, infrastructure architects, and application teams designing generative AI on AWS.
Key Points:
- Bedrock puts multiple foundation models behind AWS governance primitives
- Data residency depends on In-Region, Geographic, or Global routing choices
- Custom models can be imported, but on-prem GPUs cannot become Bedrock's managed inference backend
For enterprises already standardized on AWS, generative AI is not only a model-quality question. The harder questions are where prompts travel, where logs are stored, which identity plane controls access, and how the network path fits existing VPC rules.
This article answers one architecture question: how should Amazon Bedrock be understood as an enterprise AI runtime, not just a model catalog?
Bedrock Is a Governed Entry Point¶
Amazon Bedrock is a managed service for accessing foundation models through AWS. The official documentation says Bedrock supports 100+ foundation models from providers including Amazon, Anthropic, DeepSeek, Moonshot AI, MiniMax, and OpenAI1.
The practical difference from calling ChatGPT API or Claude API directly is not only model capability. The difference is that authentication, billing, encryption, network control, and logging can fit the AWS operating model.
If your company already uses AWS Organizations, IAM Identity Center, CloudTrail, VPC, and KMS, Bedrock reduces the need to create a separate governance plane just for generative AI. That is the enterprise value.
Where Data Flows¶
Bedrock calls do not have to rely on public internet paths. With AWS PrivateLink, applications can connect to Amazon Bedrock through VPC interface endpoints2.
A simplified enterprise path looks like this.
flowchart LR
subgraph onprem["On-Prem / Corporate Network"]
USER_APP["Business App / RAG App"]
end
subgraph vpc["Customer VPC"]
APP["Lambda / ECS / EKS"]
EP["VPC Interface Endpoint<br/>AWS PrivateLink"]
end
subgraph bedrock["Amazon Bedrock"]
API["bedrock-runtime / Converse API"]
GUARD["Guardrails / KMS / Logging"]
ROUTER["Model Routing"]
end
subgraph compute["Inference Infrastructure"]
TRAINIUM["AWS Trainium / Inferentia"]
GPU["GPU Infrastructure"]
end
USER_APP -->|"Direct Connect / VPN"| APP
APP --> EP
EP --> API
API --> GUARD
GUARD --> ROUTER
ROUTER --> TRAINIUM
ROUTER --> GPUThe important part is ownership. Your team controls the application, VPC endpoint, IAM policy, logging design, and regional choice. AWS controls the managed Bedrock service plane and the underlying inference infrastructure.
That means you do not manage the EC2 instance that runs the model. You manage the path into Bedrock and the governance around it.
Regional Control Depends on Routing¶
Data residency in Bedrock depends on both the model and the inference routing option. AWS documents three options: In-Region, Geographic, and Global routing3.
| Routing option | Data residency model | Best fit |
|---|---|---|
| In-Region | Processed within one specified Region | Strict single-Region compliance requirements |
| Geographic | Routed within a geography such as Japan, EU, or US | Residency tied to a geography, not one Region |
| Global | Routed across commercial Regions | Performance and price matter more than location constraints |
So the correct statement is not "Tokyo Region always keeps everything local." The safer statement is: with In-Region routing, and if the required model is available in that Region, the workload can be designed as single-Region processing.
For Japanese enterprises, the architecture decision is often whether Tokyo alone is required, or whether Japan Geographic routing is acceptable. That decision should be made with legal, security, and cloud platform teams using the model availability table.
Where Models Actually Run¶
Behind Bedrock, AWS Trainium has become strategically important. In 2026, Amazon CEO Andy Jassy said that much of Bedrock inference runs on Trainium4.
AWS announced general availability of Trn2 instances in December 2024, describing 30-40% better price performance than GPU-based EC2 P5e and P5en instances5. The same announcement also notes Anthropic's use of large Trainium2 clusters to serve Claude for Bedrock customers.
The lesson is that Bedrock is not simply an API layer over NVIDIA GPUs. AWS is trying to control the inference cost curve with its own silicon.
For enterprise architecture, this matters in two ways. First, Bedrock price and capacity are tied to AWS's silicon strategy. Second, choosing a model provider is no longer separate from choosing the cloud infrastructure where inference runs.
Three Ways to Bring Your Own Model¶
If you already have a fine-tuned or internally trained model, Bedrock gives you several routes. The tradeoff is simple: more freedom usually means more work outside Bedrock; more operational integration means staying closer to Bedrock.
| Route | What it does | Best fit |
|---|---|---|
| Custom Model Import | Import supported model weights from S3 into Bedrock | You trained or customized a supported model elsewhere and want Bedrock serving |
| Bedrock customization | Fine-tune or continue pre-training inside Bedrock | You want the training and serving workflow closer to AWS governance |
| SageMaker training, Bedrock serving | Train in SageMaker, then import into Bedrock | You need training flexibility and managed inference integration |
Custom Model Import lets you import supported model architectures from Amazon S3. AWS documentation describes using Hugging Face model files and Safetensors-formatted weights for supported architectures6.
A minimal import shape looks like this.
aws bedrock create-model-import-job \
--job-name my-llama3-import \
--imported-model-name my-llama3-jp \
--role-arn arn:aws:iam::123456789012:role/BedrockImportRole \
--model-data-source '{"s3DataSource":{"s3Uri":"s3://my-model-bucket/llama3/"}}'
This does not mean every model can be imported. Architecture support, file format, serving mode, and throughput requirements still need to be checked before the architecture is approved.
On-Prem GPUs Do Not Become Bedrock Backends¶
Companies with existing on-prem GPU clusters often ask whether those GPUs can be attached behind Bedrock. The direct answer is no: on-prem GPUs cannot become the managed inference backend for Amazon Bedrock.
The practical design is routing by workload.
- Send highly sensitive workloads to on-prem or dedicated self-managed inference
- Send general internal AI and RAG workloads to Bedrock
- Train or fine-tune outside Bedrock, then import supported weights through S3
- Use Direct Connect or VPN as a network path, not as a way to move Bedrock compute on-prem
This distinction is easy to miss. A private network connection to AWS does not move Bedrock's managed compute into your data center.
If you need to preserve on-prem investment, build the routing decision in the application layer. Do not assume Bedrock can absorb every existing inference runtime.
Enterprise Checks Before Adoption¶
Model comparison tables are not enough. Before choosing Bedrock for production, check these architecture questions first.
- Is the required model available in the target Region and routing mode?
- How will PrivateLink, KMS, CloudTrail, and CloudWatch be designed?
- Do Guardrails, Knowledge Bases, and Agents have the same regional coverage?
- If using a custom model, are architecture support and throughput cost understood?
- Where do SageMaker, EKS, and on-prem GPU workloads stop and Bedrock begin?
If the team gets stuck here, the issue may not be Bedrock itself. The real issue may be that the organization has not yet defined which data can be sent to which model under which permissions.
Summary¶
Bedrock is not just a wrapper around LLM APIs. It is a way to place generative AI inside AWS governance, networking, logging, and infrastructure decisions.
One new implication follows from that: Bedrock adoption is not owned by the AI team alone. Cloud platform, security, legal, and workflow owners need to agree on what AWS will own, what the company will retain, and where the boundary is drawn.