← All posts

Building an AI Chatbot with AWS Bedrock

February 15, 2026

The chatbot on this portfolio — Gueka — answers questions about my background using AWS Bedrock. You ask it something, it searches a knowledge base of my resume and experience, and responds as a slightly-too-confident AI assistant with the personality of Star-Lord. Getting there was a good education in the messier corners of serverless AI on AWS.

Here's what I learned building it, how the knowledge base works, and how I put guardrails in place so it doesn't go off-script.


Architecture Overview

The stack is deliberately lean: Bedrock (Claude Haiku) for inference, a Bedrock Knowledge Base backed by S3 for retrieval, Lambda for the API, Cognito for auth, and DynamoDB for rate limiting. Everything is defined as infrastructure-as-code with AWS SAM.

The flow: the frontend sends a message with a Cognito JWT → Lambda verifies the token → checks the rate limit → retrieves relevant knowledge base chunks → calls the model with those chunks as context → returns a response.


The Knowledge Base

The chatbot answers from a curated markdown file covering work history, skills, projects, education, and current focus. It's written in natural language paragraphs rather than bullet-point resume format. That matters for RAG: the vector store chunks by paragraph, so conversational prose retrieves better than terse bullets.

Updating it is straightforward — edit the file, sync it to S3, and trigger a re-sync in the Bedrock console to rebuild the vector index. No model retraining, no redeployment.

One tuning detail worth knowing: without capping and deduplicating retrieval results, a single large document can crowd out everything else. I limit how many chunks can come from the same source, so the chatbot doesn't become myopic about one part of my background when the question is broader.


Guardrails

Left unguarded, the chatbot would answer anything — write code for people, give free architecture advice, go off on unrelated tangents. I put two layers in place.

Infrastructure level: Bedrock has a native Guardrails feature. I use it to block off-topic requests — things like "write me a function" or "debug this script". When a guardrail fires, instead of an empty response or an error, the Lambda returns one of several pre-written deflection messages. They're in-character — things like "My legal team says I can't help with that. But I can tell you about Marco's work." A 200 with personality beats a silent error.

Model level: Even before the guardrail gets a chance to fire, the model receives a system prompt that defines its scope and personality. This catches edge cases the topic policy misses — softer requests that don't match the guardrail patterns but still wander off-topic. The general principle: constrain the model to what you actually want it to do, give it a voice, and be explicit about what it should never do.

There's also a server-side input length cap as a third layer — defense in depth against clients that bypass the frontend UI. Trusting only the client is naive.


Key Lessons

Hoist expensive initialization out of the handler. The JWT verifier fetches Cognito's public keys on first use and caches them. Create it at module scope and it persists across warm Lambda invocations. Create it inside the handler and every invocation pays the initialization cost. Small thing, meaningful difference at scale.

CORS should be per-origin, not wildcard. A JWT-gated API with a wildcard CORS header is contradictory — the browser will happily send the auth token from any page. Computing the allowed origin dynamically per request, only returning it if it matches the allow-list, is the right pattern.

Rate limiting needs atomic writes. A naive read-then-write pattern has a race condition: two concurrent requests both read the same count, both increment, and both get through. The fix is a conditional update — the increment only applies if the limit hasn't already been hit.

Use TTL for counter resets, not a scheduled job. Rather than a cron Lambda to reset rate limit counters, I store an expiry timestamp on each counter record. DynamoDB deletes it automatically when it expires. The next request after midnight creates a fresh counter. Zero ops overhead.

Split retrieval from generation. Bedrock has an API that does both steps in one call. I chose to split them — retrieve first, then generate with the results injected into the context. The split gives control: deduplicate results, inspect what was retrieved, format the context exactly as needed. The combined API is a black box.


What I'd Do Differently

The knowledge base has evolved since the initial build — it now has several documents covering different aspects of my background, not just the professional resume. That made a noticeable difference in response quality and range. If I were starting over, I'd invest in that content structure earlier rather than treating it as an afterthought.

Conversation history is also something I'd reconsider from the start. The current implementation doesn't persist history — each message is stateless. That keeps things simple and costs predictable, but it means the chatbot has no memory within a conversation. For a portfolio context that's an acceptable trade-off, but worth thinking through upfront rather than discovering it later.