Knowledge Base Q&A

foundationatlasprotocol

Foundation + Atlas + Protocol composing to answer a question from an ingested corpus.

Step 1 — User Question Received

Foundation's AI adapter receives the user question and prepares to retrieve relevant context from Atlas.

typescript
// foundation/ai/adapter.ts
import { AIAdapter } from "foundation/ai/adapter";

const question = "What chunking strategy works best for long technical documents?";

const adapter = new AIAdapter({ model: "gpt-4o-mini" });

// Retrieve relevant context before answering
const chunks = await atlas.retrieve(question, { topK: 3 });

console.log("Question:", question);
console.log("Retrieving top-3 chunks from corpus...");
output
Question: "What chunking strategy works best for long technical documents?"
Retrieving top-3 chunks from corpus...

Step 2 — Atlas Retrieves Top-3 Chunks

Atlas queries the vector corpus and returns the three most relevant chunks with similarity scores and source citations.

typescript
// atlas/retrieval/query.ts
import { AtlasRetriever } from "atlas/retrieval/query";

const retriever = new AtlasRetriever({ corpus: "dblxl-docs" });

const chunks = await retriever.query(question, { topK: 3 });

chunks.forEach((c, i) => {
  console.log(`[${i + 1}] score=${c.score.toFixed(3)} source=${c.source}`);
  console.log("    ", c.text.slice(0, 80) + "...");
});
output
[1] score=0.941 source=chunking-strategies.md
     Recursive character splitting with a 512-token window and 64-token overlap
     performs best for long technical documents because...

[2] score=0.887 source=embedding-models.md
     When embedding technical prose, models fine-tuned on code and documentation
     (e.g. text-embedding-3-small) outperform general-purpose embeddings...

[3] score=0.812 source=retrieval-augmented-generation.md
     Hybrid retrieval combining dense vector search with BM25 sparse scoring
     reduces hallucination on domain-specific Q&A by up to 34%...

Step 3 — Protocol Validates Structured Answer

Protocol's schema validator confirms the LLM answer conforms to the QAAnswer contract before returning it.

typescript
// protocol/schema/validator.ts
import { validate } from "protocol/schema/validator";
import { QAAnswerSchema } from "protocol/schemas/qa";

const answer = await adapter.complete({
  system: "Answer using only the provided context. Cite sources.",
  user: question,
  context: chunks.map(c => c.text).join("\n\n"),
});

const structured = extractJSON(answer.prose, { schema: QAAnswerSchema });
const result = validate(structured, QAAnswerSchema);

console.log("valid:", result.valid);
output
valid: true

Final Result — Answer with Source Citations

The complete validated answer with confidence score and source citations ready for the client.

typescript
// Composed pipeline result
const qaResult = await knowledgeBaseQA(question);

console.log(JSON.stringify(qaResult, null, 2));
output
{
  "question": "What chunking strategy works best for long technical documents?",
  "answer": "Recursive character splitting with a 512-token window and 64-token overlap performs best for long technical documents. Pairing this with a hybrid retrieval strategy (dense + BM25) reduces hallucination on domain-specific Q&A by up to 34%.",
  "confidence": 0.94,
  "citations": [
    { "source": "chunking-strategies.md", "score": 0.941 },
    { "source": "retrieval-augmented-generation.md", "score": 0.812 }
  ],
  "pipeline": ["foundation/ai", "atlas/retrieval", "protocol/validator"]
}