Do I need a trained model first, or can the retrieval layer stand alone with a wrapper?

It can stand alone with a wrapper. The retrieval layer is independent of the inference layer — the same ingestion pipelines, vector store, and citation rendering work whether the inference is a fine-tuned customer-owned model or a third-party API. The honest scope is named on day one; many buyers start retrieval-first and add the fine-tune later when the corpus stabilizes.

Which vector store do you recommend?

For buyers already on Postgres, pgvector — it keeps the vector data inside the existing database operations boundary. For larger corpora or stricter latency budgets, Qdrant or Weaviate self-hosted. Milvus for the largest scale. The recommendation depends on the corpus size, the latency target, the existing platform stack, and the operations team's comfort. The decision is scoped before any code.

How do you prevent the model from hallucinating sources?

Citation-first retrieval. The response template requires the model to attach the retrieved passage and the source document to every answer; the prompt structure does not give it room to fabricate a source. Refusal is the fallback when retrieval returns nothing useful — the model says it does not know rather than inventing an answer. The eval harness scores citation correctness alongside answer correctness.

Knowledge Base Ingestion & Retrieval

The auditable retrieval layer that complements your trained model — answers cite their sources, sources are versioned, the corpus is yours.

Retrieval is the layer that lets the trained model answer questions about facts the corpus updates faster than retraining cycles can keep up. Product specs change weekly. Pricing changes monthly. New articles ship daily. The trained model is the voice and the reasoning; the retrieval layer is the source-of-truth handoff.

The buyer for this page is a knowledge-management or support-operations leader whose answers must be auditable. Regulated industries, technical-product support, anywhere "where did the bot get that from" is a question the auditor will ask. The parent category — Customer Service AI Models — describes the trained model side; this page is about the retrieval side that complements it.

Ingestion sources

Confluence, SharePoint, Notion, Zendesk, Salesforce, Freshdesk, file shares (SMB, S3, Azure Blob), and bespoke databases over JDBC or ODBC. Incremental sync — the pipeline does not refetch the world every night. Document-level versioning so a retrieval can cite the version that was current at the time of the answer, even if the document has since been edited.

Embeddings and vector store

The customer chooses the embedding model — self-hosted (Sentence Transformers, BGE, GTE families) or vendor-supplied (OpenAI, Cohere, Voyage). The vector store lives on infrastructure the customer controls — pgvector on Postgres, Qdrant, Weaviate, or Milvus, scoped per the buyer's existing platform stack. I do not ship a retrieval layer that depends on a third-party hosted vector database the customer cannot inspect or export from.

Citation-first retrieval

Every answer surfaces the source document, the section, and the passage the model drew from. The retrieved passage is rendered in the response or in a sidebar — the customer chooses. No silent paraphrase. No fabricated sources. No answers without provenance. If retrieval returned nothing useful, the model says so and escalates rather than confabulating.

What I ship

Ingestion pipelines. Confluence, SharePoint, Notion, Zendesk, Salesforce, Freshdesk, file shares, bespoke databases. Incremental, versioned.
Embedding pipeline. Customer choice of embedding model; chunking strategy tuned to the corpus shape.
Vector store on customer infrastructure. pgvector, Qdrant, Weaviate, or Milvus.
Citation-first retrieval layer. Every answer cites the source document and the passage; no answer without provenance.
Document-level versioning. The retrieval can surface what was current at the time of the answer.
Retrieval-quality dashboard. Hit rate, refusal rate, citation coverage — measurable, not vibes-based.

Where it fits

Auditable answers

Healthcare, finance, defense-adjacent, regulated technical support. Every answer must trace to a source. The retrieval layer makes that traceability native to the response — not a separate audit export the operations team has to assemble after the fact.

Knowledge sprawled across systems

The customer's source of truth is in Confluence for some topics, SharePoint for others, Zendesk macros for support patterns, and a Salesforce knowledge base for the field team. I integrate all of them behind a single retrieval layer with deduplication and conflict resolution, so the model sees one corpus rather than five.

"Where did the bot get that from?"

An auditor, a customer, or a regulator asks where a specific answer came from. The audit log surfaces the retrieval, the source document, the passage, the document version at the time, and the model version that interpreted it. The question is answered in seconds, not weeks.

How I work

Discovery scopes the source systems, the corpus volume, the embedding choice, the vector store choice, the chunking strategy, and the retention requirements. Implementation runs in two-week sprints with the first source system live at the end of sprint one, additional sources added one per sprint. The retrieval-quality dashboard is wired up before any production rollout. Patterns from prior retrieval engagements are in the research notes.

Engagement model

Discovery runs one to two weeks. Implementation runs four to ten weeks depending on the source-system count and corpus volume. Handover includes the ingestion pipelines, the vector store, the retrieval layer, the citation-rendering integration, the quality dashboard, the runbook, and a 30-day support window. To scope a retrieval engagement, get in touch.