Customer Service AI Models — Your Data, Not a Generic Wrapper

I train customer service AI on your data, not on someone else’s API — you own the weights and choose where the model runs.

Self-model (customer-trained)

A base model fine-tuned on your corpus, deployed on infrastructure you control.

The base model is chosen on technical fit and license. The fine-tune runs against your transcripts, knowledge base, and product documentation until the model speaks your domain natively. You own the weights, the eval harness, and the retention policy. There is no per-token tax and no quarterly notice that a vendor is sunsetting the model your application depends on.

You own the weights. Training data, eval set, and retention policy stay inside your perimeter.
Predictable latency floor. Set by your hardware, not a public API rate-limited during news cycles.
Capital up front, capped opex. Hardware + power, not a per-token bill that scales with ticket volume.
No vendor lock-in. Weights are portable; the training pipeline retargets to a refreshed base on your schedule.
Regulated-industry fit. The model lives inside the compliance boundary; PII never leaves.

Generic wrapper (third-party API + prompt)

A system prompt and a retrieval layer pointed at a hosted model from a third party.

A wrapper is a thin application layer that sends a prompt, often stuffed with examples and retrieved knowledge-base chunks, to GPT, Claude, Gemini, or a hosted Ollama instance. The third party owns the model, runs the inference, and sees every interaction in the context window. For low-volume, generic-domain, non-sensitive use cases this is the right answer and it ships in days. I will scope one honestly when the math points there.

Vendor sees every prompt. Retention policy is whatever the vendor publishes that quarter.
Variable latency. Floor set by a public API; rate-limited during peak traffic.
Per-token opex. Cheap to demo Monday, scales linearly with ticket volume by Friday.
Vendor coupling. API surface, pricing, model versions, and sunset notices are on the vendor’s calendar.
Days, not weeks. This is the wrapper’s genuine advantage — ship a working demo this week.

Self-models cost more up front than wrappers. They take weeks rather than days. They need ongoing retraining as the corpus drifts. For low-volume, English-only, non-regulated use cases, a wrapper is the right answer and I will scope one honestly. For regulated buyers, high-volume buyers, and buyers whose product IS the model’s behavior, the self-model pays back over the operating life. The fork is named on day one, not after the contract.

THE EIGHT DIMENSIONS

Dimension	Self-model (customer-trained)	Wrapper (third-party API + prompt)
Data ownership	Customer owns the weights, training data, eval set, and retention policy. Nothing leaves the perimeter.	Vendor sees every prompt, completion, and retrieved chunk. Retention is whatever the vendor publishes that quarter.
Latency	Floor set by customer hardware. Predictable. Tunable via quantization and the inference stack.	Floor set by a public API. Variable. Rate-limited during news cycles.
Cost model	Capital up front (training compute, corpus prep, eval harness). Operating cost is hardware and power — predictable and capped.	Per-token. Scales linearly with ticket volume. Cheap to demo, expensive at scale.
Vendor lock-in	None. Weights are portable. The base model can be retrained against a different family later.	High. Coupled to the vendor’s API surface, pricing, model versions, and sunset notices.
Regulated-industry fit	Strong. The model lives inside the compliance boundary. PII never leaves the perimeter.	Weak by default; possible with vendor enterprise contracts and BAAs where the vendor offers them.
Custom behavior	Domain is baked into the weights. The model speaks customer vocabulary natively. No load-bearing system prompt to leak.	Domain is bolted on via prompt and retrieval. Effective for generic tasks; brittle on niche vocabulary.
Time to ship	Eight to sixteen weeks. Longer for regulated or multi-channel rollouts.	Days. This is the wrapper’s genuine advantage.
Ongoing maintenance	Periodic retraining as the corpus drifts. The eval harness runs on every retraining pass.	Vendor handles model updates — a benefit until a sunset notice arrives or a version change shifts behavior.

MYTH-BUSTING Q&A

Q.Isn’t a self-model just an open-source LLM you downloaded?

No. Downloading Llama-3.1 or Qwen-2.5 gives you the base. A self-model is that base — chosen on technical fit, license, and deployment target — fine-tuned on your corpus until it speaks your domain natively. The base is a generalist; the self-model is the specialist that emerges after the fine-tune. The training pipeline, the eval harness, and the retention policy are part of the deliverable.

Q.Don’t I still need GPT-5 or Claude for the hard questions?

Sometimes, and that is fine. A self-model handles the routine 80–95% of interactions where the domain matters. For edge cases the deployment can route to a frontier model as a fallback. Pure self-model where the corpus covers the surface; pure wrapper where volume does not justify training; hybrid where both pay back. The architecture decision is named on day one, not improvised at month three.

Q.Won’t this be ten times more expensive than just using GPT-5?

The math depends on volume. At low volume a wrapper is cheaper — no capex to amortize. Around tens of thousands of monthly interactions, per-token bills overtake the amortized capex of a self-model on a single GPU. At high volume the gap widens. I scope the volume on day one and recommend the cheaper option for the actual volume — including the wrapper when that is the math.

Q.Can’t I just use a longer system prompt and RAG over my docs?

You can, and for many use cases that is the right answer. Prompt-and-RAG works when domain vocabulary is close to general English, tasks are routine, and volume is low. It breaks down when jargon is dense, when tasks require domain reasoning the base has not internalized, or when the prompt-leak vector is unacceptable — regulated, defense-adjacent, or IP-sensitive deployments. I will tell you which bucket your use case sits in before any contract is signed.

Q.What happens when the open-source base becomes outdated?

The same thing that happens when a vendor sunsets a closed model — except you own the weights, so you have time. The training pipeline is reproducible; re-run it against a refreshed base on your schedule, not the vendor’s. License compatibility and upgrade-path tradeoffs are scoped in the base-model selection report before the first training run.

Q.Will my data train someone else’s model?

No, by construction. The fine-tune runs on infrastructure you control. Training data and transcripts never flow into a third-party training pipeline. The answer is the same ten months from now as today — there is no vendor terms-of-service to change.

The rest of this page covers what I ship inside a self-model engagement — the training pipeline, the channel deployment, the citation-first retrieval layer, the self-hosted operations stack — how each fits a real buyer profile, and how the engagement is scoped end to end. Read on, or get in touch if the scope conversation is the part you need next.

I train customer service AI on your data, not on someone else's API. You own the weights, choose where the model runs, and never pay a per-token tax to a third party.

A self-trained AI customer service model — fine-tuned on the customer's own corpus, deployed on infrastructure the customer controls — replacing the thin-wrapper "prompt + RAG over a public LLM" approach that most vendors are selling under the same label. The customer owns the weights. The customer chooses where the model runs. The customer sets the retention policy.

Most "custom AI" offers on the market today are a system prompt and a retrieval layer pointed at GPT-5 or Claude. That is wrapper work, sold under a self-model label. I will recommend a wrapper when the volume is low, the domain is generic, and the data is non-sensitive — and I will say so on day one. I will not recommend a wrapper to a regulated buyer, a high-volume buyer, or a buyer whose product IS the model's behavior.

The engagement scope is the trained model, the auditable retrieval layer that complements it, the channel deployment across web chat, SMS, voice, and email, and the operations layer that keeps the model running on your VPC or on-prem GPU pool. No per-token tax to a third party. No customer transcripts flowing into a vendor's training pipeline. No quarterly notice that the model your application depends on is being sunsetted.

James Henderson is a computer scientist with 25-plus years as an architect, engineer, and application designer. He pairs disciplined engineering with current AI tooling — Claude Code, Codex, and a working judgment about when to trust them — to ship products faster and at higher quality than traditional staffing allows. For customer service AI specifically, that means model selection that respects the deployment target, fine-tuning runs that produce a published eval score before promotion, and operations layers that look like real production systems rather than notebooks running on a laptop in a closet. There is no offshore team behind the page. The principal carries the work end to end: scope, build, deploy, support. A B.S. in Computer Science from the University of Houston and service in the U.S. Army ground the tone — evidence over assertion, plans before code, verification before delivery.

What "self-model" actually means

A self-model is not a marketing label. It is a base model — chosen on technical fit, license, and deployment target — fine-tuned on the customer's own corpus until it speaks the customer's domain natively. The customer owns the weights. The customer chooses where it runs. The customer controls the upgrade cadence, the eval harness, and the retention policy. There is no per-token tax. There is no quarterly notice that the vendor is sunsetting the model the application depends on. The latency floor is set by the customer's hardware, not by a public API that occasionally rate-limits during news cycles.

The phrase matters because the market is full of "custom AI" offers that are a system prompt and a retrieval layer pointed at a third-party model. That is wrapper work. It can be the right answer. It is not a self-model, and the distinction decides whether the buyer's IP, customer transcripts, and competitive moat live inside or outside their perimeter.

Self-model vs wrapper — the honest comparison

The wrapper approach. A "GPT wrapper" or "Ollama wrapper" is a thin application layer that sends a prompt — usually stuffed with a system message, some examples, and retrieved knowledge-base chunks — to a third-party model. The third party owns the model. The third party owns the inference infrastructure. The third party sees every customer interaction that flows through the context window, subject to whatever retention policy is current that quarter. The application is cheap to demo on Monday and depending on the vendor's pricing, painful to scale by Friday.

The self-model approach. A self-model is owned by the customer end to end. Weights, hosting, eval harness, retention policy. There is no per-token tax. There is no surprise model sunset. The latency floor is set by your hardware, not by a public API.

The honest tradeoffs. Self-models are not free. They cost more up front — corpus preparation, training compute, eval harness construction, ops setup. They ship slower than wrappers — weeks of training and evaluation rather than days of prompt iteration. They need ongoing retraining as the corpus drifts. They require operational maturity some buyers do not yet have. I scope all of this on day one.

Wrap a public model when the volume is low, the domain is generic, the data is non-sensitive, and the moat does not depend on the model itself.
Train a self-model when the interactions are the product, are regulated, are high-volume, or contain IP the buyer cannot afford to leak into a third-party training pipeline.
The fraud is selling a wrapper as a "custom AI" because the system prompt is two paragraphs long. I will not do that, and neither will anyone reading this who reaches out for the same reason.

The background that informs that scope discipline is described on the about page. Patterns and post-mortems live in the research notes. If you would rather skip to scoping, get in touch.

What I ship

Self-model training and fine-tuning. Corpus preparation, base-model selection, LoRA / QLoRA fine-tuning, distillation where the deployment target demands it, reproducible training runs with versioned hyperparameters.
Customer service bot deployment. Web chat, SMS, voice, and email channels from a single served model, with identity and conversation-state management so returning customers are not treated as strangers.
Knowledge base ingestion and retrieval. Citation-first retrieval with versioned source documents — no silent paraphrase, no fabricated sources, no answers without provenance.
Self-hosted model operations. Inference serving on customer VPC, on-prem GPU, or hybrid; observability, drift monitoring, written retraining-trigger policy.
Eval harness construction. Built against the customer's actual service tasks, not against a public benchmark; pre- and post-training scores published with the deliverable.
Wrapper engagements when appropriate. Stated on day one, scoped honestly, priced for what they are — not sold as a self-model.
Channel and identity integration. Twilio for SMS and voice, web-chat widgets with session continuity and human handoff, email response generation with human review queues.

Engagement model

Discovery runs one to two weeks: corpus inventory, deployment-target assessment, eval-task definition, and a written plan with milestones. From there, two- to three-week training and integration sprints with a working demo at the end of each — a small fine-tuned model on a representative slice of the corpus first, then a full run once the eval harness is calibrated. Handover includes the trained weights, the eval scores published against the customer's actual tasks, the inference serving stack on infrastructure the customer controls, a runbook for retraining, and a 30-day support window for the questions that surface after launch. To scope a build, get in touch.

FREQUENTLY ASKED

What is the difference between a "self-model" and a "wrapper"?

A wrapper sends a prompt to a third-party model (GPT, Claude, Gemini, an Ollama instance) and styles the response. The third party owns the weights, sees the data, and controls the model lifecycle. A self-model is a base model fine-tuned on your corpus — you own the weights, you choose where it runs, you set the retention policy, and there is no per-token tax. Both are real engineering. The difference matters for regulation, volume, and IP retention.

Will you recommend a wrapper if that is the right answer?

Yes, on day one. For low-volume, generic-domain, non-sensitive use cases, a wrapper ships in days for a fraction of the cost of a fine-tune. The honest scope is named before the contract, not after. The fraud is selling a wrapper as a "custom AI" because the system prompt is two paragraphs long, and I do not do that.

How long does a self-model engagement take, end to end?

Most engagements run eight to sixteen weeks: one to two weeks of discovery and corpus preparation, two to three weeks of base-model selection plus a calibration fine-tune on a representative slice, three to six weeks of full training and eval-harness construction, and two to three weeks of deployment, observability, and handover. Heavily regulated deployments or multi-channel rollouts extend that. A wrapper engagement runs days, not weeks.

RELATED SERVICES