I train customer service AI on your data, not on someone else's API. You own the weights, choose where the model runs, and never pay a per-token tax to a third party.
A self-trained AI customer service model — fine-tuned on the customer's own corpus, deployed on infrastructure the customer controls — replacing the thin-wrapper "prompt + RAG over a public LLM" approach that most vendors are selling under the same label. The customer owns the weights. The customer chooses where the model runs. The customer sets the retention policy.
Most "custom AI" offers on the market today are a system prompt and a retrieval layer pointed at GPT-5 or Claude. That is wrapper work, sold under a self-model label. I will recommend a wrapper when the volume is low, the domain is generic, and the data is non-sensitive — and I will say so on day one. I will not recommend a wrapper to a regulated buyer, a high-volume buyer, or a buyer whose product IS the model's behavior.
The engagement scope is the trained model, the auditable retrieval layer that complements it, the channel deployment across web chat, SMS, voice, and email, and the operations layer that keeps the model running on your VPC or on-prem GPU pool. No per-token tax to a third party. No customer transcripts flowing into a vendor's training pipeline. No quarterly notice that the model your application depends on is being sunsetted.
James Henderson is a computer scientist with 25-plus years as an architect, engineer, and application designer. He pairs disciplined engineering with current AI tooling — Claude Code, Codex, and a working judgment about when to trust them — to ship products faster and at higher quality than traditional staffing allows. For customer service AI specifically, that means model selection that respects the deployment target, fine-tuning runs that produce a published eval score before promotion, and operations layers that look like real production systems rather than notebooks running on a laptop in a closet. There is no offshore team behind the page. The principal carries the work end to end: scope, build, deploy, support. A B.S. in Computer Science from the University of Houston and service in the U.S. Army ground the tone — evidence over assertion, plans before code, verification before delivery.
What "self-model" actually means
A self-model is not a marketing label. It is a base model — chosen on technical fit, license, and deployment target — fine-tuned on the customer's own corpus until it speaks the customer's domain natively. The customer owns the weights. The customer chooses where it runs. The customer controls the upgrade cadence, the eval harness, and the retention policy. There is no per-token tax. There is no quarterly notice that the vendor is sunsetting the model the application depends on. The latency floor is set by the customer's hardware, not by a public API that occasionally rate-limits during news cycles.
The phrase matters because the market is full of "custom AI" offers that are a system prompt and a retrieval layer pointed at a third-party model. That is wrapper work. It can be the right answer. It is not a self-model, and the distinction decides whether the buyer's IP, customer transcripts, and competitive moat live inside or outside their perimeter.
Self-model vs wrapper — the honest comparison
The wrapper approach. A "GPT wrapper" or "Ollama wrapper" is a thin application layer that sends a prompt — usually stuffed with a system message, some examples, and retrieved knowledge-base chunks — to a third-party model. The third party owns the model. The third party owns the inference infrastructure. The third party sees every customer interaction that flows through the context window, subject to whatever retention policy is current that quarter. The application is cheap to demo on Monday and depending on the vendor's pricing, painful to scale by Friday.
The self-model approach. A self-model is owned by the customer end to end. Weights, hosting, eval harness, retention policy. There is no per-token tax. There is no surprise model sunset. The latency floor is set by your hardware, not by a public API.
The honest tradeoffs. Self-models are not free. They cost more up front — corpus preparation, training compute, eval harness construction, ops setup. They ship slower than wrappers — weeks of training and evaluation rather than days of prompt iteration. They need ongoing retraining as the corpus drifts. They require operational maturity some buyers do not yet have. I scope all of this on day one.
- Wrap a public model when the volume is low, the domain is generic, the data is non-sensitive, and the moat does not depend on the model itself.
- Train a self-model when the interactions are the product, are regulated, are high-volume, or contain IP the buyer cannot afford to leak into a third-party training pipeline.
- The fraud is selling a wrapper as a "custom AI" because the system prompt is two paragraphs long. I will not do that, and neither will anyone reading this who reaches out for the same reason.
The background that informs that scope discipline is described on the about page. Patterns and post-mortems live in the research notes. If you would rather skip to scoping, get in touch.
What I ship
- Self-model training and fine-tuning. Corpus preparation, base-model selection, LoRA / QLoRA fine-tuning, distillation where the deployment target demands it, reproducible training runs with versioned hyperparameters.
- Customer service bot deployment. Web chat, SMS, voice, and email channels from a single served model, with identity and conversation-state management so returning customers are not treated as strangers.
- Knowledge base ingestion and retrieval. Citation-first retrieval with versioned source documents — no silent paraphrase, no fabricated sources, no answers without provenance.
- Self-hosted model operations. Inference serving on customer VPC, on-prem GPU, or hybrid; observability, drift monitoring, written retraining-trigger policy.
- Eval harness construction. Built against the customer's actual service tasks, not against a public benchmark; pre- and post-training scores published with the deliverable.
- Wrapper engagements when appropriate. Stated on day one, scoped honestly, priced for what they are — not sold as a self-model.
- Channel and identity integration. Twilio for SMS and voice, web-chat widgets with session continuity and human handoff, email response generation with human review queues.
Engagement model
Discovery runs one to two weeks: corpus inventory, deployment-target assessment, eval-task definition, and a written plan with milestones. From there, two- to three-week training and integration sprints with a working demo at the end of each — a small fine-tuned model on a representative slice of the corpus first, then a full run once the eval harness is calibrated. Handover includes the trained weights, the eval scores published against the customer's actual tasks, the inference serving stack on infrastructure the customer controls, a runbook for retraining, and a 30-day support window for the questions that surface after launch. To scope a build, get in touch.