Self-Model Training & Fine-Tuning
End-to-end training pipeline for a customer-owned model — corpus to deployable artifact, with a published eval score before promotion.
<p>This is the training engagement. The deliverable is a customer-owned model — base architecture selected for the deployment target, fine-tuned on a corpus the customer prepared with me, packaged with a reproducible training pipeline, and shipped with an eval score against the customer's actual service tasks. The weights live with the customer. The training code lives in the customer's repository. The eval harness lives next to it.</p>
<p>The buyer for this page is a support or product leader who already has a corpus — helpdesk transcripts, knowledge base articles, recorded calls, product documentation — and has decided the model is strategic rather than a side feature. The decision matters. A wrapper is shipped in a week; a self-model takes a quarter. If the corpus does not yet exist or the strategic case has not been made, the wrapper engagement on the parent <a href="/services/customer-service-ai">Customer Service AI Models</a> page is the honest answer.</p>
<h2>The pipeline, end to end</h2>
<h3>Corpus preparation</h3>
<p>Extraction from helpdesk transcripts (Zendesk, Salesforce, Freshdesk, Intercom), knowledge bases (Confluence, Notion, SharePoint), recorded calls (transcribed with the customer's choice of pipeline), and product documentation. PII redaction with reviewable rules. Deduplication. Quality scoring so the training set is not weighted toward the noisiest tickets.</p>
<h3>Base model selection</h3>
<p>Open-weights families for self-hosting; license-compatible options for commercial use; parameter count chosen against the latency and hardware budget. The decision is technical fit, not brand preference. I scope this against the customer's deployment target — on-prem GPU pool, VPC, or hybrid — before any training compute is committed.</p>
<h3>Fine-tuning approach</h3>
<p>LoRA or QLoRA where the corpus size and the hardware budget justify them; full fine-tuning where the volume warrants it; distillation where the deployment target is edge inference. Reproducible training runs with versioned hyperparameters, so the run that produced the deployable weights can be replayed against a different base or a refreshed corpus.</p>
<h3>Eval harness</h3>
<p>Built against the customer's actual service tasks — answering a returning customer about a known ticket, summarizing a long thread for a human agent, escalating cleanly when the confidence threshold drops. Pre- and post-training scores published with the deliverable. The eval harness lives with the customer and is run on every retraining run, not just the first.</p>
<h2>What I ship</h2>
<ul>
<li><strong>Corpus extraction and preparation.</strong> Helpdesk, KB, calls, product docs — extracted, redacted, deduplicated, quality-scored.</li>
<li><strong>Base-model selection report.</strong> Open-weights candidates, license review, parameter-count tradeoffs against the deployment target.</li>
<li><strong>Fine-tuning pipeline.</strong> LoRA, QLoRA, full, or distillation — versioned hyperparameters, reproducible runs.</li>
<li><strong>Eval harness against real customer tasks.</strong> Pre- and post-training scores published with the deliverable.</li>
<li><strong>Retraining-trigger policy.</strong> Written, with corpus-drift threshold and eval gate.</li>
<li><strong>Handover documentation and 30-day support.</strong> Runbook a second engineer can execute.</li>
</ul>
<h2>Where it fits</h2>
<h3>The corpus already exists</h3>
<p>Three years of Zendesk. A Confluence KB. A library of recorded calls. The data is there; the pipeline that turns it into a trained model is what is missing.</p>
<h3>The wrapper is hitting limits</h3>
<p>The per-token bill climbs with ticket volume; the hallucination rate on domain vocabulary will not improve with more prompt engineering. The next step is a fine-tune.</p>
<h3>Regulation requires in-perimeter inference</h3>
<p>Healthcare, finance, defense-adjacent. The data cannot leave the compliance boundary. The model needs to live inside it.</p>
<h2>How I work</h2>
<p>Every engagement opens with a written training brief: corpus inventory, base-model candidates with tradeoffs, fine-tuning approach, eval-task definition, hardware and license constraints, and a phased plan. The brief ships before any training compute is committed. Calibration runs come first; the full run follows only when the calibration eval confirms the approach is sound. The principal carrying the work is described <a href="/about">on the about page</a>.</p>
<h2>Engagement model</h2>
<p>Discovery and corpus preparation run two weeks. Calibration fine-tune runs one to two weeks. Full training and eval-harness construction run three to six weeks. Handover runs one to two weeks. A 30-day support window follows. Wrapper engagements — when that is the honest scope — ship in days. The decision is named on day one.</p>
Scope This Engagement
One principal, plan first, working code on every checkpoint.
Request Consultation