Meta's Llama 3 combined with LoRA allows creating a custom model adapted to your business domain without requiring massive GPUs or large-company budgets. Here is how to do it, step by step.
When to fine-tune instead of prompting
Fine-tuning is not always the right answer. For most use cases, a good system prompt plus RAG gives better results with less effort. Fine-tuning is essential for very specific writing styles, proprietary terminology or high-volume repetitive behaviour.
LoRA: efficient adaptation
LoRA does not modify original model weights. It adds small weight matrices to attention layers, trains only these, then merges with the base model. Result: 99% of parameters stay frozen, only 0.1-1% trained.
QLoRA: even more efficient
QLoRA quantises the base model to 4-bit to reduce GPU memory, then applies LoRA. You can fine-tune Llama 3.2 8B on a single 8GB GPU.
The complete pipeline
Step 1: prepare your data
Your dataset must be in instruction-following format. Minimum: 200 high-quality examples. Optimal: 1,000-5,000 very well curated examples.
Golden rule on data: 200 excellent examples always outperform 2,000 mediocre ones. Dataset quality is the number 1 factor in fine-tuned model quality.
Step 2: training with Unsloth
Unsloth is the reference library in 2026 for efficient fine-tuning: 2-5x faster than Hugging Face Trainer, less memory, same quality.
Evaluation and deployment
Use LLM-as-a-judge: ask GPT-4o to compare your fine-tuned model's responses vs the base model on your test examples.
With care,
Excellent article, this matches exactly what we're seeing with our enterprise clients. The section on inference costs is especially valuable. It's a topic most articles gloss over but it's make-or-break at scale.
Thanks James! Inference cost optimization is often deprioritized during prototyping but becomes critical in production. Feel free to book a session if you'd like to go deeper on this.
Sharing this with my whole team. The distinction between an impressive demo and robust production is exactly the debate we're having internally right now. The human checkpoint advice is immediately actionable.
Great article. I'd push back slightly on the 18-day deployment estimate, in our experience with enterprise security and GDPR requirements, 4–6 weeks is more realistic for a first production agent.
Completely fair point David. The 18 days refers to a scoped first agent in a test environment. For full enterprise production with security constraints, your estimate is accurate.