The rise of SLMs (small language models) and why they matter in LATAM

Bigger isn't always better

In 2024 everyone talked about giant LLMs. In 2026 most solutions we ship to Mexican companies use small models — between 3 and 30 billion parameters — that run on a single GPU or even CPU.

Why SLMs matter so much in LATAM

Cost: a mid-size Mexican company won't tolerate a 15,000 USD/month OpenAI bill. A self-hosted SLM can run for under 1,000 USD/month.
Latency: running the model in a Mexico City or Querétaro datacenter eliminates the 200-300ms US round trip.
Privacy and data sovereignty: regulated companies (banking, health, government) often can't send data to APIs abroad.
Specialization: an SLM fine-tuned on your domain beats generic GPT-5 at narrow tasks.

SLMs we're using

Llama 4 8B and 30B: workhorse. Great quality/cost, easy to fine-tune.
Qwen 3: strong reasoning and code, solid multilingual support.
Phi-5: Microsoft. Surprisingly good for its size.
Mistral Small: still great for simple tools and function calling.

When NOT to use an SLM

When you need extended multi-step reasoning: GPT-5 or Claude still win clearly.
When your volume is low (under 100k requests/month): operational cost doesn't justify it.
When you don't have DevOps/MLOps capacity: hosting an SLM isn't trivial.

If you meet the volume and privacy criteria, a well-tuned SLM is probably the best cost/benefit decision you'll make this year.