SLMOpen sourceLATAM
The rise of SLMs (small language models) and why they matter in LATAM
Why small models are eating a large slice of the market, especially in countries with higher infrastructure costs.
April 12, 2026 · Lixto Labs Team · 1 min read
Bigger isn't always better
In 2024 everyone talked about giant LLMs. In 2026 most solutions we ship to Mexican companies use small models — between 3 and 30 billion parameters — that run on a single GPU or even CPU.
Why SLMs matter so much in LATAM
- Cost: a mid-size Mexican company won't tolerate a 15,000 USD/month OpenAI bill. A self-hosted SLM can run for under 1,000 USD/month.
- Latency: running the model in a Mexico City or Querétaro datacenter eliminates the 200-300ms US round trip.
- Privacy and data sovereignty: regulated companies (banking, health, government) often can't send data to APIs abroad.
- Specialization: an SLM fine-tuned on your domain beats generic GPT-5 at narrow tasks.
SLMs we're using
- Llama 4 8B and 30B: workhorse. Great quality/cost, easy to fine-tune.
- Qwen 3: strong reasoning and code, solid multilingual support.
- Phi-5: Microsoft. Surprisingly good for its size.
- Mistral Small: still great for simple tools and function calling.
When NOT to use an SLM
- When you need extended multi-step reasoning: GPT-5 or Claude still win clearly.
- When your volume is low (under 100k requests/month): operational cost doesn't justify it.
- When you don't have DevOps/MLOps capacity: hosting an SLM isn't trivial.
If you meet the volume and privacy criteria, a well-tuned SLM is probably the best cost/benefit decision you'll make this year.