Qwen 3.5 Small
Alibaba's compact multimodal AI model family (0.8B to 9B parameters) delivering surprisingly strong performance for on-device applications, edge computing, and lightweight agent tasks.
What is Qwen 3.5 Small?
The Qwen 3.5 Small Model Series, launched by Alibaba's Qwen team on March 2, 2026, is a family of four models ranging from 0.8B to 9B parameters built for on-device and edge deployment. All models share the same Qwen 3.5 architecture with native multimodal support for both text and images in a single model. The series includes architectural upgrades and scaled reinforcement learning from the larger Qwen 3.5 models. The 0.8B and 2B variants target edge devices and mobile, the 4B model serves as a strong multimodal base for lightweight agents, and the 9B model closes the gap with much larger models. Available on Hugging Face Hub and ModelScope under open-source licenses.
Best for: On-device deployment · Edge computing · Lightweight multimodal tasks · Privacy-first local AI

Developer Stack Fit
Quick read on where Qwen 3.5 Small fits in a software team's AI stack. Validate final fit against your codebase, data policy, and deployment model.
- Stack layer
- Self-Hosted
- Deployment model
- Self-hosted or local option
- Open-source status
- Yes or source-available
- API support
- Not a primary API tool
- MCP support
- No MCP signal found
- Security posture
- Stronger controls worth validating
- Best use case
- On-device deployment
Discovery graph
Featured in NeuralStackly paths
Product media
Interface proof

Key Features
- 01
Native multimodal text and image processing
Native multimodal in compact models
- 02
Four model sizes for different use cases
On-device deployment ready
- 03
Optimized for on-device and edge deployment
Strong performance per parameter
- 04
Scaled reinforcement learning training
A core development capability that teams use daily.
- 05
Architectural improvements over Qwen 3.0
A core development capability that teams use daily.
- 06
Async RL frameworks for agent scaffolds
A core development capability that teams use daily.
- 07
High multimodal training efficiency
A core development capability that teams use daily.
- 08
Open weights on Hugging Face and ModelScope
A core development capability that teams use daily.
- 09
Base and instruct/chat variants
A core development capability that teams use daily.
- 10
Competitive with much larger models
A core development capability that teams use daily.
Pros & Cons
What stands out
- Exceptional performance for model size
- True on-device multimodal capability
- Completely free and open-source
- Multiple size options for different hardware
- Strong alternative to proprietary small models
Watch outs
- Larger models still needed for complex reasoning
- Limited context window compared to frontier models
- Requires optimization for specific edge hardware
- Community and ecosystem smaller than Llama/Mistral
Pricing Plans
Qwen 3.5 Small Pricing
Choose the perfect plan for your needs. All plans include our core features with different usage limits and advanced capabilities.
Open Source
Need a Custom Solution?
Looking for enterprise features or custom pricing? Contact Qwen 3.5 Small directly for tailored solutions.
Contact SalesMost teams land on the Open Source plan.
Alternatives
FAQ
What is Qwen 3.5 Small and how does it work?
Qwen 3.5 Small is a development tool that alibaba's compact multimodal ai model family (0.8b to 9b parameters) delivering surprisingly strong performance for on-device applications, edge computing, and lightweight agent tasks.. It uses AI to help users improve productivity through analyzing input and generating relevant output.
Is Qwen 3.5 Small free to use?
Qwen 3.5 Small offers a completely free plan. You can get started without paying anything.
Is there a free plan or trial?
Qwen 3.5 Small doesn't offer a traditional free trial, but provides a money-back guarantee on paid plans.
What can Qwen 3.5 Small do?
More development Tools
Cursor
AI-powered code editor with autonomous agents, multi-model support, and Automations for triggering agents via code changes, Slack, or timers.
TurboQuant
Revolutionary KV cache compression achieving 6x memory reduction and 8x speedup for LLM inference with zero accuracy loss.
Ollama
Local-first LLM runtime for running models on your hardware with local privacy, no per-token API costs, and offline-capable workflows.
Affiliate Disclosure: We may earn a commission when you purchase through links on our site. This doesn't affect our editorial independence or the price you pay.
Qwen 3.5 Small
Free