Back to products
Step 3.5 Flash

Step 3.5 Flash

Frontier open-source MoE model built for OpenClaw agents

Website static.stepfun.com
Overview

What it is

Step 3.5 Flash is StepFun’s 196B sparse MoE model that activates only 11B parameters per token. It delivers frontier reasoning and strong agentic performance with high efficiency. Seamless native OpenClaw integration makes it one of the best open models for running serious agents right now.

Intent

I need it when

Solve advanced mathematical and reasoning problems with high accuracy and speed

Step 3.5 Flash achieves 97.3% on AIME 2025, 96.2% on HMMT 2025, and 85.4% on IMOAnswerBench. With Python code execution integration, performance improves further (99.8% on AIME 2025). The model's 3-way Multi-Token Prediction enables complex reasoning chains with immediate responsiveness, making it suitable for competitive math, logic puzzles, and analytical problem-solving.

Build autonomous agents that reason deeply and execute complex, multi-step tasks reliably in real-world scenarios

Step 3.5 Flash is purpose-built for agentic tasks with a scalable RL framework, achieving 74.4% on SWE-bench Verified and 51.0% on Terminal-Bench 2.0. Its sparse MoE architecture (11B active of 196B parameters) enables fast reasoning (100–300 tok/s) while maintaining frontier-level reasoning depth, making it ideal for orchestrating tool-use, code execution, and long-horizon workflows.

Deploy a capable AI model locally with data privacy and without reliance on proprietary cloud APIs

Step 3.5 Flash is an open-source foundation model optimized for local deployment on high-end consumer hardware (Mac Studio M4 Max, NVIDIA DGX Spark). Its efficient sparse MoE architecture reduces computational overhead while maintaining elite-level intelligence, enabling secure, private inference without sending data to external servers.

Process and reason over very long documents or codebases efficiently

Step 3.5 Flash supports a cost-efficient 256K context window using 3:1 Sliding Window Attention (SWA), integrating three SWA layers for every one full-attention layer. This hybrid approach ensures consistent performance across massive datasets or long codebases while significantly reducing computational overhead compared to standard long-context models.

Automate software engineering tasks including code generation, repository-level refactoring, and end-to-end development workflows

Step 3.5 Flash achieves 86.4% on LiveCodeBench-V6 and is compatible with Claude Code as an efficient backend for agent-led development. Its long-context reasoning and precision in tool invocation enable handling of repository-level tasks, code verification, dependency mapping, and continuous development loop maintenance.

Drop

Not a fit when

  • User requires proprietary model guarantees or SLA commitments; Step 3.5 Flash is open-source and self-hosted deployment may lack enterprise support
  • User needs real-time web search or live data integration beyond the model's training cutoff without external tool orchestration
  • User operates in highly regulated industries (finance, healthcare, legal) requiring certified model audits or compliance documentation not mentioned in available evidence
  • User requires GPU resources below high-end consumer hardware (e.g., Mac Studio M4 Max, NVIDIA DGX Spark); model is parameter-heavy despite MoE efficiency
  • User prioritizes lowest-latency inference under 50ms; model's 100–300 tok/s throughput may not meet ultra-low-latency requirements
Commercials

Pricing

CNY9.9 - CNY369 / monthly View pricing