Handle long-context inference efficiently within memory constraints
MiniCPM-SALA variant supports million-token context with sparse-and-linear hybrid attention architecture. MiniCPM5-1B includes native long-context support, enabling efficient processing of extended documents on limited hardware.
