Static Parallelism Is Killing Your GPU Budget

That’s the claim from researchers at China’s National University of Defense Technology.

Current LLM training frameworks pick one parallel strategy and pray. Short sequences? Wasted overhead. Long sequences? OOM crash.

It’s like assigning the same size crew to every construction job. Small repair? Workers standing around. Massive build? Not enough hands.

ParaDySe

It hot-switches strategies layer-by-layer based on actual input length. No restart. No reconfiguration.

The foreman reassigns workers in real-time based on what’s actually needed.

624K tokens supported. 89% faster training on long sequences. 181% longer sequence support than baselines.

Note: NUDT is a military university under China’s Central Military Commission. Practical AI infrastructure work with open-source code is not what I’d expect from a defense institution.

Sources:

Code: https://github.com/Carrie-ou/ParaDySe (code repository is currently minimal)
arXiv: https://arxiv.org/abs/2511.13198

Static Parallelism Is Killing Your GPU Budget

Related Posts

The Bunny That Broke the Transformer

Who Decides What a Safe LLM Means

The Geometry of Refusal

Deep Learning Might Have Been Lying to Us About Resolution