· Tom Hippensteel · AI Research · 1 min read
Static Parallelism Is Killing Your GPU Budget
Current LLM training frameworks pick one parallel strategy and pray. ParaDySe hot-switches strategies layer-by-layer based on actual input length.
That’s the claim from researchers at China’s National University of Defense Technology.
Current LLM training frameworks pick one parallel strategy and pray. Short sequences? Wasted overhead. Long sequences? OOM crash.
It’s like assigning the same size crew to every construction job. Small repair? Workers standing around. Massive build? Not enough hands.
ParaDySe
It hot-switches strategies layer-by-layer based on actual input length. No restart. No reconfiguration.
The foreman reassigns workers in real-time based on what’s actually needed.
624K tokens supported. 89% faster training on long sequences. 181% longer sequence support than baselines.
Note: NUDT is a military university under China’s Central Military Commission. Practical AI infrastructure work with open-source code is not what I’d expect from a defense institution.
Sources:
- Code: https://github.com/Carrie-ou/ParaDySe (code repository is currently minimal)
- arXiv: https://arxiv.org/abs/2511.13198


