Static Parallelism Is Killing Your GPU Budget
Current LLM training frameworks pick one parallel strategy and pray. ParaDySe hot-switches strategies layer-by-layer based on actual input length.
Current LLM training frameworks pick one parallel strategy and pray. ParaDySe hot-switches strategies layer-by-layer based on actual input length.