deepspeed¶
Functions
| Convert ZeRO 2 or 3 checkpoint into a single fp32 consolidated  | |
| 
 | 
 | 
Utilities that can be used with Deepspeed.
- lightning.pytorch.utilities.deepspeed.convert_zero_checkpoint_to_fp32_state_dict(checkpoint_dir, output_file, tag=None)[소스]¶
- Convert ZeRO 2 or 3 checkpoint into a single fp32 consolidated - state_dictfile that can be loaded with- torch.load(file)+- load_state_dict()and used for training without DeepSpeed. It gets copied into the top level checkpoint dir, so the user can easily do the conversion at any point in the future. Once extracted, the weights don’t require DeepSpeed and can be used in any application. Additionally the script has been modified to ensure we keep the lightning state inside the state dict for being able to run- LightningModule.load_from_checkpoint('...')`.- 매개변수
- checkpoint_dir¶ ( - Union[- str,- Path]) – path to the desired checkpoint folder. (one that contains the tag-folder, like- global_step14)
- output_file¶ ( - Union[- str,- Path]) – path to the pytorch fp32 state_dict output file (e.g. path/pytorch_model.bin)
- tag¶ ( - Optional[- str]) – checkpoint tag used as a unique identifier for checkpoint. If not provided will attempt to load tag in the file named- latestin the checkpoint folder, e.g.,- global_step14
 
 - Examples: - # Lightning deepspeed has saved a directory instead of a file convert_zero_checkpoint_to_fp32_state_dict( "lightning_logs/version_0/checkpoints/epoch=0-step=0.ckpt/", "lightning_model.pt" ) - 반환 형식