Optimizing Distributed Training and Inference for Intel® Gaudi® AI Accelerators
Master expert techniques for distributing AI workloads across Intel® Data Center CPUs and GPUs, improving training and inference.The complexity of deep learning models is surging, warranting enhanced training and inference in distributed compute environments. This session focuses on the essential techniques to use with Intel® Data Center CPUs and GPUs to balance distributed AI workloads and meet data center challenges to improve advances in efficiency and performance.Within the session, explore Intel® Extensions for PyTorch, which optimizes neural network operations on Intel® hardware, and learn how DeepSpeed can be integrated to perform training operations at scale.Topics covered include:Tackle model scalability in a distributed environment skillfully, handling workloads efficiently across Intel Data Center CPUs and GPUs.Gain familiarity with essential Intel tools to simplify operations, including PyTorch Distributed Data Parallel (DPP), Intel’s openAPI Collective Communication Library (oneCCL), and the DeepSpeed library that streamlines network training at scale.Deploy practical solutions that maximize hardware efficiency and perfect strategies that ensure top performance for AI development.Sample code and see benchmarking milestones, using tools such as IPX LLM, to illustrate performance achievements.Sign up today.Speakers: Alex Sin - AI Software Solutions Engineer, Yuning Qiu - AI Software Solutions Engineer.