Overcoming the Hidden Performance Traps of Variable-Shaped Tensors: Efficient Data Sampling in…PyTorch Model Performance Analysis and Optimization — Part 113d ago3d ago
On the Challenge of Converting TensorFlow Models to PyTorchHow to Upgrade and Optimize Legacy AI/ML ModelsNov 4Nov 4
Optimizing PyTorch Model Inference on AWS GravitonTips for Accelerating AI/ML on CPU — Part 2Oct 31Oct 31
Optimizing PyTorch Model Inference on CPUFlyin’ Like a Lion on Intel XeonOct 19A response icon3Oct 19A response icon3
Capturing and Deploying PyTorch Models with torch.exportA Demonstration of PyTorch’s Exciting New Export Feature on a HuggingFace ModelAug 14A response icon1Aug 14A response icon1
Maximizing AI/ML Model Performance with PyTorch CompilationPractical Tips for Getting the Most Out of torch.compileAug 7A response icon1Aug 7A response icon1
The Crucial Role of NUMA Awareness in High-Performance Deep LearningPyTorch Model Performance Analysis and Optimization — Part 10Jul 7Jul 7
Pipelining AI/ML Training workloads With CUDA StreamsPyTorch Model Performance Analysis and Optimization — Part 9Jun 21A response icon3Jun 21A response icon3
A Caching Strategy for Identifying Bottlenecks on the Data Input PipelinePyTorch Model Performance Analysis and Optimization — Part 8Jun 6Jun 6
The Case for Centralized AI Model Inference ServingOptimizing Highly Parallel AI Algorithm ExecutionMar 18Mar 18