The Case for Centralized AI Model Inference ServingOptimizing Highly Parallel AI Algorithm ExecutionMar 18Mar 18
Debugging the Dreaded NaNCapturing and Reproducing Failures in PyTorch Training with LightningFeb 26A response icon1Feb 26A response icon1
Streaming Data from Cloud Storage with Mountpoint for Amazon S3A First Look at a New Solution for Mounting Cloud Based DataFeb 10Feb 10
Efficient Metric Collection in PyTorch: Avoiding the Performance Pitfalls of TorchMetricsPyTorch Model Performance Analysis and Optimization — Part 7Feb 4A response icon1Feb 4A response icon1
Published inTDS ArchiveOptimizing Transformer Models for Variable-Length Input SequencesHow PyTorch NestedTensors, FlashAttention2, and xFormers can Boost Performance and Reduce AI CostsNov 26, 2024A response icon5Nov 26, 2024A response icon5
Published inTDS ArchiveIncreasing Transformer Model Efficiency Through Attention Layer OptimizationHow paying “better” attention can drive ML cost savingsNov 18, 2024Nov 18, 2024
Published inTDS ArchiveOn the Programmability of AWS Trainium and InferentiaAccelerating AI/ML Model Training with Custom Operators — Part 4Nov 1, 2024Nov 1, 2024
Published inTDS ArchiveAI Model Optimization on AWS Inferentia and TrainiumTips for accelerating ML with AWS Neuron SDKOct 20, 2024A response icon1Oct 20, 2024A response icon1
Published inTDS ArchiveImplementing Sequential Algorithms on TPUAccelerating AI/ML Model Training with Custom Operators — Part 3.AOct 7, 2024Oct 7, 2024
Published inTDS ArchiveThe Rise of Pallas: Unlocking TPU Potential with Custom KernelsAccelerating AI/ML Model Training with Custom Operators — Part 3Oct 6, 2024A response icon2Oct 6, 2024A response icon2