Streaming Data from Cloud Storage with Mountpoint for Amazon S3A First Look at a New Solution for Mounting Cloud Based DataFeb 10Feb 10
Efficient Metric Collection in PyTorch: Avoiding the Performance Pitfalls of TorchMetricsPyTorch Model Performance Analysis and Optimization — Part 7Feb 41Feb 41
Published inTDS ArchiveOptimizing Transformer Models for Variable-Length Input SequencesHow PyTorch NestedTensors, FlashAttention2, and xFormers can Boost Performance and Reduce AI CostsNov 26, 20245Nov 26, 20245
Published inTDS ArchiveIncreasing Transformer Model Efficiency Through Attention Layer OptimizationHow paying “better” attention can drive ML cost savingsNov 18, 2024Nov 18, 2024
Published inTDS ArchiveOn the Programmability of AWS Trainium and InferentiaAccelerating AI/ML Model Training with Custom Operators — Part 4Nov 1, 2024Nov 1, 2024
Published inTDS ArchiveAI Model Optimization on AWS Inferentia and TrainiumTips for accelerating ML with AWS Neuron SDKOct 20, 20241Oct 20, 20241
Published inTDS ArchiveImplementing Sequential Algorithms on TPUAccelerating AI/ML Model Training with Custom Operators — Part 3.AOct 7, 2024Oct 7, 2024
Published inTDS ArchiveThe Rise of Pallas: Unlocking TPU Potential with Custom KernelsAccelerating AI/ML Model Training with Custom Operators — Part 3Oct 6, 20242Oct 6, 20242
Published inTDS ArchiveTraining AI Models on CPURevisiting CPU for ML in an Era of GPU ScarcitySep 1, 20244Sep 1, 20244
Published inTDS ArchiveUnleashing the Power of Triton: Mastering GPU Kernel Optimization in PythonAccelerating AI/ML Model Training with Custom Operators — Part 2Aug 13, 20243Aug 13, 20243