Published inTowards Data ScienceOptimizing Transformer Models for Variable-Length Input SequencesHow PyTorch NestedTensors, FlashAttention2, and xFormers can Boost Performance and Reduce AI CostsNov 26, 20244Nov 26, 20244
Published inTowards Data ScienceIncreasing Transformer Model Efficiency Through Attention Layer OptimizationHow paying “better” attention can drive ML cost savingsNov 18, 2024Nov 18, 2024
Published inTowards Data ScienceOn the Programmability of AWS Trainium and InferentiaAccelerating AI/ML Model Training with Custom Operators — Part 4Nov 1, 2024Nov 1, 2024
Published inTowards Data ScienceAI Model Optimization on AWS Inferentia and TrainiumTips for accelerating ML with AWS Neuron SDKOct 20, 20241Oct 20, 20241
Published inTowards Data ScienceImplementing Sequential Algorithms on TPUAccelerating AI/ML Model Training with Custom Operators — Part 3.AOct 7, 2024Oct 7, 2024
Published inTowards Data ScienceThe Rise of Pallas: Unlocking TPU Potential with Custom KernelsAccelerating AI/ML Model Training with Custom Operators — Part 3Oct 6, 20242Oct 6, 20242
Published inTowards Data ScienceTraining AI Models on CPURevisiting CPU for ML in an Era of GPU ScarcitySep 1, 20244Sep 1, 20244
Published inTowards Data ScienceUnleashing the Power of Triton: Mastering GPU Kernel Optimization in PythonAccelerating AI/ML Model Training with Custom Operators — Part 2Aug 13, 20243Aug 13, 20243
Published inTowards Data ScienceAccelerating AI/ML Model Training with Custom OperatorsOn the potential benefits of creating model-specific GPU kernels and their application to optimizing the use of dynamically shaped tensorsAug 11, 20241Aug 11, 20241
Published inTowards Data ScienceMulti-Framework AI/ML Development with Keras 3All hail the return of KerasJun 16, 2024Jun 16, 2024