Published inTowards Data ScienceOptimizing Transformer Models for Variable-Length Input SequencesHow PyTorch NestedTensors, FlashAttention2, and xFormers can Boost Performance and Reduce AI CostsNov 264Nov 264
Published inTowards Data ScienceIncreasing Transformer Model Efficiency Through Attention Layer OptimizationHow paying “better” attention can drive ML cost savingsNov 18Nov 18
Published inTowards Data ScienceOn the Programmability of AWS Trainium and InferentiaAccelerating AI/ML Model Training with Custom Operators — Part 4Nov 1Nov 1
Published inTowards Data ScienceAI Model Optimization on AWS Inferentia and TrainiumTips for accelerating ML with AWS Neuron SDKOct 201Oct 201
Published inTowards Data ScienceImplementing Sequential Algorithms on TPUAccelerating AI/ML Model Training with Custom Operators — Part 3.AOct 7Oct 7
Published inTowards Data ScienceThe Rise of Pallas: Unlocking TPU Potential with Custom KernelsAccelerating AI/ML Model Training with Custom Operators — Part 3Oct 62Oct 62
Published inTowards Data ScienceTraining AI Models on CPURevisiting CPU for ML in an Era of GPU ScarcitySep 14Sep 14
Published inTowards Data ScienceUnleashing the Power of Triton: Mastering GPU Kernel Optimization in PythonAccelerating AI/ML Model Training with Custom Operators — Part 2Aug 133Aug 133
Published inTowards Data ScienceAccelerating AI/ML Model Training with Custom OperatorsOn the potential benefits of creating model-specific GPU kernels and their application to optimizing the use of dynamically shaped tensorsAug 111Aug 111
Published inTowards Data ScienceMulti-Framework AI/ML Development with Keras 3All hail the return of KerasJun 16Jun 16