Published inTowards Data ScienceOn the Programmability of AWS Trainium and InferentiaAccelerating AI/ML Model Training with Custom Operators — Part 4Nov 1Nov 1
Published inTowards Data ScienceAI Model Optimization on AWS Inferentia and TrainiumTips for accelerating ML with AWS Neuron SDKOct 201Oct 201
Published inTowards Data ScienceImplementing Sequential Algorithms on TPUAccelerating AI/ML Model Training with Custom Operators — Part 3.AOct 7Oct 7
Published inTowards Data ScienceThe Rise of Pallas: Unlocking TPU Potential with Custom KernelsAccelerating AI/ML Model Training with Custom Operators — Part 3Oct 62Oct 62
Published inTowards Data ScienceTraining AI Models on CPURevisiting CPU for ML in an Era of GPU ScarcitySep 14Sep 14
Published inTowards Data ScienceUnleashing the Power of Triton: Mastering GPU Kernel Optimization in PythonAccelerating AI/ML Model Training with Custom Operators — Part 2Aug 133Aug 133
Published inTowards Data ScienceAccelerating AI/ML Model Training with Custom OperatorsOn the potential benefits of creating model-specific GPU kernels and their application to optimizing the use of dynamically shaped tensorsAug 111Aug 111
Published inTowards Data ScienceMulti-Framework AI/ML Development with Keras 3All hail the return of KerasJun 16Jun 16
Published inTowards Data ScienceAI Model Training with JAXHit the road to super-fast AI/ML developmentMay 29May 29
Published inTowards Data SciencePyTorch Native FP8Accelerating PyTorch Training Workloads with FP8 — Part 2May 21May 21