Streaming Data from Cloud Storage with Mountpoint for Amazon S3A First Look at a New Solution for Mounting Cloud Based DataFeb 102Feb 102
Efficient Metric Collection in PyTorch: Avoiding the Performance Pitfalls of TorchMetricsPyTorch Model Performance Analysis and Optimization — Part 7Feb 41201Feb 41201
Published inTDS ArchiveOptimizing Transformer Models for Variable-Length Input SequencesHow PyTorch NestedTensors, FlashAttention2, and xFormers can Boost Performance and Reduce AI CostsNov 26, 20243955Nov 26, 20243955
Published inTDS ArchiveIncreasing Transformer Model Efficiency Through Attention Layer OptimizationHow paying “better” attention can drive ML cost savingsNov 18, 202492Nov 18, 202492
Published inTDS ArchiveOn the Programmability of AWS Trainium and InferentiaAccelerating AI/ML Model Training with Custom Operators — Part 4Nov 1, 202410Nov 1, 202410
Published inTDS ArchiveAI Model Optimization on AWS Inferentia and TrainiumTips for accelerating ML with AWS Neuron SDKOct 20, 2024431Oct 20, 2024431
Published inTDS ArchiveImplementing Sequential Algorithms on TPUAccelerating AI/ML Model Training with Custom Operators — Part 3.AOct 7, 202434Oct 7, 202434
Published inTDS ArchiveThe Rise of Pallas: Unlocking TPU Potential with Custom KernelsAccelerating AI/ML Model Training with Custom Operators — Part 3Oct 6, 2024192Oct 6, 2024192
Published inTDS ArchiveTraining AI Models on CPURevisiting CPU for ML in an Era of GPU ScarcitySep 1, 20242854Sep 1, 20242854
Published inTDS ArchiveUnleashing the Power of Triton: Mastering GPU Kernel Optimization in PythonAccelerating AI/ML Model Training with Custom Operators — Part 2Aug 13, 2024823Aug 13, 2024823
Published inTDS ArchiveAccelerating AI/ML Model Training with Custom OperatorsOn the potential benefits of creating model-specific GPU kernels and their application to optimizing the use of dynamically shaped tensorsAug 11, 20241Aug 11, 20241
Published inTDS ArchiveMulti-Framework AI/ML Development with Keras 3All hail the return of KerasJun 16, 2024Jun 16, 2024
Published inTDS ArchiveAI Model Training with JAXHit the road to super-fast AI/ML developmentMay 29, 2024May 29, 2024
Published inTDS ArchivePyTorch Native FP8Accelerating PyTorch Training Workloads with FP8 — Part 2May 21, 2024May 21, 2024
Published inTDS ArchiveA Priority Based Scheduler for Amazon SageMaker Training JobsOptimizing the use of limited AI training accelerators — Part 2Mar 8, 2024Mar 8, 2024
Retaining Amazon SageMaker Instance Capacity with SageMaker Managed Warm PoolsAn Alternative Solution to Cloud Instance ReservationFeb 27, 2024Feb 27, 2024
Published inTDS ArchiveMaximizing the Utility of Scarce AI Resources: A Kubernetes ApproachOptimizing the use of limited AI training acceleratorsFeb 13, 20241Feb 13, 20241
Published inTDS ArchiveHow to Implement a Custom Training Solution Based on Amazon EC2A Simple Solution for Managing Cloud-Based ML-Training — Part 2Jan 30, 20241Jan 30, 20241
Published inTDS ArchiveOptimizing Instance Type Selection for AI Development in Cloud Spot MarketsInstance Selection for Deep Learning — Part 2Jan 22, 2024Jan 22, 2024
Published inTDS ArchiveDebugging and Tuning Amazon SageMaker Training Jobs with SageMaker SSH HelperA new tool that increases the debuggability of managed training workloadsDec 27, 20231Dec 27, 20231