Thoughts and Theory

Harnessing the Power of Dedicated DNN Training Chips

One of the driving forces behind the success of deep learning over the past decade has been the immense computing power offered by Graphics Processing Units (GPUs). Although originally designed for rendering images to display devices, their highly parallel structure enabled training speed-ups of orders of magnitude. Over time GPUs…


Making Sense of Big Data

A Simple Technique that Can Save You Bucketloads of Money and How to Combine it with Mixed Precision Learning

Motivated by the desire to accelerate the speed of learning, a common practice in the world of deep learning today is to distribute training activity across multiple workers (e.g. GPUs). …


Making Sense of Big Data

Simplify data management by unifying the file format across different kinds of machine learning workloads

Machine learning is all about the data. To successfully train a sophisticated model you will need a high quality training dataset; a dataset that is sufficiently large, accurately labeled, and correctly represents the distribution of data samples in the real world. However, no less important is proper management of the…


Making Sense of Big Data

Reduce CPU Load on the Training Instance by Processing Data During its Retrieval

Two months ago (in March of 2021) AWS announced the Amazon S3 Object Lambda feature, a new capability that enables one to process data that is being retrieved from Amazon S3 before it reaches the calling application. …


What to look out for when scaling your training to multiple workers

These days data distributed training is all the rage. In data distributed training learning is performed on multiple workers in parallel. The multiple workers can reside on one or more training machines. Each worker starts off with its own identical copy of the full model and performs each training step…


Making Sense of Big Data

Dynamically Adapt your Training Session Based on Worker System Availability

Horovod is a popular framework for running distributed training on multiple GPU workers and across multiple hosts. Elastic Horovod is an exciting new feature of Horovod that introduces support for fault-tolerance, enabling training to continue uninterrupted, even in the face of failing or resuming hosts. …


Back to Basics: Rethinking Development Best Practices in the Age of Cloud Computing

My previous posts have been mostly technical, covering a range of topics on training in the cloud and advanced TensorFlow development. This post is different. You might consider it more of an opinion piece.

In the course of my career I have rarely seen tensions run higher than when discussing…


Making Sense of Big Data

Maximize Training Resource Utilization, Accelerate Learning, Save Money

In a previous post, I spoke about the importance of profiling the runtime performance of your DNN training sessions as a means to making the most of your training resources, accelerating your training, and saving money. I described a typical training pipeline, (see the diagram below), reviewed some of the…


Making Sense of Big Data

How to Increase Your Efficiency and Reduce Cost When Training in the Cloud

This blog post accompanies a talk I gave at AWS re:Invent 2020, in which I described some of the ways in which my team at Mobileye, (officially known as Mobileye, an Intel Company), uses Amazon SageMaker Debugger in its daily DNN development.

Monitoring the Learning Process

A critical part of training machine learning models…


How to Implement a Non-trivial TensorFlow Keras Loss Function

One of the main ingredients of a successful deep neural network, is the model loss function. At Mobileye, (officially known as Mobileye, an Intel Company), we spend a lot of time cultivating our loss functions, and fine-tuning them to the precise problems that we are trying to solve. While we…

Chaim Rand

I am a Machine Learning Algorithm Developer working on Autonomous Vehicle technologies at Mobileye, an Intel Company.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store