The Automotive Data Challenge: Enter the Colosseum
Nothing lasts forever. It’s a hard truth that everyone must face, but it comes with living in a world that’s constantly changing. These days, the pace of change is often measured in bits and bytes: we’ve been told that by 2025 we’ll be generating 463 exabytes of data globally every day, or the equivalent of over 200 million DVDs.
By one estimate, the digital universe doubles in size every two years, and manufacturing is a major contributor to that growth. By Acerta’s estimates, the auto industry currently generates 93 petabytes (PB) of on-road data from connected vehicles and a whopping 809PB of manufacturing data each year.
With so much information, machine learning (ML) is becoming a necessity for automakers.
While implementing ML involves more than just technical challenges, manufacturers tend to face challenges both with data quality, and ever-shifting data characteristics. Even when they have a solution, it comes with an expiry date. Machine learning solutions for manufacturing naturally decay over time in response to shifts in production data. These shifts result from changes in staffing, raw materials, tooling, even the weather—which raises a pair of intriguing questions for industrial machine learning:
How do we best react to all these changes to our data, and does each type warrant a different reaction?
At Acerta, we’ve been tackling these questions head-on, and today I’m excited to share our answer.
Data, Data Everywhere But Not Enough to Think
You might expect that the 809PB of data per year coming from the automotive industry would be more than enough for any data scientists to work with, but—for all the talk of Industry 4.0—there’s a surprising paucity of manufacturing data.
It’s not a problem of volume, but quality. Many things can go wrong during data collection, and there are even more which can threaten the usefulness of the data generated during production. It might be human-error during data entry, a lack of traceability between ops, or impossible-to-synchronize data from multiple devices. It could even be a poor sampling rate!
Sampling rate issues are particularly insidious. It comes down to the Nyquist rate: in order to reliably reconstruct the original signal, the rate of sampling must be twice the signal bandwidth. If you’re not measuring at the Nyquist rate, then you’re basically doomed to lie to yourself about what’s happening. And this isn’t only a common problem in manufacturing, it’s often an ignored one.
To understand why, we need to consider CAN bus, which is a system that measures data by sending very small packets (each one has an information “payload” of only eight bytes). Because they’re so small, automakers have to make sacrifices; the number of packets they can send is limited by the connectivity standards of the CAN bus, ultimately constraining your data pipeline. The solution is to either add more buses, reduce the rate of sampling, or sample aperiodically, and the end result in each case usually isn’t good.
A lot of the data Acerta has collected from complete vehicles being tested has all these control units (ECUs) talking to each other (and talking over each other) via the CAN bus. And what we see when we try to build a story about these signals is that some of them change so rapidly that the CAN bus can’t capture them.
Automakers are no strangers to this problem, so when they build cars, they don’t build them to send information about signals that are impossible to measure as a consequence of the above constraint. The problem is that they’re still collecting all this data from connected cars and trying to solve issues with it that the data was never intended to solve. As the auto industry tries to do more complex things with that same data, more people are realizing that it’s not of a sufficient quality to be useful.
Getting Better Manufacturing Data
There are ways of dealing with CAN bus limitations. For example, because the CAN bus signals aren’t always generated in exact lockstep, you can have random jitter, so you have to look at the average sampling rate, not just the sample rate you’ve set.
The graph above shows the result of a random sampling rate: even though we’re sampling at the Nyquist rate, because they’re randomly distributed, we can’t guarantee that we have enough resolution at any given point. In other words, because we’re trying to measure a lot of stuff with a system that is bandwidth-limited, we can’t send messages perfectly periodically—and because of a lot of them aren’t safety-critical, we end up with randomly delayed or advanced messages, which is a problem.
This is the sampling rate for the previous graph. You can see the average sampling period is around 1.0, but we have to half this in order to approximate the Nyquist rate. By taking roughly twice as many samples, we can get an expected value for interarrival time of 0.5, meaning we’ve hit our Nyquist rate.
And here’s the result:
To sum up: we need to ensure that the average time between samples models the Nyquist rate, rather focusing on whether the whole thing, on average, looks right.
Data Resolution - An Aside
As any meteorologist will tell you: measurements aren’t inherently useful. We’ve seen this already in the issues with CAN bus data, but there are countless other examples in manufacturing. In two separate instances, we’ve had clients using calipers with a precision of 0.01, but their tolerances were also in the 0.01 range. That’s not a good idea.
It might seem silly, but people tend to reach for the most familiar tool that’s served them well in the past. You might think it’s a one-off mistake; somebody screwed up and it wouldn’t happen again, but we’ve seen it twice this year already!
The moral is: Don’t measure millimeters with meter sticks.
Signal Drift & Model Decay in Manufacturing
Signal drift denotes a change in the character of your data over time, for any given method of characterization. Models decay not only because of signal drift, but also as a result of changes in the relationships between measured signals. That’s why we tackle this in two ways: we monitor the data for changes in its character and we also monitor our model for changes in its character.
Basically, the problem is that we’re modeling the world, and the world is not so cooperative.
Here’s an example from one of our clients:
You can see clear shifts in the distribution of this data, month over month. The means have changed, even the standard deviations and the general shapes in some cases—and this isn’t a small sample. It’s data generated over five months; 1,000 units per day, every day. In one instance, we had a signal with a tolerance of 0.01 and the mean is shifting by almost that value. And this is data coming from a full-scale production line, which is typically designed to produce products as precisely as possible.
Most manufacturers get an intuitive sense of these changes because they notice changes in their FTT, which can lead them down the path of further investigation. Industry 4.0 represents the dream of having your finger on the pulse of production at all times, so you know the instant that anything changes. We need to track this pulse to make good models and to maintain them.
But there are lots of ways a pulse can change. The company might change the sensors on the production line or add new ones—that would be a schema shift, where the information included in the data set changes over time. There’s also covariate drift, where the distribution of values of input signals change over time.
Obviously, things on the production line are changing all the time: new operations, new equipment, and—hopefully—changes made based on our recommendations.
So, how do we keep up with it all?
Enter the Colosseum
Most people are familiar with the concept of DevOps—the development practice whereby we take a piece of software, test it, vet it, and deploy it in a reliable way. At Acerta, we’re taking a similar approach to machine learning with our MLOps. In other words, we’re looking at how to take a machine learning model, test it, vet it, and deploy it in a reliable way. Additionally, we want to automate as much of that process as possible. That’s where the Colosseum comes in.
It started as a benchmarking tool—a way to compare the performance of different models. Since our models all speak the same language, we can set up benchmarking tools that don’t care about what the models are actually doing. They just test the models and return the results.
Once we had the benchmarking tools in place, we built Ludus Magnus, which takes JSON specifications for an ML experiment, including references to training data, testing data and metadata. This lets us treat an entire training job as a JSON file and distribute them automatically on our auto-scaling compute platform. The result is that we can now programmatically and automatically generate these JSONs to run all kinds of benchmarking, hyperparameter optimization, or cross-validation experiments to determine which model is best for which job.
This is Acerta’s Colosseum: the arena where our models are continually pitted against each other to determine which is best for a given application. In other words, our clients are Caesars (Hey, better than being judges at a dog show, right?), and our models are the fearless gladiators battling it out for their glory—by which, of course, I mean manufacturing efficiency.
And the best part is that because we’re doing this with JSON, we can have lots of “battles” playing out simultaneously by scaling up our infrastructure. In one instance, we ran 7,000 experiments over the course of a day, more or less in parallel.
Although we originally built this framework as an internal tool to evaluate different model architectures, we’ve repurposed it by adding the ability to monitor data and deploy models if they “win” in the Colosseum. So, whenever there’s a change in the clients’ data, we generate events and an event handler chooses which competitors to pit against each-other on the new data in the Colosseum. Once the best model emerges, it goes through something closer to a standard DevOps process: tests for validity, robustness and bias are automatically executed, it’s benchmarked against the currently deployed model.
Here’s the kicker: if the new model runs better, it’s delivered almost instantly!
We built this because we were doing a lot of anomaly detection and we couldn’t find a good tool for doing that kind of experimentation in general purpose machine learning frameworks. There was no existing tool that approached the semi-supervised anomaly detection strategy the way we need it to. So, we made our own.
Nothing lasts forever, but some things do stand the test of time. The Roman Colosseum has been standing (more or less) for nearly 2,000 years. Our Colosseum may not be around for quite that long, but we’re hoping it will be put to better use.
For More Information...
Check out other blog posts or read our case studies.