I've been an expert in and advocate of distributed systems for decades. I was lucky enough to find my way into the automotive world at GM where multiple computers cooperatte to make a car operational. From there, luck struck again as I stumbled into Grid Computing at IBM where I learned about enterprise distribution and the inherent challenges and the promises of moving compute and storage as close as possible to where data and decisions live. One of my greatest successes was getting Duke Energy to reverse course and push simple computations they had planned to put on their mainframe, and overload it, down to the smart grid controller. I proved to Todd Arnold, then SVP Smart Grid, that only through distribution could they even come close to providing their customers with the service they designed.
I've been considering writing this post for a week, fearful it was too far out there for people to grasp. Thankfully Aravind Srinivas, the CEO of Perplexity gets it. He stated last Saturday that on-device AI threatens the massive data center build-out strategy employed by just about everyone in the AI space today. Remember, this is a company backed by Nvidia, the ones making billions by filling those AI data centers with their product.
Why is AI's future distributed? Cost.
Time is money, and automated systems need answers faster than humans starting at a screen. It takes time for input data to be shipped out, a decision to be made, and for that decision to trek back. Data has a unidirectional trend: growth! The amount of data we generate and consume grows every year, and it's not just cat videos. All those billions and billions of Interent connected devices, the sensors and controllers being distributed everywhere to control everything, are generating data that AI will need to consume. There's too much data to move quickly enough to meet the millisecond needs of next generation systems.
Beyond speed, compared to the cost of moving data, everything else is free. There's a reason cloud hyperscalers charge nothing for data ingress but gleefully bill for data egress.
As we see announcements of hundreds of billions of dollars being dumped into AI data centers, generating fear of a bubble, it breaks my core belief that success comes from leveraging other people's assets. That phone in your pocket is capable of MUCH more than you use it for beyond cat videos. As a thought experiment, I was challenged by the CTO of Wells Fargo a decade ago to come up with a solution to providing 100% uptime. It was a test. Would I BS him, or admit there is no such thing. What he didn't expect was me handing him a workable solution within 30 minutes: store an emergency copy of financial transaction data on each person's phone in the secure sandbox your app creates. Yes, it would work, and he knew it. We are surrounded by smart devices that are I/O bound, meaning they spend most of their life doing NOTHING! That's what Aravind gets.
Here's a wrinkle though. What if the price of RAM goes up so high that companies pull back on IoT deployments? What if they have to scale down the available headspace of their devices to keep the costs down? What if laptops and desktops surged in price by 50% or 100% because we're competing with IBM and Google for silicon? Perhaps that's why, despite the obvious business value of narrowing down to the AI market, Nvidia has maintained availability of it's products for consumers, unlike Crucial who is withdrawing to focus their silicon capacity on commercial products exclusively.
How it plays out will be interesting, but just as I advocated for what we now call Edge Computing over a decade ago, I'll advocate for pushing AI workloads out to the endpoints, onto devices that are ready now in ways people don't understand.

