For those who came in late, last spring rumours surfaced that the iPhone maker was working with Broadcom on its first AI server silicon, internally dubbed "Baltra", with TSMC’s 3nm N3E process pencilled in.
At the time, the chatter suggested the design would be wrapped up within 12 months, which in Apple time usually means whenever it feels like it. Actual deployment is now expected in 2027, even though Job’s Mob started shipping US-made servers in October 2025.
The interesting bit is not when the chip lands, but what Apple plans to do with it. Firstly, the outfit is not expected to train large AI models any time soon. It has already cut a deal with Google to use a customised 3-trillion-parameter "Gemini" model to power its so-called Apple Intelligence in the cloud.
That arrangement reportedly costs Apple $1 billion per year (€920 million), which buys it someone else’s brain while it pretends to be an AI pioneer. Given that setup, Baltra looks less like a training monster and more like a glorified inference workhorse.
For those not in the know, Inference is what happens when pre-trained models churn out answers, summaries or yet another dreary autogenerated email. It is less about raw mathematical heroics and more about latency, throughput and doing it cheaply at scale.
Inference silicon does not need high precision maths such as INT8, nor the heavyweight number crunching used in model training. That strongly suggests Apple and Broadcom are tuning Baltra for efficiency rather than ambition.
What this means is that while the Tame Apple Press will tell you that Apple's new chips are a cure for cancer, the clever bits of AI still come from elsewhere, no matter how shiny the silicon sounds.


