Discussion about this post

User's avatar
Destiny S. Harris's avatar

I'm just excited to see it happen

Expand full comment
Neural Foundry's avatar

The SPAD concept is the most intresting part of this breakdown. Seperating prefill and decode workloads at the hardware level makes so much sense when you think about how different the computational profiles are. Most companies treat inference as one monolithic problem but Google splitting readers and writers into distinct physical architectures could be a huge efficiency win.

Expand full comment

No posts

Ready for more?