The Next Stage in Generative AI is On-Device

Generative AI is 2023's hottest topic, and everyone's trying to figure out how to fit it into their business. Whether it’s transforming customer service, retail planning, content creation, or other fresh ideas, the momentum behind generative AI in business looks unstoppable.

But generative AI demands are going to run up big dollar costs unless queries move into the billions of devices in our hands. TIRIAS Research estimates amortized data center capital plus operating costs for generative AI at more than $70 billion by 2028 if something doesn't change. These massive costs will be a drag on the whole industry, being passed along to ISVs, enterprises, and consumers.

Fortunately, there are already more than 2 billion devices shipped with the Qualcomm® AI Engine — and not just phones. They’re also PCs, tablets, security cameras, robots, vehicles, and more. Shifting AI workloads from data centers with assumed infinite processing performance to power-constrained, low-cost devices takes new thinking. New forms of compilation, low-bit quantization, and tuning are already taking place both at major AI research firms and in the open-source community, creating language, vision, and other models optimized to run on devices you don't need to pay a cloud provider’s service fee to use.

Within the past few months, Meta introduced its Llama 2 model with 7-billion and 13-billion parameter versions, and Google introduced its even smaller Gecko. In some measures the Llama 13-billion models outperformed the GPT-3 175-billion model, the root of the original version of ChatGPT. While there's definitely a race going on — the big models get better as well as the small ones — the device-friendly models are becoming more and more capable.

And that's only the largest, buzziest language models. In the IOT world, enterprises often rely on image-recognition and other vision models which are frequently less than a billion parameters. With Qualcomm President and CEO Cristiano Amon saying that devices will run models with more than 10 billion parameters easily by the end of the year, that can make for a massive shift from costly cloud decision-making to more efficient on-device processing.

Beyond cost, on-device AI lets users make smarter decisions, taking advantage of personal, location, and sensor data that customers may not want shared with a cloud service. In a device-centric, hybrid AI world, smaller and more sensitive models run securely on devices, while queries that can only be handled by giant models get sent to the cloud — offering the best of both approaches.

Combine multiple models in a hybrid approach and new forms of magic happen. In a car, a personal assistant could use a voice-to-text model within the vehicle, upload chat queries to the cloud, and have a backup on-device chat model for when connectivity gets weak. All of this would be seamless and invisible to the user. It would just work — reliably, speedily, and at low cost.

Ziad Asghar, senior vice president, product management, Qualcomm Technologies, Inc. will be discussing how AI is elevating the customer experience at MWC Las Vegas, on Wednesday, September 27 at 11:30 a.m. on Stage A in the West Hall. To follow the latest developments in on-device AI, sign up for our newsletter .

Qualcomm AI Engine is a product of Qualcomm Technologies, Inc. and/or its subsidiaries.