New Method Runs Big LLMs on Smartphones

Jim Griffin June 26, 2024

There’s a big breakthrough that just came out for handling large language models on smartphones. It’s called PowerInfer-2 and what it does is look at every option for a processing an LLM on a particular smartphone, and picks the fastest way for that particular LLM on that particular device. For example, it uses completely different computation patterns for the early vs. the later phases of the pipeline, and it breaks down the work into small tasks, and organizes those based on which neurons are most likely to activate, which increases efficiency a lot. Then the final step picks which processing units to use, based on which one will do the job faster.

Add it all up, and the performance difference is very impressive: 29x faster.

This video starts with a review of the six strategies that are generally used to prepare large language models for use on a smartphone, with examples of each, and then it presents a side-by-side demo of PowerInfer-2 vs Llama-cpp.

The speed difference is remarkable.

Author

Jim Griffin

Nemotron-4 is BIG in More Ways than One

Jim Griffin June 20, 2024

Last week, NVIDIA announced Nemotron-4, which consists of three models: Base, Instruct and Reward. These three models work together within the NeMo framework to enable the creation and fine-tuning of […]

Default

Michael Koved: The Economics of Generative AI

Jim Griffin June 10, 2025

Default

Nachiket Mehta: Inside the Data Mesh at Wayfair

Jim Griffin May 31, 2025