play_arrow

keyboard_arrow_right

skip_previous play_arrow skip_next
00:00 00:00
playlist_play chevron_left
volume_up
  • Home
  • keyboard_arrow_right New Method Runs Big LLMs on Smartphones

New Method Runs Big LLMs on Smartphones

Jim Griffin June 26, 2024


Background

There’s a big breakthrough that just came out for handling large language models on smartphones. It’s called PowerInfer-2 and what it does is look at every option for a processing an LLM on a particular smartphone, and picks the fastest way for that particular LLM on that particular device. For example, it uses completely different computation patterns for the early vs. the later phases of the pipeline, and it breaks down the work into small tasks, and organizes those based on which neurons are most likely to activate, which increases efficiency a lot. Then the final step picks which processing units to use, based on which one will do the job faster.

Add it all up, and the performance difference is very impressive: 29x faster.

This video starts with a review of the six strategies that are generally used to prepare large language models for use on a smartphone, with examples of each, and then it presents a side-by-side demo of PowerInfer-2 vs Llama-cpp.

The speed difference is remarkable.

Previous post