Nemotron-4 is BIG in More Ways than One
Last week, NVIDIA announced Nemotron-4, which consists of three models: Base, Instruct and Reward. These three models work together within the NeMo framework to enable the creation and fine-tuning of […]
Iqbal Hossain: The UofAZ Knowledge Map Story Jim Griffin
Jeffrey Spyropoulos: Making Analytics Count at JCP Jim Griffin
Tapan Khopkar: A ‘MasterClass’ in Marketing Mix Jim Griffin
Aida Farahani: From 2D to 3D in Seconds Jim Griffin
Nikhil Patel: Inside Sally Beauty’s Data Strategy Jim Griffin
Victor Perrine: From Bananas to $Billions Jim Griffin
Ray Pettit: New Models for AI Literacy? Jim Griffin
Ivan Pinto: A Year of AI Testing in Software Dev Jim Griffin
There’s a big breakthrough that just came out for handling large language models on smartphones. It’s called PowerInfer-2 and what it does is look at every option for a processing an LLM on a particular smartphone, and picks the fastest way for that particular LLM on that particular device. For example, it uses completely different computation patterns for the early vs. the later phases of the pipeline, and it breaks down the work into small tasks, and organizes those based on which neurons are most likely to activate, which increases efficiency a lot. Then the final step picks which processing units to use, based on which one will do the job faster.
Add it all up, and the performance difference is very impressive: 29x faster.
This video starts with a review of the six strategies that are generally used to prepare large language models for use on a smartphone, with examples of each, and then it presents a side-by-side demo of PowerInfer-2 vs Llama-cpp.
The speed difference is remarkable.
Last week, NVIDIA announced Nemotron-4, which consists of three models: Base, Instruct and Reward. These three models work together within the NeMo framework to enable the creation and fine-tuning of […]
Copyright AI Master Group 2023-24