play_arrow

keyboard_arrow_right

skip_previous play_arrow skip_next
00:00 00:00
playlist_play chevron_left
volume_up
  • Home
  • keyboard_arrow_right How an 8B Model Beat an Industry Giant

How an 8B Model Beat an Industry Giant

Jim Griffin November 7, 2024


Background

This video describes how a system called ‘AgentStore’ was able to gain the top spot on a benchmark for AI agents – beating out a gigantic model with a small one.

AgentStore is a platform and method for aggregating specialized agents that perform real-world tasks on digital devices on macOS, Windows and Ubuntu. In that system, a meta agent selects the best resource (or combination of resources) for each user request. The new benchmark was achieved using a small 8B model, outperforming industry heavy-weight Claude 3.5 Sonnet.

The testing was done on OSWorld, which is an environment for benchmarking agents on 369 different computer tasks involving popular web and desktop workflows, spanning multiple applications, ranging from Google Chrome and Microsoft Office to Thunderbird and PDF. The video describes some of the tasks that are part of this difficult benchmark. Testing was also done on APPAgent, which is a similar benchmark for mobile applications. The video reviews the test results and the capabilities of the agents, as well as the overall system design, including a special class of token that identifies what each agent can do. This information is used by a meta agent that picks the most suitable resource for each task, based on the information in those tokens.

Previous post