play_arrow

keyboard_arrow_right

skip_previous play_arrow skip_next
00:00 00:00
playlist_play chevron_left
volume_up
  • Home
  • keyboard_arrow_right Andrew Ng at Snowflake: AI Agent Battle Royale

Andrew Ng at Snowflake: AI Agent Battle Royale

Jim Griffin November 28, 2024


Background

Andrew Ng was the keynote speaker last week on Day Two of the Snowflake BUILD conference, and in that talk, he shared results from testing different kinds of agentic workflows on the Human Eval benchmark.

This video is a deep dive into those test results, paying particular attention to the top two best-performing agentic tools in the evaluation panel done by DeepLearning, which were Reflexion and AgentCoder – both of which surpassed a 95% score on the demanding HumanEval benchmark. It’s probably not a coincidence that the top two best-performing agentic frameworks are quite similar, so the video describes the similarities and differences between them. It then concludes with a summary of all the models that were tested, presented in a way that helps to stack rank the frameworks tested, from highest to lowest performance.

Previous post