Andrew Ng at Snowflake: AI Agent Battle Royale

Jim Griffin 28/11/2024

Andrew Ng was the keynote speaker last week on Day Two of the Snowflake BUILD conference, and in that talk, he shared results from testing different kinds of agentic workflows on the Human Eval benchmark.

This video is a deep dive into those test results, paying particular attention to the top two best-performing agentic tools in the evaluation panel done by DeepLearning, which were Reflexion and AgentCoder – both of which surpassed a 95% score on the demanding HumanEval benchmark. It’s probably not a coincidence that the top two best-performing agentic frameworks are quite similar, so the video describes the similarities and differences between them. It then concludes with a summary of all the models that were tested, presented in a way that helps to stack rank the frameworks tested, from highest to lowest performance.

Author

Jim Griffin

Sequoia Capital: Move 37 is Here!

Jim Griffin 14/11/2024

This is a special edition of the ‘AI World’ video series covering the release of OpenAI-o1 (alias Q* and Strawberry). By whatever name, this is a very powerful new kind […]

Default

Raghav Ram: 40 LLMs, One Answer

Jim Griffin 24/11/2025

Default

Michael Koved: The Economics of Generative AI

Jim Griffin 10/06/2025