Sequoia Capital: Move 37 is Here!
This is a special edition of the ‘AI World’ video series covering the release of OpenAI-o1 (alias Q* and Strawberry). By whatever name, this is a very powerful new kind […]
Jeffrey Spyropoulos: Making Analytics Count at JCP Jim Griffin
Tapan Khopkar: A ‘MasterClass’ in Marketing Mix Jim Griffin
Aida Farahani: From 2D to 3D in Seconds Jim Griffin
Nikhil Patel: Inside Sally Beauty’s Data Strategy Jim Griffin
Victor Perrine: From Bananas to $Billions Jim Griffin
Ray Pettit: New Models for AI Literacy? Jim Griffin
Ivan Pinto: A Year of AI Testing in Software Dev Jim Griffin
Sam Marks: Big Data, Big Bad Bruins Jim Griffin
Andrew Ng was the keynote speaker last week on Day Two of the Snowflake BUILD conference, and in that talk, he shared results from testing different kinds of agentic workflows on the Human Eval benchmark.
This video is a deep dive into those test results, paying particular attention to the top two best-performing agentic tools in the evaluation panel done by DeepLearning, which were Reflexion and AgentCoder – both of which surpassed a 95% score on the demanding HumanEval benchmark. It’s probably not a coincidence that the top two best-performing agentic frameworks are quite similar, so the video describes the similarities and differences between them. It then concludes with a summary of all the models that were tested, presented in a way that helps to stack rank the frameworks tested, from highest to lowest performance.
This is a special edition of the ‘AI World’ video series covering the release of OpenAI-o1 (alias Q* and Strawberry). By whatever name, this is a very powerful new kind […]
Copyright AI Master Group 2023-24