play_arrow

keyboard_arrow_right

skip_previous play_arrow skip_next
00:00 00:00
playlist_play chevron_left
volume_up
AI Master Group Wht Font Dwn
Background

Latest Videos

10:57

Willow Quantum Computer, Amazing Milestones

This video explains why this week’s announcement of Google’s Willow Quantum Computer is significant, starting with the fact that Willow was able to solve an exceedingly complex problem in under 5 minutes that the world’s fastest traditional supercomputer could not to solve at all – even if given all the time that the universe existed until today. (It would need more time.)

To ground the discussion, the video first describes what Quantum Computing is and how it’s related to quantum mechanics, which is the fundamental theory that describes the behavior of the smallest particles known to man. It also includes fascinating details about Quantum computing, and covers the key differences between Quantum computers and traditional computers that use bits with a state of 0 or 1.

Along the way, the video describes key concepts and milestones that are important for understanding the significance of this week’s announcement, including helpful layman’s descriptions of “random circuit sampling” and of the “critical quantum error correction threshold.”

11:10

AI 2025 Forecast: Agents Dominate

This video is our second annual forecast for key trends or developments most likely to define the coming year for AI in the private sector.

By far, the biggest one is AI agents. If we had to choose just one development that is set to define AI next year, this would be it.

Using OpenAI to illustrate this, the video walks through the reasons why they might plausibly achieve their aspirational goal of growing from 250 million weekly active users to a billion users next year by offering AI agents that help people in their day-to-day tasks.

Meanwhile, Microsoft, Anthropic, Google and Elon Musk (xAI) have all announced plans to launch AI agents of their own in the coming year, and the video explains why this is where the money is, so that’s why this is set to be a key trend for next year.

The other three trends covered are:

  1. Further advances in Generative AI, including multi-modal applications that process images, audio or video, as well as text
  2. A proliferation of lightweight models that perform well on edge devices at low cost, and
  3. New or expanded use cases in robotics that go beyond repetitive manufacturing tasks
12:19

Inner Workings of OpenAI-o1? A First Glimpse

Since the architecture for the powerful o1 reasoning model from OpenAI has not been disclosed, there’s a lot of curiosity about how it works. To get a better understanding of that, this video pulls together information from OpenAI itself, along with systematic tests that were published in a recent paper by members of OpenO1, which is a group which hopes to create an open source version of the o1 model.

First, performance of the o1 model is compared against four well-known open-source methods that are designed to achieve similar results.

Next, six types of reasoning strategies exhibited by the o1 model are described, and those methods are mapped to four very different problem sets: HotpotQA, Collie, USACO and AIME, covering commonsense reasoning, coding and math. The analysis shows that the choice of reasoning methods deployed by the o1 model is far from random. To the contrary, the choices the model makes about its problem-solving strategies are well matched to the problems presented.

7:32

Andrew Ng at Snowflake: AI Agent Battle Royale

Andrew Ng was the keynote speaker last week on Day Two of the Snowflake BUILD conference, and in that talk, he shared results from testing different kinds of agentic workflows on the Human Eval benchmark.

This video is a deep dive into those test results, paying particular attention to the top two best-performing agentic tools in the evaluation panel done by DeepLearning, which were Reflexion and AgentCoder – both of which surpassed a 95% score on the demanding HumanEval benchmark. It’s probably not a coincidence that the top two best-performing agentic frameworks are quite similar, so the video describes the similarities and differences between them. It then concludes with a summary of all the models that were tested, presented in a way that helps to stack rank the frameworks tested, from highest to lowest performance.

16:54

Sequoia Capital: Move 37 is Here!

This is a special edition of the ‘AI World’ video series covering the release of OpenAI-o1 (alias Q* and Strawberry). By whatever name, this is a very powerful new kind of model that has demonstrated remarkable reasoning abilities.

The video starts with a look back in time at “Move 37” – an iconic moment in AI history during the 2016 match between AlphaGo and Lee Sedol. That was a moment when the world saw AI do something that looked a lot like reasoning or strategy, and the latent promise implied by that moment seems to coming to life at this very moment.

For its storyline, the video draws on two very recent papers (and very important) papers:

  1. “Generative AI’s Act o1: The Agentic Reasoning Era Begins” by Sequoia Capital
  2. “Learning to Reason with LLMs” by OpenAI

First, to illustrate the new model’s capabilities, the video showcases that model’s success at decoding an encrypted message, which is definitely not something that a basic language would be able to do.

And with that as context, the focus then turns to the Sequoia Capital investment hypothesis, which is that considerable value will be be unlocked by companies that apply agentic AI in a domain-specific context, especially if those use cases target specialized pools of work. To illustrate this, the video presents XBOW, which is a company that’s been able to use agentic AI to replace highly-skilled experts that do cyber-security penetration testing.

Building on the implications of that example, the video concludes with reflections on the enormous potential impact of these new capabilities – opportunities and risks that can be measured in the trillions of dollars.

11:37

How an 8B Model Beat an Industry Giant

This video describes how a system called ‘AgentStore’ was able to gain the top spot on a benchmark for AI agents – beating out a gigantic model with a small one.

AgentStore is a platform and method for aggregating specialized agents that perform real-world tasks on digital devices on macOS, Windows and Ubuntu. In that system, a meta agent selects the best resource (or combination of resources) for each user request. The new benchmark was achieved using a small 8B model, outperforming industry heavy-weight Claude 3.5 Sonnet.

The testing was done on OSWorld, which is an environment for benchmarking agents on 369 different computer tasks involving popular web and desktop workflows, spanning multiple applications, ranging from Google Chrome and Microsoft Office to Thunderbird and PDF. The video describes some of the tasks that are part of this difficult benchmark. Testing was also done on APPAgent, which is a similar benchmark for mobile applications. The video reviews the test results and the capabilities of the agents, as well as the overall system design, including a special class of token that identifies what each agent can do. This information is used by a meta agent that picks the most suitable resource for each task, based on the information in those tokens.

8:03

Mesh Anything (except a Pink Hippo Ballerina)

The developers at MeshAnything have just released new code that offers an important improvement in how the surface of 3D objects can be encoded. What the new method does is build out the shape by always seeking to find and encode an adjacent face that shares an edge, which requires only about half as many tokens to represent the same information by other methods, resulting in a four-fold reduction in the memory requirement to achieve the same task, which enabled MeshAnything to double the maximum number of faces it can handle on a single object to 1600, as compared to 800 for current methods.

This video starts by comparing the new method with the current one. After that, we generate a 3D object from a text prompt on the Rodin website (a pink hippopotamus ballerina character with white tutu), and we check it on the Sketchfab website. Then we run the code that was provided by MeshAnything on GitHub, and we check the output on Sketchfab, comparing before and after side-by-side. The results confirm the final words of the paper, which state that “the accuracy of MeshAnything V2 is still insufficient for industrial applications. More efforts are needed.” Nonetheless, this new computational approach is elegant, and the video concludes with a prediction that we’ll likely see improvements that build on the foundations laid by MeshAnything V2.

8:36

Can Robots Win at Table Tennis? Take a Look!

Google DeepMind has just achieved a new level of robotic skill – the ability to compete and win at table tennis, a game that requires years of training for people who want compete at an expert level.

This video shows the robot in action against an array of competitors, ranging from beginner level to tournament pro and, in doing so, describes both the hardware and AI aspect, including how it was trained and a summary of the key innovations contributed by this project.

It also gives summary results of the live matches, segmented by experience level of opponents. As a bonus, I looked at the performance data and have shared four insider tips for how to beat this robot at table tennis. The video ends on a light note, describing something called RoboCup, which has the goal of fielding a team of robots that will be ready to take on the World Cup soccer champion team by 2050. You’ll quickly see that we have a very long way to go on that particular goal.

10:36

Shark Alert! YOLO AI-Vision in Action

Last week, several news outlets ran a story about SharkEye, which is an AI-vision shark detection program, developed at the University of California, Santa Barbara, and deployed at California’s Padaro Beach, which is an area where surfers and great white sharks are both frequently found.

After quickly describing the program itself, the video identifies the underlying technology that was used for the vision aspect, confirming from the project’s GitHub page that YOLO v8 by Ultralytics was used. Basically, Ultralytics created an abstraction layer that simplifies the deployment of computer vision models, so that even developers with almost no experience in computer vision can quickly implement sophisticated projects. To illustrate, the video then shows a demo of an object detection and identification task being set up and run on Google Colab. It then concludes with examples of types of projects that can be implemented by Ultralytics YOLO v8.

11:03

AI Can do That?? Silver Medal in Pure Math

AI has just achieved an amazing milestone. A couple of Alpha models by Google DeepMind scored silver-medal-level performance in a globally-recognized competition in advanced mathematics: IMO 2004.

This video starts by setting the context for this latest achievement, going back to significant milestones in 2022 and 2023 that helped set the stage for what just happened, sharing the story along the way of two remarkable mathematicians, and comparing their achievements to those of the Alpha models.

With the stage set in that way, the video then describes key details of the contest, including the scoring system, and how DeepMind scored on each problem, including details of a very difficult geometry problem that is solved in a matter of seconds. Next the video describes details about the training that was done for the AlphaProof and the AlphaGeometry 2 models. Finally, it assesses the implications of this accomplishment, including some of the fields in which this kind of capability might make significant contributions.

8:12

Will Open-Source Llama Beat GPT-4o?

Last week Meta launched its newest family of models, Llama 3.1, including a new benchmark – an open-source foundation model with 405 billion parameters. With this, Zuckerberg predicted that Meta AI will surpass OpenAI’s 200 million monthly active users by the end of this year.

Hubris aside, this video looks at six reasons why we need to pay attention to this announcement, including Zuckerberg’s assertion that open source will eventually win for language models for the same reasons that Linux eventually won out against an array of closed-source Unix models.

It then describes a situation where a company has already been building solutions using an OpenAI model or Anthropic, for example, but then decides to get an informed point of view about the open source option by creating a challenger model as well, using the new Llama options. For that situation the video suggests which model size to use, plus recommendations for best platform options for the pilot, plus four types of projects that would be good candidates for a head-to-head test of this sort. Finally, it concludes with a light-hearted description of the battle ahead.

10:20

Call a Doctor! –Blue Screen Lessons Learned

Companies worldwide grappled on Friday with what Troy Hunt, famously described as “the largest IT outage in history,” caused by a faulty sensor configuration update that got pushed to Microsoft by the cyber-security giant, CrowdStrike, resulting in a $31 billion loss in market capitalization for the company.

Specific information about the bug is not yet publicly available, but this video presents 12 top suspects, including two primary ones. From there, it focuses on lessons learned, with the help of a live interview with fractional CTO and senior solutions architect, Dave Stern, who is the author of the recent best-selling book Hackproof Your Startup.

7:07

Amazing Milestone! Million Experts Model

A top researcher at Google DeepMind just released an important paper, “Mixture of a Million Experts.” As the paper’s title announces, it describes an approach that resulted in the first-known Transformer model with more than a million experts.

For context, the number of experts currently seen in smaller models varies between 4 and 32, and ranges up to 128 for most of the bigger ones.

This video reviews the Mixture-of-Experts method, including why and where it’s used, and the computational challenges associated with doing this. Next, it summarizes the findings of another important paper from earlier this year, where a new scaling law was introduced for Mixture-of-Experts models. That sets us up to review the “Million Experts” paper by Xu He.

The video then describes two key strategies that enabled scale to over a million experts by creating experts that are only a single neuron large. Next, it shares a process map for the new approach, and concludes with ideas about where this might be most relevant, including applications that involve continuous data streams.

10:15

Behind the Curtain of Figma AI

The recent announcement of Figma AI generated both excitement and controversy. This video summarizes the new AI features in under three minutes, for this popular design tool that’s used for creating prototypes of digital experiences.

Next, the video looks at the underlying technology that was used to enable the new AI features, including OpenAI language models and the Amazon Titan diffusion model, drawing conclusions about Figma’s strategy, based on the choices they made – especially the decision to use two different vendors for key parts of Figma AI.

8:49

How a Language Model Aced a Top Leaderboard

This video shares details about a remarkable experiment by researchers in Tokyo, who teamed up with Oxford and Cambridge Universities to study whether large language models might now be able to write code that improves their own performance.

The answer was Yes.

Not only that, the model created a whole new approach that placed it at the top of a leaderboard, using a novel method that had not yet been tried or documented in any academic research paper. How can that happen?

The video describes how the model alternated between different kinds of strategies, just like a data scientist might do, resulting in an innovative new loss function, with several interesting properties. In short, the model was systematically generating hypotheses and testing them. Finally, the video identifies five aspects of the research question that can potentially be generalized, and it names three ways in which the findings might be applied to new problem sets, including to virtual reality. . .

6:29

New Method Runs Big LLMs on Smartphones

There’s a big breakthrough that just came out for handling large language models on smartphones. It’s called PowerInfer-2 and what it does is look at every option for a processing an LLM on a particular smartphone, and picks the fastest way for that particular LLM on that particular device. For example, it uses completely different computation patterns for the early vs. the later phases of the pipeline, and it breaks down the work into small tasks, and organizes those based on which neurons are most likely to activate, which increases efficiency a lot. Then the final step picks which processing units to use, based on which one will do the job faster.

Add it all up, and the performance difference is very impressive: 29x faster.

This video starts with a review of the six strategies that are generally used to prepare large language models for use on a smartphone, with examples of each, and then it presents a side-by-side demo of PowerInfer-2 vs Llama-cpp.

The speed difference is remarkable.

10:02

Nemotron-4 is BIG in More Ways than One

Last week, NVIDIA announced Nemotron-4, which consists of three models: Base, Instruct and Reward. These three models work together within the NeMo framework to enable the creation and fine-tuning of new large language models.

At 340 billion parameters, this new entrant far bigger than any other open source model, but the really big news is that Nemotron-4 comes with a permissive license that allows us to use the model to generate synthetic data at scale, for the purpose of creating new models of our own.

Until now, most big models and APIs had clauses in the user agreements that explicitly forbid using the data they generate for the purpose of creating a new model. This video provides a full summary of the size, performance, technical report, and competitive position of Nemotron-4, and it describes what each of the three models do, including production of synthetic data and the five-dimension framework that’s used for model evaluation.

13:53

Testing Ollama on Hard Questions

Ollama is a popular platform for running language models on your local machine, with access to almost 100 different open source models, including llama-3 from Meta, Phi3 from Microsoft, Aya 23 from Cohere, the Gemma models from DeepMind and Mistral.

This video shows llama-3 being run on a laptop, using Ollama. Three difficult questions are presented in turn to each of GPT-4o, Gemini and llama-3. The results yield good insight into the comparative strengths and weaknesses of these three options.

8:28

Hacking Passwords with ChatGPT?

The latest edition of the Hive Systems password table is now available, and it shows ChatGPT as the fastest option by far, for hacking passwords, which certainly requires some explanation!

This video looks at the assumptions that go into time is takes for a hacker to get a password by brute force. Along the way, we look at hashing algorithms like MD5 and bcrypt, and we look at hardware like NVIDIA RTX 4090 GPUs, and NVIDIA A100s – which is where ChatGPT enters into the story. (It turns out that Hive Systems modeled a theoretical situation that involves using about $300 Million worth of ChatGPT hardware to hack a single 8-digit password!)

The video ends with an announcement about the new AI Master Group podcast which will feature interviews with people who are on the front lines, doing innovative work related to AI. The podcast will launch on July 7.

9:07

What is AGI? –the Ultimate Test!

Since there’s lots of attention right now on AGI, it’s time to finally define what that is – digging deeper into the underlying implications of these three words: “artificial general intelligence,” and producing in a succinct one-sentence definition.

This video reviews information suggesting that we either have AGI already now, or we are very close to having that.

Along the way, we distinguish between “AGI” and a related concept known as “Strong AI” (which refers to AI that has developed consciousness, possibly including emotions), and we finish by taking a playful look at “the Ultimate test” of AGI – and the many issues we’ve all seen when our fellow humans fail that test.

But why does the video compare the thickness of a human hair to the height of the Eiffel Tower? Listen in to learn why. . .