AI Master Group Wht Font Dwn
Background

Latest Videos

8:03

Mesh Anything (except a Pink Hippo Ballerina)

The developers at MeshAnything have just released new code that offers an important improvement in how the surface of 3D objects can be encoded. What the new method does is build out the shape by always seeking to find and encode an adjacent face that shares an edge, which requires only about half as many tokens to represent the same information by other methods, resulting in a four-fold reduction in the memory requirement to achieve the same task, which enabled MeshAnything to double the maximum number of faces it can handle on a single object to 1600, as compared to 800 for current methods.

This video starts by comparing the new method with the current one. After that, we generate a 3D object from a text prompt on the Rodin website (a pink hippopotamus ballerina character with white tutu), and we check it on the Sketchfab website. Then we run the code that was provided by MeshAnything on GitHub, and we check the output on Sketchfab, comparing before and after side-by-side. The results confirm the final words of the paper, which state that “the accuracy of MeshAnything V2 is still insufficient for industrial applications. More efforts are needed.” Nonetheless, this new computational approach is elegant, and the video concludes with a prediction that we’ll likely see improvements that build on the foundations laid by MeshAnything V2.

8:36

Can Robots Win at Table Tennis? Take a Look!

Google DeepMind has just achieved a new level of robotic skill – the ability to compete and win at table tennis, a game that requires years of training for people who want compete at an expert level.

This video shows the robot in action against an array of competitors, ranging from beginner level to tournament pro and, in doing so, describes both the hardware and AI aspect, including how it was trained and a summary of the key innovations contributed by this project.

It also gives summary results of the live matches, segmented by experience level of opponents. As a bonus, I looked at the performance data and have shared four insider tips for how to beat this robot at table tennis. The video ends on a light note, describing something called RoboCup, which has the goal of fielding a team of robots that will be ready to take on the World Cup soccer champion team by 2050. You’ll quickly see that we have a very long way to go on that particular goal.

10:36

Shark Alert! YOLO AI-Vision in Action

Last week, several news outlets ran a story about SharkEye, which is an AI-vision shark detection program, developed at the University of California, Santa Barbara, and deployed at California’s Padaro Beach, which is an area where surfers and great white sharks are both frequently found.

After quickly describing the program itself, the video identifies the underlying technology that was used for the vision aspect, confirming from the project’s GitHub page that YOLO v8 by Ultralytics was used. Basically, Ultralytics created an abstraction layer that simplifies the deployment of computer vision models, so that even developers with almost no experience in computer vision can quickly implement sophisticated projects. To illustrate, the video then shows a demo of an object detection and identification task being set up and run on Google Colab. It then concludes with examples of types of projects that can be implemented by Ultralytics YOLO v8.

11:03

AI Can do That?? Silver Medal in Pure Math

AI has just achieved an amazing milestone. A couple of Alpha models by Google DeepMind scored silver-medal-level performance in a globally-recognized competition in advanced mathematics: IMO 2004.

This video starts by setting the context for this latest achievement, going back to significant milestones in 2022 and 2023 that helped set the stage for what just happened, sharing the story along the way of two remarkable mathematicians, and comparing their achievements to those of the Alpha models.

With the stage set in that way, the video then describes key details of the contest, including the scoring system, and how DeepMind scored on each problem, including details of a very difficult geometry problem that is solved in a matter of seconds. Next the video describes details about the training that was done for the AlphaProof and the AlphaGeometry 2 models. Finally, it assesses the implications of this accomplishment, including some of the fields in which this kind of capability might make significant contributions.

8:12

Will Open-Source Llama Beat GPT-4o?

Last week Meta launched its newest family of models, Llama 3.1, including a new benchmark – an open-source foundation model with 405 billion parameters. With this, Zuckerberg predicted that Meta AI will surpass OpenAI’s 200 million monthly active users by the end of this year.

Hubris aside, this video looks at six reasons why we need to pay attention to this announcement, including Zuckerberg’s assertion that open source will eventually win for language models for the same reasons that Linux eventually won out against an array of closed-source Unix models.

It then describes a situation where a company has already been building solutions using an OpenAI model or Anthropic, for example, but then decides to get an informed point of view about the open source option by creating a challenger model as well, using the new Llama options. For that situation the video suggests which model size to use, plus recommendations for best platform options for the pilot, plus four types of projects that would be good candidates for a head-to-head test of this sort. Finally, it concludes with a light-hearted description of the battle ahead.

10:20

Call a Doctor! –Blue Screen Lessons Learned

Companies worldwide grappled on Friday with what Troy Hunt, famously described as “the largest IT outage in history,” caused by a faulty sensor configuration update that got pushed to Microsoft by the cyber-security giant, CrowdStrike, resulting in a $31 billion loss in market capitalization for the company.

Specific information about the bug is not yet publicly available, but this video presents 12 top suspects, including two primary ones. From there, it focuses on lessons learned, with the help of a live interview with fractional CTO and senior solutions architect, Dave Stern, who is the author of the recent best-selling book Hackproof Your Startup.

7:07

Amazing Milestone! Million Experts Model

A top researcher at Google DeepMind just released an important paper, “Mixture of a Million Experts.” As the paper’s title announces, it describes an approach that resulted in the first-known Transformer model with more than a million experts.

For context, the number of experts currently seen in smaller models varies between 4 and 32, and ranges up to 128 for most of the bigger ones.

This video reviews the Mixture-of-Experts method, including why and where it’s used, and the computational challenges associated with doing this. Next, it summarizes the findings of another important paper from earlier this year, where a new scaling law was introduced for Mixture-of-Experts models. That sets us up to review the “Million Experts” paper by Xu He.

The video then describes two key strategies that enabled scale to over a million experts by creating experts that are only a single neuron large. Next, it shares a process map for the new approach, and concludes with ideas about where this might be most relevant, including applications that involve continuous data streams.

10:15

Behind the Curtain of Figma AI

The recent announcement of Figma AI generated both excitement and controversy. This video summarizes the new AI features in under three minutes, for this popular design tool that’s used for creating prototypes of digital experiences.

Next, the video looks at the underlying technology that was used to enable the new AI features, including OpenAI language models and the Amazon Titan diffusion model, drawing conclusions about Figma’s strategy, based on the choices they made – especially the decision to use two different vendors for key parts of Figma AI.

8:49

How a Language Model Aced a Top Leaderboard

This video shares details about a remarkable experiment by researchers in Tokyo, who teamed up with Oxford and Cambridge Universities to study whether large language models might now be able to write code that improves their own performance.

The answer was Yes.

Not only that, the model created a whole new approach that placed it at the top of a leaderboard, using a novel method that had not yet been tried or documented in any academic research paper. How can that happen?

The video describes how the model alternated between different kinds of strategies, just like a data scientist might do, resulting in an innovative new loss function, with several interesting properties. In short, the model was systematically generating hypotheses and testing them. Finally, the video identifies five aspects of the research question that can potentially be generalized, and it names three ways in which the findings might be applied to new problem sets, including to virtual reality. . .

6:29

New Method Runs Big LLMs on Smartphones

There’s a big breakthrough that just came out for handling large language models on smartphones. It’s called PowerInfer-2 and what it does is look at every option for a processing an LLM on a particular smartphone, and picks the fastest way for that particular LLM on that particular device. For example, it uses completely different computation patterns for the early vs. the later phases of the pipeline, and it breaks down the work into small tasks, and organizes those based on which neurons are most likely to activate, which increases efficiency a lot. Then the final step picks which processing units to use, based on which one will do the job faster.

Add it all up, and the performance difference is very impressive: 29x faster.

This video starts with a review of the six strategies that are generally used to prepare large language models for use on a smartphone, with examples of each, and then it presents a side-by-side demo of PowerInfer-2 vs Llama-cpp.

The speed difference is remarkable.

10:02

Nemotron-4 is BIG in More Ways than One

Last week, NVIDIA announced Nemotron-4, which consists of three models: Base, Instruct and Reward. These three models work together within the NeMo framework to enable the creation and fine-tuning of new large language models.

At 340 billion parameters, this new entrant far bigger than any other open source model, but the really big news is that Nemotron-4 comes with a permissive license that allows us to use the model to generate synthetic data at scale, for the purpose of creating new models of our own.

Until now, most big models and APIs had clauses in the user agreements that explicitly forbid using the data they generate for the purpose of creating a new model. This video provides a full summary of the size, performance, technical report, and competitive position of Nemotron-4, and it describes what each of the three models do, including production of synthetic data and the five-dimension framework that’s used for model evaluation.

13:53

Testing Ollama on Hard Questions

Ollama is a popular platform for running language models on your local machine, with access to almost 100 different open source models, including llama-3 from Meta, Phi3 from Microsoft, Aya 23 from Cohere, the Gemma models from DeepMind and Mistral.

This video shows llama-3 being run on a laptop, using Ollama. Three difficult questions are presented in turn to each of GPT-4o, Gemini and llama-3. The results yield good insight into the comparative strengths and weaknesses of these three options.

8:28

Hacking Passwords with ChatGPT?

The latest edition of the Hive Systems password table is now available, and it shows ChatGPT as the fastest option by far, for hacking passwords, which certainly requires some explanation!

This video looks at the assumptions that go into time is takes for a hacker to get a password by brute force. Along the way, we look at hashing algorithms like MD5 and bcrypt, and we look at hardware like NVIDIA RTX 4090 GPUs, and NVIDIA A100s – which is where ChatGPT enters into the story. (It turns out that Hive Systems modeled a theoretical situation that involves using about $300 Million worth of ChatGPT hardware to hack a single 8-digit password!)

The video ends with an announcement about the new AI Master Group podcast which will feature interviews with people who are on the front lines, doing innovative work related to AI. The podcast will launch on July 7.

9:07

What is AGI? –the Ultimate Test!

Since there’s lots of attention right now on AGI, it’s time to finally define what that is – digging deeper into the underlying implications of these three words: “artificial general intelligence,” and producing in a succinct one-sentence definition.

This video reviews information suggesting that we either have AGI already now, or we are very close to having that.

Along the way, we distinguish between “AGI” and a related concept known as “Strong AI” (which refers to AI that has developed consciousness, possibly including emotions), and we finish by taking a playful look at “the Ultimate test” of AGI – and the many issues we’ve all seen when our fellow humans fail that test.

But why does the video compare the thickness of a human hair to the height of the Eiffel Tower? Listen in to learn why. . .

7:27

GPT-4o Rapid Fire Highlights

The launch of GPT-4o is a big deal. Here’s a rapid-fire summary of the highlights.

This video is a mix down of the 5 key announcements from the original 26 minute video in under one minute.

Then, you get a rapid-fire demo of 7 key abilities of GPT-4o in under 7 minutes. You will certainly be amazed.

By the way, does that voice sound like it’s from Scarlett Johansson? You be the judge. . .

7:44

Happy Birthday SETI@Home!

SETI@home was officially launched on May 17, 1999 which makes it 25 years old this week, so Happy Birthday SETI!

As you might recall, SETI stands for Search for Extraterrestrial Intelligence.

This video describes the origins and background of SETI, and the amazing scale that it achieved worldwide. It then goes on to discuss the legacy of SETI@Home for large-scale projects and grid computing.

After that, it continues with a playful look at some of the better-known “evidence” from SETI in regard to the possible confirmed existence of extraterrestrial intelligence.

5:25

Summarize THIS!

This is a demo of Any Summary, which is a tool that uses OpenAI on the back end to summarize 12 different file types, up to 100 MB each, including PDFs, audio files, videos, and web pages.

The context for the demo was an email I received on a Friday, asking me to review the content of a webpage and give my feedback by Monday. That page contained 51 videos, consisting of about 18 hours of content, and none of it was on YouTube.

11:24

Mr. Bongo Makes a GPT

This week, we joined up with Mr. Bongo to create a custom GPT, using Retrieval Augmented Generation (RAG), so it uses only our own internal documents, and is available only to our own internal users. This was done in a step-by-step manner, like a ‘How To’ video.

For this demo, we built a custom GPT about Nouvelle Cuisine, which is a style of cooking that creates lighter, more delicate dishes, with a big emphasis on how the food is presented on the plate.

We ended up with 20 documents in the final upload, consisting of 663 pages. In the final group, there were 11 PDFs, 6 doc files, 2 PowerPoint files, and a text file that contains the full transcript of an hour-long video. That group, had 9 documents in English, 9 in Spanish, and one each in French and Portuguese. The GPT was created on CustomGPT.ai using OpenAI as the backend.

The video concludes with a demo of the new GPT, involving three questions that test different capabilities, including a ‘trick’ question.

But, who is Mr. Bongo? Don’t worry. You’re about to find out. . .

7:56

AI Speech Gets Real: BASE TTS

Amazon has introduced an amazing new model called BASE TTS (TTS = text-to-speech). These are the models that accept written text as an input, and then speak that text for us, which is what we use to create talking avatars and chatbots, among many other use cases.

BASE stands for Big Adaptive Streamable Emergent.

The top TTS models until now have been YourTTS, Bark and Tortoise-TTS. They’ve all been pushing speech synthesis closer and closer to human-like speech, so BASE from Amazon set out to beat them by training on more data than they did. It’s a billion-parameter model trained on 100,000 hours of audio data.

The video covers seven areas where text-to-speech is known to stumble sometimes. In ascending order of difficulty, those are:

  1. Compound nouns
  2. Syntactically-complex sentences
  3. Foreign words
  4. Unusual punctuation
  5. Questions
  6. Paralinguistics (things like groans, laughs, and whispers),
    and – most difficult of all . . .
  7. Emotions.

The video then presents 8 audio samples created by BASE TTS, each of which illustrates BASE TTS attempting to perform one of those especially-difficult tasks described above.

The results are quite impressive. Give a listen and see what you think!

9:23

Segment of One – Now it’s Real

“Segment of One” is where every customer in a database of millions can be treated in a different way. Although there’s been buzz about that since at least 1989, true Segment of One is still rare.

This video looks under the hood at the model and approach for a true Segment of One framework, as developed by a very talented team, under the technical leadership of a guy named Ramsu.

Along the way, we describe some fundamental challenges that all marketers face, including the Cold Start Problem, and also Eclipse, and we see how customer genomes can be used, in a Segment of One implementation, to solve both of those problems.