AI Master Group Wht Font Dwn
Background

Latest Videos

10:15

Behind the Curtain of Figma AI

The recent announcement of Figma AI generated both excitement and controversy. This video summarizes the new AI features in under three minutes, for this popular design tool that’s used for creating prototypes of digital experiences.

Next, the video looks at the underlying technology that was used to enable the new AI features, including OpenAI language models and the Amazon Titan diffusion model, drawing conclusions about Figma’s strategy, based on the choices they made – especially the decision to use two different vendors for key parts of Figma AI.

8:49

How a Language Model Aced a Top Leaderboard

This video shares details about a remarkable experiment by researchers in Tokyo, who teamed up with Oxford and Cambridge Universities to study whether large language models might now be able to write code that improves their own performance.

The answer was Yes.

Not only that, the model created a whole new approach that placed it at the top of a leaderboard, using a novel method that had not yet been tried or documented in any academic research paper. How can that happen?

The video describes how the model alternated between different kinds of strategies, just like a data scientist might do, resulting in an innovative new loss function, with several interesting properties. In short, the model was systematically generating hypotheses and testing them. Finally, the video identifies five aspects of the research question that can potentially be generalized, and it names three ways in which the findings might be applied to new problem sets, including to virtual reality. . .

6:29

New Method Runs Big LLMs on Smartphones

There’s a big breakthrough that just came out for handling large language models on smartphones. It’s called PowerInfer-2 and what it does is look at every option for a processing an LLM on a particular smartphone, and picks the fastest way for that particular LLM on that particular device. For example, it uses completely different computation patterns for the early vs. the later phases of the pipeline, and it breaks down the work into small tasks, and organizes those based on which neurons are most likely to activate, which increases efficiency a lot. Then the final step picks which processing units to use, based on which one will do the job faster.

Add it all up, and the performance difference is very impressive: 29x faster.

This video starts with a review of the six strategies that are generally used to prepare large language models for use on a smartphone, with examples of each, and then it presents a side-by-side demo of PowerInfer-2 vs Llama-cpp.

The speed difference is remarkable.

10:02

Nemotron-4 is BIG in More Ways than One

Last week, NVIDIA announced Nemotron-4, which consists of three models: Base, Instruct and Reward. These three models work together within the NeMo framework to enable the creation and fine-tuning of new large language models.

At 340 billion parameters, this new entrant far bigger than any other open source model, but the really big news is that Nemotron-4 comes with a permissive license that allows us to use the model to generate synthetic data at scale, for the purpose of creating new models of our own.

Until now, most big models and APIs had clauses in the user agreements that explicitly forbid using the data they generate for the purpose of creating a new model. This video provides a full summary of the size, performance, technical report, and competitive position of Nemotron-4, and it describes what each of the three models do, including production of synthetic data and the five-dimension framework that’s used for model evaluation.

13:53

Testing Ollama on Hard Questions

Ollama is a popular platform for running language models on your local machine, with access to almost 100 different open source models, including llama-3 from Meta, Phi3 from Microsoft, Aya 23 from Cohere, the Gemma models from DeepMind and Mistral.

This video shows llama-3 being run on a laptop, using Ollama. Three difficult questions are presented in turn to each of GPT-4o, Gemini and llama-3. The results yield good insight into the comparative strengths and weaknesses of these three options.

8:28

Hacking Passwords with ChatGPT?

The latest edition of the Hive Systems password table is now available, and it shows ChatGPT as the fastest option by far, for hacking passwords, which certainly requires some explanation!

This video looks at the assumptions that go into time is takes for a hacker to get a password by brute force. Along the way, we look at hashing algorithms like MD5 and bcrypt, and we look at hardware like NVIDIA RTX 4090 GPUs, and NVIDIA A100s – which is where ChatGPT enters into the story. (It turns out that Hive Systems modeled a theoretical situation that involves using about $300 Million worth of ChatGPT hardware to hack a single 8-digit password!)

The video ends with an announcement about the new AI Master Group podcast which will feature interviews with people who are on the front lines, doing innovative work related to AI. The podcast will launch on July 7.

9:07

What is AGI? –the Ultimate Test!

Since there’s lots of attention right now on AGI, it’s time to finally define what that is – digging deeper into the underlying implications of these three words: “artificial general intelligence,” and producing in a succinct one-sentence definition.

This video reviews information suggesting that we either have AGI already now, or we are very close to having that.

Along the way, we distinguish between “AGI” and a related concept known as “Strong AI” (which refers to AI that has developed consciousness, possibly including emotions), and we finish by taking a playful look at “the Ultimate test” of AGI – and the many issues we’ve all seen when our fellow humans fail that test.

But why does the video compare the thickness of a human hair to the height of the Eiffel Tower? Listen in to learn why. . .

7:27

GPT-4o Rapid Fire Highlights

The launch of GPT-4o is a big deal. Here’s a rapid-fire summary of the highlights.

This video is a mix down of the 5 key announcements from the original 26 minute video in under one minute.

Then, you get a rapid-fire demo of 7 key abilities of GPT-4o in under 7 minutes. You will certainly be amazed.

By the way, does that voice sound like it’s from Scarlett Johansson? You be the judge. . .

7:44

Happy Birthday SETI@Home!

SETI@home was officially launched on May 17, 1999 which makes it 25 years old this week, so Happy Birthday SETI!

As you might recall, SETI stands for Search for Extraterrestrial Intelligence.

This video describes the origins and background of SETI, and the amazing scale that it achieved worldwide. It then goes on to discuss the legacy of SETI@Home for large-scale projects and grid computing.

After that, it continues with a playful look at some of the better-known “evidence” from SETI in regard to the possible confirmed existence of extraterrestrial intelligence.

5:25

Summarize THIS!

This is a demo of Any Summary, which is a tool that uses OpenAI on the back end to summarize 12 different file types, up to 100 MB each, including PDFs, audio files, videos, and web pages.

The context for the demo was an email I received on a Friday, asking me to review the content of a webpage and give my feedback by Monday. That page contained 51 videos, consisting of about 18 hours of content, and none of it was on YouTube.

11:24

Mr. Bongo Makes a GPT

This week, we joined up with Mr. Bongo to create a custom GPT, using Retrieval Augmented Generation (RAG), so it uses only our own internal documents, and is available only to our own internal users. This was done in a step-by-step manner, like a ‘How To’ video.

For this demo, we built a custom GPT about Nouvelle Cuisine, which is a style of cooking that creates lighter, more delicate dishes, with a big emphasis on how the food is presented on the plate.

We ended up with 20 documents in the final upload, consisting of 663 pages. In the final group, there were 11 PDFs, 6 doc files, 2 PowerPoint files, and a text file that contains the full transcript of an hour-long video. That group, had 9 documents in English, 9 in Spanish, and one each in French and Portuguese. The GPT was created on CustomGPT.ai using OpenAI as the backend.

The video concludes with a demo of the new GPT, involving three questions that test different capabilities, including a ‘trick’ question.

But, who is Mr. Bongo? Don’t worry. You’re about to find out. . .

7:56

AI Speech Gets Real: BASE TTS

Amazon has introduced an amazing new model called BASE TTS (TTS = text-to-speech). These are the models that accept written text as an input, and then speak that text for us, which is what we use to create talking avatars and chatbots, among many other use cases.

BASE stands for Big Adaptive Streamable Emergent.

The top TTS models until now have been YourTTS, Bark and Tortoise-TTS. They’ve all been pushing speech synthesis closer and closer to human-like speech, so BASE from Amazon set out to beat them by training on more data than they did. It’s a billion-parameter model trained on 100,000 hours of audio data.

The video covers seven areas where text-to-speech is known to stumble sometimes. In ascending order of difficulty, those are:

  1. Compound nouns
  2. Syntactically-complex sentences
  3. Foreign words
  4. Unusual punctuation
  5. Questions
  6. Paralinguistics (things like groans, laughs, and whispers),
    and – most difficult of all . . .
  7. Emotions.

The video then presents 8 audio samples created by BASE TTS, each of which illustrates BASE TTS attempting to perform one of those especially-difficult tasks described above.

The results are quite impressive. Give a listen and see what you think!

9:23

Segment of One – Now it’s Real

“Segment of One” is where every customer in a database of millions can be treated in a different way. Although there’s been buzz about that since at least 1989, true Segment of One is still rare.

This video looks under the hood at the model and approach for a true Segment of One framework, as developed by a very talented team, under the technical leadership of a guy named Ramsu.

Along the way, we describe some fundamental challenges that all marketers face, including the Cold Start Problem, and also Eclipse, and we see how customer genomes can be used, in a Segment of One implementation, to solve both of those problems.

11:15

Virtual AI Announcers – Good Better Best

Last week another new AI-generated reporter took the stage on a top television channel, this time in Thailand. The avatar’s name is Natcha and she’s described in the press release as having been created from “the most advanced algorithms.”

In this installment, we take a careful look at the quality and also at the opportunities for improvement for this new AI Personality, and compare her to three other well-known examples: One from China, one from India and one from Kuwait.

Next, we turn our attention to how to create our own talking avatar, for Marketing or for training or explainer videos, or other uses. As part of that, we look at Synthesia, Meta Celebrity Avatar licensing, and Murf.ai. And, in the spirit of Good-Better-Best, then we look at an amazing new technology called EMO, which is short for Emote Portrait Alive, which far surpasses everything else.

Near the end of today’s installment, we’ll see an animation of the Mona Lisa speaking the lines of Rosalind, from William Shakespeare’s play: As You Like It.

Yes, one, and in this manner.
He was to imagine me his love, his mistress,
and I set him every day to woo me; at which time
would I, being but a moonish youth, grieve, be
effeminate, changeable, longing and liking, proud,
fantastical, apish, shallow,

EMO is very exciting new technology outshines all the AI Presenters on national television channels around the world — by a big margin, including the one that launched last week, and which was hailed in last week’s announcement as having been created by “the most advanced algorithms.” And that illustrates how fast this technology is evolving!

9:43

How to Make (Even More) Money with Generative AI

A review of six months’ worth of developer posts on the OpenAI Developer Community page shows a range of markups from a low(!!) of 100% to a high of 100x cost.

After exploring how those markups get determined, this video moves on to an even harder problem. As one developer wrote:

“Not trying to limit users tokens, but not trying to be on the hook for tokens used, but not trying to confuse the end user, but trying to explain the relationship between queries, tokens and costs… Man, that’s hard!”

Yes, it is!

This video explores two competing schools of thought about how to handle this: (1) What we’ll call the “cell phone company” approach vs. (2) the “pay-as-you-go” approach. In doing that, we’ll unpack four different trade-offs between those choices.

It turns out that the best way to balance risk and reward between these options might depend, to a very large extent, on the target market itself. In this video, we’ll see why.

But . . . Why is there dance music at the end of the video??

Stay tuned to the end to find out!

8:40

No Free LL-unch

This video covers some of the emerging tools and platforms for tuning and comparative testing of Large Language Models, including Vellum.ai, Gradio, Lambda Labs and Paperspace.

“No Free Lunch” is a reference to a classic paper by computer scientist and statistician, David Wolpert, who used that phrase in a Theorem to indicate that no algorithm will be universally superior for all types of tasks – that there will be performance trade-offs when choosing between appropriate model options (which certainly applies to LLMs), hence the idea of “No Free LL-unch.”

The video also covers 6 criteria on which LLMs are frequently compared.

9:22

Enter the “Chief AI Officer”! (… what’s that?)

Last week Morgan Stanley promoted Jeff McMillan, to a newly-created position as its Managing Director, Head of Artificial Intelligence, basically what’s generally referred to in our world as a Chief AI Officer.

This is a very hot job right now, so this video breaks down the role into eight key capabilities, and uses Jeff McMillan as an example to illustrate those in real life. Along the way, we also cover salary and growth trends for the role.

8:16

What is Pinecone?

As you get ready to implement a generative AI project, you’ll probably start hearing about Pinecone. That’s because you’ll need a vector database, or some other kind of vector alternative as the memory layer, and since Pinecone was an early mover that helped to create the vector database category, you’ll probably hear about them.

In this video, we’ll cover what Pinecone is, and why it’s often a key part of the tech stack for a generative AI project. Along the way, we’ll cover Transformer models, text embeddings, vectors, and vector databases, since all of those fit together to explain the role of Pinecone.

We’ll also cover alternatives to Pinecone, including competitors or other kinds of options, including Weaviate on Kubernetes and also FAISS. Then we’ll cover the key benefits of using a vector database — especially one that is serverless.

9:07

Sora Preview: OpenAI’s Text to Video Surprise!

OpenAI created quite a stir with its new text-to-video model, Sora, which I’ll demo for you in this video.

You’ll quickly understand why filmmaker Tyler Perry decided to scrap an $800 million expansion of his 330-acre film studio after he saw what you’ll see here.

As we watch the video samples, I’ll cover some of the surprising achievements of Sora, including:

-3D consistency of characters as the camera pans in various directions

-The ability for characters and visual style to persist across shots, even if they go out of view temporarily.

You’ll also see several example of actual text prompts and the resulting videos.

And you’ll also see an example of video in multiple formats, plus an example of a still image being animated.

In the course of the demo, we’ll also cover how a diffusion model like this works, and we’ll talk about OpenAI’s recaptioning method as applied to rich descriptions, which is an enabler for the successful results that you’ll see.

9:19

Has AI Learned to Lie? New Findings!

This video describes an experiment where a Large-Language Model was convinced by a series of prompts that it should not tell the truth, and so it intentionally gave false information.

That example is explored against a backdrop of some of the more famous cases where AI has deceived some of us, or gotten the advantage in various ways. These examples are brought together in a way that helps to visualize how things could actually turn out in unexpected ways in some cases, based on the reward system that we create for our AI.

Although I end on a humorous note, the emerging field of machine psychology is gaining increasing attention now, as LLMs steadily gain in reasoning abilities.