The following is the transcript of Jensen Huang’s Keynote. I am going to publish it just as it is originally. Later, I will go back through it. I will work to better explain it to you in linked pages.
Here is the Original Transcript:
This is how intelligence is made: a new kind of factory. A generator of tokens, the building blocks of AI. Tokens have opened a new frontier, the first step into an extraordinary world where endless possibilities are born.
[Music]
Tokens transform words into knowledge and breathe life into images. They turn ideas into videos and help us safely navigate any environment. Tokens teach robots to move like the masters…
[Music]
…and inspire new ways to celebrate our victories.
“A martini, please.”
Call light up.
“Thank you, Adam.”
…and give us peace of mind when we need it most.
“Hi, Maroka.”
“Hi, Anna. It’s good to see you again.”
“Hi, Emma. We’re going to take your blood sample today.”
“Okay, don’t worry, I’m going to be here the whole time.”
They bring meaning to numbers, to help us better understand the world around us…
[Music]
…predict the dangers that surround us…
[Music]
…and find cures for the threats within us.
[Music]
Tokens can bring our visions to life…
[Music]
…and restore what we’ve lost.
[Music] [Applause]
“Zachary, I got my voice back, buddy.”
They help us move forward one small step at a time…
[Music]
…and one giant leap together.
And here is where it all begins.
Welcome to the stage, NVIDIA founder and CEO, Jensen…
[Music] [Applause] [Music] [Applause]
Jensen Huang (on stage):
Welcome to CES! Are you excited to be in Las Vegas? Do you like my jacket? I thought I’d go the other way from Gary Shapiro. I’m in Las Vegas, after all. If this doesn’t work out, if all of you object—well, just get used to it! I really think you have to let this sink in. In another hour or so, you’re going to feel good about it.
Well, welcome to NVIDIA. In fact, you’re inside NVIDIA’s digital twin, and we’re going to take you to NVIDIA. Ladies and gentlemen, welcome to NVIDIA. You’re inside our digital twin. Everything here is generated by AI.
It has been an extraordinary journey, an extraordinary year here, and it started in 1993. Ready, go, with NV1. We wanted to build computers that can do things that normal computers couldn’t, and NV1 made it possible to have a game console in your PC.
Our programming architecture was called UDA—missing the letter “C” until a little while later—but UDA, “Unified Device Architecture.” The first developer for UDA and the first application that ever worked on UDA was SEGA’s Virtua Fighter. Six years later, we invented, in 1999, the programmable GPU, and it started 20-plus years of incredible advance in this incredible processor called the GPU. It made modern computer graphics possible.
And now, 30 years later, SEGA’s Virtua Fighter is completely cinematic. This is the new Virtua Fighter project that’s coming. I just can’t wait. Absolutely incredible.
Six years after that—six years after 1999—we invented CUDA so that we could explain or express the programmability of our GPUs to a rich set of algorithms that could benefit from it. CUDA initially was difficult to explain, and it took years—in fact, it took approximately six years. Somehow, six years later or so, 2012: Alex Krizhevsky, Ilya Sutskever, and Jeff Hinton discovered CUDA, used it to process AlexNet, and the rest of it is history.
AI has been advancing at an incredible pace since. It started with perception AI—we now can understand images and words and sounds—to generative AI. We can generate images and text and sounds. And now, agentic AI: AIs that can perceive, reason, plan, and act. Then the next phase—some of which we’ll talk about tonight—physical AI.
2012—now magically 2018—something happened that was pretty incredible. Google’s Transformer was released as BERT, and the world of AI really took off. Transformers, as you know, completely changed the landscape for artificial intelligence. In fact, it completely changed the landscape for computing altogether.
We recognized, properly, that AI was not just a new application with a new business opportunity, but AI—more importantly, machine learning enabled by Transformers—was going to fundamentally change how computing works. And today, computing is revolutionized in every single layer, from hand-coding instructions that run on CPUs to create software tools that humans use, we now have machine learning that creates and optimizes new networks that process on GPUs and create artificial intelligence. Every single layer of the technology stack has been completely changed—an incredible transformation in just 12 years.
Well, we can now understand information of just about any modality. Surely you’ve seen text and images and sounds and things like that. But not only can we understand those; we can understand amino acids. We can understand physics. We understand them, we can translate them, and generate them. The applications are just completely endless.
In fact, almost any AI application that you see out there—what modality is the input that it learned from, what modality of information did it translate to, and what modality of information is it generating? If you ask these three fundamental questions, just about every single application could be inferred. So when you see application after application that is AI-driven, AI-native, at the core of it, this fundamental concept is there. Machine learning has changed how every application is going to be built, how computing will be done, and the possibilities beyond.
Well, GPUs—GeForce, in a lot of ways, all of this with AI is the house that GeForce built. GeForce enabled AI to reach the masses, and now AI is coming home to GeForce. There are so many things that you can’t do without AI. Let me show you some of it now.
[Music] [Applause]
[Music] [Applause]
[Music]
That was real-time computer graphics. No computer graphics researcher, no computer scientist, would have told you that it is possible for us to ray-trace every single pixel at this point. Ray-tracing is a simulation of light. The amount of geometry that you saw was absolutely insane. It would have been impossible without artificial intelligence.
There are two fundamental things that we did. We used, of course, programmable shading and ray-traced acceleration to produce incredibly beautiful pixels. But then we have artificial intelligence be conditioned, be controlled, by that pixel to generate a whole bunch of other pixels. Not only is it able to generate pixels spatially because it’s aware of what the colors should be—it has been trained on a supercomputer back at NVIDIA—and so the neural network that’s running on the GPU can infer and predict the pixels that we did not render.
Not only can we do that—it’s called DLSS—the latest generation of DLSS also generates beyond frames. It can predict the future, generating three additional frames for every frame that we calculate. What you saw, if we just said four frames of what you saw, because we’re going to render one frame and generate three… If I said four frames at full HD (4K), that’s 33 million pixels or so. Out of that 33 million pixels, we computed only two. It is an absolute miracle that we can computationally—using programmable shaders and our ray-tracing engine to compute two million pixels—have AI predict all of the other 33.
As a result, we’re able to render at incredibly high performance, because AI does a lot less computation. It takes, of course, an enormous amount of training to produce that, but once you train it, the generation is extremely efficient. So this is one of the incredible capabilities of artificial intelligence, and that’s why there are so many amazing things that are happening.
We used GeForce to enable artificial intelligence, and now artificial intelligence is revolutionizing GeForce. Everyone, today we’re announcing our next generation: the RTX Blackwell family. Let’s take a look.
[Music]
Here it is: our brand-new GeForce RTX 50 Series, Blackwell architected. The GPU is just a beast: 92 billion transistors, 4,000 TOPS, four petaflops of AI—three times higher than the last generation, Ada—and we need all of it to generate those pixels that I showed you. 380 ray-tracing teraflops, so that for the pixels that we have to compute, we compute the most beautiful image we possibly can. And of course, 125 shader teraflops—there is actually a concurrent shader teraflops as well as an integer unit of equal performance, so two dual shaders: one is for floating-point, one is for integer.
G7 memory from Micron: 1.8 terabytes per second, twice the performance of our last generation. And we now have the ability to intermix AI workloads with computer graphics workloads. One of the amazing things about this generation is the programmable shader is also able to now process neural networks. So the shader is able to carry these neural networks, and as a result, we invented neural texture compression and neural material shading. As a result, you get these amazingly beautiful images that are only possible because we use AIs to learn the texture, learn a compression algorithm, and as a result get extraordinary results.
Okay, so this is the brand-new RTX Blackwell 9. Now even the mechanical design is a miracle. Look at this—it’s got two fans. This whole graphics card is just one giant fan, you know? So the question is: where’s the graphics card? Is it literally this big? The voltage regulator design is state-of-the-art, incredible design. The engineering team did a great job. So here it is. Thank you.
Okay, so those are the speeds and feeds. How does it compare? Well, this is RTX 490. I know, I know many of you have one. I know—look, it’s $1,599. It is one of the best investments you could possibly make. For $1,599, you bring it home to your $10,000 PC entertainment command center. Isn’t that right? Don’t tell me that’s not true. Don’t be ashamed—it’s liquid-cooled, fancy lights all over it, you lock it when you leave. It’s the modern home theater. It makes perfect sense. And now, for $1,599, you get to upgrade that and turbocharge the living daylights out of it.
Well, now, with the Blackwell family, RTX 5070: 4090 performance at $549. [Applause] Impossible without artificial intelligence, impossible without the four petaops of AI tensor cores, impossible without the G7 memories. Okay, so 5070, 4090 performance, $549. And here’s the whole family, starting from 5070 all the way up to 5090. 5090: twice the performance of a 4090, starting—of course we’re producing at very large scale—availability starting January.
Well, it is incredible, but we managed to put these gigantic performance GPUs into a laptop. This is a 5070 laptop for $1,299. This 5070 laptop has a 4090 performance. I think there’s one here somewhere; let me show you this.
[Sound of searching]
This is a… look at this thing here. Let me… here—there’s only so many pockets. Ladies and gentlemen, Janine! [Applause]
Paul… So can you imagine? You get this incredible graphics card here—Blackwell—we’re going to shrink it and put it in… put it in there. Does that make any sense? Well, you can’t do that without artificial intelligence. And the reason for that is because we’re generating most of the pixels using our tensor cores. We ray-trace only the pixels we need, and we generate, using artificial intelligence, all the other pixels. As a result, the amount of the energy efficiency is just off the charts.
The future of computer graphics is neural rendering—the fusion of artificial intelligence and computer graphics. And what’s really amazing is… oh, here we go, thank you. This is a surprisingly kinetic keynote.
What’s really amazing is the family of GPUs we’re going to put in here. So the 1590, the 1590 will fit into a laptop, a thin laptop. That last laptop was 14.9 mm. You got a 5080, 5070 Ti, and 5070. Okay, so ladies and gentlemen, the RTX Blackwell family!
[Applause]
Artificial Intelligence Discussion:
Well, GeForce brought AI to the world, democratized AI. Now AI has come back and revolutionized GeForce. Let’s talk about artificial intelligence. Let’s go to somewhere else at NVIDIA—this is literally our office. This is literally NVIDIA’s headquarters, okay?
So let’s talk about AI. The industry is chasing and racing to scale artificial intelligence. Artificial intelligence—and the scaling law is a powerful model. It’s an empirical law that has been observed and demonstrated by researchers and industry over several generations, and the scaling law says that the more data you have—the training data that you have—the larger model that you have, and the more compute that you apply to it, therefore the more effective or the more capable your model will become. And so the scaling law continues.
What’s really amazing is that now we’re moving towards, of course, and the internet is producing about twice the amount of data every single year as it did last year. I think in the next couple of years, we produce—humanity will produce—more data than all of humanity has ever produced since the beginning. And so we’re still producing a gigantic amount of data, and it’s becoming more multimodal: video and images and sound. All of that data could be used to train the fundamental knowledge, the foundational knowledge, of an AI.
But there are in fact two other scaling laws that have now emerged, and it’s somewhat intuitive. The second scaling law is post-training scaling law. Post-training scaling law uses technologies and techniques like reinforcement learning, human feedback—basically the AI produces and generates answers based on a human query, then the human, of course, gives feedback. It’s much more complicated than that, but the reinforcement learning system, with a fair number of very high-quality prompts, causes the AI to refine its skills. It could fine-tune its skills for particular domains—it could be better at solving math problems, better at reasoning, so on and so forth. It’s essentially like having a mentor or having a coach give you feedback after you’re done going to school.
And so you get tested, you get feedback, you improve yourself. We also have reinforcement learning AI feedback, and we have synthetic data generation. These techniques are rather akin to, if you will, self-practice. You know the answer to a particular problem, and you continue to try it until you get it right. So an AI could be presented with a very complicated and difficult problem that is verifiable functionally and has an answer that we understand—maybe proving a theorem, maybe solving a geometry problem. These problems would cause the AI to produce answers, and using reinforcement learning, it would learn how to improve itself. That’s called post-training. Post-training requires an enormous amount of computation, but the end result produces incredible models.
We now have a third scaling law, and this third scaling law has to do with what’s called test-time scaling. Test-time scaling is basically when you’re being used—when you’re using the AI—the AI has the ability to now apply a different resource allocation. Instead of improving its parameters, now it’s focused on deciding how much computation to use to produce the answers it wants to produce. Reasoning is a way of thinking about this—“long thinking” is a way to think about this. Instead of a direct inference or one-shot answer, you might reason about it; you might break down the problem into multiple steps, you might generate multiple ideas and evaluate… your AI system would evaluate which one of the ideas that you generated was the best one. Maybe it solves the problem step by step, so on and so forth.
So now test-time scaling has proven to be incredibly effective. You’re watching this sequence of technology, and all of these scaling laws emerge as we see incredible achievements, from ChatGPT 0.1 to 0.3 and now Gemini Pro. All of these systems are going through this journey step by step by step of pre-training, to post-training, to test-time scaling.
Well, the amount of computation that we need, of course, is incredible, and we would like, in fact, that society has the ability to scale the amount of computation to produce more and more novel and better intelligence. Intelligence, of course, is the most valuable asset that we have, and it can be applied to solve a lot of very challenging problems. So scaling law is driving enormous demand for NVIDIA computing. It’s driving an enormous demand for this incredible chip we call Blackwell.
Let’s take a look at Blackwell. Well, Blackwell is in full production. It is incredible what it looks like. First of all, there’s some… every single cloud service provider now has systems up and running. We have systems here from about 15 computer makers. It’s being made—about 200 different SKUs, 200 different configurations. They’re liquid-cooled, air-cooled, x86, NVIDIA Grace CPU versions, NVLink 36×2, NVLink 72×1, a whole bunch of different types of systems so that we can accommodate just about every single data center in the world.
These systems are being currently manufactured in some 45 factories. It tells you how pervasive artificial intelligence is and how much the industry is jumping onto artificial intelligence in this new computing model. The reason why we’re driving it so hard is because we need a lot more computation, and it’s very clear… it’s very clear that… Janine, you know, it’s hard to tell—you don’t ever want to reach your hands into a dark place. Hang on a second, is this a good idea?
[Sound of moving something]
[Applause] [Music]
Wait for it, wait for it. I thought I was worthy. Apparently, Mjolnir didn’t think I was worthy. All right, this is my show and tell. So this NVLink system—this right here, this NVLink system—this is GB200, NVLink 72. It is one and a half tons, 600,000 parts, approximately equal to 20 cars. 120 kilowatts. It has a spine behind it that connects all of these GPUs together: two miles of copper cable, 5,000 cables. This is being manufactured in 45 factories around the world. We build them, we liquid-cool them, we test them, we disassemble them, shipping parts to the data centers—because it’s one and a half tons—we reassemble it outside the data centers and install them. The manufacturing is insane.
But the goal of all of this is because the scaling laws are driving computing so hard that this level of computation, Blackwell over our last generation, improves the performance per watt by a factor of four—performance per watt by a factor of four, performance per dollar by a factor of three. That basically says that in one generation we reduce the cost of training these models by a factor of three. Or if you want to increase the size of your model by a factor of three, it’s about the same cost. But the important thing is this: these are generating tokens that are being used by all of us when we use ChatGPT, or when we use Gemini, use our phones. In the future, just about all of these applications are going to be consuming these AI tokens, and these AI tokens are being generated by these systems. Every single data center is limited by power, and so if the perf-per-watt of Blackwell is four times our last generation, then the revenue that could be generated, the amount of business that can be generated in the data center, is increased by a factor of four.
So these AI factory systems really are factories today. Now, the goal of all of this is so that we can create one giant chip. The amount of computation we need is really quite incredible, and this is basically one giant chip. If we would have had to build a chip… wait, sorry guys… you see that’s cool? Look at that—disco lights in here, right? If we had to build this as one chip, obviously this would be the size of the wafer, but this doesn’t include the impact of yield—it would have to be probably three or four times the size. But what we basically have here is 72 Blackwell GPUs, or 144 dies. This one chip here is 1.4 exa-FLOPS—the world’s largest supercomputer, fastest supercomputer. Only recently, this entire room’s supercomputer only recently achieved an exa-FLOP plus. This is 1.4 exa-FLOPS of AI floating-point performance. It has 14 terabytes of memory, but here’s the amazing thing: the memory bandwidth is 1.2 petabytes per second. That’s basically the entire internet traffic that’s happening right now. The entire world’s internet traffic is being processed across these chips.
And we have 130 trillion transistors in total, 2,592 CPU cores, a whole bunch of networking. I wish I could do this—I don’t think I will. So these are the Blackwells, these are our ConnectX networking chips, these are the NVLink… and we’re trying to pretend about the NVLink spine, but that’s not possible. And these are all of the HBM memories—14 terabytes of HBM memory. This is what we’re trying to do, and this is the miracle of the Blackwell system. The Blackwell dies right here—it is the largest single chip the world’s ever made. Yet the miracle, in addition to that… this is the Grace Blackwell system.
Well, the goal of all of this, of course, is so that we can… [Exhales] Thank you. Thanks. Boy, is there a chair I could sit down for a second? Can I have a MAO Ultra? How is it possible that we’re in the MOBO Ultra Stadium—it’s like coming to NVIDIA, and we don’t have a GPU for you.
So, we need enormous computation because we want to train larger and larger models, and these inferences used to be one inference. But in the future, the AI is going to be talking to itself. It’s going to be thinking, it’s going to be internally reflecting, processing. Today, when the tokens are being generated at you, so long as it’s coming out at 20 or 30 tokens per second, it’s basically as fast as anybody can read. However, in the future—and right now with GPT-1, you know, with the pre-Gemini Pro and the new GPT-0.1, 0.3 models—they’re talking to themselves, reflecting, thinking.
So as you can imagine, the rate at which the tokens could be ingested is incredibly high, and so we need the token rates, the token generation rates, to go way up, and we also have to drive the cost way down simultaneously so that the quality of service can be extraordinary, and the cost to customers can continue to be low, and we’ll continue to scale. That’s the fundamental purpose, the reason why we created NVLink.
Well, one of the most important things that’s happening in the world of enterprise is agentic AI. Agentic AI basically is a perfect example of test-time scaling. An AI is a system of models—some of it is understanding, interacting with the customer, interacting with the user; some of it is maybe retrieving information from storage, a semantic AI system like a RAG. Maybe it’s going on to the internet, maybe it’s studying a PDF file. And so it might be using tools, it might be using a calculator, and it might be using a generative AI to generate charts and such. And it’s taking the problem you gave it, breaking it down step by step, and it’s iterating through all these different models.
Well, in order to respond to a customer in the future, in order for AI to respond, it used to be: ask a question, answer—start spewing out. In the future, you ask a question, a whole bunch of models are going to be working in the background. So test-time scaling—the amount of computation used for inferencing—is going to go through the roof. It’s going to go through the roof because we want better and better answers.
Well, to help the industry build agentic AI, our go-to-market is not direct to enterprise customers. Our go-to-market is we work with software developers and the IT ecosystem to integrate our technology to make possible new capabilities. Just like we did with CUDA libraries, we now want to do that with AI libraries. And just as the computing model of the past has APIs that are doing computer graphics or doing linear algebra or doing fluid dynamics, in the future, on top of those acceleration libraries—C acceleration libraries—will have AI libraries.
We’ve created three things for helping the ecosystem build agentic AI:
- NVIDIA NIMs, which are essentially AI microservices all packaged up. It takes all of this really complicated CUDA software—cuDNN, Cutlass, TensorRT, LLM, Triton, all these different really complicated software—and the model itself, we package it up, we optimize it, we put it into a container, and you can take it wherever you like. So we have models for vision, for understanding languages, for speech, for animation, for digital biology, and we have some new, exciting models coming for physical AI. And these AI models run in every single cloud because NVIDIA’s GPUs are now available in every single cloud. It’s available in every single OEM. So you could literally take these models, integrate it into your software packages, create AI agents that run on Cadence, or they might be ServiceNow agents, or they might be SAP agents, and they could deploy it to their customers and run it wherever the customers want to run the software.
- The next layer is what we call NVIDIA NeMo. NeMo is essentially a digital employee onboarding and training evaluation system. In the future, these AI agents are essentially digital workforce that are working alongside your employees, doing things for you on your behalf. And so the way that you would bring these specialized agents into your company is to onboard them just like you onboard an employee. So we have different libraries that help these AI agents be trained for the type of language in your company—maybe the vocabulary is unique to your company, the business process is different, the way you work is different. So you would give them examples of what the work product should look like, and they would try to generate, and you would give feedback, and then you would evaluate them, so on and so forth. And you would guardrail them. You’d say, “These are the things that you’re not allowed to do. These are things you’re not allowed to say. And we even give them access to certain information.” Okay, so that entire pipeline—a digital employee pipeline—is called NeMo. In a lot of ways, the IT department of every company is going to be the HR department of AI agents in the future. Today, they manage and maintain a bunch of software from the IT industry. In the future, they will maintain, nurture, onboard, and improve a whole bunch of digital agents and provision them to the companies to use.
- On top of that, we provide a whole bunch of blueprints that our ecosystem could take advantage of. All of this is completely open source, so you could take it and modify the blueprints. We have blueprints for all kinds of different types of agents.
Well, today we’re also announcing that we’re doing something that’s really cool and I think really clever: we’re announcing a whole family of models that are based off Llama—the NVIDIA Llama Neotron language foundation models. Llama 3.1 is a complete phenomenon. The download of Llama 3.1 from Meta, 350–650,000 times, something like that. It has been der—red—and turned into other models, about 60,000 other different models. It is singularly the reason why just about every single enterprise and every single industry has been activated to start working on AI.
Well, the thing that we did was we realized that the Llama models really could be better fine-tuned for enterprise use, and so we fine-tuned them, using our expertise and our capabilities, and we turned them into the Llama Neotron suite of open models. There are small ones that interact in very, very fast response time, extremely small. There are what we call Super Llama Neotron Supers—they’re basically your mainstream versions of your models—or your Ultra model. The Ultra model could be used to be a teacher model for a whole bunch of other models—it could be a reward model, an evaluator, a judge for other models, to create answers and decide whether it’s a good answer or not, basically give feedback to other models. It could be distilled in a lot of different ways—basically a teacher model, a knowledge distillation model. Very large, very capable.
And so all of this is now available online. These models are incredible. It’s #1 in leaderboards—for chat leaderboard, for instruction leaderboard, for retrieval. So the different types of functionalities necessary that are used in AI agents around the world—these are going to be incredible models for you.
We’re also working with the ecosystem. All of our NVIDIA AI technologies are integrated into the IT industry. We have great partners and really great work being done at ServiceNow, at SAP, at Siemens for industrial AI. Cadence is doing great work. Synopsys is doing great work. I’m really proud of the work that we do with Perplexity. As you know, they revolutionized search. Yeah, really fantastic stuff. Codium—every software engineer in the world, this is going to be the next giant AI application, next giant AI service, period, is software coding. Thirty million software engineers around the world—everybody is going to have a software assistant helping them code. If not, obviously, you’re just going to be way less productive and create lesser-good code.
So this is 30 million. There’s a billion knowledge workers in the world—it is very, very clear AI agents is probably the next robotics industry, and likely to be a multi-trillion-dollar opportunity.
Well, let me show you some of the blueprints that we’ve created and some of the work that we’ve done with our partners, with these AI agents.
[Video begins]
AI agents are the new digital workforce, working for and with us. AI agents are a system of models that reason about a mission, break it down into tasks, and retrieve data or use tools to generate a quality response. NVIDIA’s agentic AI building blocks—NIM, pre-trained models, and the NeMo framework—let organizations easily develop AI agents and deploy them anywhere. We will onboard and train our agentic workforces on our company’s methods, like we do for employees. AI agents are domain-specific task experts.
Let me show you four examples for the billions of knowledge workers and students: AI Research Assistant Agents ingest complex documents like lectures, journals, financial results, and generate interactive podcasts for easy learning. By combining a U-Net regression model with a diffusion model, Cordi can downscale global weather forecasts down from 25 km to 2 km.
Developers like at NVIDIA manage software security—AI agents that continuously scan software for vulnerabilities, alerting developers to what action is needed. Virtual Lab AI agents help researchers design and screen billions of compounds to find promising drug candidates faster than ever.
NVIDIA Analytics AI agents, built on an NVIDIA Metropolis blueprint including NVIDIA COSMOS, Nimble vision-language models, Llama Neotron LLMs, and NeMo Retriever… Metropolis agents analyze content from the billions of cameras generating 100,000 petabytes of video per day. They enable interactive search, summarization, and automated reporting, and help monitor traffic flows, flagging congestion or danger. In industrial facilities, they monitor processes and generate recommendations for improvement. Metropolis agents centralize data from hundreds of cameras and can reroute workers or robots when incidents occur.
The age of agentic AI is here, for every organization.
[Video ends]
Okay, that was the first pitch at a baseball. That was not generated—I just felt that none of you were impressed.
AI at Home, AI on PC Discussion:
AI was created in the cloud and for the cloud. AI is creating the cloud, for the cloud, and for enjoying AI on phones. Of course, it’s perfect. Very, very soon, we’re going to have a continuous AI that’s going to be with you, and when you use those meta glasses, you can, of course, point at something, look at something, and ask it whatever information you want. So AI is perfect in the cloud—it was created in the cloud. However, we would love to be able to take that AI everywhere.
I’ve mentioned already that you can take NVIDIA AI to any cloud, but you can also put it inside your company. But the thing that we want to do more than anything is put it on our PC as well. And so, as you know, Windows 95 revolutionized the computer industry. It made possible this new suite of multimedia services, and it changed the way that applications were created forever.
Windows 95—this model of computing, of course, is not perfect for AI. And so, the thing that we would like to do is we would like to have, in the future, your AI basically become your AI assistant. And instead of just the 3D APIs and the sound APIs and the video API, you would have generative APIs: generative APIs for 3D, for language, for sound, and so on. We need a system that makes that possible while leveraging the massive investment that’s in the cloud. There’s no way that the world can create yet another way of programming AI models—it’s just not going to happen.
So if we could figure out a way to make Windows PC a world-class AI PC… it would be completely awesome. It turns out the answer is Windows—it’s Windows WSL 2, Windows WSL 2. Windows WSL 2 basically is two operating systems within one. It works perfectly. It’s developed for developers, and it’s developed so that you can have access to bare metal. It’s been optimized for cloud-native applications, it is optimized for—and very importantly, it’s been optimized for CUDA.
So WSL 2 supports CUDA perfectly out of the box. As a result, everything that I showed you with NVIDIA NIM, NVIDIA NeMo, the blueprints that we develop that are going to be up in ai.nvidia.com, so long as the computer fits it—so long as you can fit that model, and we’re going to have many models that fit, whether it’s vision models or language models or speech models or these animation, digital human models, all kinds of different types of models—are going to be perfect for your PC. You would download it, and it should just run.
So our focus is to turn Windows WSL 2, Windows PC, into a target, first-class platform that we will support and maintain for as long as we shall live. This is an incredible thing for engineers and developers everywhere. Let me show you something that we can do with that. This is one of the examples of a blueprint we just made for you:
[Video begins]
Generative AI synthesizes amazing images from simple text prompts, yet image composition can be challenging to control using only words. With NVIDIA NIM microservices, creators can use simple 3D objects to guide AI image generation. Let’s see how a concept artist can use this technology to develop the look of a scene. They start by laying out 3D assets, created by hand or generated with AI, then use an image-generation NIM, such as Flux, to create a visual that adheres to the 3D scene. Add or move objects to refine the composition, change camera angles to frame the perfect shot, or reimagine the whole scene with a new prompt. Assisted by generative AI and NVIDIA NIM, an artist can quickly realize their vision.
[Video ends]
NVIDIA AI for your PCs—hundreds of millions of PCs in the world with Windows—and so we can get them ready for AI. OEMs, all the PC OEMs we work with—basically all of the world’s leading PC OEMs—are going to get their PCs ready for this stack, and so AI PCs are coming to a home near you. Linux is good.
Physical AI Discussion:
Let’s talk about physical AI. Speaking of Linux, let’s talk about physical AI. Physical AI—imagine, whereas your large language model… you give it your context, your prompt on the left, and it generates tokens one at a time to produce the output. That’s basically how it works. The amazing thing is this model in the middle is quite large, has billions of parameters. The context length is incredibly large, because you might decide to load in a PDF. In my case, I might load in several PDFs before I ask it a question. Those PDFs are turned into tokens; the basic attention characteristic of a transformer has every single token find its relationship and relevance against every other token. So you could have hundreds of thousands of tokens, and the computational load increases quadratically. It does this with all of the parameters, all of the input sequence, process it through every single layer of the transformer, and it produces one token. That’s the reason why we needed Blackwell.
And then the next token is produced. When the current token is done, it puts the current token into the input sequence and takes that whole thing and generates the next token. It does it one at a time—that is the transformer model, and it’s the reason why it is so, so incredibly effective (and computationally demanding).
What if, instead of PDFs, it’s your surrounding? And what if, instead of the prompt (a question), it’s a request: “Go over there and pick up that box and bring it back.” And instead of what is produced in tokens… it’s text? No, it produces action tokens. Well, that I just described is a very sensible thing for the future of robotics, and the technology is right around the corner.
But what we need to do is we need to create effectively the “world model,” as opposed to GPT, which is a language model. This world model has to understand the language of the world—it has to understand physical dynamics, things like gravity and friction and inertia. It has to understand geometric and spatial relationships. It has to understand cause and effect: if you drop something, it falls to the ground; if you poke at it, it tips over. It has to understand object permanence: if you roll a ball over the kitchen counter, when it goes off the other side, the ball didn’t leap into another quantum universe. It’s still there.
And so all of these types of understanding is intuitive understanding that we know that most models today have a very hard time with. So we would like to create a world… we need a world foundation model. Today, we’re announcing a very big thing. We’re announcing NVIDIA COSMOS, a world foundation model that was created to understand the physical world. And the only way for you to really understand this is to see it. Let’s play.
[Video begins]
The next frontier of AI is physical AI. Model performance is directly related to data availability, but physical world data is costly to capture, curate, and label. NVIDIA COSMOS is a world foundation model development platform to advance physical AI. It includes autoregressive world foundation models, diffusion-based world foundation models, advanced tokenizers, and an NVIDIA CUDA- and AI-accelerated data pipeline.
COSMOS models ingest text, image, or video prompts and generate virtual world states as videos. COSMOS generations prioritize the unique requirements of AV and robotics use cases, like real-world environments, lighting, and object permanence. Developers use NVIDIA Omniverse to build physics-based, geospatially accurate scenarios, then output Omniverse renders into COSMOS, which generates photoreal, physically-based synthetic data—whether diverse objects or environments, conditions like weather or time of day, or edge-case scenarios.
Developers use COSMOS to generate worlds for reinforcement learning AI feedback to improve policy models or to test and validate model performance, even across multi-sensor views. COSMOS can generate tokens in real time, bringing the power of foresight and multiverse simulation to AI models, generating every possible future to help the model select the right path. Working with the world’s developer ecosystem, NVIDIA is helping advance the next wave of physical AI.
[Video ends]
NVIDIA COSMOS! NVIDIA COSMOS—the world’s first world foundation model. It is trained on 20 million hours of video, the 20 million hours of video focusing on physical, dynamic things, nature themes, humans walking, hands moving, manipulating things, you know, things that are fast camera movements. It’s really about teaching the AI not about generating creative content but teaching the AI to understand the physical world.
From this physical AI, there are many downstream things we can do. As a result, we can do synthetic data generation to train models; we can distill it and turn it into, effectively, the seed, the beginnings of a robotics model. You could have it generate multiple physically based, physically plausible scenarios of the future—basically do a “Doctor Strange.” Because this model understands the physical world, of course (you saw a whole bunch of images generated), this model understanding the physical world—it also could do captioning. It could take videos, caption them incredibly well, and that captioning and the video could be used to train large language models, multi-modality large language models. So you could use this foundation model to train robots as well as larger language models.
So this is the NVIDIA COSMOS platform. It has an autoregressive model for real-time applications, has a diffusion model for very high-quality image generation, an incredible tokenizer basically learning the vocabulary of the real world, and a data pipeline so that if you would like to take all of this and then train it on your own data, this data pipeline (because there’s so much data involved) we’ve accelerated everything end to end for you. This is the world’s first data-processing pipeline that’s CUDA-accelerated as well as AI-accelerated.
All of this is part of the COSMOS platform, and today we’re announcing that COSMOS is open-licensed, it’s open and available on GitHub. We hope that this moment—and there’s a small, medium, large for very fast models, mainstream models, and also teacher models, basically knowledge transfer models—COSMOS world foundation model being open, we really hope, will do for the world of robotics and industrial AI what Llama 3 has done for enterprise AI.
The magic happens when you connect COSMOS to Omniverse. And the reason fundamentally is this: Omniverse is a physics-grounded, not just physically grounded but physics-grounded—it’s algorithmic, physics-principled, simulation-grounded system. It’s a simulator. When you connect that to COSMOS, it provides the grounding, the ground truth, that can control and condition the COSMOS generation. As a result, what comes out of COSMOS is grounded on truth. This is exactly the same idea as connecting a large language model to a RAG, a retrieval-augmented generation system—you want to ground the AI generation on ground truth.
And so the combination of the two gives you a physically simulated, physically grounded multiverse generator. The use cases are really quite exciting. Of course, for robotics, for industrial applications, it is very, very clear. COSMOS plus Omniverse plus COSMOS represents the third computer that’s necessary for building robotic systems. Every robotics company will ultimately have to build three computers. A robotics system could be a factory, a robotics system could be a car, it could be a robot. You need three fundamental computers.
One computer, of course, to train the AI—we call it the DGX computer to train the AI. Another, of course, when you’re done, to deploy the AI—we call that AGX. That’s inside the car, in the robot, or in an AMR, or in a stadium, or whatever it is. These computers are at the edge, and they’re autonomous. But to connect the two, you need a digital twin. And this is all the simulations that you were seeing—the digital twin is where the AI that has been trained goes to practice, to be refined, to do its synthetic data generation, reinforcement learning, AI feedback, and such. So it’s the digital twin of the AI.
These three computers are going to be working interactively. NVIDIA’s strategy for the industrial world—and we’ve been talking about this for some time—is this three-computer system. Instead of a three-body problem, we have a three-computer solution. It’s the NVIDIA robotics.
Let me give you three examples, all right? The first example is how we apply all of this to industrial digitalization. There are millions of factories, hundreds of thousands of warehouses—that’s basically the backbone of a $50 trillion manufacturing industry. All of that has to become software-defined. All of that has to have automation in the future, and all of it will be infused with robotics.
We’re partnering with Kion, the world’s leading warehouse automation solutions provider, and Accenture, the world’s largest professional services provider—and they have a big focus in digital manufacturing—and we’re working together to create something that’s really special. I’ll show you that in a second. But our go-to-market is essentially the same as all of the other software platforms and technologies we have: through the developers and ecosystem partners. We have just a growing number of ecosystem partners connecting to Omniverse. The reason for that is very clear: everybody wants to digitalize the future of industries. There’s so much waste, so much opportunity for automation, in that $50 trillion of the world’s GDP.
Let’s take a look at that, this one example that we’re doing with Kion and Accenture.
[Video begins]
Kion, the supply-chain solution company; Accenture, a global leader in professional services; and NVIDIA are bringing physical AI to the $1 trillion warehouse and distribution center market. Managing high-performance warehouse logistics involves navigating a complex web of decisions influenced by constantly shifting variables. These include daily and seasonal demand changes, space constraints, workforce availability, and the integration of diverse robotic and automated systems. Predicting operational KPIs of a physical warehouse is nearly impossible today.
To tackle these challenges, Kion is adopting MEGA, an NVIDIA Omniverse blueprint for building industrial digital twins to test and optimize robotic fleets. First, Kion’s warehouse management solution assigns tasks to the industrial AI brains in the digital twin, such as moving a load from a buffer location to a shuttle storage solution. The robots’ brains are in a simulation of a physical warehouse digitalized into Omniverse, using OpenUSD connectors to aggregate CAD, video and image, 3D LiDAR, point cloud, and AI-generated data.
The fleet of robots executes tasks by perceiving and reasoning about their Omniverse digital twin environment, planning their next motion and acting. The robot brains can see the resulting state through sensor simulations and decide their next action. The loop continues while MEGA precisely tracks the state of everything in the digital twin.
Now Kion can simulate infinite scenarios at scale while measuring operational KPIs such as throughput, efficiency, and utilization—all before deploying changes to the physical warehouse. Together with NVIDIA, Kion and Accenture are reinventing industrial autonomy.
[Video ends]
Is that incredible? Everything is in simulation. In the future, every factory will have a digital twin, and that digital twin operates exactly like the real factory. In fact, you can use Omniverse with COSMOS to generate a whole bunch of future scenarios, and you pick—then an AI decides which one of the scenarios are the most optimal for whatever KPIs, and that becomes the programming constraints, the program (if you will), the AI that will be deployed into the real factories.
Autonomous Vehicles Discussion:
The next example: autonomous vehicles. The AV revolution has arrived. After so many years, Waymo’s success and Tesla’s success—it is very, very clear autonomous vehicles have finally arrived. Our offering to this industry is the three computers: the training systems (training the AIs), the simulation systems, and the synthetic data-generation systems (Omniverse and now COSMOS), and also the computer that’s inside the car. Each car company might work with us in a different way—use one, or two, or three of the computers.
We’re working with just about every major car company around the world: WM Motor and ZEEKR, and Tesla of course in their data center, BYD, the largest EV company in the world. JLR has got a really cool car coming. Mercedes has a fleet of cars coming with NVIDIA, starting this year going to production, and I’m super, super pleased to announce that today Toyota and NVIDIA are going to partner together to create their next-generation AVs. Just so many, so many cool companies—Lucid and Rivian and Xpeng, and of course Volvo. Just so many different companies. Wabi is building self-driving trucks; Aurora—we announced this week also that Aurora is going to use NVIDIA to build self-driving trucks, autonomous.
A hundred million cars are built each year, a billion vehicles on the road all over the world, a trillion miles that are driven around the world each year—that’s all going to be either highly autonomous or fully autonomous, coming up. So this is going to be a very large industry. I predict that this will likely be the first multi-trillion-dollar robotics industry. This business for us… notice in just a few of these cars that are starting to ramp into the world, our business is already $4 billion, and this year probably on a run rate of about $5 billion. So really significant business already. This is going to be very large.
Today, we’re announcing that our next-generation processor for the car—our next-generation computer for the car—is called Thor. I have one right here. Hang on a second.
[Sound of retrieving item]
Okay, this is Thor. This is Thor. This is a robotics computer. This is a robotics computer. It takes sensors and just a massive amount of sensor information, processes it—18 cameras, high-resolution radars, LiDARs—they’re all coming into this chip, and this chip has to process all that sensor, turn them into tokens, put them into a transformer, and predict the next path. This AV computer is now in full production. Thor is 20 times the processing capability of our last generation, Orin, which is really the standard of autonomous vehicles today. And so this is just really quite incredible.
Thor is in full production. This robotics processor, by the way, also goes into a full robot. So it could be an AMR, it could be a humanoid robot—it could be the brain, it could be the manipulator. This processor basically is a universal robotics computer.
The second part of our DRIVE system that I’m incredibly proud of is the dedication to safety. DRIVE OS, I’m pleased to announce, is now the first software-defined, programmable AI computer that has been certified up to ASIL-D, which is the highest standard of functional safety for automobiles—the only and the highest. So I’m really, really proud of this. ASIL—ISO 26262—it is the work of some 15,000 engineering years. This is just extraordinary work, and as a result of that, CUDA is now a functionally safe computer. And so if you’re building a robot, NVIDIA CUDA—yay!
So now I told you I was going to show you what we would use Omniverse and COSMOS to do in the context of self-driving cars. And you know, today, instead of showing you a whole bunch of videos of cars driving on the road—I’ll show you some of that too—but I want to show you how we use the car to reconstruct digital twins automatically using AI, and use that capability to train future AV models. Let’s play it.
[Video begins]
The autonomous vehicle revolution is here. Building autonomous vehicles, like all robots, requires three computers: NVIDIA DGX to train AI models, Omniverse to test drive and generate synthetic data, and DRIVE AGX, a supercomputer in the car.
Building safe autonomous vehicles means addressing edge scenarios, but real-world data is limited, so synthetic data is essential for training. The autonomous vehicle data factory, powered by NVIDIA Omniverse, AI models, and COSMOS, generates synthetic driving scenarios that enhance training data by orders of magnitude. First, OmniMap fuses map and geospatial data to construct drivable 3D environments. Driving scenario variations can be generated from replayed drive logs or AI traffic generators. Next, a neural reconstruction engine uses autonomous vehicle sensor logs to create high-fidelity 4D simulation environments. It replays previous drives in 3D and generates scenario variations to amplify training data.
Finally, Edify 3DS automatically searches through existing asset libraries or generates new assets to create sim-ready scenes. The Omniverse scenarios are used to condition COSMOS to generate massive amounts of photorealistic data, reducing the sim-to-real gap. With text prompts, generate near-infinite variations of the driving scenario. With COSMOS Neotron video search, the massively scaled synthetic dataset, combined with recorded drives, can be curated to train models. NVIDIA’s AI data factory scales hundreds of drives into billions of effective miles, setting the standard for safe and advanced autonomous driving.
[Video ends]
Is that incredible? We take thousands of drives and turn them into billions of miles. We are going to have mountains of training data for autonomous vehicles. Of course, we still need actual cars on the road—of course we will continuously collect data for as long as we shall live. However, synthetic data generation using this multiverse, physically-based, physically-grounded capability—so that we generate data for training AIs that are physically grounded and accurate or plausible, so that we could have an enormous amount of data to train with…
The AV industry is here. This is an incredibly exciting time. Super, super, super excited about the next several years. I think you’re going to see, just as computer graphics was revolutionized at such an incredible pace, you’re going to see the pace of AV development increasing tremendously over the next several years.
General Robotics / Humanoid Robots Discussion:
I think the next part is robotics. Human(oid) robots. My friends…
[Applause]
…the ChatGPT moment for general robotics is just around the corner. In fact, all of the enabling technologies that I’ve been talking about are going to make it possible for us, in the next several years, to see very rapid breakthroughs—surprising breakthroughs—in general robotics.
Now, the reason why general robotics is so important is whereas robots with tracks and wheels require special environments to accommodate them, there are three robots in the world that we can make that require no greenfield/brownfield adaptation. It’s perfect if we can possibly build these amazing robots; we could deploy them in exactly the world that we’ve built for ourselves. These three robots are: (1) agentic robots (agentic AI), because they’re information workers, so long as they can accommodate the computers that we have in our offices, it’s going to be great; (2) self-driving cars, and the reason for that is we spent 100-plus years building roads and cities; and then (3) humanoid robots.
If we have the technology to solve these three, this will be the largest technology industry the world’s ever seen. So we think that robotics era is just around the corner. The critical capability is how to train these robots. In the case of humanoid robots, the imitation information is rather hard to collect. And the reason for that is, in the case of a car, you just drive it—we’re driving cars all the time. In the case of these humanoid robots, the imitation information, the human demonstration, is rather laborious to do.
So we need to come up with a clever way to take hundreds of demonstrations, thousands of human demonstrations, and somehow use artificial intelligence and Omniverse to synthetically generate millions of synthetically generated motions, and from those motions, the AI can learn how to perform a task. Let me show you how that’s done.
[Video begins]
Developers around the world are building the next wave of physical AI—embodied robots, humanoids. Developing general-purpose robot models requires massive amounts of real-world data, which is costly to capture and curate. NVIDIA Isaac G.R.O.O.T. helps tackle these challenges, providing humanoid robot developers with four things: robot foundation models, data pipelines, simulation frameworks, and a Thor robotics computer.
The NVIDIA Isaac G.R.O.O.T. blueprint for synthetic motion generation is a simulation workflow for imitation learning, enabling developers to generate exponentially large datasets from a small number of demonstrations. First, G.R.O.O.T. TeleOp enables skilled human workers to “portal” into a digital twin of their robot using the Apple Vision Pro. This means operators can capture data even without a physical robot, and they can operate the robot in a risk-free environment, eliminating the chance of physical damage or wear and tear.
To teach a robot a single task, operators capture motion trajectories through a handful of teleoperated demonstrations. Then use G.R.O.O.T. Mimic to multiply these trajectories into a much larger dataset. Next, they use G.R.O.O.T. Gen, built on Omniverse and COSMOS, for domain randomization and 3D-to-real upscaling, generating an exponentially larger dataset. The Omniverse and COSMOS multiverse simulation engine provides a massively scaled dataset to train the robot policy.
Once the policy is trained, developers can perform software-in-the-loop testing and validation in Isaac Sim before deploying to the real robot. The age of general robotics is arriving, powered by NVIDIA Isaac G.R.O.O.T.
[Video ends]
We’re going to have mountains of data to train robots with—NVIDIA Isaac G.R.O.O.T. This is our platform to provide technology elements to the robotics industry, to accelerate the development of general robotics.
Well, I have one more thing that I want to show you. None of this would be possible if not for this incredible project that we started about a decade ago inside the company, called Project DIGITS: Deep Learning GPU Intelligence Training System—DIGITS. Well, before we launched it, I shrunk it to DGX, to harmonize it with RTX, AGX, OVX, and all the other “X”s that we have in the company. It really revolutionized—DGX-1—really revolutionized artificial intelligence.
The reason why we built it was because we wanted to make it possible for researchers and startups to have an out-of-the-box AI supercomputer. Imagine the way supercomputers were built in the past—you really have to build your own facility, and you have to go build your own infrastructure and really engineer it into existence. So we created a supercomputer for AI development, for researchers and startups, that comes literally out of the box.
I delivered the first one to a startup company in 2016 called OpenAI, and Elon was there, and Ilya Sutskever was there, and many of NVIDIA engineers were there. We celebrated the arrival of DGX-1, and obviously it revolutionized artificial intelligence and computing.
But now artificial intelligence is everywhere. It’s not just in researchers’ and startups’ labs. You know, we want artificial intelligence, as I mentioned in the beginning. This is now the new way of doing computing—this is the new way of doing software. Every software engineer, every engineer, every creative artist, everybody who uses computers today as a tool will need an AI supercomputer.
And so I just wished… I just wish that DGX-1 was smaller. And you know, so imagine, ladies and gentlemen… this is NVIDIA’s latest AI supercomputer, and it’s finally called Project DIGITS right now. If you have a good name for it, reach out to us. Here’s the amazing thing: this is an AI supercomputer. It runs the entire NVIDIA AI stack—all of NVIDIA’s software runs on this. DGX Cloud runs on this. This sits, well, somewhere, and it’s wireless, or connected to your computer—it’s even a workstation if you like it to be—and you can access it like a cloud supercomputer. NVIDIA’s AI works on it.
It’s based on a super-secret chip that we’ve been working on called GB110, the smallest Grace Blackwell that we make. And I have… well, you know what, let’s show everybody inside. Isn’t it just… it’s just so cute. This is the chip that’s inside. It is in production, this top-secret chip. We did it in collaboration—the CPU, the Grace CPU, is built for NVIDIA in collaboration with MediaTek. They’re the world’s leading SoC company, and they worked with us to build this CPU, this CPU SoC, and connect it with chip-to-chip NVLink to the Blackwell GPU. This little thing here is in full production.
We’re expecting this computer to be available around May time frame, and so it’s coming at you. It’s just incredible what we can do. I was trying to figure out, do I need more hands or more pockets? All right, so imagine… this is what it looks like. Who doesn’t want one of those? If you use PC, Mac, anything, because it’s a cloud platform, it’s a cloud computing platform that sits on your desk. You could also use it as a Linux workstation if you like. If you would like to have double DIGITS, this is what it looks like—you connect it together with ConnectX, and it has NVGPU Direct, all of that out of the box. It’s like a supercomputer: our entire supercomputing stack is available.
So, NVIDIA Project DIGITS. [Applause]
Conclusion / Wrap-Up:
Okay, well let me tell you what I told you. I told you that we are in production with three new Blackwells. Not only is the Grace Blackwell supercomputers—NVLink 72s—in production all over the world; we now have three new Blackwell systems in production, one amazing AI foundation—world foundation model, the world’s first physical AI foundation model, is open and available to activate the world’s industries of robotics and such. And three robotics. Three robots working on agentic AI, humanoid robots, and self-driving cars.
It’s been an incredible year. I want to thank all of you for your partnership. Thank all of you for coming. I made you a short video to reflect on last year and look forward to the next year. Play, please.
[Video begins, montage]
[Music] [Applause]
[Music]
[Music]
[Music] [Applause]
[Music]
[Music]
Have a great CES, everybody. Happy New Year. Thank you.
One Comment