100:00:06,480-->00:00:08,400
Good morning. We have a banger for you200:00:08,400-->00:00:09,840
today. We're going to launch chatbt300:00:09,840-->00:00:11,519
agent. But before jumping into that, I'd400:00:11,519-->00:00:12,559
like to ask the team to introduce500:00:12,559-->00:00:14,080
themselves. Starting with Yosh.600:00:14,080-->00:00:17,840
Hi, I'm Yash. I work on agent team and700:00:17,840-->00:00:20,080
before that I used to work on operator.800:00:20,080-->00:00:22,560
Hi, I'm Jing. I work on agents research900:00:22,560-->00:00:24,400
previously on deep research.1000:00:24,400-->00:00:26,000
Hi, I'm Casey. I'm a researcher on1100:00:26,000-->00:00:27,920
agents formerly operator.1200:00:27,920-->00:00:30,560
Hi, I'm Issa. I'm a researcher on agent1300:00:30,560-->00:00:32,640
formerly on deep research.1400:00:32,640-->00:00:34,880
So we we started launching agents1500:00:34,880-->00:00:36,800
earlier this year. Uh we launched deep1600:00:36,800-->00:00:38,879
research, we launched operator and1700:00:38,879-->00:00:40,160
people were very excited about this.1800:00:40,160-->00:00:42,480
People could see that now uh AI was1900:00:42,480-->00:00:44,640
going off to do complex tasks for them.2000:00:44,640-->00:00:46,079
But it became clear to us that what2100:00:46,079-->00:00:48,000
people really wanted was for us to bring2200:00:48,000-->00:00:49,760
those capabilities and more together.2300:00:49,760-->00:00:51,920
People wanted a unified agent that could2400:00:51,920-->00:00:55,039
go off, use its own computer and do real2500:00:55,039-->00:00:57,360
complex tasks for them, that could uh2600:00:57,360-->00:00:59,359
seamlessly transition from thinking2700:00:59,359-->00:01:01,520
about something to taking actions to2800:01:01,520-->00:01:03,359
using lots of tools using the terminal,2900:01:03,359-->00:01:05,360
clicking around the web, even producing3000:01:05,360-->00:01:06,880
things like spreadsheets and slides and3100:01:06,880-->00:01:08,960
and much more. And wanted people want to3200:01:08,960-->00:01:10,159
be able to do this over a long time3300:01:10,159-->00:01:12,159
horizon and a sort of for universal3400:01:12,159-->00:01:13,840
tasks. So the team has been working3500:01:13,840-->00:01:16,400
super hard to bring that together. And3600:01:16,400-->00:01:18,080
today we have chat with the agent. Um,3700:01:18,080-->00:01:19,680
it's probably easier to show it to you3800:01:19,680-->00:01:21,439
than to keep talking about it. It is one3900:01:21,439-->00:01:23,360
of the feel the aon moments for me to4000:01:23,360-->00:01:25,280
watch it work. So, let's take a look.4100:01:25,280-->00:01:27,840
Awesome. Thanks, Sam. Hello, everyone.4200:01:27,840-->00:01:29,920
Very excited to share chat GBD agent4300:01:29,920-->00:01:31,600
with everybody. And as Sam said, let's4400:01:31,600-->00:01:33,759
just dive right into the demo. Okay, so4500:01:33,759-->00:01:36,159
we are on Chad GBD as we all know and4600:01:36,159-->00:01:39,119
love. And to turn on the agent mode, you4700:01:39,119-->00:01:40,880
just click the tools menu and select4800:01:40,880-->00:01:43,280
agent. You can also just type agent in4900:01:43,280-->00:01:45,040
the composer bar and it'll take you to5000:01:45,040-->00:01:47,520
agent mode. Um, Edward and I have a5100:01:47,520-->00:01:49,360
wedding to go to later this year. Uh,5200:01:49,360-->00:01:51,119
it's for one of our mutual friends.5300:01:51,119-->00:01:52,560
Should we should we have the Asian5400:01:52,560-->00:01:53,280
planet?5500:01:53,280-->00:01:55,680
Yeah, let's do it. I need an outfit. And5600:01:55,680-->00:01:56,799
don't forget the gift.5700:01:56,799-->00:01:58,719
Okay, great. We won't forget the gift.5800:01:58,719-->00:02:00,240
Um, it's a little bit of a longer5900:02:00,240-->00:02:01,680
prompt, so I have it copied in my6000:02:01,680-->00:02:02,799
buffer, so I'm just going to go ahead6100:02:02,799-->00:02:05,759
and paste it. Um, okay. So, let's see.6200:02:05,759-->00:02:07,360
Let's see what it says. Our friends are6300:02:07,360-->00:02:08,640
getting married later this year, as I6400:02:08,640-->00:02:10,720
said, Minia and Sarah. And we want the6500:02:10,720-->00:02:12,879
agent to help us find an outfit that6600:02:12,879-->00:02:15,520
matches the dress code. uh propose a few6700:02:15,520-->00:02:17,840
options. Nice mid luxury taking into6800:02:17,840-->00:02:21,040
account venue and weather. We also want6900:02:21,040-->00:02:23,280
to find us some hotels and as Edward7000:02:23,280-->00:02:25,760
said, don't forget the gift. Um so let's7100:02:25,760-->00:02:27,840
see and7200:02:27,840-->00:02:30,319
send the prompt away. As Sam said, agent7300:02:30,319-->00:02:32,640
uses a computer. Uh so in the beginning7400:02:32,640-->00:02:34,959
it sets up its environment. It it you7500:02:34,959-->00:02:38,000
know it'll take a minute or two or not7600:02:38,000-->00:02:39,680
really 5 seconds to set up its7700:02:39,680-->00:02:41,440
environment. And in this case, as you7800:02:41,440-->00:02:43,840
see, it understands the prompt. It's7900:02:43,840-->00:02:46,319
asking for me for a clarification. I'm8000:02:46,319-->00:02:48,000
just going to let it just continue and8100:02:48,000-->00:02:51,120
work. Anyway, um I think it got confused8200:02:51,120-->00:02:54,239
by saying,"Oh, where's the um what8300:02:54,239-->00:02:55,680
exactly is the time of the date of the8400:02:55,680-->00:02:57,200
wedding?" I think it'll figure out using8500:02:57,200-->00:02:59,840
the website. Okay, cool. So, now it's8600:02:59,840-->00:03:01,760
kicked off. It's starting the process,8700:03:01,760-->00:03:03,920
the prompt, and it's open up a browser.8800:03:03,920-->00:03:04,959
And to walk you through what's8900:03:04,959-->00:03:06,800
happening, here's9000:03:06,800-->00:03:09,040
Yeah. So, as mentioned, we gave the9100:03:09,040-->00:03:10,879
agent access to its own virtual9200:03:10,879-->00:03:13,280
computer, and the computer has many9300:03:13,280-->00:03:14,720
different tools installed, and it can9400:03:14,720-->00:03:16,239
choose which to use as it's working9500:03:16,239-->00:03:18,640
through the task. So, in chat GPT, you9600:03:18,640-->00:03:21,360
can see a visualization of the agent's9700:03:21,360-->00:03:23,680
computer screen, and you can see9800:03:23,680-->00:03:25,519
overlaid its chain of thought in text,9900:03:25,519-->00:03:27,200
and that's what it's thinking as it's10000:03:27,200-->00:03:28,480
working through the task and deciding10100:03:28,480-->00:03:30,799
what to do next. We gave the agent10200:03:30,799-->00:03:32,400
access to two different ways to browse10300:03:32,400-->00:03:34,560
the internet. First, we gave it a text10400:03:34,560-->00:03:36,159
browser, and this is similar to the deep10500:03:36,159-->00:03:38,000
research tool. And this is what lets it10600:03:38,000-->00:03:40,159
really efficiently and quickly read many10700:03:40,159-->00:03:43,440
web pages um um and search for them. And10800:03:43,440-->00:03:45,040
we also gave it access to a visual10900:03:45,040-->00:03:46,319
browser. And this is similar to the11000:03:46,319-->00:03:48,239
operator tool. And this is what lets it11100:03:48,239-->00:03:50,159
actually interact with the UI of a web11200:03:50,159-->00:03:52,720
page. So it can um drag things. It can11300:03:52,720-->00:03:54,879
use the cursor to click around. It can11400:03:54,879-->00:03:57,280
open UI components. It can fill out11500:03:57,280-->00:03:59,920
forms and enter text and text areas.11600:03:59,920-->00:04:02,560
It's very flexible. So those two tools11700:04:02,560-->00:04:04,720
are very complimentary. And then we also11800:04:04,720-->00:04:06,720
gave it access to its own terminal so11900:04:06,720-->00:04:08,720
that it can run code and it can also12000:04:08,720-->00:04:10,640
generate and analyze files like slide12100:04:10,640-->00:04:12,879
decks and spreadsheets. And then through12200:04:12,879-->00:04:14,560
the terminal it's also able to call12300:04:14,560-->00:04:17,840
APIs. So both public APIs and APIs to12400:04:17,840-->00:04:19,840
access your private data sources like12500:04:19,840-->00:04:22,479
Google Drive, Google Calendar, GitHub,12600:04:22,479-->00:04:25,360
SharePoint and many others um and only12700:04:25,360-->00:04:26,960if you explicitly connect them similar12800:04:26,960-->00:04:28,960
to deep research connectors. And then it12900:04:28,960-->00:04:31,680
also has access to the image gen API so13000:04:31,680-->00:04:34,240
it can create nice visuals for um slide13100:04:34,240-->00:04:36,080
decks and other things as it's working13200:04:36,080-->00:04:38,240
through its tasks.13300:04:38,240-->00:04:40,800
How is deciding which tools to use here?13400:04:40,800-->00:04:42,560
Yes, we train the model to move between13500:04:42,560-->00:04:44,160
these capabilities with reinforcement13600:04:44,160-->00:04:46,080
learning. This is the first model we13700:04:46,080-->00:04:48,880
trained that has access to this unified13800:04:48,880-->00:04:52,000
tool box. A text browser, a GUI browser13900:04:52,000-->00:04:53,840
and a terminal all in one virtual14000:04:53,840-->00:04:57,120
machine. To guide its learning, we14100:04:57,120-->00:04:59,360
created hard tasks that require using14200:04:59,360-->00:05:01,919
all these tools. This allows the model14300:05:01,919-->00:05:04,000
not only to learn how to use these14400:05:04,000-->00:05:06,160
tools, but also when to use which tool14500:05:06,160-->00:05:08,400
depending on the task at hand. At the14600:05:08,400-->00:05:10,400
beginning of the training, the model14700:05:10,400-->00:05:12,880
might attempt to use all these tools to14800:05:12,880-->00:05:15,600
solve a relatively simple problem. Over14900:05:15,600-->00:05:17,840
time, as we reward the model for solving15000:05:17,840-->00:05:20,560
problems correctly and efficiently, the15100:05:20,560-->00:05:24,080
model will have smarter tool choice.15200:05:24,080-->00:05:27,360
For example,if you ask a model to uh15300:05:27,360-->00:05:29,039
find a restaurant with specific15400:05:29,039-->00:05:31,919
requirements and make a reservation, the15500:05:31,919-->00:05:34,479
model may typically just start a deep15600:05:34,479-->00:05:36,160
research in the text browser to find15700:05:36,160-->00:05:39,039
some candidates, then switch to the GUI15800:05:39,039-->00:05:42,160
browser to view photos of food, uh check15900:05:42,160-->00:05:45,600
availability, and complete the booking.16000:05:45,600-->00:05:48,000
Similarly,for creative task like16100:05:48,000-->00:05:50,160
creating an artifact, the model will16200:05:50,160-->00:05:51,680
first search online for public16300:05:51,680-->00:05:54,479
resources, then switch to the terminal16400:05:54,479-->00:05:57,039
to do some code editing to compile the16500:05:57,039-->00:05:59,919
artifact and finally verify the final16600:05:59,919-->00:06:02,960
outputs in the GUI browser. With this,16700:06:02,960-->00:06:05,600
we truly feel like we brought together16800:06:05,600-->00:06:08,240
the best of deep research and operator16900:06:08,240-->00:06:11,759
and added some extra sparkle.17000:06:11,759-->00:06:14,000
That's right. Yeah. So to put this17100:06:14,000-->00:06:15,520
project in context, I want to give a bit17200:06:15,520-->00:06:18,000
of history. So a few months ago, we17300:06:18,000-->00:06:20,960
shipped operator in January and this was17400:06:20,960-->00:06:23,120
our agent that lets you do online tasks17500:06:23,120-->00:06:25,759
like book reservations and um send17600:06:25,759-->00:06:27,840
emails and then two weeks later we17700:06:27,840-->00:06:29,919
shipped deep research and deep research17800:06:29,919-->00:06:31,919
is a tool that lets you do in-depth17900:06:31,919-->00:06:35,759
internet research and output highquality18000:06:35,759-->00:06:39,280
um um research reports. And after launch18100:06:39,280-->00:06:41,039
we realized that actually these two18200:06:41,039-->00:06:42,319
approaches are actually deeply18300:06:42,319-->00:06:44,160
complimentary.18400:06:44,160-->00:06:46,400
Um for example operator has some trouble18500:06:46,400-->00:06:48,720
reading super long articles. Um it has18600:06:48,720-->00:06:50,400
to scroll. It takes a long time. But18700:06:50,400-->00:06:51,759
that's something that deep research is18800:06:51,759-->00:06:56,240
good at. Conversely operator uh uh deep18900:06:56,240-->00:06:58,240
research isn't as good at interacting19000:06:58,240-->00:07:00,319
with web pages interactive elements19100:07:00,319-->00:07:03,199
visual uh highly visual web pages but19200:07:03,199-->00:07:04,800
that's something that operator excels19300:07:04,800-->00:07:08,639
at. So uh yeah we felt these approaches19400:07:08,639-->00:07:11,120
were complimentary and then we we were19500:07:11,120-->00:07:13,120
also looking at some customer feedback.19600:07:13,120-->00:07:14,880
So for example one of our most highly19700:07:14,880-->00:07:17,120
requested features for deep research was19800:07:17,120-->00:07:18,960
the ability to log into websites and19900:07:18,960-->00:07:20,960
access authenticated sources. That's20000:07:20,960-->00:07:22,880
something that operator can do.20100:07:22,880-->00:07:24,000
I've been waiting for that for a long20200:07:24,000-->00:07:24,560
time.20300:07:24,560-->00:07:26,160
Yeah.20400:07:26,160-->00:07:28,479
Um another thing is that we were looking20500:07:28,479-->00:07:29,840
at the prompts that people were trying20600:07:29,840-->00:07:31,520for operator and we saw that they were20700:07:31,520-->00:07:32,880
actually more deep research type20800:07:32,880-->00:07:35,199
prompts.for example, plan a trip and20900:07:35,199-->00:07:38,240
then book it. And so, yeah, we we really21000:07:38,240-->00:07:39,360
feel like we're bringing the best of21100:07:39,360-->00:07:41,440
both worlds here. And on a personal21200:07:41,440-->00:07:42,800
note, we've all been friends for a21300:07:42,800-->00:07:44,160while, and it's really exciting to be21400:07:44,160-->00:07:46,479
working together. So, speaking of21500:07:46,479-->00:07:48,960
matches made in heaven, how is the21600:07:48,960-->00:07:50,319
wedding planning going?21700:07:50,319-->00:07:51,759
It's amazing to watch. This is an21800:07:51,759-->00:07:53,599
example of a task I hate doing. This can21900:07:53,599-->00:07:55,520
like ruin like, you know, multiple hours22000:07:55,520-->00:07:56,960for me as I get sucked into these rabbit22100:07:56,960-->00:07:58,160
holes. So, just watching this as you22200:07:58,160-->00:07:59,520
guys have been talking click through22300:07:59,520-->00:08:01,199
this and just like do the whole thing is22400:08:01,199-->00:08:03,360
really quite remarkable. Yeah, totally.22500:08:03,360-->00:08:06,560
Um, looks like it started off by22600:08:06,560-->00:08:08,560
figuring out the weather. One of the22700:08:08,560-->00:08:11,280
cool features, um, is that, you know, as22800:08:11,280-->00:08:12,560
some of these tasks may take a little22900:08:12,560-->00:08:14,160
bit longer, you can just go back and see23000:08:14,160-->00:08:15,759
what it was doing. So, that's what we're23100:08:15,759-->00:08:17,199
exactly going to do. Looks like it went23200:08:17,199-->00:08:18,720
through the website to use the text23300:08:18,720-->00:08:21,039
browser. Interestingly,for that, now23400:08:21,039-->00:08:22,400
it's looking through the suits for23500:08:22,400-->00:08:23,919
Edward. I think it'll find something23600:08:23,919-->00:08:25,360
good. Here you can see it switched over23700:08:25,360-->00:08:27,199
to actually a visual browser to make23800:08:27,199-->00:08:28,960
sure suit will look really good on23900:08:28,960-->00:08:31,280
Edward.24000:08:31,280-->00:08:34,560
And now looks like yeah, it's got24100:08:34,560-->00:08:36,880
chugging along, figuring out what to do.24200:08:36,880-->00:08:39,599
Um, and still on suits and now probably24300:08:39,599-->00:08:41,919
getting to the gifts section. Um, okay,24400:08:41,919-->00:08:43,279
cool. So, this is going to take a while.24500:08:43,279-->00:08:44,959
As Sam said, these tasks sometimes can24600:08:44,959-->00:08:46,160
take a long time. So, it's going to24700:08:46,160-->00:08:47,680continue doing hopefully much faster24800:08:47,680-->00:08:49,760
than we will do. Um, should we do24900:08:49,760-->00:08:51,600
something elsewhile it's doing it? I25000:08:51,600-->00:08:53,519
think the team really wanted the um25100:08:53,519-->00:08:55,279
stickers, some stickers for the for the25200:08:55,279-->00:08:56,480
launch. Should we do that?25300:08:56,480-->00:08:57,279
Yeah, cool.25400:08:57,279-->00:08:59,040
All right. So, we have a team mascot,25500:08:59,040-->00:09:00,320
which is one of our colleagues, Bunny25600:09:00,320-->00:09:03,279
Doodle. really really cute tell you. Um25700:09:03,279-->00:09:06,080
and we're going to try and bring um get25800:09:06,080-->00:09:08,480
some laptop stickers for everybody. Uh25900:09:08,480-->00:09:10,480
one of the favorite features for agent26000:09:10,480-->00:09:13,120
is given that trajectories can take 1526100:09:13,120-->00:09:15,040
minutes,20 minutes,30 minutes26200:09:15,040-->00:09:17,120
depending on the complexity of the task.26300:09:17,120-->00:09:19,120
Um a lot of times the you might need to26400:09:19,120-->00:09:20,560
help the agent. Agent might need to ask26500:09:20,560-->00:09:22,480
you clarifications, confirmations and26600:09:22,480-->00:09:25,040
things like that. Um so I love to use it26700:09:25,040-->00:09:26,640
on the go. So I'm going to use my mobile26800:09:26,640-->00:09:28,160
phone to actually send the query this26900:09:28,160-->00:09:30,240
time and then see how it goes.27000:09:30,240-->00:09:32,880
Okay, so let's see. Okay, so we are on27100:09:32,880-->00:09:35,519
Chad Gibbdi. Uh I have already selected27200:09:35,519-->00:09:38,560
the agent mode. I've also inputed our uh27300:09:38,560-->00:09:40,560
cute mascot and I'm going to quickly27400:09:40,560-->00:09:43,040
paste a query. So query says make some27500:09:43,040-->00:09:45,279
swag for the team one by one laptop27600:09:45,279-->00:09:47,920
stickers and order 500 of them. I'll27700:09:47,920-->00:09:52,959
also say I like sticker mule27800:09:52,959-->00:09:55,279
which we have used in the past and send27900:09:55,279-->00:09:57,200
it off.28000:09:57,200-->00:10:00,080
Okay. So, just like it was doing on the28100:10:00,080-->00:10:02,080
web, it's going to take some time, think28200:10:02,080-->00:10:04,080
about like what's it doing, and it'll28300:10:04,080-->00:10:07,120
kick off kick off the query. And as it's28400:10:07,120-->00:10:08,880
going, it'll take some time to kick it28500:10:08,880-->00:10:11,200
off. Is it Oh, there we go. So, it'll28600:10:11,200-->00:10:12,480
start working on it. Looks like it's28700:10:12,480-->00:10:14,720
starting to create the anime art. It'll28800:10:14,720-->00:10:16,640
probably use image that Isa referred28900:10:16,640-->00:10:18,399
earlier on to create hopefully an anime29000:10:18,399-->00:10:20,240
art. We'll see how it comes out. While29100:10:20,240-->00:10:21,760
that's going, anything else we want to29200:10:21,760-->00:10:22,399do?29300:10:22,399-->00:10:24,720
Oh, yeah. I also need a pair of shoes29400:10:24,720-->00:10:26,320
because my shoes got damaged.29500:10:26,320-->00:10:27,360
How did they get damaged?29600:10:27,360-->00:10:28,560
Uh, by the rain29700:10:28,560-->00:10:30,000
in SF.29800:10:30,000-->00:10:30,800
Yes.29900:10:30,800-->00:10:32,160
Cool. All right. Uh, well, let's get30000:10:32,160-->00:10:34,240
Edward a pair of shoes as well. So, oh,30100:10:34,240-->00:10:40,320
can you also find us um pair of men's30200:10:40,320-->00:10:43,519
dress black shoes in size30300:10:43,519-->00:10:44,2409.5?30400:10:44,240-->00:10:46,0009.5.30500:10:46,000-->00:10:47,920
So, one of the key capabilities of the30600:10:47,920-->00:10:49,920
model is being able to interrupt. I30700:10:49,920-->00:10:51,920
think you know as trajectories take long30800:10:51,920-->00:10:53,760
time or whatever time it's really30900:10:53,760-->00:10:56,720
important for us to for it to feel very31000:10:56,720-->00:10:59,120
multi-turn so the users can interject31100:10:59,120-->00:11:01,120
user can direct it user can give it more31200:11:01,120-->00:11:02,640
guidance less guidance whatever we want31300:11:02,640-->00:11:04,320
to do and that's what we're doing here31400:11:04,320-->00:11:07,040
we essentially the the model was31500:11:07,040-->00:11:08,720
chugging along figuring out all the31600:11:08,720-->00:11:10,240
things that we had asked before and in31700:11:10,240-->00:11:12,320
this case we essentially said hey can31800:11:12,320-->00:11:16,000
you also uh get us a pair of men's black31900:11:16,000-->00:11:18,160
shoes and now it's thinking and soon32000:11:18,160-->00:11:19,839
enough hopefully it'll take that into32100:11:19,839-->00:11:22,000
account and keep going uh into its32200:11:22,000-->00:11:23,600
trajectory. There we go. So, it said32300:11:23,600-->00:11:25,120
acknowledge the interruption. It said,32400:11:25,120-->00:11:26,880"Okay, cool. I'll also research men's32500:11:26,880-->00:11:29,600
black shoes in size 9.5." Um, and then32600:11:29,600-->00:11:31,680
it'll probably get on its way. Um, but32700:11:31,680-->00:11:33,120
maybe Issa can tell us a little bit more32800:11:33,120-->00:11:34,240
about how that works.32900:11:34,240-->00:11:36,320
Yeah, sure. So, as you can see, the33000:11:36,320-->00:11:38,079
agent is very collaborative, and this33100:11:38,079-->00:11:39,920
was really important to us when we were33200:11:39,920-->00:11:41,200
training the model and building the33300:11:41,200-->00:11:42,880
product. If you were asking another33400:11:42,880-->00:11:44,399
person to do a task for you that would33500:11:44,399-->00:11:45,519
take them a really long time to33600:11:45,519-->00:11:46,959
complete, you'd probably give them some33700:11:46,959-->00:11:48,800
instructions to start and then they33800:11:48,800-->00:11:50,640
might ask you some clarifying questions33900:11:50,640-->00:11:52,320
and then they'd start the task and maybe34000:11:52,320-->00:11:53,600
realize, oh, they need more34100:11:53,600-->00:11:55,440
clarification from you or they need your34200:11:55,440-->00:11:56,880
permission to sign into something or do34300:11:56,880-->00:11:58,560
something on your behalf and then you34400:11:58,560-->00:12:00,240
might realize, oh, I forgot to mention34500:12:00,240-->00:12:02,640
this thing or um what's your status? How34600:12:02,640-->00:12:04,240
are you doing? Can I help redirect you34700:12:04,240-->00:12:05,760if you're getting along the wrong path34800:12:05,760-->00:12:07,760
or something? And so similarly for these34900:12:07,760-->00:12:09,680
really longrunning agentic tasks, it's35000:12:09,680-->00:12:11,519
very important that both the user and35100:12:11,519-->00:12:13,600
the agent are able to initiate35200:12:13,600-->00:12:15,519
communication with each other so that um35300:12:15,519-->00:12:17,200
the agent is able to most effectively35400:12:17,200-->00:12:19,360
help you with your tasks. And so this is35500:12:19,360-->00:12:20,560
something that we actually trained into35600:12:20,560-->00:12:22,320
the model. We trained it to be able to35700:12:22,320-->00:12:24,160
ask clarifying questions, not every35800:12:24,160-->00:12:26,240
single time like deep research. Um we35900:12:26,240-->00:12:28,800
also asked it we also trained it to be36000:12:28,800-->00:12:30,560
interruptible as Yash just showed. And36100:12:30,560-->00:12:32,000
also sometimes it will ask you for36200:12:32,000-->00:12:33,519
clarification and confirmation36300:12:33,519-->00:12:35,680
mid-trajectory.36400:12:35,680-->00:12:38,079
Yeah. And part of working with agent is36500:12:38,079-->00:12:40,480
that well sometimes it'll make mistakes.36600:12:40,480-->00:12:42,079
And that's why we felt it was important36700:12:42,079-->00:12:44,079
to train the model to ask you for36800:12:44,079-->00:12:45,920
confirmation at the last step of36900:12:45,920-->00:12:49,279
important steps. Um so for example maybe37000:12:49,279-->00:12:51,519
before it's going to send the email um37100:12:51,519-->00:12:53,440
it'll ask you to take a look at the37200:12:53,440-->00:12:54,720
draft and whether it makes sense and37300:12:54,720-->00:12:56,079
whether there are any embarrassing37400:12:56,079-->00:12:59,200
typos. Um, and if there are, then you37500:12:59,200-->00:13:01,360
can either ask it to fix it or you can37600:13:01,360-->00:13:03,440
directly take over the browser and jump37700:13:03,440-->00:13:06,079
right into the um, agents environment37800:13:06,079-->00:13:09,040
and correct it yourself. And that way it37900:13:09,040-->00:13:10,720
feels collaborative and you can um,38000:13:10,720-->00:13:13,680
really work with the agent.38100:13:13,680-->00:13:15,120
Should we look at maybe one more demo?38200:13:15,120-->00:13:17,279
We've got this uh, sort of fun tradition38300:13:17,279-->00:13:19,600
in live streams of using uh, using our38400:13:19,600-->00:13:21,120
newest models to sort of evaluate38500:13:21,120-->00:13:23,040
themselves or do something kind of meta.38600:13:23,040-->00:13:24,240
Anything like that we could do?38700:13:24,240-->00:13:27,440
Yeah, let's do it.38800:13:27,440-->00:13:28,320
So um38900:13:28,320-->00:13:29,440
I think people would love to know how39000:13:29,440-->00:13:30,320
good the model is.39100:13:30,320-->00:13:33,920
Yes. So this is a prompt we previously39200:13:33,920-->00:13:36,880
gave the a agent yesterday. So basically39300:13:36,880-->00:13:38,959
it asks the model to pull its own39400:13:38,959-->00:13:40,959
evalution number from our Google job39500:13:40,959-->00:13:43,440
connector and make some slides. So we39600:13:43,440-->00:13:44,959
want to keep it simple like no39700:13:44,959-->00:13:47,360
introduction no conclusion just present39800:13:47,360-->00:13:50,000
the results with in the charts. As you39900:13:50,000-->00:13:52,160
can see now the model is connecting to40000:13:52,160-->00:13:55,120
the Google Drive API and uh then search40100:13:55,120-->00:13:57,600
within API it right now it looks like40200:13:57,600-->00:13:59,920
the first result is very relevant. So40300:13:59,920-->00:14:02,720
it's reading the first result.40400:14:02,720-->00:14:04,959
Now it's reading the first result uh in40500:14:04,959-->00:14:07,920
details. Uh let's accelerate this uh40600:14:07,920-->00:14:12,800
replay. So then the model might read40700:14:12,800-->00:14:15,279
from the result again and write some40800:14:15,279-->00:14:16,959
code.40900:14:16,959-->00:14:19,519
So here you can see that the model is41000:14:19,519-->00:14:21,920
using the image generation model called41100:14:21,920-->00:14:24,480
image generation tool to generate some41200:14:24,480-->00:14:28,079
decorations for the slides.41300:14:28,079-->00:14:30,160
And let's see what's the first slide the41400:14:30,160-->00:14:33,399
model made.41500:14:33,920-->00:14:35,920
So here the model is writing some code41600:14:35,920-->00:14:38,399
that will be compiled to be the final41700:14:38,399-->00:14:41,120
slides. So this is the first slide the41800:14:41,120-->00:14:44,160
model make in this demo which looks okay41900:14:44,160-->00:14:46,240
but it's not polished enough.42000:14:46,240-->00:14:48,240
One of the key feature in reinforcement42100:14:48,240-->00:14:50,160
learning is that the model will re42200:14:50,160-->00:14:52,240
review its own results and refine the42300:14:52,240-->00:14:55,120
results to to deliver a good final42400:14:55,120-->00:14:57,839
results. Let's see what's the finally42500:14:57,839-->00:15:00,320
what the model give us.42600:15:00,320-->00:15:04,000
We can click skip and then the model42700:15:04,000-->00:15:07,519
give us a good uh PowerPoint file. So42800:15:07,519-->00:15:09,040
it's a real PowerPoint that you can42900:15:09,040-->00:15:14,040
download and open it in any software.43000:15:14,639-->00:15:19,279
Let's open it in uh in the office. So43100:15:19,279-->00:15:22,160
let's present the slides the model just43200:15:22,160-->00:15:23,839
generated.43300:15:23,839-->00:15:27,120
First are two intelligence benchmarks.43400:15:27,120-->00:15:30,480
Humanities last exam is a benchmark that43500:15:30,480-->00:15:33,519
measures AI's ability to solve a broad43600:15:33,519-->00:15:37,120
range of subjects on hard problems. We43700:15:37,120-->00:15:40,320
evaluate the models with two settings43800:15:40,320-->00:15:43,440
with and without tool use.43900:15:43,440-->00:15:45,920
We can see that the agent modes the raw44000:15:45,920-->00:15:48,720
intelligence is already pretty nice and44100:15:48,720-->00:15:50,880
with access to all tools nearly double44200:15:50,880-->00:15:54,720
the performance to 42%.44300:15:54,720-->00:15:56,720
When evaluating models on humanity's44400:15:56,720-->00:15:59,360
last exam, especially with the browsing44500:15:59,360-->00:16:01,759
ability, we have a two-layer44600:16:01,759-->00:16:04,399
decontamination that ensure that the44700:16:04,399-->00:16:07,680
model doesn't cheat on this benchmark.44800:16:07,680-->00:16:10,079
Front TMS is a benchmark that measures44900:16:10,079-->00:16:11,839
advanced mathematical reasoning ability45000:16:11,839-->00:16:13,680
of models.45100:16:13,680-->00:16:16,000
Different from our baseline of mini and45200:16:16,000-->00:16:18,56003 which use Python with function45300:16:18,560-->00:16:21,440
coding. We give the agent model all45400:16:21,440-->00:16:23,440
available tools like a browser, a45500:16:23,440-->00:16:26,320
computer and a terminal. The agent45600:16:26,320-->00:16:29,360
achieves new state art of 27% on this45700:16:29,360-->00:16:31,440
benchmark with the help of all these45800:16:31,440-->00:16:34,440
tools.45900:16:34,639-->00:16:36,880
Next, we evaluated the model on two46000:16:36,880-->00:16:39,519
agentic benchmarks. Web arena is a46100:16:39,519-->00:16:41,519
benchmark that measures web agents46200:16:41,519-->00:16:43,600
ability so to solve real world web46300:16:43,600-->00:16:47,279
tasks. The agent model improves over46400:16:47,279-->00:16:51,360
previous O3 model that powers the core.46500:16:51,360-->00:16:54,399
Browse comp is a benchmark we introduced46600:16:54,399-->00:16:56,240
earlier this year that measures the46700:16:56,240-->00:16:58,880
browsing agents ability to search and46800:16:58,880-->00:17:02,320
find uh how to locate information.46900:17:02,320-->00:17:03,839
The agent model significantly47000:17:03,839-->00:17:06,160
outperforms 03 and deep research on this47100:17:06,160-->00:17:11,679
benchmark achieving 69% pass rate.47200:17:11,679-->00:17:14,559
Finally, we care about how the users47300:17:14,559-->00:17:16,959
will benefit from our model in the real47400:17:16,959-->00:17:19,919
world. Spreadsheet bench is a benchmark47500:17:19,919-->00:17:21,919
that measures the model's ability to47600:17:21,919-->00:17:24,400
edit spreadsheets derived from the real47700:17:24,400-->00:17:28,079
world use case. Here the agent model47800:17:28,079-->00:17:30,480
with the liberal office and the computer47900:17:30,480-->00:17:34,000
tool can already solve 30% of the task48000:17:34,000-->00:17:36,480
when we give the model the access to the48100:17:36,480-->00:17:39,840
raw Excel file in the terminal which48200:17:39,840-->00:17:44,000
further boost the performance to 45%.48300:17:44,000-->00:17:46,000
Finally we evated the model on an48400:17:46,000-->00:17:48,000
internal banking benchmark. The bench48500:17:48,000-->00:17:49,760
this benchmark evaluated the model's48600:17:49,760-->00:17:52,559
ability to to conduct first to third48700:17:52,559-->00:17:55,679
year investment bank uh banking analyst48800:17:55,679-->00:17:58,799
tasks such as like putting together a48900:17:58,799-->00:18:00,559
three statement financial model for49000:18:00,559-->00:18:04,000
Fortune uh 500 company in this49100:18:04,000-->00:18:06,160
benchmark. The agent model significantly49200:18:06,160-->00:18:08,080
outperforms the previous deep research49300:18:08,080-->00:18:11,760
and all three models. As you can see49400:18:11,760-->00:18:13,919
this model is one of the most powerful49500:18:13,919-->00:18:16,080
model we've ever trained.49600:18:16,080-->00:18:18,960
It's not only good on benchmarks, it's49700:18:18,960-->00:18:22,480
also capable of reasoning, browsing, and49800:18:22,480-->00:18:24,720
tackling real world tasks at a level49900:18:24,720-->00:18:28,480
that we cannot imagine three months ago.50000:18:28,480-->00:18:31,600
That's right. Um, as Edward said, um, we50100:18:31,600-->00:18:32,799
think we've trained a very powerful50200:18:32,799-->00:18:35,280
model and a lot of the power comes from50300:18:35,280-->00:18:38,240
its ability to browse the internet. And50400:18:38,240-->00:18:40,240
as we know, the internet can be a scary50500:18:40,240-->00:18:42,400
place. There are all sorts of hackers50600:18:42,400-->00:18:45,120
trying to steal your information, scams,50700:18:45,120-->00:18:48,480
uh fishing attempts. Um and agent isn't50800:18:48,480-->00:18:51,120
immune to all these things. Um one50900:18:51,120-->00:18:53,360
particular thing we're worried about is51000:18:53,360-->00:18:55,520
a new uh attack called prompt51100:18:55,520-->00:18:57,120
injections.51200:18:57,120-->00:18:59,840
This is where let's say you ask agent to51300:18:59,840-->00:19:02,080
buy you a book and you give it your51400:19:02,080-->00:19:04,400
credit card information to do that.51500:19:04,400-->00:19:06,240
Agent might stumble upon a malicious51600:19:06,240-->00:19:08,559
website that asks it,"Oh, enter your51700:19:08,559-->00:19:10,400
credit card information here. it'll help51800:19:10,400-->00:19:12,799
you with your task. An agent, which is51900:19:12,799-->00:19:15,200
trained to be helpful, might decide52000:19:15,200-->00:19:18,080
that's a good idea.52100:19:18,080-->00:19:19,760
We've done a lot of work to try to52200:19:19,760-->00:19:22,320
ensure that this doesn't happen. We've52300:19:22,320-->00:19:24,240
trained our model to ignore suspicious52400:19:24,240-->00:19:27,120
instructions on on suspicious websites.52500:19:27,120-->00:19:29,039
We've also have uh we also have layers52600:19:29,039-->00:19:32,000
of monitors that kind of peer over the52700:19:32,000-->00:19:33,760
agent's shoulder and watch it as it's52800:19:33,760-->00:19:36,480
going um and stop the trajectory if52900:19:36,480-->00:19:38,799
anything looks suspicious. We can even53000:19:38,799-->00:19:41,919
update these in real time if new attacks53100:19:41,919-->00:19:44,160
are found in the wild.53200:19:44,160-->00:19:45,919
That said though, you know, this is a53300:19:45,919-->00:19:47,760
cutting edge product. This is a new53400:19:47,760-->00:19:50,000
surface and we can't stop everything.53500:19:50,000-->00:19:51,280
And so that's why I feel it's very53600:19:51,280-->00:19:52,559
important for the audience to be aware53700:19:52,559-->00:19:55,360
of the risks involved in using agent.53800:19:55,360-->00:19:57,440
And um we encourage users to be53900:19:57,440-->00:19:59,520
proactive in kind of thinking about how54000:19:59,520-->00:20:01,120
they share their information. You know,54100:20:01,120-->00:20:02,880if it's highly sensitive information,54200:20:02,880-->00:20:06,799
maybe don't share that. um maybe um uh54300:20:06,799-->00:20:08,799
use our features like takeover mode to54400:20:08,799-->00:20:10,799
directly input your credit credit card54500:20:10,799-->00:20:12,880
information into the browser instead of54600:20:12,880-->00:20:15,679
um giving it to agent. Um we feel like54700:20:15,679-->00:20:18,640
we've built a very powerful product but54800:20:18,640-->00:20:20,480
again it's important for our users to54900:20:20,480-->00:20:21,760
understand the risk involved.55000:20:21,760-->00:20:23,280
Yeah, I really want to emphasize that I55100:20:23,280-->00:20:25,520
think this is a new level of capability55200:20:25,520-->00:20:27,120
in AI. It's a new way to use AI, but55300:20:27,120-->00:20:28,799
there will be a new set of attacks that55400:20:28,799-->00:20:30,799
come with that. And society and the55500:20:30,799-->00:20:33,120
technology will have to evolve and learn55600:20:33,120-->00:20:34,320
how we're going to mitigate things that55700:20:34,320-->00:20:36,159
we can't even really imagine yet. Uh, as55800:20:36,159-->00:20:37,360
people start doing more and more work55900:20:37,360-->00:20:39,679
this way. Before I wrap up, should we56000:20:39,679-->00:20:41,840
check in on some of the tasks you kicked56100:20:41,840-->00:20:42,080
off?56200:20:42,080-->00:20:46,159
Yeah, let's do it. Um, okay. So, I am56300:20:46,159-->00:20:48,240
going to open a new tab and make sure56400:20:48,240-->00:20:51,840
that we can see the progress of our um,56500:20:51,840-->00:20:55,679
stickers as well. Okay. Let's see. All56600:20:55,679-->00:20:58,159
right. So, sounds like stickers are56700:20:58,159-->00:21:00,880
ready. Let me see what it actually Okay.56800:21:00,880-->00:21:03,200
So, cool thing. This is sort of the end56900:21:03,200-->00:21:06,720
end result of the took about 7 minutes.57000:21:06,720-->00:21:08,480
Highly likely figured out everything.57100:21:08,480-->00:21:09,840
We'll go back and look at the trajectory57200:21:09,840-->00:21:11,679
and see how it did. But at the end57300:21:11,679-->00:21:13,679
result, it looks like it's added to the57400:21:13,679-->00:21:15,360
cart. This is the subtotal. I can just57500:21:15,360-->00:21:17,360
go ahead and look at it and then figure57600:21:17,360-->00:21:20,000
out uh I can just take over at this57700:21:20,000-->00:21:21,600
point as Casey said to enter my credit57800:21:21,600-->00:21:23,039
card information and then place the57900:21:23,039-->00:21:25,200
order really quickly. model is asking58000:21:25,200-->00:21:27,120for confirmations, etc. as it's supposed58100:21:27,120-->00:21:29,280
to do. Let's just quickly browse through58200:21:29,280-->00:21:31,039
the trajectory and see what it actually58300:21:31,039-->00:21:33,280
did. Oh, it looks like it generated some58400:21:33,280-->00:21:35,840
stickers. Oh, look at that. That's what58500:21:35,840-->00:21:38,880
it generated sticker. Cool. So, yeah,58600:21:38,880-->00:21:40,640
that's the task. I think I can at this58700:21:40,640-->00:21:42,559
point finish up by myself or I can ask58800:21:42,559-->00:21:43,919
the model to actually go ahead and do it58900:21:43,919-->00:21:46,720for me as well. Let's check on the59000:21:46,720-->00:21:49,840
wedding. Okay, great. Looks like it just59100:21:49,840-->00:21:52,720
finished in the nick of time. Uh, okay,59200:21:52,720-->00:21:55,520
cool. So in this case, as as we said, we59300:21:55,520-->00:21:57,840
were looking for hotel, stress, uh59400:21:57,840-->00:22:01,919
suits, and also shoes. So it's come out59500:22:01,919-->00:22:03,520
with a pretty comprehensive report. It59600:22:03,520-->00:22:05,840
looks like wedding venue, date, when it59700:22:05,840-->00:22:10,240
is with the Zilla links, dress codes. It59800:22:10,240-->00:22:11,600
figured out like what the suit59900:22:11,600-->00:22:12,960
recommendation should be, where you can60000:22:12,960-->00:22:14,799
buy. Now I can go ahead and buy myself60100:22:14,799-->00:22:17,120
or I can ask the agent to go and buy for60200:22:17,120-->00:22:20,960
me. Um also figured out footwear hurdle60300:22:20,960-->00:22:23,360
options. It actually looked through all60400:22:23,360-->00:22:27,120
the oop sorry it looked through all the60500:22:27,120-->00:22:29,360
availability. You can see actually it60600:22:29,360-->00:22:31,440
gives screenshots of what it checked. In60700:22:31,440-->00:22:33,120
this case we use booking.com and it's60800:22:33,120-->00:22:35,280
able to do that. Also has gift60900:22:35,280-->00:22:37,360
suggestions etc. And next step I can ask61000:22:37,360-->00:22:39,760
it as you said the agent says hey if you61100:22:39,760-->00:22:41,520
need assistance purchasing any item or61200:22:41,520-->00:22:42,960
have any further adjustments let me know61300:22:42,960-->00:22:44,880
so we can do that. Um, and I want to61400:22:44,880-->00:22:46,320
show one last demo which we didn't61500:22:46,320-->00:22:48,640
really run live but I think it's really61600:22:48,640-->00:22:51,280
cool and especially because the folks61700:22:51,280-->00:22:52,880
who are getting married are really into61800:22:52,880-->00:22:57,679
MLB. U so we asked the agent uh to go61900:22:57,679-->00:22:59,679
and build an optimal itinary for62000:22:59,679-->00:23:02,640
visiting all 30 MLB stadiums in just in62100:23:02,640-->00:23:05,200case you're thinking of a satical uh and62200:23:05,200-->00:23:08,159
then design the optimal route prioritize62300:23:08,159-->00:23:10,960
Hello Kitty nights and whatnot and62400:23:10,960-->00:23:12,400
present a final plan as a detailed62500:23:12,400-->00:23:13,520
spreadsheet. I'll really quickly run62600:23:13,520-->00:23:15,440
through this. Um I think it's just so62700:23:15,440-->00:23:18,240
fun to see. So again like as we have62800:23:18,240-->00:23:20,720
thrown shown throughout the the live62900:23:20,720-->00:23:23,919
stream it uses a multitude of tools uses63000:23:23,919-->00:23:26,240
container the terminal use using the63100:23:26,240-->00:23:28,799
browser working through all the details.63200:23:28,799-->00:23:30,400
It'll probably use again back to the63300:23:30,400-->00:23:33,200
browser figuring out Hello Kitty nights63400:23:33,200-->00:23:36,559
and then sports stadium and whatnot. Oh63500:23:36,559-->00:23:39,520
let's see did I miss the Oh go map.63600:23:39,520-->00:23:42,080
building a map using code to actually63700:23:42,080-->00:23:43,919
build it out and then overall we get63800:23:43,919-->00:23:46,159
like a pretty solid result I think at63900:23:46,159-->00:23:48,880
the end takes 25 minutes to work where64000:23:48,880-->00:23:50,400
does the season start and what not you64100:23:50,400-->00:23:51,919
have a spreadsheet that you can quickly64200:23:51,919-->00:23:55,760
view inside just right inside Chad GBD64300:23:55,760-->00:23:57,919
you can map the journey cool looking map64400:23:57,919-->00:24:00,400
I guess and that's it so this is Chad64500:24:00,400-->00:24:02,240
GBD agent we hope you really like it and64600:24:02,240-->00:24:04,000
over to Sam64700:24:04,000-->00:24:05,919
amazing work all of you and and to your64800:24:05,919-->00:24:07,440
teams this is I think uh really64900:24:07,440-->00:24:08,720
something that's going to help people65000:24:08,720-->00:24:10,720
get worked done uh and have more time to65100:24:10,720-->00:24:12,240do the things they want to do. Um I65200:24:12,240-->00:24:13,520
think it's it's really amazing how much65300:24:13,520-->00:24:15,360
you've brought together to deliver this65400:24:15,360-->00:24:17,760
experience and watching the agent sort65500:24:17,760-->00:24:19,120
of use the internet, make these65600:24:19,120-->00:24:20,640
spreadsheets, make PowerPoints, whatever65700:24:20,640-->00:24:22,960else uh and do all this work is is quite65800:24:22,960-->00:24:26,000
amazing. We're going live today for pro65900:24:26,000-->00:24:28,880
plus and team users. Pro users will get66000:24:28,880-->00:24:30,720
uh 400 queries a month plus some team66100:24:30,720-->00:24:32,720
users will get 40 a month. Uh the66200:24:32,720-->00:24:34,000
rollout should be finished by the end of66300:24:34,000-->00:24:36,159
the day for pro and very soon for plus66400:24:36,159-->00:24:38,400
and team users. will try to be live for66500:24:38,400-->00:24:40,799
enterprise and edu by the end of this66600:24:40,799-->00:24:43,360
month. As Casey mentioned, although this66700:24:43,360-->00:24:45,360
is an extremely exciting new technology,66800:24:45,360-->00:24:48,080
there are new risks. Uh people learned66900:24:48,080-->00:24:49,520
how to use the internet generally pretty67000:24:49,520-->00:24:50,880
safely, although of course there are67100:24:50,880-->00:24:52,880
still scammers and other attacks. People67200:24:52,880-->00:24:54,559
are going to need to learn to use AI67300:24:54,559-->00:24:56,080
agents. Uh and societyy's going to need67400:24:56,080-->00:24:57,919
to learn to build up defenses against67500:24:57,919-->00:25:00,080
attacks on AI agents as well. So we're67600:25:00,080-->00:25:02,080
starting with a very robust system, lots67700:25:02,080-->00:25:04,240
of warnings. We will relax that over67800:25:04,240-->00:25:05,679
time as people get more comfortable with67900:25:05,679-->00:25:07,600
it. But we do want people to treat this68000:25:07,600-->00:25:09,919
as a new technology and a new risk68100:25:09,919-->00:25:12,080
surface and use all of the caution that68200:25:12,080-->00:25:14,799
Casey talked about. Um, but that said,68300:25:14,799-->00:25:16,720
we hope you'll love it. Uh, this is68400:25:16,720-->00:25:18,159
still very early. We will improve it68500:25:18,159-->00:25:20,640
rapidly and we're excited to see where68600:25:20,640-->00:25:22,640
it all goes. So, congrats again. Thank68700:25:22,640-->00:25:26,440
you very much. Hope you enjoy.