Has GPT-4 been dethroned?


Claude 3 has entered the chat

Monday, Anthropic AI made an announcement many thought improbable: they'd shipped a model that outperforms GPT-4 (the model you get when you pay for ChatGPT Plus).

GPT-4 has been the undisputed heavyweight champion of large language models (LLMs) for nearly a year.

The Anthropic team is understandably proud, and they had a funny moment when testing the model...

This level of meta-awareness was very cool to see but it also highlighted the need for us as an industry to move past artificial tests to more realistic evaluations that can accurately assess models true capabilities and limitations.

The consensus around power users is: this is an excellent LLM.

twitter profile avatar
Ethan Mollick
Twitter Logo
@emollick
Claude 3 does a good job with the needle-in-a-Great-Gatsby test, where I load the entire text of the novel with a couple alterations into the context window. Much better than Claude 2.1 (no hallucinations!), not quite as good as Gemini (not quite as insightful about content). https://twitter.com/emollick/status/1760142889642852729
photo
twitter profile avatar
Ethan Mollick
@emollick
Gemini Pro 1.5 with 1M token window vs. Claude 2.1 with 100k vs. OpenAI GPT-4 with RAG I uploaded the Great Gatsby with 2 alterations (mentioning an "iphone-in-a-box" and a laser lawnmower) Gemini nails it (& finds one more thing). Claude does but hallucinates. RAG doesn't work
6:53 PM • Mar 6, 2024
20
Retweets
169
Likes

Balaji Srinivasen even uploaded Morgan Stanley's financial statements and found Claude to be a shrewd financial analyst.

Spoiler alert: Claude was not surprised by Morgan's recent stock price decline.

twitter profile avatar
Balaji
Twitter Logo
@balajis
I asked Claude, the Finance God. Is Morgan in the red? Here's what it said. FROM CLAUDE ---------------- Based on the financial information provided in the images, there are several key observations and potential concerns regarding Morgan Stanley's financial position and risk… https://twitter.com/i/web/status/1765685404794290631 https://twitter.com/DarioCpx/status/1765560703434498464
twitter profile avatar
JustDario 🏊‍♂️
@DarioCpx
#JustDarioDaily 🚨MORGAN STANLEY - BIG BALANCE SHEET LOSSES HIDDEN BEHIND EXOTIC DERIVATIVES CURTAINS (AGAIN)? 😳🚨 When you've spent enough time trading inside a bank and watching markets for hours a day sometimes you're able to spot if something suddenly starts behaving… https://twitter.com/i/web/status/1765560703434498464 https://twitter.com/dariocpx/status/1765388587280060434
5:26 AM • Mar 7, 2024
264
Retweets
1658
Likes

Do your technical reading with AI assistance

In one of my own tests, I uploaded a selection from the Medicare Inpatient Prospective Payment System Final Rule (a dense document full of circular references and legalese).

After a minute or so of reading, Claude was able to answer questions in plain English.

AI has become an essential tool for any technical reading I do, and Claude performs admirably.

With a context window (read: short term memory) of 200,000 tokens (roughly 150,000 words), you can upload extremely long documents and still get good answers.

Bottom Line

You can try Claude for free at claude.ai. For $20 / month, you get chat access similar to ChatGPT Plus

  • Sadly, Claude does not yet have tools like web browsing and Code Interpreter

If you're pondering the question 'Which LLM should I bet on for my team?' I recommend reading this excellent writeup:

Your guide to Google Gemini and Claude 3.0, compared to ChatGPT - Taren SK, AI Impact Lab

And finally, if you try Claude for yourself, hit 'Reply' and let me know what you notice. I read every email.

Until next time,
Adam


Whenever you're ready, here are 2 ways I can help you:

[Individual] One-off AI Coaching Call: Get unstuck! Solve a problem or address an opportunity with AI assistance

[Team] AI Accelerator Program: An interactive workshop series for teams. Accelerate your team’s AI adoption journey and start reaping the benefits of AI more fully

Adam Lorton

I help executives and their teams combine the power of AI with the principles of Deep Work to - Get unstuck - Move faster - Deliver excellent experiences for customers Subscribe for prompts, case studies, and stories in your inbox weekly!

Read more from Adam Lorton

GPT-4ohmygosh Howdy! Yesterday, along with 2.1 million other AI-heads, I tuned in to see OpenAI's latest product announcement. Here's the ultra-quick summary: The new model is called 'GPT-4o' where "o" stands for "omnimodel" It's free to the public It's still smart and faster than ever It has more expressive voices OpenAI is releasing a desktop app to go with it "A desktop app?" you ask. "Is that really news?" It's big news, and I'll tell you why 👇 (3 minute video) OpenAI announces GPT-4o If...

Teach you, I will So you've learned enough about AI tools to be dangerous, and now you want to pay it forward. You want to pass that knowledge on and help someone else. Commendable! (as my good friend ChatGPT would say) A Jedi teaches a young Padawan - Adam Lorton & Midjourney Not so fast. Have you ever been bruised, demotivated, or deflated by an abrasive, condescending teacher? Of course you have. We all have. (If we're being honest, most of us have BEEN the abrasive teacher at one point or...

Interesting Observations on LLM Introspection Last week, I ran a ChatGPT training with people who do executive search for a living. As part of the training, we asked ChatGPT to help us develop a process to evaluate a candidate's fit for a role. One thing we asked for was weights -- how much should the interview be worth? How much should assessments be worth? Work sample? Recruiter's notes? This is what ChatGPT suggested: Interview transcript = 40% Recruiter's notes = 25% Resume / CV review =...