type
status
date
slug
summary
tags
category
password
icon
Author
Abstract
Welcome to my latest notes on Grok 3. In this blog post, I'll share my observations and highlight some fascinating test cases comparing Grok 3 with deepseek-r1 and o3-mini.

Information

XAI has introduced Grok 3 with two beta reasoning models: Grok 3 (Think) and Grok 3 mini(Think). These models were trained using reinforcement learning (RL) at an unprecedented scale, refining their chain-of-thought processes to enable advanced, data-efficient reasoning.
Below is a benchmark graph showing Grok 3's thinking model performance:
notion image
 
For the general model, Grok 3 with a context window of 1 million tokens also demonstrates very impressive performance. Here it is:
notion image

Interesting Test Cases

Dave W Plummer conducted a fascinating Breakout test with Grok 3. Here are the results
 
The initial prompt was simple: "How about a colored version of Breakout?" The first revision requested, "Make the player move automatically under computer control, and make the ball go 10% faster each time it bounces off the paddle." The final revision addressed a gameplay issue: "Good, but the ball can get stuck in a vertical bounce. How did the original game handle that? Do the same! And make the player aim for remaining bricks."
For detailed information, you can check here: Breakout by Grok3
Theo-t3.gg shows Grok 3 is not great at coding. Here is his demonstration case:
 
Alex Prompter tested Grok 3 and DeepSeek v3 with the same critical prompts. His extensive comparison tests revealed multiple insights. For more details, see: Grok 3 VS. DeepSeek V3
Andrej Karpathy conducted a thorough comparison between Grok 3, OpenAI's o1-pro, and DeepSeek-R1. His tests showed Grok 3's strong performance in reasoning tasks, such as Settlers of Catan board generation and GPT-2 training flop estimation. However, the model struggled with complex spatial tasks, particularly generating accurate SVG images of a pelican riding a bicycle. For the complete analysis, see: Grok 3 test by Andrej Karpathy
 
<ins/>
The First Pages of 2025 - My January & February StoryFeb 8, Notes on Policy Gradient
Loading...
Chengsheng Deng
Chengsheng Deng
Chengsheng Deng
Latest posts
Mar 24 Notes on LightRAG
Mar 24, 2025
Dec 6, Some Tests on o1
Mar 14, 2025
Mar 10, Note on BIG-MATH
Mar 10, 2025
Mar 6, Note on QwQ-32B
Mar 6, 2025
Jan 21, Notes on DeepSeek-R1
Mar 6, 2025
The First Pages of 2025 - My January & February Story
Mar 5, 2025
Announcement
🎉Welcome to my blog🎉 
To find me:
Twitter/X:My X
👏Have fun in my blog👏