Don't Just Read the Paper, Implement It
Paperman converts bleeding-edge papers into hands-on coding challenges you can solve in your browser.
arXiv:1706.03762
Attention Is All You Need
Vaswani et al., 2017
Abstract
Attention(Q,K,V) = softmax(QKT/√dk)V
PaperMan
Attention Is All You Need
Implement the scaled dot-product attention:
def attention(Q, K, V):
# Your code here
scores = Q @ K.T / √d_k
3/5 tests passing
Used by engineers from
Google
Meta
OpenAI
DeepMind
Anthropic
NVIDIA
Microsoft
Apple
Google
Meta
OpenAI
DeepMind
Anthropic
NVIDIA
Microsoft
Apple
How it Works
HOW PAPERMAN WORKS
1. Select a Paper
Choose from a curated list of top AI research papers.
📄
📄
📄
2. Understand the Math
Grasp the underlying concepts with interactive explanations.
∂L/∂w = -2Σ(yᵢ - ŷᵢ)xᵢ
Attention(Q,K,V) = softmax(QKᵀ/√dₖ)V
σ(z) = 1 / (1 + e⁻ᶻ)
Interactive
Step-by-step
3. Implement the Code
Build the model function by function in our live coding environment.
def
attention(Q, K, V):
# Your implementation
scores = Q @ K.T
weights = softmax(scores)
return weights @ V
✓ 3/3 tests passed
