In-context learningThe ability that shouldn't exist — learning from examples without updating any weights
Standard machine learning requires training: you show the model examples, run backpropagation, and update the weights. In-context learning (ICL) is different: the model is shown examples inside the prompt itself — as text — and immediately generalises to new examples without any weight updates whatsoever. No backpropagation. No gradient. The model "learns" from context that is just tokens in the input. This emerged from GPT-3 and shocked the research community.
Zero-shot: no examples at allJust ask the model to do something. "Translate this to French." "Summarise this article." "Solve this maths problem." At sufficient scale, models can perform many tasks zero-shot — purely from the instruction and their training. GPT-3 showed this for the first time at scale. ChatGPT's conversational ability is largely zero-shot generalisation from instruction tuning.
Few-shot: examples in the promptProvide 3–10 (input, output) examples before the actual question. The model adapts to the pattern immediately. GPT-3's few-shot results matched or exceeded fine-tuned models on many benchmarks — without updating a single weight. This was the key result that made LLMs practically useful: no task-specific training needed for many tasks.
Chain-of-thought promptingShowing the model how to think, not just what to answer
Wei et al. (2022) discovered that including reasoning steps in few-shot examples dramatically improves performance on complex tasks. Instead of showing (question, answer) pairs, you show (question, step-by-step reasoning, answer) pairs. The model learns to generate its own reasoning chains before answering — and this dramatically improves accuracy on maths, logic, and multi-step problems.
This works because reasoning is text. If the model can predict text well, and reasoning appears in training text, then the model has learned to generate reasoning. The chain-of-thought examples in the prompt simply activate this latent ability. In the discovery era, no training was required — just prompting.
2026Reasoning is now trained, not just prompted
Chain-of-thought is no longer only an emergent trick you elicit with clever prompts. Since late 2024, reasoning has been trained directly: models like OpenAI's o-series, DeepSeek-R1, and Claude's extended thinking are post-trained with reinforcement learning on verifiable rewards (RLVR) — maths and code problems where correctness can be checked automatically — so the model learns to produce long, useful reasoning chains on its own. This also changed the scaling story: alongside scaling pretraining compute, the field now scales inference-time compute — letting a model think longer at answer time buys accuracy, the successor to the pure scaling-law narrative.
Then
Chain-of-thought was elicited by few-shot prompting (2022–23) — a latent ability you activated with examples, no training required.
Now · June 2026
Chain-of-thought is trained directly via RL on verifiable rewards (o-series, DeepSeek-R1, Claude extended thinking). "No training required" describes the discovery era, not current practice.