Use a model whose drafts need the fewest style edits. The best writing pick is the one you rewrite least.
CheckAsk for three versions and choose the one with the strongest default taste.
Stop overthinking it. Match the shape of the job to the shape of the model.
Good questions to ask first:
Skip the quiz. Six common asks, six ways to narrow the field.
Use a model whose drafts need the fewest style edits. The best writing pick is the one you rewrite least.
CheckAsk for three versions and choose the one with the strongest default taste.
Coding work needs context and verification more than a clever single answer. Favor models and apps that can inspect files, edit, and run tests.
CheckGive it a small real bug and see whether it changes the right file.
Reasoning models are worth the extra latency when each mistake is expensive and the answer has multiple dependent steps.
CheckAsk for assumptions, not just the final answer.
The model cannot reason over pages it never saw. Context window matters most when the source is long and cross-referenced.
CheckAsk it to cite where in the source each claim came from.
For tagging, extraction, routing, and simple Q&A, the right answer is usually the cheapest model that clears your eval.
CheckRun a 50-item sample before paying flagship prices.
Open-weight models trade managed polish for privacy, offline use, customization, and predictable infrastructure costs.
CheckRead the license before building a business on it.
Don't see your ask? Scroll to the full directory and compare the traits underneath the recommendation.
Why these picks are these picks. Eight terms that quietly decide every choice.
working memory
Context window
Everything you paste โ prompt, attachments, prior chat โ has to fit. Bigger window means more code or more pages of a contract before you start chunking.
Use bigger windows for whole repos, long contracts, or many attached PDFs. Use smaller ones when the prompt is short and latency matters.
thinks before it answers
Reasoning model
A second class of model that runs an internal "scratchpad" before responding. Slower and pricier per query, but lands harder problems โ math, multi-step logic, code refactors.
Best for hard math, planning, and codebase refactors. Overkill for rewriting a paragraph or classifying a support ticket.
handles more than text
Multimodal
Reads images, audio, video, or PDFs as input โ not just text. Native multimodal models trained on all four feel sharper than ones bolted onto text-only bases.
Choose this when the task includes screenshots, charts, PDFs, audio, or video. Text-only work does not need the extra surface area.
the unit you pay by
Tokens
Sub-words the model reads and writes โ roughly 0.75 word per token. Pricing is "$X per million input tokens / $Y per million output." Output is usually 3โ5ร more expensive than input.
Long inputs and long outputs both cost money. Summaries are cheap; repeated analysis loops over giant files are where bills grow.
fast vs. flagship
Tier (Fast ยท Mid ยท Flagship)
Each lab ships a tiered family. Fast tiers are 5โ20ร cheaper and answer in <1s โ built for production pipelines. Flagship tiers cost more, think harder, and only make sense when quality matters more than latency.
Start with the cheapest tier that can do the job, then move up only when you see quality failures you can name.
can you run it yourself
Open vs. closed weights
Open-weight models publish the trained parameters โ you can self-host, fine-tune, or run offline. Closed-weight models live behind an API only. Open trades some quality for full control and zero per-call cost.
Use open weights for privacy, offline use, fine-tuning, or predictable unit economics. Use hosted APIs when managed quality matters more.
can it call other things
Tool use / agents
Whether the model can call APIs, run code, search the web, or read files mid-conversation โ instead of just generating text. The foundation under every "agent" headline.
Tool access is what turns a chat answer into a workflow: search, fetch, calculate, write, verify, repeat.
how recent its world is
Knowledge cutoff
The date training data ends. Anything after โ a release, a news event, a library version โ the model has to be told or look up. Models with web search smooth this over; pure-LLM answers don't.
For current events, prices, docs, or laws, use a model with retrieval or bring the sources yourself. Guessing from memory is the trap.
Match the term to the task. The Quick Picks above are shorthand โ these are the dials underneath them.
A deliberately stable snapshot as of April 2026. The labels are family-level on purpose: exact release names move faster than this page should.
Showing 15 models, sorted by their strongest practical signal.
Anthropic
The flagship shape. Use it when the answer has to be deeply considered.
DeepSeek
Open-source reasoning powerhouse. Shockingly cheap.
Midjourney
The artist. Don't ask it to write code.
OpenAI
The heavyweight thinker. Bring your hard problems.
Anthropic
The sweet spot. Smart, fast, and surprisingly affordable.
Black Forest Labs
The new hotness in image gen. Punches up.
The context monster. Feed it your whole codebase.
OpenAI
The smart cheap one. Reasoning without the flagship price tag.
OpenAI
Easy image gen inside the OpenAI world.
OpenAI
The all-around OpenAI pick. Strong ecosystem, strong tool use.
Meta
The open-weight workhorse. Good when control matters.
Fast, cheap, and surprisingly capable.
Mistral
The European contender. Solid all-around.
Anthropic
Claude's fast little sibling. Great bang for buck.
OpenAI
Cheap, fast, and good enough for most things.
Models retire and new ones land. Spot something missing? let me know.