What is a Context Window?

A context window is the maximum amount of text — measured in tokens — that an AI language model can process in a single interaction. It includes everything: your system prompt, the conversation history, any documents or data you paste in, and the model's own response. Think of it as the model's working memory. Once you exceed the context window, the model either refuses the request, silently truncates older content, or degrades in quality as important context gets pushed out. Understanding context windows is essential for designing prompts that work reliably, especially when working with long documents, multi-turn conversations, or complex instructions.

Context window sizes vary dramatically across models. GPT-4o supports 128K tokens (roughly 96,000 words or a 300-page book). Claude offers models with up to 200K tokens, and some configurations extend to 1M tokens. Gemini 1.5 Pro supports up to 2M tokens. However, bigger is not always better — models tend to perform best when the most relevant information is positioned at the beginning or end of the context (the "lost in the middle" problem). A 200K context window does not mean you should always fill it. Focused, well-structured context with only the most relevant information typically outperforms dumping everything into a massive prompt.

Strategies for working within context limits include chunking long documents and processing them in parts, using summarization to compress prior conversation history, prioritizing the most relevant sections of source material, and leveraging RAG to dynamically retrieve only what is needed. For multi-turn conversations, be aware that every message in the history consumes tokens — long chats eventually push out your original instructions. Reset or summarize periodically. Use a token calculator to measure exactly how much context you are consuming and plan your prompts accordingly.