How Copilot actually answers: grounding, the semantic index, and why it sometimes lies

Ask ten Copilot users how it works and nine will say some version of “it’s ChatGPT trained on our company data.” That belief is wrong in a way that costs them every single day — because it leads to prompts that fail for reasons they can’t see.

Here’s the model that actually predicts Copilot’s behavior.

The pipeline nobody explains

When you send a prompt to Microsoft 365 Copilot, roughly this happens:

Your prompt is interpreted — Copilot decides what kind of question this is and which tools or sources it needs (your files? mail? calendar? the web?).
Retrieval runs against the semantic index — a search layer Microsoft builds over your Microsoft Graph content: files, mail, meetings, Teams messages, People data. This is search, not memory. It returns a limited set of relevant chunks, security-trimmed to what you can access.
The model reasons over your prompt + those chunks — and only those. The LLM itself has never seen your tenant. It sees a question and a small briefing pack assembled seconds ago.
A response is generated and post-processed — citations attached, compliance filters applied.

The single most important consequence: Copilot’s answer quality is capped by what retrieval found. The model can’t reason over a document that didn’t make it into the briefing pack. When Copilot misses something obvious, it’s almost never because the model is dumb — it’s because retrieval didn’t surface the right chunk.

Every great Copilot prompt is secretly two instructions: what to find and what to do with it. Most people only write the second half.

Why this explains the weird stuff

Why Copilot nails one question about a document and whiffs the next. A 60-page document is split into chunks. Retrieval pulls the chunks that look most relevant to your wording. Ask about a topic using the document’s own vocabulary and the right chunk comes back. Paraphrase loosely and it may not. This is why rewording a “failed” question often fixes it — you changed the retrieval, not the reasoning.

Why attaching or referencing a file beats describing it. When you reference a file directly (the / picker in Word or Copilot Chat), you’re bypassing the retrieval lottery and guaranteeing the source is in the pack. This is the single biggest reliability upgrade available, and it’s free.

Why Copilot “lies.” When retrieval returns weak or partial material, the model still tries to be helpful — it generalizes, fills gaps, and rounds off uncertainty. The output sounds equally confident either way. Hallucination in M365 Copilot is overwhelmingly a retrieval-starvation symptom. The fix isn’t “trust nothing”; it’s learning to spot answers with thin citations.

Why answers differ between people. Retrieval is security-trimmed. Your colleague’s identical prompt runs against a different permission scope. If they can see the finance site and you can’t, you get different answers — silently. Copilot will never say “there’s a better document you don’t have access to.”

Why fresh content is sometimes invisible. New or just-edited content takes time to be indexed. The document you saved four minutes ago may not be retrievable yet — reference it explicitly instead of expecting Copilot to find it.

The four levers this hands you

1. Speak the document’s language

Retrieval matches your words against content. If your org calls it the “FY26 Growth Framework,” asking about “next year’s strategy plan” is rolling dice. Use the proper nouns your documents use — project names, document titles, people. Specificity isn’t pedantry; it’s retrieval targeting.

2. Pin sources instead of hoping

Reference files, people, and meetings explicitly whenever the answer must come from a known place. “Summarize the risks in /Q3-Vendor-Assessment” is deterministic. “What are our vendor risks?” is probabilistic. Know which mode you’re in, and choose it deliberately.

3. Demand receipts

Add “quote the source and link the document for each claim” to anything that matters. This does two things: it biases Copilot toward retrieved content over generalization, and it makes thin retrieval visible — an answer with one weak citation is telling you retrieval starved, so rephrase and rerun.

4. Fix the library, not just the prompt

If Copilot keeps answering from a stale document, no prompt will save you — retrieval is faithfully returning the garbage that’s there. Duplicate files, draft copies, retired policies that still rank: this is why “Copilot is wrong” tickets are usually content-hygiene tickets. (This matters 10x more for agents — see the knowledge-source audit prompt in the Vault before you point an agent at any SharePoint site.)

Web grounding is a different animal

Copilot Chat can also ground on web search. Two things to know:

Work and web are separate grounding modes. When you need tenant data, make sure you’re in work grounding; a surprising number of “Copilot can’t find my file” complaints are people asking the web-grounded mode.
Web grounding answers from search results — same retrieval-cap logic, different corpus. The same “demand receipts” lever applies.

The 30-second self-test

You now know more about Copilot’s behavior than most rollout leads. Test yourself — why do these fail?

Failing prompt	Real cause
”Summarize our remote work policy” returns a 2022 version	Retrieval ranks the stale doc; the library needs hygiene, not a better prompt
”What did Sarah promise in yesterday’s meeting?” comes back empty	Transcription may have been off, or the recap isn’t indexed yet — check the meeting artifact exists
”Analyze our sales data” gives generic advice	Nothing pinned: retrieval had no idea which data, so the model generalized
Your teammate gets a richer answer to the same question	Security trimming — different permissions, different briefing pack

Once you see Copilot as a reasoning engine fed by a search engine, everything else on this site is just learning to feed it well.