The Truth / scorecard

Researcher: the best thing Microsoft has shipped since meeting recap

"Real multi-step research with citations across web and tenant — if you can wait the minutes it takes and answer its questions like an adult."

9/10

What's genuinely good

  • Genuine multi-step research, not search-and-summarize: it plans, digs, cross-references, and revises before writing
  • Web and work grounding in one report — the only mainstream tool that cites both your tenant's documents and the open web side by side
  • Citations on every claim, structured into a report you can actually forward
  • Included with the M365 Copilot license and callable from workflows via the M365 Copilot node — research as a building block, not just a chat trick

What sucks

  • Slow by design — minutes, not seconds — and a measurable share of users quit the tab before it finishes
  • It asks clarifying questions, people answer them lazily, and the report quality dies right there — garbage scope in, verbose mush out
  • House-formatted and verbose: every report comes out in the same consultant register whether you asked for one or not
  • No usage guardrails: it will happily spend ten minutes on a question Copilot Chat answers in five seconds, and people let it

The honest assessment

Researcher is the first Copilot feature since meeting recap where the honest review and the marketing slide say the same thing. Give it a real research question — “build me a brief on this company’s last six months and our history with them” — and it does what a competent junior analyst does: plans the work, runs multiple rounds of digging across the web and your tenant, cross-checks, and hands back a structured report with citations on the claims. That combination — web plus your own mail, files, and chats, in one cited document — exists nowhere else in mainstream software.

The score is a 9, not a 10, because Researcher has one structural enemy: time. It is deliberately slower than chat, and the entire product bets that users will trade minutes for quality. Many won’t. Watch a rollout and you’ll see the same failure twice: people fire a question, watch the spinner for ninety seconds, conclude it’s broken, and go back to chat. And before that, the clarifying questions — Researcher’s best feature on paper — get answered with “whatever you think” by users trained on instant chat. A lazy answer to a scoping question doesn’t degrade the output a little; it wrecks it, because every subsequent minute of research is aimed at the wrong target.

The reports themselves are good but house-styled: a confident, consultant-flavored register with more words than your question needed. You can fight it with formatting instructions, but the default verbosity is real, and busy readers notice.

The workarounds that change the score

  1. Answer the clarifying questions like a brief, not a chat. When Researcher asks about scope, give it audience, length, sections, and what to exclude. Sixty seconds of real answers is the highest-ROI minute in the whole product.
  2. Fire and walk away. Researcher is asynchronous work, not conversation. Submit, go do something else, come back to a finished report. Teams that frame it this way (“it’s a research request, not a chat”) keep using it; teams that watch the spinner churn out of it.
  3. Pre-scope in the first prompt. Specify sections, word limit, and source mix up front and the clarifying round gets shorter and the house style gets tamer. This is mandatory in workflows — via the M365 Copilot node nobody is there to answer questions, so the prompt has to carry the whole scope.
  4. Route by question type. One-fact questions go to Copilot Chat; anything you’d assign to a person for an afternoon goes to Researcher. Teach that routing rule explicitly or people burn time using a research engine as a chatbot.

What Microsoft won’t tell you

  • The slowness is the product. Researcher is deliberate about taking minutes because that’s what multi-step verification costs — but nothing in the UI sells the wait, so users read quality as latency failure. Your rollout messaging has to do the job the UI doesn’t.
  • The clarifying-question step is where most bad reports are born, and it will never show up in any diagnostic. The tool did exactly what the lazy answer asked.
  • Workflow mode quietly changes the contract: no clarifying questions, prompt-as-full-spec. Teams that learned Researcher interactively get worse results from the workflow node until someone tells them why.

Bottom line

This is the rare Copilot feature that’s better than its demo. The catch isn’t capability — it’s discipline. Users who scope properly and let it run get analyst-grade cited reports included in a license they already pay for. Users who treat it like fast chat get slow chat, and tell everyone it’s overrated. Train the discipline; the tool already works.

← All scorecards