If you’ve been following along with my earlier Sebastian post, you know I’ve been rebuilding our family AI assistant on a different foundation: Open WebUI + Ollama instead of our previous OpenClaw setup. The new stack has been running great, but there was one capability I knew I was going to miss: the ability to do genuinely deep research. Not “search and summarize the first result” deep, but “read twenty sources, synthesize them, cite everything” deep. This week I finally added it.
The Problem with Regular Web Search
The web_search tool Sebastian already had was good for quick lookups: current news, recent prices, who said what. But when someone in the family asked something like “compare these three database options for our project” or “what’s the current state of the art for X and what should we actually use?” the kind of question where you need to actually read and cross-reference sources rather than just fetch a snippet, it fell short. You’d get a confident-sounding answer backed by one or two pages, when what you really needed was something closer to a research brief.
That gap was the motivation. I wanted a second research mode: slower, deeper, with citations.
Choosing GPT-Researcher
I looked at a few options. I’d been reading about Alibaba’s Tongyi DeepResearch, and I had a previous experiment with an AB-MCTS pipeline that did multi-step research planning. Both were interesting but either too tightly coupled to specific APIs or too much to build from scratch for what I needed right now.
GPT-Researcher turned out to be the right call. It’s Apache-2.0 licensed, it has a clean programmatic API (you give it a query, it plans subtasks, runs parallel searches, synthesizes a report), and critically, it’s configurable enough that I could point it at my own Ollama cloud models instead of OpenAI. It’s not perfect, but it’s battle-tested and I didn’t need to build the hard parts myself.
The philosophy here is the same as the rest of this stack: depend on it, don’t fork it. GPT-Researcher stays in its own Docker container with its own dependencies; our code just calls its API.
The Architecture
The design has three layers:
- research-service: a Docker container running GPT-Researcher behind a tiny FastAPI wrapper. It’s localhost-only (bound to
127.0.0.1:8001), requires a bearer token, and exposes one real endpoint:POST /research. The container reads API keys from a root-owned file at/etc/owui-research/research.envthat never touches the Samba share or the repo. - project-service endpoint: our existing FastAPI service got a
/deep_researchendpoint. This is where the folder-awareness lives. It checks whether the chat is inside an OWUI Folder, calls the research-service, then either (a) writes the full cited report into the project’sresearch/directory and git-commits it, returning a summary to chat, or (b) returns the full report inline if there’s no folder context. - OWUI tool: the
deep_researchtool is what the model actually sees. It looks like any other tool call from Sebastian’s perspective: pass a query, get a result. The folder detection happens automatically via the OWUI database, same as the coder and project tools already do.
The LLM brain is Kimi K2 (via Ollama cloud) for the heavy reasoning, MiniMax M3 for fast subtask synthesis, and nomic-embed-text for local semantic deduplication. Search is Tavily as the primary provider, with Brave Search API as a fallback if Tavily quota runs out. Both keys live exclusively in that root-owned env file.
The Folder-Aware Design
This part I’m particularly happy with because it came out of something that was already working. Every chat in Sebastian lives inside an OWUI Folder that maps to a project directory. The coder tool already auto-detects which project you’re working in by reading the OWUI database: chat_id to folder_id to folder name to project path. The deep research tool reuses the exact same logic.
So in practice: if you’re in a chat inside your “home-automation” folder and ask Sebastian to research something, the full multi-thousand-word cited report lands in /projects/YourName/home-automation/research/2026-06-09-your-query-slug.md and gets auto-committed to git. Sebastian tells you the file path and summarizes the first section. If you’re just in a plain chat with no folder, the report comes back inline instead. No configuration, no “which project?” prompting. It just does the right thing.
The Routing Rule
I also updated the operating instructions that get injected into every chat. The distinction is now explicit:
Quick facts, current events, recent news go to
web_search. Deep, open-ended research questions, technology comparisons, and project-planning briefs that need citations go todeep_research. Never use deep_research for simple lookups.
In practice this means Sebastian picks the right tool without being asked. Ask about today’s news and it uses web_search. Ask “what are the tradeoffs between approach A and approach B for our use case” inside a project folder and it fires off deep_research, disappears for a few minutes, and comes back with a document.
What a Research Run Actually Looks Like
A typical run on “what are the best practices for SQLite in production applications?” produced an 8,500-word structured report covering WAL mode, connection pooling, backup strategies, indexing patterns, and more, with section headers, bullet points, and a sources list. It took about four minutes and cost maybe a cent in Tavily API credits.
The reports aren’t perfect. The source tracking is a known limitation of how GPT-Researcher works with non-OpenAI backends: the content is synthesized correctly but the URL list sometimes comes back empty. It’s on the roadmap to investigate. But the reports themselves are substantive and genuinely useful for planning, which is the whole point.
Open Source and Keys
The whole stack is in the owui-agent-stack repo under AGPL-3.0. The research-service component is fully self-contained in research-service/: you bring your own Tavily and Brave keys, drop them in a host-side env file, and run docker compose up -d --build. Full setup instructions are in docs/owui-deep-research.md.
We depend on GPT-Researcher (Apache-2.0) as a library. We don’t fork it, we don’t vendor it, we just run it in a container and call its API. That felt like the right way to handle it: get the benefit of their work, don’t create a maintenance burden of keeping a fork in sync, and give them full credit.
What’s Next
The source URL tracking issue is worth digging into. It’s probably solvable with a configuration change in how GPT-Researcher stores context when using Ollama as the backend. Beyond that, I want to experiment with the report types (GPT-Researcher supports multi-agent reports, outline reports, and detailed reports) and see if any of them suit our use case better for longer planning sessions.
But honestly, even in this initial state, having a “go think about this seriously” option in addition to the “quick search” option has already changed how we use Sebastian. That was the goal.



