,

Scientific Data Enrichment Tool for Open WebUI


Want your AI to understand chemistry like a research scientist? Here is an open-source tool that automatically enriches your chemistry and materials science AI conversations with real scientific data from trusted databases.

What is it?

The Scientific Data Enrichment Tool is an Open WebUI extension that acts as a live science reference librarian. When you ask about chemicals, drugs, or materials, it automatically pulls comprehensive data from PubChem, ChEMBL, Materials Project, and RDKit before your LLM even starts generating a response.

No more hallucinated molecular weights. No more guessed chemical properties. Real data, every time (hopefully).

How It Works

Ask: “What happens if I mix ibuprofen with H2SO4?” (Ridiculous, I know)

Behind the scenes:

  1. Tool detects “ibuprofen” (drug name) and “H2SO4” (chemical formula)
  2. Fetches from PubChem: molecular structure, properties, safety data
  3. Queries ChEMBL: drug bioactivity, clinical information
  4. Calculates with RDKit: molecular descriptors, Lipinski rule compliance
  5. Enriches your LLM’s context with all this data
  6. LLM responds with factual, grounded information

All automatic. All transparent.

Key Features

Multi-Database Integration:

  • PubChem (NIH): 100M+ chemical structures and properties
  • ChEMBL (EMBL-EBI): Bioactivity data for 2M+ compounds
  • Materials Project: Computational materials properties database
  • RDKit: On-the-fly molecular descriptor calculations

Smart detection automatically recognizes:

  • Chemical formulas (H2SO4, C6H12O6, etc.)
  • Drug/chemical names (ibuprofen, aspirin, benzene)
  • Materials Project IDs (mp-xxx)

Comprehensive data, each query returns:

  • Molecular properties (weight, formula, structure)
  • Drug bioactivity and clinical data
  • Lipinski rule compliance (drug-likeness)
  • Computational materials properties
  • Safety and handling information

Why This Matters

LLMs are trained on text, but science requires precision. This tool bridges the gap by grounding your AI’s chemistry knowledge in real, verified databases. Perfect for:

  • Students learning chemistry and materials science
  • Researchers doing quick lookups during literature review
  • Educators demonstrating molecular properties
  • Hobbyists exploring chemical reactions safely
  • Anyone who needs accurate scientific information fast

Technical Stack

  • Backend: Python 3.11+, FastAPI
  • Data Sources: PubChem, ChEMBL, Materials Project, RDKit
  • Integration: Open WebUI function/tool system
  • Deployment: Docker (plug-and-play) or local Python

Real-World Example Query: “Tell me about caffeine”

Tool enriches with:

  • IUPAC name: 1,3,7-Trimethylxanthine
  • Molecular formula: C8H10N4O2
  • Molecular weight: 194.19 g/mol
  • Drug classification: CNS stimulant
  • Bioactivity: Adenosine receptor antagonist
  • Lipinski compliance: Yes (drug-like)
  • Safety data: LD50, toxicity warnings

LLM responds with accurate, detailed, scientifically-grounded answers instead of vague generalities or hallucinations.

Get Started GitHub: https://github.com/johnsonfarmsus/scientific-enrichment-tool

  1. Clone the repository
  2. docker-compose up (or install Python dependencies)
  3. Add to Open WebUI as a function/filter
  4. Enable in Chat Controls before your query
  5. Start asking questions

Optional: Add a Materials Project API key for enhanced materials data.

Open Source, Community-Driven and built for students, researchers, and science enthusiasts who want their AI to speak chemistry and material science fluently. Free, open-source, and ready to deploy.

This is a hobby project by a technology enthusiast, the scope and scale have room for improvement and upgrade for those willing to dive in.