QBiruni — The Knowledge Engine of Quantum Literature

§ 01 The Problem

The literature is the ground truth. Nobody can read it fast enough.

Every quantum hardware team — at IBM, Google, Alice & Bob, IQM, Quantinuum, and every university lab building superconducting circuits — faces the same bottleneck. Every simulation, every fabrication decision needs to be benchmarked against what has already been measured. And that record lives scattered across thousands of papers in arXiv, Nature, PRX Quantum, APL, and conference proceedings.

Today, this work is done by hand. A PhD student navigates a dozen papers to confirm the expected effect of an oxide thickness. An engineer spends three days extracting T₁ values from different fabrication processes to decide whether a measurement is an outlier. The same extraction is done, rebuilt, lost, and redone — in every team, every quarter, forever.

Generic scientific-QA tools exist — Elicit, Consensus, PaperQA2 — but none understand quantum-hardware ontology, none extract quantitative measurements with provenance, and none cross-validate against a running simulation. The gap is structural. It requires a purpose-built engine.

"We already have a prediction about the effect of increasing the TLS oxide thickness. To validate it, we have to read every paper that has measured it. That is weeks of work. Every time."

— Observed workflow, quantum hardware team

10³

Papers per year
on SC qubits alone

Typical time to
review one mechanism

Structured extraction
tools for the field

§ 02 Introducing QBiruni

A calibrated memory for the quantum hardware field.

Named after Abu Rayhan al-Biruni — medieval scholar, 973–1048 — who spent his life going to the primary source of everything: learning Sanskrit to read Indian science directly, measuring the Earth's circumference himself, cross-checking every claim against every other source available. QBiruni is that cognitive act, automated at scale.

QBiruni

by QAIREON · Second Model

Literature Synthesis · Cross-Validation

QBiruni is a continuously-updated knowledge engine for the quantum hardware field. It ingests the published literature, extracts structured measurements with paragraph-level provenance, and answers a single question with rigor:

"Given this QJosephson simulation output, what does the published record say?"

The output is never a single number. It is a distribution, a list of supporting papers, a list of conflicts, and an explicit verdict — supports, contradicts, or inconclusive — with every claim traceable to the exact paragraph it came from.

⌁

Ingest

Continuous pull from arXiv, Nature, PRX, APL, PRL, and institutional sources

◇

Extract

Schema-enforced extraction of materials, geometry, and measurements with provenance

⊙

Cross-validate

Every QJosephson prediction checked against the published distribution

⟶

Verdict

Supports · Contradicts · Inconclusive — with full paragraph-level provenance

§ 03 The Name

The cognitive style the tool embodies.

QAIREON models are named after scientists whose intellectual signature maps to what the tool does. QJosephson predicts physics that has not yet been built. QBiruni goes everywhere, reads everything in the original language, and verifies from primary sources.

Historical engraving associated with al-Biruni (source: Wikipedia)

QBiruni

ABU RAYHAN AL-BIRUNI · 973 — 1048 CE

Known as
Father of comparative science

Languages mastered
Arabic, Farsi, Sanskrit, Greek, Syriac, Hebrew

Fields
Astronomy, mathematics, geography, anthropology, mineralogy

Major work
Kitab al-Qanun al-Mas'udi

The first scientist who went to the primary source of everything

Abu Rayhan al-Biruni was born in 973 CE in Khwarazm — present-day Uzbekistan — and became the most formidable empirical mind of the Islamic Golden Age. What defined him was not brilliance alone, but method: an absolute refusal to accept secondhand knowledge. When he wanted to understand Indian mathematics, medicine, and astronomy, he did not ask a translator. He learned Sanskrit, traveled east, and read the Vedas himself. His result, Tahqiq ma li'l-Hind (Researches on India), is the first work of systematic comparative ethnography in history.

He measured the Earth's circumference using an original method involving the angle of a mountain peak — his result was accurate to within 1% of the modern value. He described the rotation of the Earth before Copernicus. He catalogued over a thousand plants with systematic cross-references across Arabic, Greek, Persian, and Indian sources. He identified specific gravity for dozens of minerals with precision that stood for centuries.

The cognitive act that defines him — and the one the tool inherits — is this: go to every source, in its original form, verify the claim yourself, reconcile it against everything else you know. He did not trust transmission. He went back to the data. Every time. That is what QBiruni does to the quantum hardware literature.

Tahqiq ma li'l-Hind (Researches on India)1030 CE
Kitab al-Qanun al-Mas'udi (astronomical encyclopedia)1030 CE
Kitab al-Tafhim (introduction to mathematics and astronomy)1029 CE
Kitab al-Jamahir fi Ma'rifat al-Jawahir (mineralogy)1048 CE
Al-Athar al-Baqiyya (chronology of ancient nations)1000 CE

§ 04 The Existing Ecosystem

What already exists — and how QBiruni uses, learns from, and surpasses it.

Several powerful tools attack the scientific-literature problem. None of them solve it for quantum hardware engineering specifically. QBiruni does not compete with them — it is built on top of them, using each as the best available component for its role, while providing what none of them can: a domain-specific ontology, quantitative extraction with provenance, and tight integration with a running simulator.

Understanding this ecosystem is not optional. It is the fastest path to a working v0 — and the clearest map of where QBiruni's real moat lies.

Build on top of

PaperQA2 — FutureHouse

Open-source · State-of-the-art scientific QA with citation grounding

PaperQA2 is the current best-in-class for scientific question answering with provenance. It retrieves relevant passages, keeps citations attached to every answer fragment, and scores well on academic benchmarks. It is open-source and actively maintained.

→ Study its citation-grounding architecture before writing any extraction code. The problem of "every sentence must cite its source" is solved here.
→ Fork the retrieval-augmented QA loop as QBiruni's synthesis layer baseline.
→ Its weakness: no schema, no structured extraction. It returns prose, not typed rows. This is exactly where QBiruni takes over.

Use as infrastructure

Semantic Scholar API

Free API · 200M+ papers · Semantic search, citations, abstracts

Semantic Scholar is the primary paper database QBiruni should ingest from. It provides structured metadata (authors, venues, citation counts, semantic embeddings) and semantic search over 200M papers. It is not a competitor — it is the data layer.

→ Use the S2 API as primary ingestion source alongside the arXiv API. The two together cover > 98% of the relevant corpus.
→ Use S2's citation graph to surface papers that cite a key reference — essential for tracking how experimental claims evolve over time.
→ Filter by venue: PRX Quantum, Nature Physics, APL, PRL, arXiv:quant-ph, cond-mat.supr-con.

Use as filtering layer

arXiv-sanity-lite

Open-source · Relevance filtering for arXiv feeds

arXiv-sanity-lite (Karpathy's tool) provides a simple but effective relevance-ranking mechanism for arXiv papers using TF-IDF and SVMs. It is not semantic search — but it is fast, interpretable, and requires no GPU.

→ Use it as a coarse pre-filter before expensive LLM extraction. Papers ranked below threshold get metadata-only storage, not full extraction.
→ Fine-tune the term weights on quantum-hardware vocabulary: TLS, transmon, Josephson junction, tantalum, niobium, T₁, T₂, quasiparticle.

Benchmark against

Elicit + Consensus

Commercial · General-purpose scientific literature search

Elicit and Consensus are the best generic scientific-literature AI tools available today. They search papers, summarize claims, and extract some structured data. Every quantum hardware team is already aware of them or using them.

→ Use them as a benchmark baseline. Before claiming QBiruni is better, test the TLS oxide question on both and document where they fail. Concretely: they return prose summaries, not typed measurements; they hallucinate numbers; they have no concept of "device regime."
→ Their failure modes are QBiruni's differentiation story, made concrete with examples.

Learn from failure

Galactica — Meta AI

Discontinued · Large language model trained on scientific text

Meta's Galactica (2022) was trained on 48M scientific papers, textbooks, and knowledge bases. It was pulled from public access after 3 days because it confidently hallucinated scientific claims — plausible-sounding but wrong numbers, fabricated citations, invented results.

→ The lesson is structural: an LLM trained on science still hallucinates when not grounded in retrieved source text. QBiruni must enforce retrieval-first — never generate a number without a source paragraph in context.
→ Every extraction must store the verbatim source paragraph. This is not optional. It is the architectural lesson Galactica teaches.

Consider as extraction backbone

Llemma — EleutherAI

Open-source · Math and science LLM based on Code Llama

Llemma is a family of open-source models (7B and 34B) specifically trained on mathematical and scientific text. It outperforms general models on scientific reasoning tasks and runs locally — no API cost, no data leaving your environment.

→ Evaluate Llemma-34B as an extraction backbone for the structured ontology extraction step — potentially faster and more private than Claude API for high-volume extraction.
→ Trade-off: lower ceiling than Claude Sonnet on complex reasoning, but may be sufficient for schema-constrained extraction where the ontology does most of the work.

Study the interface pattern

SciSpace (Typeset)

Commercial · PDF reading and explanation for scientists

SciSpace allows researchers to upload PDFs and ask questions. It is widely used and has good UI/UX for the researcher workflow. It explains figures, tables, and equations in context.

→ Study its UI for the "explain this figure" and "what does this table mean" interactions — these are the interaction patterns QBiruni's researcher-facing interface should support in v1.
→ Its critical gap: it does not extract structured data, does not compare across papers, and has no simulation integration. QBiruni's value is exactly what SciSpace does not do.

Where QBiruni's moat actually lives

None of the above tools can do what QBiruni does: extract a T₁ value from a paper as a typed row — {device: tantalum transmon, substrate: sapphire, oxide_thickness: 3nm, T₁: 312µs ± 25, temp: 15mK, source: §3.2 para 4} — and automatically compare it against a QJosephson simulation output in the same schema. That tight ontological coupling between simulation and literature is the real moat. PaperQA2 does not know what a Josephson junction is. Semantic Scholar does not understand fabrication regimes. Elicit does not extract measurement uncertainty. QBiruni does all three — because it was built for this specific domain, with a schema designed jointly with QJosephson, by people who understand the physics.

The secondary moat is accumulation. Every paper that passes through QBiruni, every extraction that is audited and verified, every verdict that is checked against reality — this makes the database richer and the extractor more accurate. After 500 papers, QBiruni knows things about the quantum hardware measurement landscape that no human team has ever systematically compiled. That knowledge does not expire and is not easily replicated.

§ 05 Pipeline

Six stages from raw literature to calibrated verdict.

Every paper that passes through QBiruni traverses the same strictly-typed pipeline. No stage silently falls back; every stage produces an auditable artefact. The goal is not coverage — it is trust.

⌁

Ingestion

Continuous feed from arXiv (quant-ph, cond-mat.supr-con) and Semantic Scholar API. arXiv-sanity-lite pre-filters by relevance before any LLM touch.

arxiv-apisemantic-scholararxiv-sanity-lite

▤

Parsing

PDF decomposed into paragraphs, tables, figures, equations, captions. Every token preserves its source offset. Tables and supplementary data are first-class structured objects.

GROBIDPyMuPDFtable-transformer

◇

Extraction

LLM-guided extraction under strict Pydantic schema. Every numeric claim typed: device · material · geometry · measurement · condition · uncertainty. Ungrounded extractions are rejected. Verbatim source paragraph stored alongside every claim.

claude-sonnetpydanticllemma-34b (alt)

⚖

Verification

Human-in-the-loop audit queue for every extraction below confidence threshold. Rolling accuracy metric gates the extractor. Trust is built, not assumed.

audit-queueconfidence-gatingprovenance

◫

Storage

Dual store: Postgres for structured, queryable facts; Qdrant for semantic retrieval. Both share the ontology used by QJosephson — a measurement is a measurement, wherever it came from.

postgresqdrantshared-ontology

⟶

Synthesis

Hybrid retrieval combines structured filter and semantic neighbourhood. PaperQA2-inspired agent loop assembles evidence, surfaces conflicts, composes the verdict. Every sentence carries a citation. Nothing un-cited reaches the output.

paperqa2-archhybrid-retrievalcitations

§ 06 First Version

One mechanism. End-to-end. Full provenance.

QBiruni v0 covers one loss mechanism, end-to-end, with full provenance. Depth before breadth. Trust before reach.

v0 · End-to-End Flow

QJosephson simulation outputINPUT

↓

Parameter-regime query01

↓

Literature retrieval (hybrid)02

↓

Measurement extraction & grouping03

↓

Distribution & conflict analysis04

↓

Verdict report + provenanceOUTPUT

Scope: the TLS oxide thickness question.

v0 addresses one loss mechanism: two-level-system (TLS) loss in the native oxide layer. QJosephson produces a T₁ prediction for a given substrate, material, and oxide thickness. QBiruni then asks what the published literature says in that exact regime.

This mechanism is chosen deliberately — well-studied enough that data exists, contested enough that papers disagree, narrow enough that the extraction ontology remains tractable.

Timeline6–8 weeks to working demonstration
Corpus~150 curated papers on TLS loss (2015–2026)
ExtractionOntology v1: device, substrate, material, oxide, T₁, conditions, uncertainty
OutputOne-page report: distribution, supporting papers, conflicts, verdict
Benchmark10 QJosephson cases with known literature agreement
Success> 80% verdict accuracy, zero un-cited claims

Example v0 Output · Sample Verdict

simulation_idqj-2026-0417-tantalum-3nm

predicted_T1312 µs (±25)

regimeTa / 3nm native oxide / sapphire

literature_n14 papers, 23 measurements

distributionmedian 287 µs, IQR 195–340 µs

conflicts2 papers: systematically lower (substrate treatment difference)

verdictSUPPORTS — within published distribution

provenance23 paragraph citations, traceable

The memory
of the field,
automated.

The literature is the ground truth. Nobody can read it fast enough.

§ 02 Introducing QBiruni

A calibrated memory for the quantum hardware field.

§ 03 The Name

The cognitive style the tool embodies.

The first scientist who went to the primary source of everything

§ 04 The Existing Ecosystem

What already exists — and how QBiruni uses, learns from, and surpasses it.

PaperQA2 — FutureHouse

Semantic Scholar API

arXiv-sanity-lite

Elicit + Consensus

Galactica — Meta AI

Llemma — EleutherAI

SciSpace (Typeset)

Where QBiruni's moat actually lives

§ 05 Pipeline

Six stages from raw literature to calibrated verdict.

Ingestion

Parsing

Extraction

Verification

Storage

Synthesis

§ 06 First Version

One mechanism. End-to-end. Full provenance.

Scope: the TLS oxide thickness question.

§ 07 Architecture

The engineering stack, layer by layer.

§ 08 Integration

QJosephson predicts. QBiruni verifies.
QAIREON closes the loop.

§ 09 Roadmap

From one mechanism to the memory of the field.

§ 10 Vision

The memoryof the field,automated.

The literature is the ground truth. Nobody can read it fast enough.

§ 02 Introducing QBiruni

A calibrated memory for the quantum hardware field.

§ 03 The Name

The cognitive style the tool embodies.

The first scientist who went to the primary source of everything

§ 04 The Existing Ecosystem

What already exists — and how QBiruni uses, learns from, and surpasses it.

PaperQA2 — FutureHouse

Semantic Scholar API

arXiv-sanity-lite

Elicit + Consensus

Galactica — Meta AI

Llemma — EleutherAI

SciSpace (Typeset)

Where QBiruni's moat actually lives

§ 05 Pipeline

Six stages from raw literature to calibrated verdict.

Ingestion

Parsing

Extraction

Verification

Storage

Synthesis

§ 06 First Version

One mechanism. End-to-end. Full provenance.

Scope: the TLS oxide thickness question.

§ 07 Architecture

The engineering stack, layer by layer.

§ 08 Integration

QJosephson predicts. QBiruni verifies.QAIREON closes the loop.

§ 09 Roadmap

From one mechanism to the memory of the field.

§ 10 Vision

The memory
of the field,
automated.

QJosephson predicts. QBiruni verifies.
QAIREON closes the loop.