Multilingual health AI search in 60+ languages

A national health body engaged us to deploy AI search across its translated medical resources — a corpus of patient-facing health information published in 60+ languages. End user query: "find COVID vaccine information in Arabic." End user expectation: the right document in the right language, not a generic translation.

Background

The body's translated content was authoritative and high-quality but practically invisible to the users who most needed it. The conventional search experience required users to know which language a document had been translated into, and to know the local naming conventions. Practical effect: most users found the English original, not the version they could actually read.

The brief: an AI search interface that understood multilingual queries, routed them to the right corpus, and surfaced answers in the user's preferred language with the citation to the source document.

Solution

We deployed Onyx with per-language analyzer configurations across the multilingual corpus and added cross-language query routing — a question in Arabic searches against Arabic content; a question in English about Arabic-speakers' COVID information returns Arabic results with a translated summary. The Onyx data model is language-aware end to end, so the citation back to the source document is to the right language variant.

PDF extraction handles the layout patterns common in health-information leaflets — multi-column flows, glossary panels, info-boxes — across script systems including Arabic, Chinese, Korean, Hindi, Tamil, and others where the standard PDF-to-text path tends to break.

Outcome

End users find authoritative health information in their own language, with citations they can trust and a path back to the canonical source document. Where the question is ambiguous, the agent prompts for language clarification rather than defaulting to English.

Client is anonymised in this case study at their request. We're happy to discuss the engagement in detail under NDA.