Geolocation extraction using LLM’s
Our research paper has been published at the Web Conference 2026
Our paper “Fair Geolocation Extraction from Humanitarian Documents Using Large Language Models” has been published in the Companion Proceedings of the Web Conference 2026 (WWW ’26).
This work addresses a critical challenge in humanitarian crisis response: existing automated systems for extracting geographic information from text often reproduce geographic and socioeconomic biases, leading to uneven visibility of crisis-affected regions. Our research investigates whether Large Language Models (LLMs) can mitigate these disparities.
We developed a two-step framework that combines few-shot LLM-based named entity recognition with an agent-based geocoding module that leverages context to resolve ambiguous toponyms. The system was evaluated using an enhanced version of the HumSet dataset of humanitarian documents.
We show that LLM-based methods substantially outperform traditional pretrained models such as SpaCy or RoBERTa in both precision and fairness of geolocation extraction. In particular, our system improves the performance for underrepresented regions, particularly in lower-income countries and the Global South.