Our Methodology

ClimaEmpact is a framework designed to enhance the understanding of extreme weather events using large language models (LLMs). It bridges the data gap in underrepresented regions by analyzing news articles and other text sources to assess event severity, extract key themes, and evaluate public sentiment. This enables faster, more accurate decision-making when structured data is limited.

We use news articles as the primary data source. The ExtremeWeatherNews dataset comprises articles collected for 60 distinct extreme weather events, selected based on ClimaMeter. These events include heatwaves, cold spells, extreme wind, and extreme precipitation.

Our reports are generated using Domain-Aligned Small Language Models (SLMs), fine-tuned with climate-specific reasoning paths and real-world news data.

FAQs on the Methodology

News articles for 60 extreme weather events are collected using web scraping with Google News RSS and newspaper3k. Articles are filtered for English text, date range (±1 month), and relevance, with noisy and off-topic sentences removed.

The Flair NER tagger is used to identify geopolitical entities (GPEs) in each sentence. Sentences without location tags or those outside event-specific geographical scopes are discarded to ensure precision.

Three keyword groups—public (e.g., 'evacuation'), economic (e.g., 'insurance'), and weather-related (e.g., 'rain')—ensure comprehensive coverage of impacts, responses, and meteorological relevance during article retrieval.

ClimaMeter provides a curated list of 60 extreme weather events, including type, location, and date, which guide the web scraping targets and ensure a geographically and thematically diverse dataset.

Using one-shot prompting with Qwen2.5-32B-Instruct, 30k samples are generated from the news sentences, each containing task-specific labels and detailed reasoning paths for alignment across three tasks.

Prompts include detailed category definitions and structured reasoning templates. Annotators check clarity, relevance, and explanation consistency to ensure logical coherence and domain specificity.

Three tasks—(1) vulnerability/impact/emergency categorization, (2) topic/subtopic/keyword labeling, and (3) emotion analysis—capture practical, policy-relevant aspects of extreme weather impacts and public response.

LLMs are prompted to assign normalized probability scores (summing to 1) across predefined categories, capturing uncertainty and allowing multi-label reasoning in ambiguous or overlapping cases.

Prompts follow a fixed format with <think> for reasoning and <output> for final probabilities. They include category definitions and formatting rules to ensure interpretability and post-processing ease.

The data includes LLM-generated rationales, expert-aligned definitions, balanced task coverage, and strict quality control through filtering and formatting, leading to high consistency and task-relevance.

EWRA is a two-stage fine-tuning approach where SLMs are first trained on implicit prompts and then explicitly guided data, enabling them to learn both flexible domain reasoning and precise task adherence.