A data for policy workshop hosted in Zambia in 2025 to discuss the use of AI and data in government.

A data for policy workshop hosted in Zambia in 2025 to discuss the use of AI and data in government. Photo by IGC.

From promise to practice: Making AI work in LMIC governments

Blog data, artificial intelligence and State effectiveness

AI will not transform government overnight – but in the right hands, it can make capable systems more effective. For low- and middle-income countries, the art is in knowing where to start – by applying AI where it solves real problems, in low-risk use cases, and with local evidence – when to adapt quickly, and when to stop.

Artificial intelligence is moving from hype to habit in policymaking circles. For developing countries, the latest generation of AI tools offers three capabilities with immediate relevance to governments: supporting decision-making, making sense of complex administrative data, and unlocking institutional knowledge. 

Systems that can sift through records, flag anomalies across datasets, or provide surface guidance to caseworkers are no longer speculative – they are on the market and, in some administrations, already in use. Yet the promise is outpacing the preparedness to use it well.

AI can speed up public administration by reducing time lost to paperwork, facilitating swifter and more consistent triage, and leveraging dormant data more effectively. For low- and middle-income countries (LMICs), the risk is misallocation – investing scarce resources in projects with little public value or in tools that seem faster but are not. 

The remedy is a combination of humility and rigour: start with low-risk internal uses, measure the gains, strengthen data and governance scaffolding, and adapt systems to local realities. Done this way, AI becomes a lever for capability, not a multiplier for dysfunction.

Why AI should be on the agenda

The case for focusing on the use of AI is strong:

1. AI can unlock big near-term productivity gains in routine services. 

The Alan Turing Institute estimates that about one billion citizen‑facing transactions a year are made in the UK central government, including roughly 143 million complex ones – 84% of which are judged to be highly automatable. Every minute saved adds up. Many developing countries will likely have similar high‑volume back‑office work that AI can help with.

2. AI can be used to raise data quality and unlock dormant value.

When staff and clean data are scarce, it is faster to augment than to rebuild. AI that standardises classifications, links records, and searches across silos can spot errors, fill gaps, and speed up decision-making processes. Used within existing workflows, it can deliver quick wins and build momentum for bigger data reforms. 

Working with the national statistics agency, ZamStats, we tested OpenAI’s GPT-4 Turbo on expert-coded data. Enumerators working on Zambia’s labour statistics must translate open-ended survey responses into detailed job and industry codes using 300-page manuals – a tedious task. The model outperformed human coders in industry and broad occupation categories, and could save approximately 130 workdays per year if embedded in workflows.

3. AI can help unlock institutional knowledge. 

Governments drown in PDFs, guidance notes and evaluations; the value is there, but buried. AI-powered retrieval can index that corpus and provide evidence-based answers to frontline officials. 

The World Bank’s ImpactAI points the way: a curated, generative assistant that turns impact-evaluation research into policy-ready summaries and side-by-side comparisons of interventions to help officials allocate scarce resources more effectively.

Where to start? Getting the basics right before scaling AI use

This potential collides with the administrative reality in most countries: infrastructure is fragile, connectivity is patchy, specialised staff are scarce, and core administrative data is fragmented. Without clear rules and institutions, pilot studies either stall, backfire, or fail to scale up. 

What is needed is scaffolding that makes AI usable at scale: an understanding of risk and use, disciplined procurement, public inventories, stronger data foundations, and human oversight. In practice, shoring up the data layer – inventories, quality, basic interoperability – delivers more value than flashy pilots. Get the basics right, then scale.

A sensible first move is to prioritise internal-facing use cases with low risk and measurable productivity gains, while building the necessary muscles (policy, assurance, skills) for higher-stakes deployments. This approach is followed by many high-income countries – for example, in Australia, generative tools are being utilised for administrative and analytical tasks, with caution exercised when services involve human rights or safety concerns.

Low-risk, high-reward examples include cleaning and linking records, drafting internal summaries of large files, assisting in searches over policy and case guidance, structured extraction from forms, code scaffolding for analysts, and triage of support tickets (with a human in the loop). The reward is only real if it outperforms current baselines, so instrument pilots must measure and publish the numbers.

Using AI for high-impact cases comes with risks

For higher‑impact uses like fraud detection, eligibility triage, or resource allocation, stronger safeguards and more specialised capacity are non‑negotiable. Risk frameworks need to converge on three habits: transparency, auditability, and contestability – NIST’s AI Risk Management Framework and OECD guidance on generative AI are good anchors.

There are cautionary tales about algorithmic decision-making without adequate human oversight. In 2020, England used a statistical standardisation model to award GCSEs and A-levels (secondary school qualifications in the UK). This led to public anger, with confused students demanding explanations of why they had been assigned specific grades. Ultimately, the system was suspended, and students were awarded grades based on teacher assessments. 

In 2018, researchers found that commercial facial-gender recognition systems made far more errors for darker-skinned women – up to 34.7% – than for lighter-skinned men, where error rates were below 1%. The bias stemmed from unrepresentative training data, and the study triggered calls for stricter testing across demographic groups before deployment.

What to prioritise when deploying AI – and how to sequence it

1. Start with principles, not promises

Capability, not just code, matters most. Public servants need AI literacy; lawyers and regulators must be able to uphold or contest algorithmic decisions; civil society needs tools to scrutinise claims. 

2. Invest in quality data and training

Above all, basic administrative data needs to be used well. AI applied to flawed data yields poor results. Treat inventories, data quality, linkage, and access as core infrastructure. 

Do so by investing in people within the government: train policy teams to ask the right questions and establish small, cross-functional teams that enable the delivery of data and AI work, as well as provide assurance.

3. Prove value where risk is low

Choose a few internal use cases where the time saved scales into months. Establish baselines, conduct quick A/B or before-and-after tests, and share the results. If the gains are weak, stop and try a different problem. Developing expertise through low-risk projects is an essential first step.

4. Measure reality, not hopes

Do not assume every deployment of AI boosts productivity. A recent randomised trial of experienced open‑source developers found that AI assistance made them about 19% slower, despite ex‑ante beliefs that it would make them faster and self-reported productivity improvements by trial participants. 

Early evidence on "cognitive debt" from a study on AI-assisted essay‑writing points the same way. This is why it is essential to measure success, not activity – tracking days from idea to pilot, the share of deployments with proper evaluation, the number of datasets lifted to a minimum quality, incident rates and resolution times, and staff satisfaction with guidance. If the metrics fall short, adjust your course quickly.

5. Measure what matters

Avoid testing models in unrealistic, lab-based settings – AI systems can exhibit impressive performance in standardised testing, but have a limited impact in reality. For example, state-of-the-art language models are claimed to have expert knowledge in high-impact areas like healthcare and law, as measured by performance on standardised tests. However, performance often significantly drops when evaluated in realistic settings involving information asymmetries, prolonged human interaction, and requiring tacit knowledge.

6. Borrow, then localise

One advantage for LMICs is that others have already experimented. Public inventories (such as this one by the US Chief Information Officers Council) now catalogue thousands of AI uses – from mission support to citizen‑facing services – and flag those that are “rights‑ or safety‑impacting” and what safeguards apply. Treat these as menus, not blueprints; shortlist promising ideas, and then demand local evidence before scaling.

7. Find models that fit the context, not just “state‑of‑the‑art”

Foundation model AI is mainly trained on internet data steeped in North American and European language use and norms. Two problems follow for LMICs: performance drops in non-European languages, and models default to Western norms and categories. 

Recent evaluations for African languages document the gaps. The remedy is to test models on your own data and tasks, require vendors to show performance for your languages and domains, and consider fine-tuned or retrieval-augmented systems grounded in local corpora – although in many cases, large general-purpose models may still outperform smaller specialised ones.

8. A cross‑border technology with bordered regulation

Data and models move across borders, but regulation does not. Start with clarity on your government, even if minimal at first, and be mindful of spill‑overs. The EU AI Act provides a practical taxonomy for classifying risk and obligations. Regional baselines, such as the African Union’s Data Policy Framework, can also guide safe data sharing and establish standards.

AI must augment existing systems to solve real problems

Fundamentally, AI will not fix broken systems, but it can make capable ones faster, more consistent, and more effective. For developing countries, the challenge is not to chase every frontier, but to apply technology where it solves real problems, at a measured risk, and with evidence in hand. Start small, measure hard, adapt fast – and let capability, not novelty, set the pace.

The Zambia Evidence Lab is an initiative spearheaded by the Zambian Ministry of Finance and National Planning (MoFNP), supported by the International Growth Centre, to enhance the use of data in policy decisions.

Connect with the Zambia Evidence Lab