
Artificial intelligence in national statistics: Insights from Zambia
Can artificial intelligence help classify jobs more accurately and efficiently? In partnership with Zambia’s national statistical agency, the Zambia Evidence Lab is testing large language models on labour force survey data, with promising results that could speed up reporting and sharpen policy decisions without replacing human expertise.
As artificial intelligence (AI) continues to expand rapidly, public organisations are increasingly focused on how it can be used for good. One promising application of AI is for national statistical surveys conducted by governments around the world.
In Zambia, we are testing the use of AI to analyse the Labour Force Survey, a quarterly survey conducted to understand employment trends. Our work is in partnership with the Zambia Statistics Agency (ZamStats), Zambia’s national statistical agency.
The challenge of classifying survey responses
As part of the Labour Force Survey, ZamStats hires enumerators to conduct the survey on the ground. These enumerators ask respondents about their occupation and industry to understand employment and sectoral dynamics in Zambia.
After the interviews, enumerators are expected to assign responses to numerical, four-digit International Standard Classification of Occupations (ISCO) and International Standard Industrial Classification (ISIC) codes. ISCO codes describe a respondent’s occupation, while ISIC codes describe their industry.
ZamStats approached the IGC expressing concerns about the accuracy of these classifications. The codebooks are extremely complex (exceeding 300 pages each), and survey respondents often provide insufficient detail for enumerators to identify the correct code. After monitoring the survey process in the field, we proposed using large language models – AI trained to answer natural language questions – to classify the responses.
Testing large language models against enumerators
To measure classification accuracy, we need ground truth against which to compare large language models and enumerators. Officials from the ZamStats headquarters in Lusaka, who oversee the survey for the entire country, offered their expertise and classified 1,059 responses into ISCO and ISIC codes.
We instruct a large language model, GPT-4 Turbo, to process this same set of responses and assign its own codes. With a job title, job description, and main job activities from the survey – but no codebook guides or successful examples – GPT-4 Turbo is instructed to select ISCO and ISIC codes. We then compare the similarity of these codes and the enumerator codes against the ground truth for each cumulative digit, checking if the first digit matches, first two digits match, and so forth.
Large language models perform better, but not always
On the ISIC codes, the large language model outperforms the enumerators on every digit. The results for ISCO codes are varied. On the first two ISCO digits, the large language model performs better, although the difference on the second digit is small enough that it is not possible to attribute the result to more than random chance. However, the enumerators outperform the large language model on the third and fourth digits.
The large language model performs worse on the third and fourth ISCO digits because classification complexity increases with each digit. They will avoid simple human mistakes on the more basic first and second digits, but are more likely to struggle on the final digits, which require greater country and interview context.
The figures below show graphical results comparing accuracy rates. On the first ISCO digit (Figure 1), GPT-4 Turbo matches the ground truth by nearly 7 percentage points more than the enumerators. However, it performs worse by 12 percentage points on the third digit and 21 percentage points on the fourth digit.
Figure 1: ISCO match rates by digit level

On the second, third, and fourth ISIC digits (Figure 2), GPT-4 Turbo outperforms enumerators by seven to eight percentage points (the first digit for ISIC is omitted as the first two digits are always analysed in conjunction for policymaking purposes).
Figure 2: ISIC match rates by digit level

Future research can improve how large language models perform
With changes, we expect that large language models can outperform humans across all digits for ISCO and ISIC codes. We are planning four steps for the next iteration of this research project:
- Implementing the method within ZamStats using a graphical user interface (GUI) or by embedding a large language model into the tablet used by enumerators.
- Exploring state-of-the-art models such as OpenAI’s o4-mini and Anthropic’s Claude Opus 4.
- Adjusting the prompt to include codebooks and successful examples to guide the large language model in assigning the codes.
- Incorporating a larger dataset with additional Labour Force Survey data shared by ZamStats.
Policy implications of adopting AI in statistical processes
We offer comments on three areas that policymakers interested in this research in their own contexts may want to consider.
1. Time savings
Using AI to classify ISCO and ISIC codes enables time savings (up to 130 working days annually, according to our calculations), speeding up the production of official statistical reports. Because policymakers often conduct countercyclical economic policy in response to labour market shocks, fast-tracking the statistical production process will allow Zambian policymakers to respond more quickly to labour market changes.
2. Jobs
Some policymakers will understandably be concerned about job losses from AI. In this specific context, it is possible for AI to augment rather than replace humans, who will still be required to physically administer the survey and check large language model classifications for accuracy. From a social welfare perspective, the goal should be to free officials at national statistics agencies to perform more tasks, more effectively, rather than performing the same tasks within existing capacity constraints.
3. Accuracy
Increasing classification accuracy will alter the distribution of labour market categories because some sectors may be over- or under-represented by enumerators relative to the ground truth. With a better understanding of Zambia’s labour force composition, policymakers can better determine which sectors to target for interventions.
What next?
As AI’s prominence increases, public sector organisations must devise ways to harness it for good. Our work demonstrates that national statistics is one such opportunity. By automating parts of the statistical production process, policymakers can save time and improve accuracy, freeing statisticians to work on more analytical tasks.