Monitoring progress on the new Global Goal for access to education will require research to capture data on the most disadvantaged children, particularly those excluded from formal schooling. In today’s blog, Ben Alcott and Pauline Rose argue that better data makes better policy. For educational access, this means gathering more data, over longer time periods, and working to integrate it with existing administrative data to produce richer evidence-bases for policymakers.

This post forms part of a cross-blog series on the 2030 Agenda for Sustainable Development run by the IGC, Africa at LSE, and South Asia at LSE blogs. View more posts in this series.

Worldwide, 250 million children lack basic numeracy and literacy skills. The Sustainable Development Goals (SDGs) aim to rectify this by promoting equitable and universal access to quality education. The SDGs’ explicit reference to equity is both laudable and essential: the most disadvantaged children are the least likely to be learning and the most in need of support.

Leaving aside on-going debates around which policies governments can adopt to deliver on this ambitious goal of providing quality education for all, in this blog, we focus instead on the challenge of how to track and monitor progress. Effective monitoring will require an understanding of which kinds of data will best capture the full scope of the problem of educational inequality.

In recent years, two sets of surveys –the Young Lives study and the People’s Action for Learning (PAL) Network – have made great strides in improving our knowledge of the inequalities that persist in educational access. Both have the added benefit of being publicly-available. Using them as a starting point, we highlight the most useful principles of these datasets which can be incorporated into future data-gathering initiatives.

Reaching the most disadvantaged children

For data to tell us anything meaningful about equitable access to education, the first step is ensuring that data-collection instruments will gather information on the most disadvantaged children.  Relying purely on testing and learning assessments however, means that only children that are already in school will be included in the data. This is insufficient. Current estimates predict that approximately 58 million children are still out of school. These children are the most likely to face a range of disadvantages associated with poverty, gender, ethnicity, geography, and disabilities.

The solution might come from developing more comprehensive, representative samples, based on household surveys, such as the kinds adopted by the PAL Network. These surveys randomly select villages and households within districts. Data is appropriately weighted to account for differences in district sizes, and helps to create a more representative picture of the extent to which children are learning, regardless of whether they are in school.

It is worth noting that even the most comprehensive of household surveys cannot guarantee full coverage of the world’s population. Since refugees and internally displaced children face some of the worst conditions, their absence from educational data increases the risk of underestimating inequalities and thus the extent to which education systems must change in order to improve opportunities for all. In short, poor data hamstrings effective policy design.

Similarly, nomadic populations and street children are amongst the most disadvantaged, and also the most likely to be missing. Children with disabilities are usually systematically excluded from learning assessments, and special efforts will be needed to include them.

Nonetheless, given the considerable costs that any data collection entails, household surveys are likely to provide the most feasible and sustainable means of maximising scale. Robust sampling provides nationally representative data without the need to visit all households, as in a census. Household surveys are also useful in collecting information on key household measures of deprivation.

The sustainability of these initiatives is evident in the fact that they have been conducted for a decade in India, one of the world’s largest countries, and are now being trialled in Mexico and Nigeria.

More data is needed to understand how inequality changes over time

Young Lives is currently the main publicly available source of longitudinal data in low- and lower-middle income countries. It has provided invaluable insights into how inequalities form and consolidate over time. For example, in Peru a child’s family wealth at age 1 explains 32% of the variability in children’s performance in mathematics in Grade 4. And in each of the four countries surveyed (Ethiopia, Peru, Vietnam, and Andhra Pradesh, India), the richest quartile makes more progress than the poorest quartile in mathematics between ages 5 and 8.

Given uneven progress through primary schooling, and that this progress often depends on sources of inequality associated with inherited disadvantage, there is a need to track progress in children’s learning to identify where interventions can have the greatest impact. The trade-off between depth and scale  is a continual challenge for any longitudinal dataset.

The PAL Network surveys do not currently provide data that are longitudinal at the student-level, but they have the potential to do so. These surveys currently visit certain villages over consecutive years, and this could be extended to ensure that the same households continue to be surveyed. This would no doubt increase the cost, but savings could be made by reducing the survey to once every two or three years, rather than annually, as done at present. The potential benefit to policy development, in assessing the trajectory of learnings for different groups of children over time, is immense.

Data should be linked between households, schools, and governments

Identifying who is and is not learning is just the first step. In principle, schools should be institutions that are engineered to rectify these otherwise inherited inequalities. Too often though, they have the opposite effect, resulting in a widening of gaps over a school cycle. We therefore need to understand better what schools need to do to address learning inequalities.

Both Young Lives and PAL Network data have made important contributions in this regard. Young Lives data collects information on teacher characteristics, for example. This has led to important findings on the links between teaching quality in learning, such as the finding that teachers in Vietnam have a far more accurate sense of their students’ skill levels than do teachers in Andhra Pradesh. PAL Network surveys collect information on a sub-sample of schools that can, in turn, be linked to children who have been surveyed in households. Typically, surveyors visit a school within each village, offering a partial but important snapshot of local school conditions.

A significant extension of this would be to link household surveys systematically to pre-existing administrative data. For example, using common identifiers for schools that allow data collected from households to be linked with data collected on schools, such as on finances, class size, school facilities, teacher preparation, and teaching practices.

The distinctive characteristics of Young Lives and the PAL Network surveys have made each essential to our understanding of global learning inequalities. To chart progress towards the SDGs, the ideal survey would synthesise the strengths of each across a wider range of countries, so that we can have comprehensive, longitudinal data at scale. Of course, this is easier said than done, but the major accomplishments of these projects provide a great foundation to do so.

This article gives the views of the author, and not the position of the IGC.

This post forms part of a cross-blog series on the 2030 Agenda for Sustainable Development run by the IGC, Africa at LSE, and South Asia at LSE blogs. View more posts in this series.