Pensions Regulator takes up machine learning

Project with Government Digital Service leads to more tailored approach to communications with different pension schemes

The Pensions Regulator has developed a machine learning model to predict how well workplace pension schemes are likely to comply in submitting key information and tailoring how it communicates with them.

Drawing of head with cogs in itPeter Jackson, the organisation’s head of data, has outlined the project in a blogpost, saying it was developed with the data team at the Government Digital Service.

“So far we have only been able to look at historical data, but we were keen to start making predictions about what pensions schemes will do in the  future,” Jackson says.

The data scientists opted for a supervised machine learning model, which ensures that all of the examples used in the learning process are labelled. This reflected the fact that the regulator held large volumes of data labelled with whether the pension schemes had complied with the requirement to submit a regular scheme return, which can be used in predictions about how it will behave.

The project involved creating new columns of data to extract extra meaning – a process named feature engineering – and opting to do most of the work with decision trees, which can be used to provide ‘yes or no’ answers to questions, and predict the likelihood of more complex outcomes.

In this case they could be used to predict whether a scheme would send in its return on time, late or fail to do so completely.


The schemes deemed likely to be late were also split into different groups using data around different types of variables. The final model splits them into around 30 groups, each of which is expected to behave differently.

“This is great as we can tailor our communication strategy differently for each group; light touch when we expect schemes to make returns on time, firmer when we expect a scheme to be more problematic,” Jackson says.

He adds that the project has made it possible to further investigate data associated with the schemes, and to work on improving the quality of data held by the Pensions Regulator.

Image by Kami Phuc, CC BY 2.0 through flickr