Skip to the content

Managing the public sector’s data explosion

31/12/24

Industry Voice

Get UKAuthority News

Share

Explosion abstract
Image source: istock.com/liulolo

CURRENTLY FOR REVIEW

 

There are ways of managing the massive of volume of data while controlling costs, writes Berwyn Jones, head of UK public sector at Cribl

It is no secret that there has been an exponential growth in the volume of data used in public services, and there is every indication that it is going to continue over the next few years.

A number of factors are at work, including the increasing use of data from internet of things (IoT) technologies, the surge in activity around the use of AI, and the evolving threats of cyber attack that require the continual monitoring of large amounts of data on internet activity.

The latter point has assumed a new importance with the Government’s plan for a new Cyber Security and Resilience Bill, which includes expanding the remit of regulation to cover more digital services and supply chains and will demand oversight of even more data.

All of this has to be stored and managed, but the financial outlook for the public sector indicates that it will not receive new funding to respond to the demand, and it will have to develop new approaches to managing the data within existing budgets.

This is further complicated by the widespread use of hybrid cloud environments, with organisations using different combinations of public and private clouds and on-premise facilities for storing data.

UKA Live perspectives

It is becoming a major issue for digital teams in the sector, and provided the focus of a recent UKA Live discussion, staged with Cribl, in which I took part with Richard Woolham, lead product owner for performance monitoring and data analytics at DWP Digital, Stuart Bowell, global head of observability at NETbuilder, and UKAuthority publisher Helen Olsen Bedford.

It identified some of the major challenges in the explosion of public services data and measures that could be taken to make it manageable within the expected constraints on budgets.

A prime challenge has emerged around data hoarding, deriving from organisations holding onto massive volumes on the basis that it might be needed in the future, but most of which is unlikely to be of any real use. This comes at a significant cost and complicates the demands on making it available and re-usable.

Organisations are having to make compromises that some fear could create operational blind spots and undermine their security postures and reliability of systems. Making the right choices involves understanding not just existing but future uses of the data, which is very difficult with all the unknowns around cyber threats and the possibilities with AI.

Legacy lock-in

There are worries about data being locked into legacy systems, with effective control being in the hands of the suppliers. The discussion raised a point that there is a similar fear around using data on cloud based analytics platforms that can make the results visible but not easily available for transfer to other systems.

The sheer number of systems, often requiring different data formats, is adding to the complexity, as is the need for application programming interfaces (APIs) to support the sharing and re-use of data, and the growing number of agents on servers to gather data and protect systems. The problems around these are intensified by the growing need for real time data to support operations and cyber security.

Another factor is that the value of data can lessen over time, and there could be a cut-off point when it is not worth retaining; but there are always variables that make it a challenge to assess when the right time comes.

The discussion brought out an agreement that the ability to own and control the data, having freedom to make decisions on where it should be stored, is a big factor in in dealing with all these difficulties successfully without breaking budgets.

Techniques to take

There are techniques for achieving this. One is to get a clear view of prime uses for the data to determine where and how it is stored. For observability and security purposes it is likely to be needed in real time, but for governance it would be in line with audit cycles and therefore easier to store at low cost when there would be no urgency in retrieving it.

Another is to ensure that any agreement with systems suppliers recognises that the data belongs to the public sector body and that it has the right to recover and move it at any time. This comes with ensuring there are no technical barriers, making sure that data will be stored in accessible formats over the long term, and that the systems have the necessary degree of interoperability.

This should be a crucial element of future procurements, although there is a challenge in breaking free of legacy terms and conditions when it is otherwise easier to stick with the same supplier.

Going as far as possible with the standardisation of data can provide benefits. There is a shortage of skills needed to build data pipelines, but standardising the structure of data and how it is stored makes it easier to train people to collect and use data without expensive external support.

If these principles are fed into a data strategy, aligned with the organisation’s broader goals for the long term, they can do a lot to future proof the data, ensuring that it will retain its value and keep the cost in check.

The Cribl solution

There are also tools that can support this, notably the Cribl suite, which provides a competitively priced approach for data management in a hybrid cloud environment. It can take data from multiple proprietary sources to feed into multiple destinations, making use of its wide range of prebuilt integration and different solutions.

These comprise: Cribl Search for querying data where it lives, searching all datasets once and sending only what is needed to analytics tools; Cribl Stream for data routing, shaping and reduction; Crible Lake for storage and storage and easy access; Crible Edge for data collection at scale and centralised management and configuration; and Cribl Copilot to generate pipelines and functions from natural language with troubleshooting guidance.

These help to provide complete ownership and control of the data, with the flexibility to optimise its use and reduce the costs of storage and processing. The suite makes it possible to collect data from any source, enrich it and send it where you want, improving its quality for input into different systems.

It also reduces the reliance on expensive, specialised skills to build the data pipelines for the future, enabling organisations to prepare data to fulfil the potential of AI in public services, and to respond to the changing demands on cyber security.

This can be a big part of any public service organisation’s strategy for preserving the value of its data for the future.

CALL TO ACTION

Register For Alerts

Keep informed - Get the latest news about the use of technology, digital & data for the public good in your inbox from UKAuthority.