Barriers to auto classification for information governance and how to overcome them

In the second of this three part blog series, records management expert Conni Christensen provides insights from her experience with information governance and auto-classification methodology.

After my last post, I got into conversations with colleagues about why we haven’t been able to leverage auto-classification more for information governance (IG)

Given that it’s no longer possible for us humans to process the vast quantities of information we create and receive, why can’t we take advantage of the tremendous advances made in the field of data analytics and auto-classification?

What you see is a digital transformation process, where

1) a body of knowledge is translated into

2) schemes, rules and processes, which are then

3) parsed into data elements, which can then be

4) built into algorithms enabling automation of the process.

This is the means by which modern accounting systems (like Xero) have evolved, where computer-based processing is achieved through the creation of rules based on data recognition. Same with ERP, same with other business systems.

But not with information governance. Because we have failed to convert the information governance body of knowledge into a digital framework.

Barriers to information governance automation

Over the last 20 years, the IG body of knowledge has been well documented in standards such as IS015489 Records management, ISO23081 Recordkeeping Metadata, and various government standards relating to security, privacy, data protection, recordkeeping etc. But the controls which support these standards, like retention schedules, are still written as documents for human interpretation and application.

Take this description from the Records Authority 2014/00247391 for records which are to be retained as national archives, ie permanently:

Records relating to arrangements, agreements, Memorandums of Understanding (MOUs) and contracts relating to the management, conservation and use of significant NCA managed land and assets such as National Memorials, diplomatic sites and estates, heritage value buildings and Lake Burley Griffin. Includes water abstraction agreements, Crown Leases for diplomatic sites and estates, agreements with external parties to undertake major works activities and projects, including those that do not proceed.

And this for records which can be destroyed 12 years after completed or termination of agreement:

Agreements, Memorandums of Understanding (MOUs) and contracts relating to the management, conservation, maintenance and use of NCA managed land and assets, other than those covered by class 61534.                                                                                                                             

Based on these rules, a human appraising records would interpret the rules by knowing which document types, sites and buildings, business activities to look out for. If we specified which documents types, sites/buildings and activities were significant, the same work could be undertaken using a search engine.

To enable automation we need to transform our governance controls into taxonomies, ontologies and data models from which we can build algorithms to feed into the search engines, enabling recognition of the terms/term sets which indicate significance. Likewise for access and security controls.

How can we breakthrough with auto classification?

Oddly enough we have an effective model for converting information governance requirements into a digital framework. It was called DIRKS and you’ll find remnants of the Methodology in ISO 15489 Standard for Records Management. Over the last 15 years consultants like myself working in Australian government have followed this methodology to develop classification schemes and retention schedules.

Whether they knew it or not, the Australian government regulators had developed an ontology containing many of the concepts and relationships that define recordkeeping rules. I developed much of my body of knowledge from following the methodology over and over again and then by developing a.k.a.® software to make the process a whole lot easier.

What’s holding us back from automating information governance?

The DIRKS methodology fell from grace in 2007 because the process mandated by the National Archives of Australia was highly prescriptive and costly. A bit like the parent of Baby Huey, NAA struggled to control their creation, and there was a backlash from government agencies. But in the intervening years no one has come up with an effective framework for modelling information governance requirements.

I believe that the DIRKS framework holds the keys to automation. In fact I predict that once we re-examine the DIRKS data models, automation is going to become a whole lot easier.

Read more about the practical application of auto-tagging in this Pingar whitepaper.

Conni Christensen founded Synercon in 1998 and is the designer of a.k.a.® information governance software.

She has more than twenty years’ experience in records and information management, business consulting, training and software development. For many years, Conni has worked across the globe as a highly sought trainer, speaker and presenter.

Find out about PingarBot.