Entries by Conni Christensen

Barriers to auto-classification for information governance and how to overcome them


Barriers to auto classification for information governance and how to overcome them

In the second of this three part blog series, records management expert Conni Christensen provides insights from her experience with information governance and auto-classification methodology.

After my last post, I got into conversations with colleagues about why we haven’t been able to leverage auto-classification more for information governance (IG)

Given that it’s no longer possible for us humans to process the vast quantities of information we create and receive, why can’t we take advantage of the tremendous advances made in the field of data analytics and auto-classification?

What you see is a digital transformation process, where

1) a body of knowledge is translated into

2) schemes, rules and processes, which are then

3) parsed into data elements, which can then be

4) built into algorithms enabling automation of the process.

This is the means by which modern accounting systems (like Xero) have evolved, where computer-based processing is achieved through the creation of rules based on data recognition. Same with ERP, same with other business systems.

But not with information governance. Because we have failed to convert the information governance body of knowledge into a digital framework.

Barriers to information governance automation

Over the last 20 years, the IG body of knowledge has been well documented in standards such as IS015489 Records management, ISO23081 Recordkeeping Metadata, and various government standards relating to security, privacy, data protection, recordkeeping etc. But the controls which support these standards, like retention schedules, are still written as documents for human interpretation and application.

Take this description from the Records Authority 2014/00247391 for records which are to be retained as national archives, ie permanently:

Records relating to arrangements, agreements, Memorandums of Understanding (MOUs) and contracts relating to the management, conservation and use of significant NCA managed land and assets such as National Memorials, diplomatic sites and estates, heritage value buildings and Lake Burley Griffin. Includes water abstraction agreements, Crown Leases for diplomatic sites and estates, agreements with external parties to undertake major works activities and projects, including those that do not proceed.

And this for records which can be destroyed 12 years after completed or termination of agreement:

Agreements, Memorandums of Understanding (MOUs) and contracts relating to the management, conservation, maintenance and use of NCA managed land and assets, other than those covered by class 61534.                                                                                                                             

Based on these rules, a human appraising records would interpret the rules by knowing which document types, sites and buildings, business activities to look out for. If we specified which documents types, sites/buildings and activities were significant, the same work could be undertaken using a search engine.

To enable automation we need to transform our governance controls into taxonomies, ontologies and data models from which we can build algorithms to feed into the search engines, enabling recognition of the terms/term sets which indicate significance. Likewise for access and security controls.

How can we breakthrough with auto classification?

Oddly enough we have an effective model for converting information governance requirements into a digital framework. It was called DIRKS and you’ll find remnants of the Methodology in ISO 15489 Standard for Records Management. Over the last 15 years consultants like myself working in Australian government have followed this methodology to develop classification schemes and retention schedules.

Whether they knew it or not, the Australian government regulators had developed an ontology containing many of the concepts and relationships that define recordkeeping rules. I developed much of my body of knowledge from following the methodology over and over again and then by developing a.k.a.® software to make the process a whole lot easier.

What’s holding us back from automating information governance?

The DIRKS methodology fell from grace in 2007 because the process mandated by the National Archives of Australia was highly prescriptive and costly. A bit like the parent of Baby Huey, NAA struggled to control their creation, and there was a backlash from government agencies. But in the intervening years no one has come up with an effective framework for modelling information governance requirements.

I believe that the DIRKS framework holds the keys to automation. In fact I predict that once we re-examine the DIRKS data models, automation is going to become a whole lot easier.

Read more about the practical application of auto-tagging in this Pingar whitepaper.

Conni Christensen founded Synercon in 1998 and is the designer of a.k.a.® information governance software.

She has more than twenty years’ experience in records and information management, business consulting, training and software development. For many years, Conni has worked across the globe as a highly sought trainer, speaker and presenter.

Find out about PingarBot.

Ontologies to automate records appraisal and classification


Why you need ontologies to automate records appraisal and classification

In the first of this three part blog series, records management expert Conni Christensen provides insights from her experience with information governance and auto-classification methodology

What is auto-classification?

Auto-classification is the process where documents are classified (tagged with metadata) by a machine i.e. a software tool. You could easily think that the machine can make informed decisions about classification just by reading the document. In reality, the classification is dependent on the knowledge that you build into the auto-classification engine. Classification which supports information governance is significantly different to classification for search. There is more complexity because there are more aspects to consider, such as:

  • users wanting to be able to capture and classify their documents so they can find, use and share them.
  • information professionals having to classify with metadata that governs access, data protection (i.e. GDPR), retention and disposal, all in accordance with contemporary standards and legislation.
  • ICT professionals wanting to be able to manage information infrastructure more effectively, and
  • the C suite wanting everyone to be more efficient at information management, spending less money on consultants and more time delivering products and services (but at the same time they don’t want to expose the business to unnecessary risks through poor governance practices).

Is it possible to achieve all this through auto-classification?

Yes it is… but we need to develop fit for purpose machine readable data models, such as ontologies, that convey the requisite knowledge into the auto-classification platform.

What are ontologies?

Ontologies are linked data models for describing a domain, that list the types of objects and their instances, the relationships that connect them, and the constraints on the ways in which objects and relationships can be combined. The term dictionary is used to refer to an electronic vocabulary or lexicon as used for example in spelling checkers. If dictionaries are arranged in a subtype-supertype hierarchy of concepts (or terms) then it is called a taxonomy. If it also contains other relations between the concepts, then it is called an ontology. Unlike File Plans, ontologies enable us to combine multiple concepts and define multiple types of relationships. In ontologies we can accommodate several taxonomies within the same scheme so you can address the needs of all stakeholders. And because they are built for machine-application scale is not an issue.

Ontologies hold the key to automated appraisal

Ontologies enable the automation of records appraisal. If we extract the knowledge built into contemporary disposal authorities we can create data models which enable the auto-classifier to recognise significant concepts, then tag documents with the appropriate disposal class.

Are ontologies difficult to build?

Not in my experience. In fact, I find ontologies far easier to build than file plans because the logic is more explicit. You start by defining your data model (ie what metadata you want to tag with) and the relationships between your concepts. Then you harvest the concepts (terms) from your existing controls – metadata libraries, business classification schemes and disposal authorities – and sort them into logical groups. Lastly you connect the concepts together with relationships. Often the type of relationships you build depend on functionality available within the target system. For example, DiscoveryOne is designed to use ontologies to auto-classify and appraise content.

Ontology tools

It’s easier to build ontologies with purpose built tools. We used a.k.a.® software for building the ontology and DiscoveryOne auto-classification platform to apply the ontology and tag content in SharePoint library. Click the links to find out more about these technologies. Conni Christensen founded Synercon in 1998 and is the designer of a.k.a.® information governance software. She has more than twenty years’ experience in records and information management, business consulting, training and software development. For many years, Conni has worked across the globe as a highly sought trainer, speaker and presenter.

Find out about PingarBot.

Finding meaning in mountains of data

Finding meaning in mountains of data

Many businesses find themselves drowning in the massive amounts of data produced in a digital-first world. ECMS are key to managing this information flow, but often don’t have the functionality to allow people to quickly find meaningful data.

There’s an old saying in business that ‘cash flow is your business’s lifeblood’. This may have been true in the past, but in our fast-moving digital age, information is just as crucial a tool as revenue. The problem faced by many large, complex organisations is that a proliferation of data makes it difficult to sift through and filter out what’s meaningful, what can have an impact on your business, and what will help you to make better decisions.

To help you understand this issue and how to address it, we have prepared an eBook called The search for meaning: why ‘findability’ can help maximize your ECMS investment. It outlines how to find meaning in enterprise data, and how enhancing findability can add significant value to the investment your business makes in important tools like enterprise content management systems (ECMS). The eBook covers:

Data, data everywhere… but not a byte to use?

This chapter delves into the specifics of data proliferation and the impact it’s having on businesses everywhere. Most of it’s unstructured – emails, webpages, social media posts, spreadsheets etc – and that increases the difficulty of finding something meaningful. That’s why your ECMS needs to have ‘findability’ tools beyond the native functions, so that they remain competitive and can make better decisions.

3 reasons your ECMS may not be delivering a sufficient ROI

Like all business tools, your ECMS needs to have a good ROI. There’s no point in investing in a system that isn’t providing value. This chapter looks at how relevant results, organization of documents, and managing risk are key factors in making sure your ECMS is extracting and managing the data you need.

The findability engine and the search for meaning

using a tool that allows businesses to automatically categorize and tag documents can assist in information retrieval tasks such as which documents are legal agreements, or which documents contain information around operational risk issues.

ECMS migration and upgrades

This is the ideal time to implement better content discovery. It’s a tool that enhances the value of your ECMS investment, and enables you to decide what data needs to be migrated and enable automatic tagging and metadata creation. This chapter outlines why, when choosing an ECMS, it’s essential to select one that’s a true findability engine, one that delivers results for critical business needs.

The Pingar solution

Here we look at DiscoveryOne, a multi-platform content discovery system that supports a variety of ECM systems, enhancing their capability for extracting meaningful data from the huge volumes that are available.

Although there are massive amounts of data to be waded through, your business doesn’t need to be drowning in it. With the right ECMS tools, you can find the meaning in the information mountain, which will enable better decision making and business growth.

Download the eBook

Find out about PingarBot.