Why you need ontologies to automate records appraisal and classification

In the first of this three part blog series, records management expert Conni Christensen provides insights from her experience with information governance and auto-classification methodology

What is auto-classification?

Auto-classification is the process where documents are classified (tagged with metadata) by a machine i.e. a software tool. You could easily think that the machine can make informed decisions about classification just by reading the document. In reality, the classification is dependent on the knowledge that you build into the auto-classification engine. Classification which supports information governance is significantly different to classification for search. There is more complexity because there are more aspects to consider, such as:

  • users wanting to be able to capture and classify their documents so they can find, use and share them.
  • information professionals having to classify with metadata that governs access, data protection (i.e. GDPR), retention and disposal, all in accordance with contemporary standards and legislation.
  • ICT professionals wanting to be able to manage information infrastructure more effectively, and
  • the C suite wanting everyone to be more efficient at information management, spending less money on consultants and more time delivering products and services (but at the same time they don’t want to expose the business to unnecessary risks through poor governance practices).

Is it possible to achieve all this through auto-classification?

Yes it is… but we need to develop fit for purpose machine readable data models, such as ontologies, that convey the requisite knowledge into the auto-classification platform.

What are ontologies?

Ontologies are linked data models for describing a domain, that list the types of objects and their instances, the relationships that connect them, and the constraints on the ways in which objects and relationships can be combined. The term dictionary is used to refer to an electronic vocabulary or lexicon as used for example in spelling checkers. If dictionaries are arranged in a subtype-supertype hierarchy of concepts (or terms) then it is called a taxonomy. If it also contains other relations between the concepts, then it is called an ontology. Unlike File Plans, ontologies enable us to combine multiple concepts and define multiple types of relationships. In ontologies we can accommodate several taxonomies within the same scheme so you can address the needs of all stakeholders. And because they are built for machine-application scale is not an issue.

Ontologies hold the key to automated appraisal

Ontologies enable the automation of records appraisal. If we extract the knowledge built into contemporary disposal authorities we can create data models which enable the auto-classifier to recognise significant concepts, then tag documents with the appropriate disposal class.

Are ontologies difficult to build?

Not in my experience. In fact, I find ontologies far easier to build than file plans because the logic is more explicit. You start by defining your data model (ie what metadata you want to tag with) and the relationships between your concepts. Then you harvest the concepts (terms) from your existing controls – metadata libraries, business classification schemes and disposal authorities – and sort them into logical groups. Lastly you connect the concepts together with relationships. Often the type of relationships you build depend on functionality available within the target system. For example, DiscoveryOne is designed to use ontologies to auto-classify and appraise content.

Ontology tools

It’s easier to build ontologies with purpose built tools. We used a.k.a.® software for building the ontology and DiscoveryOne auto-classification platform to apply the ontology and tag content in SharePoint library. Click the links to find out more about these technologies. Conni Christensen founded Synercon in 1998 and is the designer of a.k.a.® information governance software. She has more than twenty years’ experience in records and information management, business consulting, training and software development. For many years, Conni has worked across the globe as a highly sought trainer, speaker and presenter.

Find out about PingarBot.