From Manual Creation to Machine Learning

Information Management

Do you struggle with organising and categorising your business documents? With the majority of corporate data being unstructured, it can be challenging to keep track of sensitive information, contracts, and other documents. This is where automatic classification and categorisation come in.
In this blog, we'll focus on the process of taxonomy creation and administration, which is an essential aspect of document categorisation. Taxonomy refers to the practice of classifying documents based on specific features in the text. Taxonomies are knowledge management structures used by businesses to categorise their documents. There are two ways to create a taxonomy: manually, using solutions such as the Pingar Taxonomy Editor, or by mirroring a file structure, using the Pingar Taxonomy Trainer (enabling machine learning).

Manual Taxonomy Creation

Creating a taxonomy requires multiple subject matter experts from within an organisation to develop a knowledge structure and taxonomy expertise to ensure the correct term relationships. The creation of a taxonomy compared to an ontology is a much faster process. However, the effectiveness of any classification scheme is a direct reflection of the expertise of the team creating it. To optimise the effectiveness of a taxonomy, companies often use machine learning tools.

Taxonomy Editor and Trainer

The Taxonomy Editor is part of the DiscoveryOne suite, which allows you to create or edit taxonomies, and prepare them for document Categorisation. When choosing what text-features to use for Categorisation, the category name and any alternative labels you have are a good place to start. However, don't expect these to do all the work. Labels are often too general and can match text in documents belonging to multiple categories. This is where the Taxonomy Editor allows you to enhance your categories using rules.

On the other hand, if you don't have the taxonomy expertise or need categorisation immediately, the Taxonomy Trainer is the right option. It uses machine learning to analyse sample documents and identify the most common patterns among them. Once the patterns are identified and analysed, DiscoveryOne can use this knowledge to start immediate categorisation.

Artificial Intelligence in Taxonomy Trainer

DiscoveryOne uses machine learning algorithms in conjunction with a taxonomy. Each leaf node of the taxonomy has a set of rules for classification. The machine learning algorithms automatically identify and weight rules for each group. Machine learning in the Taxonomy Trainer is a powerful and fast way to start categorisation at a large scale.

In conclusion, taxonomies and categorisation are vital for any business dealing with unstructured data. The creation and administration of a taxonomy require careful planning and the right tools. With Pingar's DiscoveryOne suite, you can easily create and administer your taxonomies, enhance your categories using rules, and use machine learning to start categorisation immediately.