PROVING GROUND
about academic work proving ground  

Automated Subject Classification

The matchmaker is a prototype framework for semi-automated (interactive) label matching.

The intended use-case scenario is focused on the task of matching labels extracted from text (noun phrases represented as bags of words) against the most relevant and semantically related labels of SKOS concepts from a given SKOS taxonomy, in order to annotate the text with SKOS concepts and/or extend the SKOS-based knowledge graph with new concepts originating in unstructured data.

The framework continuosly proposes likely matches (using fuzzy matching, semantic distance, and taxonomy neighbourhood information), and learns from the user's feedback, thus improving the quality and efficiency of the mapping process over time.

See results

You can see some samples of the generated mappings by following the "ccs" links, included under some of my publications.

Tools and data

Key components involved:

  • WordNet (RDF) vocabulary and WordNet-based semantic similarity service,
  • WEKA machine learning library.

Example data includes:

Additional references

  • More info avilable on GitHub.