Automated Subject Classification
The matchmaker is a prototype framework for semi-automated (interactive) label matching.
The intended use-case scenario is focused on the task of matching labels extracted from text (noun phrases represented as bags of words) against the most relevant and semantically related labels of SKOS concepts from a given SKOS taxonomy, in order to annotate the text with SKOS concepts and/or extend the SKOS-based knowledge graph with new concepts originating in unstructured data.
The framework continuosly proposes likely matches (using fuzzy matching, semantic distance, and taxonomy neighbourhood information), and learns from the user's feedback, thus improving the quality and efficiency of the mapping process over time.
You can see some samples of the generated mappings by following the "
Tools and data
Key components involved:
- WordNet (RDF) vocabulary and WordNet-based semantic similarity service,
- WEKA machine learning library.
Example data includes:
- noun phrases extracted from a sample of DBLP data using NLP GATE library,
- the ACM Computing Classification System SKOS taxonomy.
- More info avilable on GitHub.