Blog

Learn About Our Meetup

4500+ Members

[P] entity resolution system for large-scale databases

Hello everyone,

I’d like to share some insights about a Wikimedia Foundation project I’ve been contributing to.

soweego is an entity resolution system that links the Wikidata knowledge base to large external databases through a set of supervised algorithms: https://soweego.readthedocs.io/

Specifically, we leveraged Bernoulli Naïve Bayes, Linear Support Vector Machines, Single-layer Perceptrons, and Multi-layer Perceptrons. As an interesting finding, models based on Single-layer Perceptrons are the ones that work best for our input datasets, namely Discogs, IMDb, and MusicBrainz.

soweego partners with Mix’n’match, which mainly deals with small catalogs. soweego is currently uploading 255 k confident identifiers to Wikidata, see its activity. 126 k medium-confident links are instead getting into Mix’n’match for curation.

The soweego team has also worked hard to address the following community requests:

  1. sync Wikidata to external databases and check them to spot inconsistencies in Wikidata;
  2. import new databases with reasonable effort.

If you like the project, please consider starring it on GitHub: https://github.com/Wikidata/soweego

submitted by /u/tupini07
[link] [comments]

Next Meetup

 

Days
:
Hours
:
Minutes
:
Seconds

 

Plug yourself into AI and don't miss a beat