Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[P] entity resolution system for large-scale databases

Hello everyone,

I’d like to share some insights about a Wikimedia Foundation project I’ve been contributing to.

soweego is an entity resolution system that links the Wikidata knowledge base to large external databases through a set of supervised algorithms: https://soweego.readthedocs.io/

Specifically, we leveraged Bernoulli Naïve Bayes, Linear Support Vector Machines, Single-layer Perceptrons, and Multi-layer Perceptrons. As an interesting finding, models based on Single-layer Perceptrons are the ones that work best for our input datasets, namely Discogs, IMDb, and MusicBrainz.

soweego partners with Mix’n’match, which mainly deals with small catalogs. soweego is currently uploading 255 k confident identifiers to Wikidata, see its activity. 126 k medium-confident links are instead getting into Mix’n’match for curation.

The soweego team has also worked hard to address the following community requests:

  1. sync Wikidata to external databases and check them to spot inconsistencies in Wikidata;
  2. import new databases with reasonable effort.

If you like the project, please consider starring it on GitHub: https://github.com/Wikidata/soweego

submitted by /u/tupini07
[link] [comments]