Matching Applicants
How could the same business applicant be identified across multiple datasets, and over time? How could we do this in new, or interesting ways?
Go to Challenge | 8 teams have entered this challenge.
Machine learning based entity resolution to the rescue
Databases somehow always end up with duplicate entries but we can solve that using machine learning based entity resolution (a.k.a record linkage, fuzzy matching, etc).
Entity resolution typical requires:
1) Deduplication (removal of exact copies of records)
2) Record Linkage (records that may reference the same business)
3) Canonicalization (ensuring data with more than one representation are in a standardised form)
Only steps 1 and 2 were addressed during this challenge of which out of 47404 records, 1920 unique businesses were identified using csvdedupe (https://github.com/dedupeio/csvdedupe)
Perhaps you can even use this during form filling and validation to reduce any further duplicates.
NB. Using Excel for step 1, and csvdedupe for step 2 which is simply a CLI program the only evidence of work is the training data generated by the program.
Go to Challenge | 8 teams have entered this challenge.
Go to Challenge | 5 teams have entered this challenge.