
Outdated Benchmarks in Entity Linking? Why We Should Update Them
Entity Linking (EL) is a core task in natural language processing. It involves connecting terms or names in a text—so-called mentions—to the correct entries in a knowledge base such as Wikipedia or Wikidata. While the idea is simple, the execution is often challenging: a term like “Apple” can refer to either the tech company or the fruit, depending on context.
To develop and compare EL models, researchers typically rely on standardized benchmark datasets such as AIDA-CoNLL or TAC KBP. These datasets contain pre-annotated entity links and offer a consistent framework for fair evaluation. However, there’s a significant issue that’s often overlooked: many of these benchmarks are outdated. The underlying Wikipedia articles may have been renamed, merged, or deleted—meaning that models are being trained and tested on references that no longer exist in their original form.
This isn’t just a minor inconvenience—it can seriously distort evaluation results. A model that correctly links a mention to the current version of an entity might still be marked as incorrect simply because the benchmark is stuck on an outdated label.
In my academic project, I’m developing a tool that addresses this issue: it automatically detects when an entity reference in a benchmark is outdated and updates the link to a valid, current version—either as of today or as it existed at a specific historical timestamp. This enables fair and up-to-date evaluation of EL models.