Publication
In this paper, we delve into the crucial role of constraints in maintaining data integrity in knowledge graphs with a specific focus on Wikidata, one of the most extensive collaboratively maintained open data knowledge graphs on the Web.
The World Wide Web Consortium (W3C) recommends the Shapes Constraint Language (SHACL) as the constraint language for validating Knowledge Graphs, which comes in two different levels of expressivity, SHACL-Core, as well as SHACL-SPARQL. Despite the availability of SHACL, Wikidata currently represents its property constraints through its own RDF data model, which relies on Wikidata’s specific reification mechanism based on authoritative namespaces, and – partially ambiguous – natural language definitions.
In the present paper, we investigate whether and how the semantics of Wikidata property constraints, can be formalized using SHACL-Core, SHACL-SPARQL, as well as directly as SPARQL queries.
While the expressivity of SHACL-Core turns out to be insufficient for expressing all Wikidata property constraint types, we present SPARQL queries to identify violations for all 32 current Wikidata constraint types.
We compare the semantics of this unambiguous SPARQL formalization with Wikidata’s violation reporting system and discuss limitations in terms of evaluation via Wikidata’s public SPARQL query endpoint, due to its current scalability.
Our study, on the one hand, sheds light on the unique characteristics of constraints defined by the Wikidata community, in order to improve the quality and accuracy of data in this collaborative knowledge graph. On the other hand, as a “byproduct”, our formalization extends existing benchmarks for both SHACL and SPARQL with a challenging, large-scale real-world use case.
N. Ferranti, J.F. De Souza, S. Ahmetaj, A. Polleres, Formalizing and validating Wikidata’s property constraints using SHACL and SPARQL, Semantic Web (2024), DOI 10.3233/SW-243611.
Related
Signup
Cookie | Duration | Description |
---|---|---|
cookielawinfo-checkbox-analytics | 1 year | Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Analytics" category. |
cookielawinfo-checkbox-functional | 1 year | The GDPR Cookie Consent plugin sets the cookie to record the user consent for the cookies in the category "Functional". |
cookielawinfo-checkbox-necessary | 1 year | Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Necessary" category. |
CookieLawInfoConsent | 1 year | CookieYes sets this cookie to record the default button state of the corresponding category and the status of CCPA. It works only in coordination with the primary cookie. |
PHPSESSID | session | This cookie is native to PHP applications. The cookie stores and identifies a user's unique session ID to manage user sessions on the website. The cookie is a session cookie and will be deleted when all the browser windows are closed. |
viewed_cookie_policy | 1 year | The GDPR Cookie Consent plugin sets the cookie to store whether or not the user has consented to use cookies. It does not store any personal data. |
Cookie | Duration | Description |
---|---|---|
mec_cart | 1 month | Provides functionality for our ticket shop |
VISITOR_INFO1_LIVE | 6 months | YouTube sets this cookie to measure bandwidth, determining whether the user gets the new or old player interface. |
VISITOR_PRIVACY_METADATA | 6 months | YouTube sets this cookie to store the user's cookie consent state for the current domain. |
YSC | session | Youtube sets this cookie to track the views of embedded videos on Youtube pages. |
yt-remote-connected-devices | never | YouTube sets this cookie to store the user's video preferences using embedded YouTube videos. |
yt-remote-device-id | never | YouTube sets this cookie to store the user's video preferences using embedded YouTube videos. |
yt.innertube::nextId | never | YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen. |
yt.innertube::requests | never | YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen. |
Cookie | Duration | Description |
---|---|---|
_ga | 1 year | Google Analytics sets this cookie to calculate visitor, session and campaign data and track site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognise unique visitors. |
_ga_* | 1 year | Google Analytics sets this cookie to store and count page views. |
_gat_gtag_UA_* | 1 min | Google Analytics sets this cookie to store a unique user ID. |
_gid | 1 day | Google Analytics sets this cookie to store information on how visitors use a website while also creating an analytics report of the website's performance. Some of the collected data includes the number of visitors, their source, and the pages they visit anonymously. |