Reuse of Fact Checks
On this page
- Identifying Claims
- An important caveat
- The authors and sources of a
Claim
- Multiple people making a claim
- Aside: Authors in social media sightings
- Repeated claims by the same person
- The same claim made by different people
- A
ClaimReview
with multiple claims by different people - How can related
Claim
s be linked to one another? - Additional linking between
Claim
,appearances
andClaimReview
for API responses - Google’s notes to fact checkers
- Recommendations
Producing a fact check takes time and effort. Full Fact are naturally keen to make the most of this work. One way to do that is to find repeated instances of the same Claim and then associate them with existing fact checks. This increases coverage of the content checked, without the need to redo the fact checking.
Full Fact already have a process in place to support this, by finding additional occurences of a claim in the media. But the team are looking for guidance on the correct way to enrich the ClaimReview
data with this information.
So, how can existing fact checks be linked to claims that are repeatedly made at different times and places, by the same or different people or organisations?
These notes consider this question from the following angles:
- Repeated claims by the same person
- Claims made by multiple people
- The same claim being made by different people
While the examples focus on people making claims, the conclusions and recommendations equally apply to organisations.
Also included are some brief notes on:
- how similar claims may be related to one another, and how this might be exposed via an API
- the need for reciprocal relationships between
Claim
,appearances
andClaimReview
for API results
Identifying Claims
Within the Full Fact profile of Claim Review a Claim
doesn’t current have an identifier.
In line with my general recommendation to add @id
property to assign URIs and link resources to API endpoints, I suggest that every Claim
should have a unique identifier, added using an @id
property and perhaps also an identifier
property.
While claims are currently presented in the context of a ClaimReview
(and the article within which they are embedded), from a data model perspective, a Claim
is a key resource. So they would benefit from being uniquely identified to support discovery, linking, etc.
An important caveat
The Schema.org definition of Claim
is clearly a work in progress.
The description of Claim
mentions a number of ways in which it might be used, for example:
- expressing the content of a claim using the
text
property - linking “well-known” claims together using
sameAs
and - summarising claims by adding a
name
.
The under-discussion MediaReview
proposal also includes an example of adding a description
rather than name
or text
to a Claim
.
But the documentation doesn’t go into any real detail about how any of this might be used in practice.
There is clearly more work to be done to flesh out the design. And this might invalidate some of the recommendations and thinking that follows.
As Dan Brickley notes here, the Schema.org team are prioritising attention on issues that relate to actual applications and data. This is backed up by my own experiences of engaging with the project. This creates an opportunity for Full Fact to take the lead by committing to specific patterns of data publication through use.
Obviously, there is also the perennial juggling act between how Google wants to consume Schema.org data in its applications, and how that model might be used in other contexts (e.g. for other fact checking use cases, or as the basis for building a Full Fact API).
The authors and sources of a Claim
The Google ClaimReview
guidance notes that the publishers of ClaimReview
data:
“…must clearly attribute the specific claim that you’re assessing to a distinct origin (separate from your website), whether it’s another website, public statement, social media, or other traceable source.”
In terms of the data mode, a Claim
is associated with a “distinct origin” using the appearance
and firstAppearance
properties.
So it’s a little surprising that these properties are only RECOMMENDED
rather than REQUIRED
in Google profile of the data. And also that, as noted previously, the Rich Result Tester does not warn if these properties are missing.
While the appearance
properties are used to link to an article, speech, video, or other CreativeWork
within which a Claim
is made or recorded, they don’t help us to specify the person or organisation who is making that claim.
This is because the person (or organisation) making a claim may not always be the author
of the work in which it appears.
It’s useful (and argubably, important) to distinguish between the author
of a Claim
and the author
of the CreativeWork
that is used as evidence of that claim.
For example, a blog post written by Dominic Cummings or a speech published on gov.uk by Boris Johnson might both contain claims. It would be correct to consider those individuals as the authors of those works and the author of the Claim.
A BBC video of Boris giving a speech, or a Guardian live blog that quoted from it, might both be valid sources to include as an appearance
. But in neither case is Boris the author of those works, he’s the author
of the Claim
.
To give an example in JSON-LD:
{
"@type": "Claim",
"@id": "...",
"appearance": [{
"@type": "CreativeWork",
"url": "https://bbc.co.uk/...",
"author": {
"@type": "Person",
"name": "Laura Kuenssberg",
"sameAs": "https://www.wikidata.org/wiki/Q6499096"
}
}],
"author": {
"@id": "...",
"@type": "Person",
"name": "Boris Johnson",
"sameAs": "https://www.wikidata.org/wiki/Q180589"
}
}
Multiple people making a claim
A claim might have multiple authors. For example a team of scientists publishing a paper, a group of individuals publishing an open letter, or a group of organisations publishing a white paper.
This might be an infrequent, but its worth noting how this might be captured within the current model.
A Claim
is a CreativeWork
and so can have multiple authors:
{
"@type": "Claim",
"author": [
{
"@type": "Person",
"name": "person 1"
},
{
"@type": "Person",
"name": "person 2"
}
]
}
Note: at the time of writing the Rich Results Tester does not generate any warnings or errors if the author
property of a Claim
is an array.
Aside: Authors in social media sightings
I previously noted some inconsistencies around how sightings in social media postings seem to be currently handled within the Full Fact data.
I suggested some changes to:
- better distinguish between the
url
used to reference a post and the link used assameAs
reference, to more clearly identify the account making the posting and the origin of the claim - and using the
description
property rather thanname
, when editorial policy states that an individual should not be named
The same recommendations apply to expressing the author
of a Claim
:
{
"@type": "Claim",
"@id": "...",
"appearance": [{
"@type": "CreativeWork",
"url": "https://twitter.com/jennyrickson/status/1384403401908314112",
"datePublished": "2021-04-20",
"author": {
"@type": "Person",
"sameAs": "https://twitter.com/jennyrickson"
"description": "Twitter user"
}
}],
"author": {
"@type": "Person",
"sameAs": "https://twitter.com/jennyrickson"
"description": "Twitter user"
}
}
Or, removing direct links to posts:
{
"@type": "Claim",
"@id": "...",
"appearance": [{
"@type": "CreativeWork",
"datePublished": "2021-04-20",
"author": {
"@type": "Person",
"sameAs": "https://twitter.com/jennyrickson"
"description": "Twitter user"
}
}],
"author": {
"@type": "Person",
"sameAs": "https://twitter.com/jennyrickson"
"description": "Twitter user"
}
}
As an aside, distinguishing between the author of a social media post and the author of a Claim might also be useful in some limited scenarios. For example, a journalist live tweeting an event, might publish a quote attributed to a speaker. Until a better source is available this could be considered a suitable appearance
. Although clearly there are editorial decisions to be made around whether Full Fact responds to that type of coverage.
Repeated claims by the same person
Based on the above discussion, its hopefully clear that if we find multiple sources of the same claim being made by the same person, then these can be recorded as additional sightings of an existing Claim
:
{
"@type": "Claim",
"@id": "...",
"description": "Voter fraud is a significant problem"
"appearance": [{
"@type": "CreativeWork",
"url": "https://gov.uk/...",
"description": "Transcript of a speech given at..."
"author": {
"@type": "Person",
"name": "Boris Johnson",
"sameAs": "https://www.wikidata.org/wiki/Q180589"
}
},
{
"@type": "CreativeWork",
"url": "https://www.telegraph.co.uk/...",
"description": "An article published in The Telegraph..."
"author": {
"@type": "Person",
"name": "Boris Johnson",
"sameAs": "https://www.wikidata.org/wiki/Q180589"
}
}
],
"author": {
"@id": "...",
"@type": "Person",
"name": "Boris Johnson",
"sameAs": "https://www.wikidata.org/wiki/Q180589"
}
}
When we consider claims as being authored then it also clarifies the role of appearance
and firstAppearance
.
The appearance
property provides an ordered list of sources in which a Claim from a specific person has been noted. A list of times that the same person, made the same claim, but in different contexts. The firstAppearance
is then the earliest time that person made a claim.
Whether a Claim can be considered to be the same will always be an editorial one.
When multiple sightings begin to accumulate for a Claim
, then it is reasonable to expect these to be included in the fact checking articles that mention them.
How these later revisions might be expressed in the web pages is an editorial and design decision. But one approach might be to list these additional appearances in a clearly marked section of the article. Much as Full Fact already clearly identifies other corrections and revisions.
The same claim made by different people
If a Claim
is always associated with an author
then its clear that if we find different people making the same claim, then we need to express that as a new Claim
.
While those two claims might share some attributes, e.g. the basic text of the claim (“Voter fraud is a growing issue”), they will vary based on who made them and where they appear. There might also be nuances in how the Claim is expressed (“Voter fraud is a growing issue”, “Voter fraud in the UK is on the rise”) that don’t impact the overall meaning, or the results of the fact checking, but which are important to capture and attribute correctly.
It also clearly wouldn’t be appropriate to just add the second person making a Claim as an additional author
of an existing Claim
. Firstly, this would make it unclear which appearances
(sightings) relate to which person. And secondly those people aren’t necessarily acting in concert: they may be independently making a claim at different times and in different contexts.
(If multiple politicians are repeatedly making a Claim on behalf of their party, then it may be more appropriate to express this as the Organization making the claim, rather than the individuals. Again, this is an editorial decision.)
We’ll return to how to relate together similar Claims later in these notes.
A ClaimReview
with multiple claims by different people
The most obvious way to update a ClaimReview
to incorporate the same claim being made by a different person, is to provide multiple values for the itemReviewed
property:
{
"@context": "http://schema.org",
"@type": "ClaimReview",
...
"itemReviewed": [
{
"@type": "Claim",
"@id": "...",
"description": "Voter fraud is a significant problem"
"appearance": [{
"@type": "CreativeWork",
"url": "https://www.telegraph.co.uk/...",
"author": {
"@type": "Person",
"name": "Boris Johnson",
"sameAs": "https://www.wikidata.org/wiki/Q180589"
}
}
],
"author": {
"@id": "...",
"@type": "Person",
"name": "Boris Johnson",
"sameAs": "https://www.wikidata.org/wiki/Q180589"
}
},
{
"@type": "Claim",
"@id": "...",
"description": "Voter fraud is a problem in the UK"
"appearance": [{
"@type": "CreativeWork",
"url": "https://www.telegraph.co.uk/...",
"author": {
"@type": "Person",
"name": "Jacob Rees-Mogg",
"sameAs": "https://www.wikidata.org/wiki/Q574980"
}
}
],
"author": {
"@id": "...",
"@type": "Person",
"name": "Jacob Rees-Mogg",
"sameAs": "https://www.wikidata.org/wiki/Q574980"
}
}
]
}
This approach fits within the existing design. Providing multiple values for a property seems to be generally accepted in how Schema.org is used in practice.
While the itemReviewed
property does state that it refers to “the item that is being reviewed/rated”, there is an open issue that suggests that itemReviewed
could be redefined to support reviews that cover multiple item. This was motivated by the above requirements. I would suggest pushing for that minor change in the core model.
As always there is a caveat about how well Google might support this in practice. Providing an array of Claims
as the value of itemReviewed
does not generate any warnings or errors in the Rich Results Tester at the time of writing. It also doesn’t appear that they currently rely on the itemReviewed
property in their search results so issues seem unlikely.
Having discounted using multiple values for author
, the only other alternative would be to generate a separate ClaimReview
for every Claim
that is checked, regardless of their similarity, who made them, etc. But that is unnecessarily verbose.
How can related Claim
s be linked to one another?
If we have the same Claim made by different people at different times, then how might we indicate that they are related to one another?
One answer to that question is that this association is covered by the fact check itself:
- the article either contains multiple
ClaimReview
objects, each one for different (but likely related) claims. E.g. as seen in this article on plastic straws - or, as we have explored above, the article checks the same claim made by different people, with each
Claim
being related to a singleClaimReview
object
But there are other ways in which we can consider claims to be related:
- By source (People/Organisation) - Claims with the same
author
, via their names and (more reliably) wikidata identifiers - By time - Claims appearing over times, based on the
datePublished
of a sighting (aCreativeWork
) or theClaimReview
- By where they are sighted - claims appearing in the same publication (or platform) based on the
url
andauthor
associated withCreativeWork
Some of these may be less interesting to Full Fact but might be useful ways to query and explore the data. E.g. to compile a list of fact checks during a specific moment in time, or a history of claims by an individual.
Using existing Schema.org markup, it is possible to extend this further to also add topics to a Claim
.
As a CreativeWork
, a Claim
may have an about
property, that refers to an entity or topic which it covers.
This property could be indicate the topic for a Claim
by referring to a DefinedTerm
as follows:
{
"@type": "Claim",
"@id": "...",
"description": "Voter fraud is a significant problem"
"appearance": [{
"@type": "CreativeWork",
"url": "https://gov.uk/...",
"author": {
"@type": "Person",
"name": "Boris Johnson",
"sameAs": "https://www.wikidata.org/wiki/Q180589"
}
}],
"author": {
"@id": "...",
"@type": "Person",
"name": "Boris Johnson",
"sameAs": "https://www.wikidata.org/wiki/Q180589"
},
"about": [
{
"@type": "DefinedTerm",
"name": "Voter fraud",
"sameAs": "https://www.wikidata.org/wiki/Q692209"
}
]
}
The above example tags the Claim
as being about the topic “Voter fraud”, and links that to the the relevant Wikidata entity. Multiple topics could be added as required.
Using the same DefinedTerm
across claims will make it easier to identify and find claims that are about the same topic.
The keywords
property has a somewhat similar use to about
. This is already being used on a few ClaimReview
instances by Full Fact to tag third-party fact checks. I would suggest using keywords
for more free-form tagging, and about
for linking to more controlled terms, e.g. from Wikidata.
Additional linking between Claim
, appearances
and ClaimReview
for API responses
The current design of ClaimReview
and Claim
is largely focused on annotating fact checks. So the model and documentation emphasises ClaimReview
as the primary entity in the model.
But, if we were building a RESTful API, that allows users to lookup or search for different types of resources, then a ClaimReview
isn’t necessarily the starting point. The API might provide data about a person or Claim
and the consumer will want to find related items.
From this perspective there are some missing reciprocal relationships in the model:
- a link from a
CreativeWork
to theClaim
that it contains, i.e. the inverse ofappearance
- a link from a
Claim
to theClaimReview
in which it is checked, i.e. the inverse ofitemReviewed
Depending on the scope and design of the API, similar relationships might be required to associate instances of People
, Organization
, and DefinedTerm
with works, claims and reviews.
I would suggest designing the API around an extended profile of Claim Review, using JSON-LD to give some consistency across how data is published, adding extra properties as needed.
Google’s notes to fact checkers
As usual we need to be mindful of Google’s requirements around fact check metadata and whether any of these relate to how repeated claims might be handled.
In the eligibility requirements they note that:
There must not be any mismatch between the structured data and page content … Your readers can easily identify the claims and checks in the body of the article … You must clearly attribute the specific claim that you’re assessing to a distinct origin (separate from your website), whether it’s another website, public statement, social media, or other traceable source.
And in the technical guidelines they note that:
A single page can host multiple ClaimReview elements, each for a separate claim.
And provide some additional guidance around multiple ClaimReview elements.
I don’t believe any of this guidance is at odds with the above conclusions and recommendations.
However, as noted earlier, where a Claim
is being associated with an existing ClaimReview
, then I think it will be important to ensure that the original page content is updated. When additional sightings are added, or when a new Claim by a different author is added to the page.
This will help to build confidence and full transparency around which appearances and whose claims are being assessed.
Recommendations
- Give unique identifiers to each
Claim
, via an@id
and possibly anidentifier
property - Use the
author
property ofClaim
to refer to the source of the Claim and theauthor
property of aCreativeWork
(appearance
) to refer to the author of a post or article. These may not be the same for a singleClaim
- Revisit handling of social media claims and sightings to ensure they’re consistent with the above
- Add repeated sightings of the same
Claim
by the same person or organisation as new values ofappearance
- Make the
itemReviewed
property multi-valued - When a different person makes a
Claim
that is covered by an existing fact check, then add that newClaim
to theitemReviewed
property for theClaimReview
- Consider adding tags/topics for claims to provide another way to query or navigate the data, using the
about
property and linking to Wikidata - Explore how to expose the full richness of the model via the API, providing mechanisms to look up Claims by person, organisation, time or topic
- Add additional custom properties to the data, to allow you to build a well-structured RESTful API around a consistent JSON-LD model