Reuse of Fact Checks

Producing a fact check takes time and effort. Full Fact are naturally keen to make the most of this work. One way to do that is to find repeated instances of the same Claim and then associate them with existing fact checks. This increases coverage of the content checked, without the need to redo the fact checking.

Full Fact already have a process in place to support this, by finding additional occurences of a claim in the media. But the team are looking for guidance on the correct way to enrich the ClaimReview data with this information.

So, how can existing fact checks be linked to claims that are repeatedly made at different times and places, by the same or different people or organisations?

These notes consider this question from the following angles:

  • Repeated claims by the same person
  • Claims made by multiple people
  • The same claim being made by different people

While the examples focus on people making claims, the conclusions and recommendations equally apply to organisations.

Also included are some brief notes on:

  • how similar claims may be related to one another, and how this might be exposed via an API
  • the need for reciprocal relationships between Claim, appearances and ClaimReview for API results

Identifying Claims

Within the Full Fact profile of Claim Review a Claim doesn’t current have an identifier.

In line with my general recommendation to add @id property to assign URIs and link resources to API endpoints, I suggest that every Claim should have a unique identifier, added using an @id property and perhaps also an identifier property.

While claims are currently presented in the context of a ClaimReview (and the article within which they are embedded), from a data model perspective, a Claim is a key resource. So they would benefit from being uniquely identified to support discovery, linking, etc.

An important caveat

The Schema.org definition of Claim is clearly a work in progress.

The description of Claim mentions a number of ways in which it might be used, for example:

  • expressing the content of a claim using the text property
  • linking “well-known” claims together using sameAs and
  • summarising claims by adding a name.

The under-discussion MediaReview proposal also includes an example of adding a description rather than name or text to a Claim.

But the documentation doesn’t go into any real detail about how any of this might be used in practice.

There is clearly more work to be done to flesh out the design. And this might invalidate some of the recommendations and thinking that follows.

As Dan Brickley notes here, the Schema.org team are prioritising attention on issues that relate to actual applications and data. This is backed up by my own experiences of engaging with the project. This creates an opportunity for Full Fact to take the lead by committing to specific patterns of data publication through use.

Obviously, there is also the perennial juggling act between how Google wants to consume Schema.org data in its applications, and how that model might be used in other contexts (e.g. for other fact checking use cases, or as the basis for building a Full Fact API).

The authors and sources of a Claim

The Google ClaimReview guidance notes that the publishers of ClaimReview data:

“…must clearly attribute the specific claim that you’re assessing to a distinct origin (separate from your website), whether it’s another website, public statement, social media, or other traceable source.”

In terms of the data mode, a Claim is associated with a “distinct origin” using the appearance and firstAppearance properties.

So it’s a little surprising that these properties are only RECOMMENDED rather than REQUIRED in Google profile of the data. And also that, as noted previously, the Rich Result Tester does not warn if these properties are missing.

While the appearance properties are used to link to an article, speech, video, or other CreativeWork within which a Claim is made or recorded, they don’t help us to specify the person or organisation who is making that claim.

This is because the person (or organisation) making a claim may not always be the author of the work in which it appears.

It’s useful (and argubably, important) to distinguish between the author of a Claim and the author of the CreativeWork that is used as evidence of that claim.

For example, a blog post written by Dominic Cummings or a speech published on gov.uk by Boris Johnson might both contain claims. It would be correct to consider those individuals as the authors of those works and the author of the Claim.

A BBC video of Boris giving a speech, or a Guardian live blog that quoted from it, might both be valid sources to include as an appearance. But in neither case is Boris the author of those works, he’s the author of the Claim.

To give an example in JSON-LD:

{
  "@type": "Claim",
  "@id": "...",
  "appearance": [{
      "@type": "CreativeWork",
      "url": "https://bbc.co.uk/...",
      "author": {
        "@type": "Person",
        "name": "Laura Kuenssberg",
        "sameAs": "https://www.wikidata.org/wiki/Q6499096"
      }
  }],
  "author": {
    "@id": "...",
    "@type": "Person",
    "name": "Boris Johnson",
    "sameAs": "https://www.wikidata.org/wiki/Q180589"
  }
}

Multiple people making a claim

A claim might have multiple authors. For example a team of scientists publishing a paper, a group of individuals publishing an open letter, or a group of organisations publishing a white paper.

This might be an infrequent, but its worth noting how this might be captured within the current model.

A Claim is a CreativeWork and so can have multiple authors:

{
  "@type": "Claim",
  "author": [
    {
      "@type": "Person",
      "name": "person 1"
    },
    {
      "@type": "Person",
      "name": "person 2"
    }
  ]
}

Note: at the time of writing the Rich Results Tester does not generate any warnings or errors if the author property of a Claim is an array.

Aside: Authors in social media sightings

I previously noted some inconsistencies around how sightings in social media postings seem to be currently handled within the Full Fact data.

I suggested some changes to:

  • better distinguish between the url used to reference a post and the link used as sameAs reference, to more clearly identify the account making the posting and the origin of the claim
  • and using the description property rather than name, when editorial policy states that an individual should not be named

The same recommendations apply to expressing the author of a Claim:

{
  "@type": "Claim",
  "@id": "...",
  "appearance": [{
    "@type": "CreativeWork",
    "url": "https://twitter.com/jennyrickson/status/1384403401908314112",
    "datePublished": "2021-04-20",
    "author": {
      "@type": "Person",
      "sameAs": "https://twitter.com/jennyrickson"
      "description": "Twitter user"
    }
  }],
  "author": {
    "@type": "Person",
    "sameAs": "https://twitter.com/jennyrickson"
    "description": "Twitter user"
  }
}

Or, removing direct links to posts:

{
  "@type": "Claim",
  "@id": "...",
  "appearance": [{
    "@type": "CreativeWork",
    "datePublished": "2021-04-20",
    "author": {
      "@type": "Person",
      "sameAs": "https://twitter.com/jennyrickson"
      "description": "Twitter user"
    }
  }],
  "author": {
    "@type": "Person",
    "sameAs": "https://twitter.com/jennyrickson"
    "description": "Twitter user"
  }
}

As an aside, distinguishing between the author of a social media post and the author of a Claim might also be useful in some limited scenarios. For example, a journalist live tweeting an event, might publish a quote attributed to a speaker. Until a better source is available this could be considered a suitable appearance. Although clearly there are editorial decisions to be made around whether Full Fact responds to that type of coverage.

Repeated claims by the same person

Based on the above discussion, its hopefully clear that if we find multiple sources of the same claim being made by the same person, then these can be recorded as additional sightings of an existing Claim:

{
  "@type": "Claim",
  "@id": "...",
  "description": "Voter fraud is a significant problem"
  "appearance": [{
      "@type": "CreativeWork",
      "url": "https://gov.uk/...",
      "description": "Transcript of a speech given at..."
      "author": {
        "@type": "Person",
        "name": "Boris Johnson",
        "sameAs": "https://www.wikidata.org/wiki/Q180589"
      }
    },
    {
        "@type": "CreativeWork",
        "url": "https://www.telegraph.co.uk/...",
        "description": "An article published in The Telegraph..."
        "author": {
          "@type": "Person",
          "name": "Boris Johnson",
          "sameAs": "https://www.wikidata.org/wiki/Q180589"
        }
    }
  ],
  "author": {
    "@id": "...",
    "@type": "Person",
    "name": "Boris Johnson",
    "sameAs": "https://www.wikidata.org/wiki/Q180589"
  }
}

When we consider claims as being authored then it also clarifies the role of appearance and firstAppearance.

The appearance property provides an ordered list of sources in which a Claim from a specific person has been noted. A list of times that the same person, made the same claim, but in different contexts. The firstAppearance is then the earliest time that person made a claim.

Whether a Claim can be considered to be the same will always be an editorial one.

When multiple sightings begin to accumulate for a Claim, then it is reasonable to expect these to be included in the fact checking articles that mention them.

How these later revisions might be expressed in the web pages is an editorial and design decision. But one approach might be to list these additional appearances in a clearly marked section of the article. Much as Full Fact already clearly identifies other corrections and revisions.

The same claim made by different people

If a Claim is always associated with an author then its clear that if we find different people making the same claim, then we need to express that as a new Claim.

While those two claims might share some attributes, e.g. the basic text of the claim (“Voter fraud is a growing issue”), they will vary based on who made them and where they appear. There might also be nuances in how the Claim is expressed (“Voter fraud is a growing issue”, “Voter fraud in the UK is on the rise”) that don’t impact the overall meaning, or the results of the fact checking, but which are important to capture and attribute correctly.

It also clearly wouldn’t be appropriate to just add the second person making a Claim as an additional author of an existing Claim. Firstly, this would make it unclear which appearances (sightings) relate to which person. And secondly those people aren’t necessarily acting in concert: they may be independently making a claim at different times and in different contexts.

(If multiple politicians are repeatedly making a Claim on behalf of their party, then it may be more appropriate to express this as the Organization making the claim, rather than the individuals. Again, this is an editorial decision.)

We’ll return to how to relate together similar Claims later in these notes.

A ClaimReview with multiple claims by different people

The most obvious way to update a ClaimReview to incorporate the same claim being made by a different person, is to provide multiple values for the itemReviewed property:

{
  "@context": "http://schema.org",
  "@type": "ClaimReview",
  ...
  "itemReviewed": [
    {
      "@type": "Claim",
      "@id": "...",
      "description": "Voter fraud is a significant problem"
      "appearance": [{
            "@type": "CreativeWork",
            "url": "https://www.telegraph.co.uk/...",
            "author": {
              "@type": "Person",
              "name": "Boris Johnson",
              "sameAs": "https://www.wikidata.org/wiki/Q180589"
            }
        }
      ],
      "author": {
        "@id": "...",
        "@type": "Person",
        "name": "Boris Johnson",
        "sameAs": "https://www.wikidata.org/wiki/Q180589"
      }
    },
    {
      "@type": "Claim",
      "@id": "...",
      "description": "Voter fraud is a problem in the UK"
      "appearance": [{
            "@type": "CreativeWork",
            "url": "https://www.telegraph.co.uk/...",
            "author": {
              "@type": "Person",
              "name": "Jacob Rees-Mogg",
              "sameAs": "https://www.wikidata.org/wiki/Q574980"
            }
        }
      ],
      "author": {
        "@id": "...",
        "@type": "Person",
        "name": "Jacob Rees-Mogg",
        "sameAs": "https://www.wikidata.org/wiki/Q574980"
      }
    }
  ]
}

This approach fits within the existing design. Providing multiple values for a property seems to be generally accepted in how Schema.org is used in practice.

While the itemReviewed property does state that it refers to “the item that is being reviewed/rated”, there is an open issue that suggests that itemReviewed could be redefined to support reviews that cover multiple item. This was motivated by the above requirements. I would suggest pushing for that minor change in the core model.

As always there is a caveat about how well Google might support this in practice. Providing an array of Claims as the value of itemReviewed does not generate any warnings or errors in the Rich Results Tester at the time of writing. It also doesn’t appear that they currently rely on the itemReviewed property in their search results so issues seem unlikely.

Having discounted using multiple values for author, the only other alternative would be to generate a separate ClaimReview for every Claim that is checked, regardless of their similarity, who made them, etc. But that is unnecessarily verbose.

If we have the same Claim made by different people at different times, then how might we indicate that they are related to one another?

One answer to that question is that this association is covered by the fact check itself:

  • the article either contains multiple ClaimReview objects, each one for different (but likely related) claims. E.g. as seen in this article on plastic straws
  • or, as we have explored above, the article checks the same claim made by different people, with each Claim being related to a single ClaimReview object

But there are other ways in which we can consider claims to be related:

  • By source (People/Organisation) - Claims with the same author, via their names and (more reliably) wikidata identifiers
  • By time - Claims appearing over times, based on the datePublished of a sighting (a CreativeWork) or the ClaimReview
  • By where they are sighted - claims appearing in the same publication (or platform) based on the url and author associated with CreativeWork

Some of these may be less interesting to Full Fact but might be useful ways to query and explore the data. E.g. to compile a list of fact checks during a specific moment in time, or a history of claims by an individual.

Using existing Schema.org markup, it is possible to extend this further to also add topics to a Claim.

As a CreativeWork, a Claim may have an about property, that refers to an entity or topic which it covers.

This property could be indicate the topic for a Claim by referring to a DefinedTerm as follows:

{
  "@type": "Claim",
  "@id": "...",
  "description": "Voter fraud is a significant problem"
  "appearance": [{
      "@type": "CreativeWork",
      "url": "https://gov.uk/...",
      "author": {
        "@type": "Person",
        "name": "Boris Johnson",
        "sameAs": "https://www.wikidata.org/wiki/Q180589"
      }
  }],
  "author": {
    "@id": "...",
    "@type": "Person",
    "name": "Boris Johnson",
    "sameAs": "https://www.wikidata.org/wiki/Q180589"
  },
  "about": [
    {
      "@type": "DefinedTerm",
      "name": "Voter fraud",
      "sameAs": "https://www.wikidata.org/wiki/Q692209"
    }
  ]
}

The above example tags the Claim as being about the topic “Voter fraud”, and links that to the the relevant Wikidata entity. Multiple topics could be added as required.

Using the same DefinedTerm across claims will make it easier to identify and find claims that are about the same topic.

The keywords property has a somewhat similar use to about. This is already being used on a few ClaimReview instances by Full Fact to tag third-party fact checks. I would suggest using keywords for more free-form tagging, and about for linking to more controlled terms, e.g. from Wikidata.

Additional linking between Claim, appearances and ClaimReview for API responses

The current design of ClaimReview and Claim is largely focused on annotating fact checks. So the model and documentation emphasises ClaimReview as the primary entity in the model.

But, if we were building a RESTful API, that allows users to lookup or search for different types of resources, then a ClaimReview isn’t necessarily the starting point. The API might provide data about a person or Claim and the consumer will want to find related items.

From this perspective there are some missing reciprocal relationships in the model:

  • a link from a CreativeWork to the Claim that it contains, i.e. the inverse of appearance
  • a link from a Claim to the ClaimReview in which it is checked, i.e. the inverse of itemReviewed

Depending on the scope and design of the API, similar relationships might be required to associate instances of People, Organization, and DefinedTerm with works, claims and reviews.

I would suggest designing the API around an extended profile of Claim Review, using JSON-LD to give some consistency across how data is published, adding extra properties as needed.

Google’s notes to fact checkers

As usual we need to be mindful of Google’s requirements around fact check metadata and whether any of these relate to how repeated claims might be handled.

In the eligibility requirements they note that:

There must not be any mismatch between the structured data and page content … Your readers can easily identify the claims and checks in the body of the article … You must clearly attribute the specific claim that you’re assessing to a distinct origin (separate from your website), whether it’s another website, public statement, social media, or other traceable source.

And in the technical guidelines they note that:

A single page can host multiple ClaimReview elements, each for a separate claim.

And provide some additional guidance around multiple ClaimReview elements.

I don’t believe any of this guidance is at odds with the above conclusions and recommendations.

However, as noted earlier, where a Claim is being associated with an existing ClaimReview, then I think it will be important to ensure that the original page content is updated. When additional sightings are added, or when a new Claim by a different author is added to the page.

This will help to build confidence and full transparency around which appearances and whose claims are being assessed.

Recommendations

  • Give unique identifiers to each Claim, via an @id and possibly an identifier property
  • Use the author property of Claim to refer to the source of the Claim and the author property of a CreativeWork (appearance) to refer to the author of a post or article. These may not be the same for a single Claim
  • Revisit handling of social media claims and sightings to ensure they’re consistent with the above
  • Add repeated sightings of the same Claim by the same person or organisation as new values of appearance
  • Make the itemReviewed property multi-valued
  • When a different person makes a Claim that is covered by an existing fact check, then add that new Claim to the itemReviewed property for the ClaimReview
  • Consider adding tags/topics for claims to provide another way to query or navigate the data, using the about property and linking to Wikidata
  • Explore how to expose the full richness of the model via the API, providing mechanisms to look up Claims by person, organisation, time or topic
  • Add additional custom properties to the data, to allow you to build a well-structured RESTful API around a consistent JSON-LD model