Basic Usage of Claim Review

A core requirement for this piece of work is to ensure that any use of the Claim Review markup remains valid with respect to Google’s expectations. It is essential that Full Fact’s work remains visible in Google (and other search engine) results.

With that in mind its useful to initially confirm what requirements Google have documented before looking at Full Fact’s current usage of the Claim Review model and any potential extensions.

Google’s Requirements

There has always been a separation between Schema.org and Google’s use of structured data. Schema.org, as a standard data model covers a wide range of different areas not all of which Google are interested in using in their products.

Of those areas of the model that do intersect with their uses, they often use it in a limited way and in limited contexts.

What are Google’s requirements around Claim Review?

According to the Fact Check documentation there are a number of required and recommended properties

EntityPropertyStatusValue
ClaimReviewauthorRECOMMENDEDOrganization
ClaimReviewclaimReviewedREQUIREDText
ClaimReviewdatePublishedRECOMMENDEDDateTime
ClaimReviewitemReviewedRECOMMENDEDClaim
ClaimReviewreviewRatingREQUIREDRating
ClaimReviewurlREQUIREDURL
ClaimappearanceRECOMMENDEDURL or CreativeWork
ClaimauthorRECOMMENDEDOrganization or Person
ClaimdatePublishedRECOMMENDEDDateTime
ClaimfirstAppearanceRECOMMENDEDURL or CreativeWork
RatingalternateNameREQUIREDText
RatingbestRatingRECOMMENDEDNumber
RatingnameRECOMMENDEDText
RatingratingValueRECOMMENDEDNumber
RatingworstRatingRECOMMENDEDNumber

No other properties are documented as being REQUIRED or RECOMMENDED. A number of these properties have extra recommendations or constraints that are described in the Google documentation. E.g. the claimReviewed text should be ideally less that 75 characters.

Some aspects are under-specified, e.g. the cardinality of values is not stated. In some cases both single values and arrays of values appear to be legal (e.g. for an appearance). As are varying degrees of precision for dates and times.

When publishing data using JSON-LD there are also some other contraints. Specifically Google only processes JSON-LD that declares that is uses the Schema.org JSON-LD @context. (And only that context, even though multiple contexts are legal JSON-LD).

The “root” item in the JSON-LD document must also have a @type property that matches one of the types in the Google Structured Data Documentation, e.g. ClaimReview.

Based on the above, we can create and test some JSON-LD documents:

  • minimal-valid.json is valid according to the Rich Results Tester checker, but warns about the missing RECOMMENDED properties of ClaimReview.
  • minimal-recommended.jsonld conforms to the documented REQUIRED and RECOMMENDED properties and is considered valid, with no warnings by the Rich Results Tester.

However the Tester is also tolerant of some changes to minimal-recommended.jsonld document without generating additional warnings. For example, some quick testing of variations illustrated that:

  • the Tester accepts arrays of appearance as well as single items. While this property might have multiple values, it does not enforce that the value is always an array
  • the Tester is happy with Date for datePublished which is Schema.org says is permitted, but the Google documentation says should be DateTime
  • the Tester does not warn that either appearance or firstAppearance are RECOMMENDED, if either or both are missing
  • the Tester does not warn about the RECOMMENDED properties of Review, if they are missing
  • the Tester does however flag up when some recommended properties are missing, e.g datePublished

This highlights that the Rich Results Tester is using a slightly different set of criteria than what is documented. This is obviously not ideal. There is nothing in the Tester that contradicts the documentation, it just seems that in practice they are more tolerant of variations.

The Full Fact Profile: Full Fact’s use of Claim Review

Full Fact are using their own “profile” of the public Schema.org specification. It conforms to Google’s recommended usage of the Schema.org Claim Review model but includes extra attributes, relationships and types.

Note: the following may not be exhaustive and is based on inspecting a small number of examples. REQUIRED here means “present in all of the examples inspected”. So the data might reflect how templates/data generators work rather than what is stored in the database.

EntityPropertyStatusValueNotes
ClaimReviewauthorREQUIREDOrganizationA richer model than recommended by Google, includes sameAs links and Full Fact logo
ClaimReviewclaimReviewedREQUIREDText
ClaimReviewdateModifiedREQUIREDDateAlways provided, by default has same value as datePublished?
ClaimReviewdatePublishedREQUIREDDate
ClaimReviewdescriptionREQUIREDTextA short description of the Claim Review. Same text is used for social media cards. Different to claimReviewed
ClaimReviewidentifierREQUIREDTextA UUID
ClaimReviewitemReviewedREQUIREDClaim
ClaimReviewreviewBodyREQUIREDText
ClaimReviewreviewRatingREQUIREDRating
ClaimReviewurlREQUIREDURL
ClaimappearanceREQUIREDCreativeWorkAlways an array. CreativeWork always has a url, datePublished and an author
ClaimauthorREQUIREDOrganizationAlways an Organization. This is an known issue.
ClaimfirstAppearanceREQUIREDCreativeWorkAlways same value as in appearance?
ClaimdatePublishedREQUIREDDate
RatingalternateNameREQUIREDTextIn FF case this is is always same as the reviewBody (a summary of the fact check) as they don’t use a tiered rating.
RatingbestRatingREQUIREDNumberAlways null
RatingratingValueREQUIREDNumberAlways null
RatingworstRatingREQUIREDNumberAlways null

The key differences with Google’s profile are:

  • Inclusion of the reviewBody property from Schema.org, which better reflects FF needs
  • Use of alternateName rather than name, as Full Fact do not assign ratings
  • Inclusion of additional properties from other parts of Schema.org, e.g. description, use of sameAs, logo, etc
  • All properties are REQUIRED (or are included as standard)
  • Restricted values for some types, e.g. appearance is always an array, and always contains CreativeWork and not URL. Use of Date rather than DateTime

Questions relating to the current Claim Review model

What is the relationship between appearance and firstAppearance?

Google recommend the use of both appearance and firstAppearance but don’t explain the benefits of doing that. Neither are REQUIRED and the Rich Text Tester does not seem to warn about either property if missing.

The Schema.org documentation does not provide much in the way of additional context, just that firstAppearance is the “first known” occurence of a Claim in some CreativeWork.

This could be interpreted in two ways. Either as “the first time this Claim was encountered” by the fact checker. Or “the earliest occurence known to the fact checker”. In the first case the value would always be fixed, whereas in the latter the value might be changed as new sightings are found.

In general, the former interpretation seems to have little value in terms of wider reuse.

My assumption is that the two properties are provided to allow organisations to just indicate a single specific sighting for this review (appearance) alongside the earliest sighting known to them, without requiring that they also list all other known sightings and their dates.

Full Fact’s profile of ClaimReview is richer. All sightings are always listed within the appearance property. And all sightings have a datePublished. This means that the firstAppearance property is simply the earliest sighting, ordering by datePublished. So in this case it is not adding much value.

My suggestion would be to:

  • order the appearance array by datePublished
  • omit the firstAppearance property as its unnecessary and is currently unused by Google

How much detail to include in the claimReviewed property?

According to the Google documentation, the claimReviewed property is intended to be

“a short summary of the claim being evaluated”

Schema.org defines it very slightly differently:

“A short summary of the specific claims reviewed in a ClaimReview.”.

One question that came up in discussion is how much text to include?

Google currently recommends that is should contain less than 75 characters to display well on mobile devices. Full Fact don’t seem to specify a character limit, the value is just a single sentence expressing the claim.

This usage seems to fit wells with how Full Fact publish their fact checks, which clearly display the claims and verdicts alongside the more detailed review. The claim might be a direct quote from the article or a summary of the key claim being made if a suitable quote is not available.

Inclusion of too much content, e.g. larger quotes or extracts from the article, will make the property less useful for data users who are likely to want a reasonably concise version of a claim for display purposes.

The editorial choices that FF put into choosing the quote or creating the summary of the claim adds value to the data.

Longer extracts might also run the risk of encountering issues of copyright. While “fair use” will likely apply in most contexts, there may be issues with also distributing the content via an API for reuse in other contexts.

If there was a need to include more detail about the claim, then an alternate approach to adding more content to the claimReviewed property is to add a text property to the Claim:

{
	"@context": "http://schema.org",
	"@type": "ClaimReview",
	"claimReviewed": "Nearly 5 billion plastic straws are currently used in the UK each year.",
	"itemReviewed": {
		"@type": "Claim",
		"text": "The Guardian published a claim that nearly 5 billion plastic straws are currently used in the UK each year."
		"appearance": [{
			"@type": "CreativeWork",
			"url": "https://www.theguardian.com/environment/2019/may/22/england-plastic-straws-ban",
			"datePublished": "2019-05-22",
			"author": {
				"@type": "Organization",
				"name": "Guardian"
			}
		}],
		"datePublished": "2019-05-22",
		"author": {
			"@type": "Organization",
			"name": "Guardian"
		}
	}
}

The Schema.org documentation for Claim suggests that the text property can be used to provide a summary of the content, providing “enough contextual information to minimize the risk of ambiguity or inclarity”.

In more general use across Schema.org the text property may be used to include the full text of a CreativeWork.

Recommendations:

  • Continue to keep claimReviewed concise to support display of Claims by reusers. The current approach adds value to the data. Reusers wanting more of the source text can find and index this via the provided url of the CreativeWork
  • Consider using the text property to add a longer description of a Claim or excerpt from an article, if required

Extending the Full Fact Profile

Having looked at the current profile of Claim Review used by Google and Full Fact, we can move on to think about how to extend those profiles.

There are several different ways that this can happen:

  • using a broader subset of Schema.org – Schema.org defines other attributes and relationships that could be added to the Full Fact Profile. For example ClaimReview and Claim both extend other classes (Review, CreativeWork, Thing) whose properties could be used. Richer descriptions of sources (CreativeWork) and publishers (Organization, Person) of claims might also be included by adding additional properties
  • using an extended subset of Schema.org – new properties and relationships could be defined by Full Fact as a means of describing information not already covered by Schema.org. These might draw on other aspects of Schema.org, as required.

While in both cases it will be helpful for Full Fact to document the profile they are using, it will be essential in the latter case to ensure that any new vocabulary is properly defined.

An extended subset would focus on decorating the existing Claim Review types with more properties, helping to ensure that the data remains acceptable to Google. It would not change their relationships or basic properties. Any new types would be limited to custom properties.

Full Fact also have the option to continue to expose only the core Claim Review metadata on their web pages (i.e. the existing Full Fact Profile), while offering a richer view of that information via the API. Targeting the data exposed to Google would give further confidence that search indexing would not be impacted by any extensions.

Embedded metadata in web pages is largely to support search indexing and browser based tooling (e.g. bookmarklets, plugins, etc), Consumers looking to harvest and use FF data in bulk would be directed to use the API rather than crawling and indexing the website content.