Page MenuHomePhabricator

sameAs schema doesn't report dateModified if article has only one edit
Open, LowPublic

Description

When you visit the page that has only one edit, the sameAs schema doesn't include dateModified, as the article wasn't modified (only created)
Although the Google testing tools says that dateModified is recommended field but:

  • it also says that the dateModified should be more recent than datePublished
  • there is no mention what to do if there is no modification date (only creation date).

Please, verify what to do in this scenario, and

  • or add dateModified with same value as datePublished
  • leave a note, that in this cases dateModified is not necessary

Also, for getEarliestRevTime(), do we need to check that it exists? E.g.:

$firstRevisionTimestamp = $title->getEarliestRevTime();
if ( $firstRevisionTimestamp ) {
  $schema['datePublished'] = wfTimestamp( TS_ISO_8601, $firstRevisionTimestamp );
}

Event Timeline

ovasileva triaged this task as Medium priority.Nov 15 2018, 10:37 PM

Here's an odd example: https://pl.wikipedia.org/wiki/Shin_Seung-hun. When I'm logged in, I get dateModified. When I'm anonymous, I don't.

I believe this issue is occurring for all pages on Polish Wikipedia. /cc @pmiazga @ovasileva @Tbayer

I did some digging on this task during today's chores. Here's what I think I know:

  • We create the dateModified property using the server-local timestamp of the last revision (herein the "timestamp"; see here)
  • We might not have that timestamp?!
  • We get the timestamp from OutputPage#getRevisionTimestamp (see here)
  • The timestamp returned by that method is set by OutputPage#setRevisionTimestamp but there's no fallback if the value is never set
  • OutputPage#setRevisionTimestamp is invoked (in this context):
    • When there's a parser cache hit and the cached parser output has a timestamp (see here)
    • When there's a parser cache miss (see here)
      • The comment on that line doesn't make sense as WikiPage#getTimestamp might hit the DB to hydrate itself

AFAICT we're seeing the result of a parser cache hit where the cached parser output doesn't have a timestamp.

My recommendation would be to change this line to:

$out->getWikiPage()->getTimestamp()

Or reach out on wikitech about why the above circumstance may occur.

Jdlrobson lowered the priority of this task from Medium to Low.Jul 31 2019, 8:02 PM
ovasileva raised the priority of this task from Low to Medium.Oct 15 2019, 4:32 PM
ovasileva lowered the priority of this task from Medium to Low.Oct 22 2020, 3:34 PM