Page MenuHomePhabricator

QA page schemas
Closed, ResolvedPublic

Description

We've made a significant change to add article schemas to many pages across wikis. Given the scope of this change, the impact possible for third-parties, and other factors, it is worth investing in additional team QA explicitly. This task tracks that work.

Per T208763, schemas have been enabled for 25% of pages. Verify:

  • Approximately 75% of main namespace pages are unaffected and do not have the schema.
  • Schemas are enabled for bucketed main namespace pages
  • Schemas are enabled even the main page (if bucketed).
  • Schemas are not enabled for other namespaces such as talk and user.
  • Schemas are understood by https://search.google.com/structured-data/testing-tool/u/0/. When all data is available for a given page, there should be no errors reported. When data is unavailable, such as short description or page image, the linkage should still be presented but with missing data omitted.
  • Schema data linked appears correct for pages in the new treatment. E.g., the headline should match the associated Q item's short description and the last modified and first published timestamps should match the page.
  • Messages from T207790 are used in the data.
  • No new logstash or client errors occur.
  • If an article image is referenced, it matches the page image (show under page info and the og:image meta HTML tag property).
  • The data is valid JSON-LD as verified by https://json-ld.org/playground-dev/.
  • The data is linked properly as verified by http://linter.structured-data.org. E.g. http://linter.structured-data.org/?url=https:%2F%2Fde.wikipedia.org%2Fapi%2Frest_v1%2Fpage%2Fhtml%2FDouglas_Adams.
  • At least 5 examples of the new treatment from the beta cluster are recorded as comments on this task (see examples below for format expected).
  • The new HTML script tag appears towards the bottom of the page, not the top and otherwise doesn't alter the HTML.
  • Try to think of other things to check or try to break it locally.
  • All of the above are tested on both en and non-en beta clusters. E.g., https://simple.wikipedia.beta.wmflabs.org/wiki/Main_Page or https://de.wikipedia.beta.wmflabs.org/wiki/Main_Page.
  • All of the above are tested on both mobile and non-mobile sites.

Examples of the new treatment are:

<script type="application/ld+json">{"@context":"https:\/\/schema.org","@type":"Article","name":"Kitten","url":"https:\/\/en.wikipedia.org\/wiki\/Kitten","sameAs":"https:\/\/www.wikidata.org\/entity\/Q147","mainEntity":"https:\/\/www.wikidata.org\/entity\/Q147","author":{"@type":"Organization","name":"Contributors to Wikimedia projects"},"publisher":{"@type":"Organization","name":"Wikimedia Foundation, Inc.","logo":{"@type":"ImageObject","url":"https:\/\/www.wikimedia.org\/static\/images\/wmf-hor-googpub.png"}},"datePublished":"2002-07-31T13:37:08Z","dateModified":"2018-10-08T14:18:31Z","image":"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/0\/06\/Kitten_in_Rizal_Park%2C_Manila.jpg\/1200px-Kitten_in_Rizal_Park%2C_Manila.jpg","headline":"young of a cat"}</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Article",
  "name": "Douglas Adams",
  "url": "https://de.wikipedia.org/wiki/Douglas_Adams",
  "sameAs": "https://www.wikidata.org/entity/Q42",
  "mainEntity": "https://www.wikidata.org/entity/Q42",
  "author": {
    "@type": "Organization",
    "name": "Contributors to Wikimedia projects"
  },
  "publisher": {
    "@type": "Organization",
    "name": "Wikimedia Foundation, Inc.",
    "logo": {
      "@type": "ImageObject",
      "url": "https://www.wikimedia.org/static/images/wmf-hor-googpub.png"
    }
  },
  "datePublished": "2002-05-27T18:26:23Z",
  "dateModified": "2018-09-28T20:16:12Z",
  "image": "https://upload.wikimedia.org/wikipedia/commons/c/c0/Douglas_adams_portrait_cropped.jpg",
  "headline": "British author and humorist (1952–2001)"
}
</script>

Acceptance criteria

  • A point person / hero / steward has taken ownership of the overall testing
  • The point person has ensured that several people (other than themselves) have looked at the task
  • The point person should tick off all the test steps above that have been adequately tested
  • All test steps related to the schema being enabled have been performed
  • We have confirmed that no errors are occurring in logstash
  • We have confirmed that the A/B sampling is working correctly.

Event Timeline

Restricted Application changed the subtype of this task from "Deadline" to "Task". · View Herald TranscriptNov 5 2018, 7:49 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Guessing this should be a "spike" type card with a timebox. How long do we want to spend QAing this @ovasileva (given it's likely going to be developers doing this)

ovasileva raised the priority of this task from Medium to High.Nov 5 2018, 10:33 PM

Guessing this should be a "spike" type card with a timebox. How long do we want to spend QAing this @ovasileva (given it's likely going to be developers doing this)

@Jdlrobson - Would this be a rough estimate of checking the things on the list? Or do you mean that we don't necessarily need to look at all of them?

The more testing we do the better, but we should set a timebox to make sure we don't spend too long looking at QA. The timebox would be how much time do we want to spend testing this before we feel ready to ship it.

The linter for World article (http://linter.structured-data.org/?url=https:%2F%2Fen.wikipedia.beta.wmflabs.org%2Fwiki%2FWorld) returns an error message:

property og:image: Object "https://upload.wikimedia.org/wikipedia/commons/thumb/9/97/The_Earth_seen_from_Apollo_17.jpg/1200px-The_Earth_seen_from_Apollo_17.jpg"@en not compatible with range (ogc:url)

The error is related to the og:image tag, not to the schema itself, but I found that the og:image and the schema image point to image thumbnail (https://upload.wikimedia.org/wikipedia/commons/thumb/9/97/The_Earth_seen_from_Apollo_17.jpg/1200px-The_Earth_seen_from_Apollo_17.jpg), when the Schema points to the image itself (https://upload.wikimedia.org/wikipedia/commons/9/97/The_Earth_seen_from_Apollo_17.jpg).

I think it is a correct behavior, I just wanted to highlight it.

An example article for dewiki on betacluster - https://de.wikipedia.beta.wmflabs.org/wiki/Erde
I couldn't find any article on simplewiki that has the schema .

Just to record some of my own testing, these are some pages I'm checking out.

Not sampled or control (document.querySelector('[type*="ld+json"]')):

https://en.m.wikipedia.beta.wmflabs.org/wiki/Main_Page
https://en.m.wikipedia.beta.wmflabs.org/wiki/Barack_Obama
https://en.m.wikipedia.beta.wmflabs.org/wiki/Vice_President_of_the_United_States
https://en.m.wikipedia.beta.wmflabs.org/wiki/President_of_the_United_States
https://en.m.wikipedia.beta.wmflabs.org/wiki/Jimmy_Jam_and_Terry_Lewis
https://en.m.wikipedia.beta.wmflabs.org/wiki/Ronnie_and_Donnie_Galyon
https://en.m.wikipedia.beta.wmflabs.org/wiki/Genova_%26_Dimitrov
https://en.m.wikipedia.beta.wmflabs.org/wiki/Bruce_Fogle
https://en.m.wikipedia.beta.wmflabs.org/wiki/Giannini_sextuplets
https://en.m.wikipedia.beta.wmflabs.org/wiki/Anthony_Field
https://en.m.wikipedia.beta.wmflabs.org/wiki/Prince_Felix_of_Denmark
https://en.m.wikipedia.beta.wmflabs.org/wiki/Gundecha_Brothers
https://en.m.wikipedia.beta.wmflabs.org/wiki/John_Duffy_and_David_Mulcahy
https://en.m.wikipedia.beta.wmflabs.org/wiki/Brian_Draper_and_Torey_Adamcik
https://en.m.wikipedia.beta.wmflabs.org/wiki/Abigail_and_Brittany_Hensel
https://en.m.wikipedia.beta.wmflabs.org/wiki/Doggie_(artist)
https://en.m.wikipedia.beta.wmflabs.org/wiki/Timeline_of_the_presidency_of_Barack_Obama_(2010)
https://en.m.wikipedia.beta.wmflabs.org/wiki/The_Wachowskis
https://en.m.wikipedia.beta.wmflabs.org/wiki/The_Usos
https://en.m.wikipedia.beta.wmflabs.org/wiki/Cryme_Tyme
https://en.m.wikipedia.beta.wmflabs.org/wiki/Carl_Sturken_and_Evan_Rogers
https://en.m.wikipedia.beta.wmflabs.org/wiki/Cleoparta_Stratan
https://en.m.wikipedia.beta.wmflabs.org/wiki/Conny_and_Johanna_Strandberg
https://en.m.wikipedia.beta.wmflabs.org/wiki/Ringo_Starr
https://en.m.wikipedia.beta.wmflabs.org/wiki/Sherman_Brothers
https://en.m.wikipedia.beta.wmflabs.org/wiki/Kristina_and_Karissa_Shannon
https://en.m.wikipedia.beta.wmflabs.org/wiki/Lori_and_George_Schappell
https://en.m.wikipedia.beta.wmflabs.org/wiki/Jay_Allen_Sanford
https://en.m.wikipedia.beta.wmflabs.org/wiki/Fernando_and_Nefty_Sallaberry
https://en.m.wikipedia.beta.wmflabs.org/wiki/Rosenkowitz_sextuplets
https://en.m.wikipedia.beta.wmflabs.org/wiki/United_States_House_of_Representatives

In the new treatment (document.querySelector('body > [type*="ld+json"]')):

https://en.m.wikipedia.beta.wmflabs.org/wiki/List_of_Presidents_of_the_United_States
https://en.m.wikipedia.beta.wmflabs.org/wiki/Genain_quadruplets
https://en.m.wikipedia.beta.wmflabs.org/wiki/Hanselman_sextuplets
https://en.m.wikipedia.beta.wmflabs.org/wiki/Madeline_Harper
https://en.m.wikipedia.beta.wmflabs.org/wiki/Joe_Berlinger_and_Bruce_Sinofsky
https://en.m.wikipedia.beta.wmflabs.org/wiki/Skai_Jackson
https://en.m.wikipedia.beta.wmflabs.org/wiki/Gwendolyn_Graham_and_Cathy_Wood
https://en.m.wikipedia.beta.wmflabs.org/wiki/Chad_Griffin
https://en.m.wikipedia.beta.wmflabs.org/wiki/Jeff_Fatt
https://en.m.wikipedia.beta.wmflabs.org/wiki/Neil_young
https://en.m.wikipedia.beta.wmflabs.org/wiki/Maddie_Ziegler
https://en.m.wikipedia.beta.wmflabs.org/wiki/Michael_Yarmush
https://en.m.wikipedia.beta.wmflabs.org/wiki/Phillip_Wilcher
https://en.m.wikipedia.beta.wmflabs.org/wiki/Duo_Tal_%26_Groethuysen

The only remaining question is where should point the URL property on the mobile domain, most probably it should be the canonical url (it should point to the desktop version) but I'm not 100% sure. I asked question on the StackOverflow as I couldn't find the answer on the schema.org documentation.

The URL property in Article schema should point to the canonical URL, our implementation is correct. This task is done.

pmiazga removed pmiazga as the assignee of this task.
pmiazga subscribed.