Page MenuHomePhabricator

[SPIKE] Investigate effects of schema.org section metadata on Google search results
Closed, ResolvedPublic

Description

The goal here is to determine if adding section-level metadata about topics, etc. will have any effect on how our pages are rendered on Google search results.

Rough suggestion of how to proceed:

Note that this spike will help us determine if any information related to sections will help with Google SEO. It will not help us determine if the information we actually obtain from section topics is enough to help with SEO. If this spike is successful, we will need a follow-on spike to use sample blue-link based section topics for a similar experiment.

For reference, here are some links from the Web team's last experiment with schema.org:

Event Timeline

SimoneThisDot subscribed.

Spike

To complete the above spike, I have run tests on simple pages and more complex pages and tried to analyse the scheme.org and the suggestion provided by the google search tool to try and define simple to achieve properties that would increase our SEO visibility.

The aim of this spike was to see if the property and work to apply this properties is worth, and does not take into consideration more complex property that may require markup changes and or data extraction.

Result

After reading documentation for a few schema types and property I have concluded that it will be quite simple and worth for us to add further metadata on the page. I have create a simple example report (link in the notes and resources section) to provide an example of the metadata that would be produce by the addition of the following metadata on the page.

In the attached report I have done the following changes:

  • Defined a main article
  • Defines sub heading using hasPart
  • Define thumbnail and images using "image" or "thumnailURL"
  • Defined reference link using "isBasedOnUrl"

Conclusion

Due to the scope of the spike, I have not analysed further the possible usage of the metadata, but the above should be more than enough to define the investment in its usage worth it (even if we just add the "isBasedOnUrl" for all reference links). There are further properties that could be useful in our case, but would require more work, for example setting a inLanguage to define the languages in which the wiki is translated, or specific item metadata (audio, video), or even more specific "mediaType" metadata, like providing an "event" like a concert specific metadata such as capaticy, ticket price, date or even, participant, that are all available within the schema.org

  1. List of schema.org properties and types used in the research.

Notes and resources

  • The "required" field from Google did not always match up with the schema.org object (eg in movie Google requires an image, while in schema.org is not needed)
  • Adding further details may requite markup changes (eg each individual section should be wrapped in a span element to be able to define scope)
    1. Reports

report run on this wiki page: https://en.wikipedia.org/wiki/Glastonbury_Festival

Second Spike Update

The following update is meant to provide some clarity on the way in which a wikimedia article can be split in multiple section using the scheme.org data structure, and hopefully contribute to the SEO rating of the pages.

For the following example, I have used the same page as the first example: https://en.wikipedia.org/wiki/Glastonbury_Festival
The initial report on the page (includes wikidata schema.org info): https://search.google.com/test/rich-results/result/r%2Farticles?id=m1UlNKeMzHcftxRyDukVAg

What was achieved
I have been able to create a small POC in which the Schema.org section would provide further information on sections and other information within the section. This is just an example and the json could be expanded to have all the links and sub-sections.

In my example, I have added a section (history) and defined 2 articles (Michael Eavis and National Jazz and Blues Festival) as part of the new section.

From the JSON created, SEOs will now be aware of the "history section" and of the article/images that are linked within this individual sections

How was achieved
The Schema.org JSON used to generate the attached report was: Part of this schema is already available within the original page, and most of the addition has been within the "hasPart"

{
    "@context": "https:\/\/schema.org",
    "@type": "Article",
    "name": "Glastonbury Festival",
    "identifier": "Glastonbury_Festival",
    "url": "https:\/\/en.wikipedia.org\/wiki\/Glastonbury_Festival",
    "sameAs": "http:\/\/www.wikidata.org\/entity\/Q309066",
    "mainEntity": "http:\/\/www.wikidata.org\/entity\/Q309066",
    "author": {
        "@type": "Organization",
        "name": "Contributors to Wikimedia projects"
    },
    "publisher": {
        "@type": "Organization",
        "name": "Wikimedia Foundation, Inc.",
        "logo": {
            "@type": "ImageObject",
            "url": "https:\/\/www.wikimedia.org\/static\/images\/wmf-hor-googpub.png"
        }
    },
    "hasPart": [
        {
            "@type": "Article",
            "name": "Glastonbury Festival - History",
            "headline": "Glastonbury Festival - History section",
            "identifier": "Glastonbury_Festival--history",
            "mentions": [
                {
                    "@type": "article",
                    "name": "Michael Eavis",
                    "headline": "Athelstan Joseph Michael Eavis",
                    "url": "https:\/\/en.wikipedia.org\/wiki\/Michael_Eavis",
                    "abstract": " is an English dairy farmer and the co-creator of the Glastonbury Festival, which takes place at his farm in Pilton, Somerset.",  
                    "author": {
                        "@type": "Organization",
                        "name": "Contributors to Wikimedia projects"
                    },
                    "image":"https://upload.wikimedia.org/wikipedia/commons/thu…ael_Eavis_04_-_Glastonbury_Festival_2019_crop.jpg"
                },
                {
                    "@type": "article",
                    "name": "National Jazz and Blues Festival",
                    "headline": "National Jazz and Blues Festival",
                    "url": "https:\/\/en.wikipedia.org\/wiki\/National_Jazz_and_Blues_Festival",
                    "abstract": "The National Jazz and Blues Festival was the precursor to the Reading Rock Festival and was the brainchild of Harold Pendleton, the founder of the prestigious Marquee Club in Soho.",  
                    "author": {
                        "@type": "Organization",
                        "name": "Contributors to Wikimedia projects"
                    },
                    "image":"https://upload.wikimedia.org/wikipedia/commons/thumb/9/91/National_Jazz_and_Blues_Festival_1975_%28Reading%29_stage.jpg/480px-National_Jazz_and_Blues_Festival_1975_%28Reading%29_stage.jpg"
                }
            ],
            "author": {
                "@type": "Organization",
                "name": "Contributors to Wikimedia projects"
            },
            "image": "\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/d\/d6\/Michael_Eavis_%284742590455%29.jpg\/330px-Michael_Eavis_%284742590455%29.jpg"
        }
    ],
    "datePublished": "2002-09-01T15:04:23Z",
    "dateModified": "2022-03-16T09:30:14Z",
    "image": "https:\/\/upload.wikimedia.org\/wikipedia\/commons\/c\/cc\/Glastonbury_Tribute.jpg",
    "headline": "performing arts festival in Somerset, England"
}

What are the "issues" encounted
While developing the POC I encounted the following difficulties or possible blockers:

  • The main "wikidata" schema.org had to be modified to include the new information
  • I tried to create different "schemas" but was not able to use the "identifier" to connect them (further investigation needed to solve this)
  • We need to decide how to define the "links" between the sections. These could be:
    • isPartOf: Indicates an item or CreativeWork that this item, or CreativeWork (in some sense), is part of.
    • isBasedOn: A resource from which this work is derived or from which it is a modification or adaption.
    • mentions: Indicates that the CreativeWork contains a reference to, but is not necessarily about a concept.

Reports link

New report: https://search.google.com/test/rich-results/result/r%2Farticles?id=81wcVsQ2UNEd4VMaj1xZFQ

SimoneThisDot added a subscriber: CBogen.

@MarkTraceur @CBogen I have updated the ticket with all the information that you requested.

Please let me know if there is any further information or investigation that you want me to work on before moving this task further.

CBogen added a subscriber: SWakiyama.

Thank you @SimoneThisDot! This is very promising and @SWakiyama is going to build further work on it into the requirements for section topics work we have coming up next FY. For now this spike can be considered complete.

SEOs will now be aware of the "history section" and of the article/images that are linked within this individual sections.

New link since the one above expired: https://search.google.com/test/rich-results/result/r%2Farticles?id=H5aqq5fJKvmAshH7IMQiSg

A couple of screenshots in case the above expires again:

image.png (766×2 px, 101 KB)

image.png (1×1 px, 166 KB)

image.png (1×1 px, 181 KB)

image.png (1×1 px, 206 KB)

Just wanted to mention that Google seems to support dynamically-generated JSON-LD metadata, see https://developers.google.com/search/docs/appearance/structured-data/intro-structured-data#structured-data-format .
This can be either client-side or server-side, see https://developers.google.com/search/docs/appearance/structured-data/generate-structured-data-with-javascript#custom-javascript

As a result, a MediaWiki gadget might be a viable solution to test at scale. I guess this would require an HTTP endpoint to serve section topics data.

As a result, a MediaWiki gadget might be a viable solution to test at scale. I guess this would require an HTTP endpoint to serve section topics data.

A big caveat raised by @Cparle is that we need to get the gadget approved by the community: for Google to pick up a page with schema.org markup, it has to be the public version.

Can we clarify exactly what we're going to use to specify section topics?

We'll be using hasPart (or isPartOf) to indicate sections, and about to indicate what the section is about ... but how are we going to identify/describe the topics? Just wikidata ids are probably not going to be useful for search results. I expect that at the very least we'll need the title and the url of the article on the current wiki that corresponds to the wikidata id. Is that what you expect @CBogen ?

So Glastonbury might have this

"hasPart": [
        {
            "@type": "Article",
            "name": "Glastonbury Festival - History",
            "url": "https://en.wikipedia.org/wiki/Glastonbury_Festival#History",
            "about": [
                {
                    "@type": "article",
                    "name": "National Jazz and Blues Festival",
                    "url": "https:\/\/en.wikipedia.org\/wiki\/National_Jazz_and_Blues_Festival",
                    "mainEntity": "https:\/\/www.wikidata.org\/wiki\/Q3336901"
                },
                { ... }
            ]
        }
    ]

I expect that at the very least we'll need the title and the url of the article on the current wiki that corresponds to the wikidata id. Is that what you expect @CBogen ?

Yes, that looks right to me! @AUgolnikova-WMF do you agree?

BTW - I think we should create a new ticket to account for this work instead of continuing this discussion here. I've created a stub at T318722; I'll let @AUgolnikova-WMF fill it in.