Page MenuHomePhabricator

[L] Experiment with adding section information to schema.org metadata
Closed, ResolvedPublic

Description

The goal of this experiment is to test a hypothesis that adding section topics to schema.org metadata on English wiki improves SEO. This work builds upon https://phabricator.wikimedia.org/T302735 where we experimented with manually adding "hasPart" to the schema.org JSON.

This ticket is to create a fake small wiki (on toolforge (or possibly its own domain?)) with a selection of static html pages that are copied from enwiki and that have had section topic data added to the schema.org data on the page (and links modified so they link to one another). We want 10 or 20 pages with a selection of section topics. The extra data will be hasPart (or isPartOf) to indicate sections, and about to indicate what the section is about. At the very least we'll need the title and the url of the article on the current wiki that corresponds to the wikidata id.

So the Glastonbury Festival page might have this

"hasPart": [
        {
            "@type": "Article",
            "name": "Glastonbury Festival - History",
            "url": "https://en.wikipedia.org/wiki/Glastonbury_Festival#History",
            "about": [
                {
                    "@type": "article",
                    "name": "National Jazz and Blues Festival",
                    "url": "https://en.wikipedia.org/wiki/National_Jazz_and_Blues_Festival",
                    "mainEntity": "https://www.wikidata.org/wiki/Q3336901"
                },
                { ... }
            ]
        }
    ]

Ultimately what we're trying to do is see if google uses the schema.org data to enhance search results. One the fake wiki is up, submit it for indexing to google, and then once it's indexed try searching it with site: and see what the search results look like

We hope to see some of the following changes for a sample of modified articles/sections:

  • We can link sections to articles so that when people type in a section related topic, they see the article with a link to that section come up in the results.
  • Searchers can see visual representations of headings of sections when searching for a section related topic

Also:

  • Consider experimenting with defining different schema.org variables to understand how google is using them and where it helps to show blocks of informations related to articles/sections when searching for them.

If the search results are affected positively, then we'll consider doing this for real on a live wiki

See also T319417: Top article-level topics with schema.org markup

Update

The following table shows the experimental wiki pages together with their Google rich snippet test results:

Pasted from T318722#8534255 for more visibility.

Event Timeline

Cparle updated the task description. (Show Details)
Cparle added a subscriber: AUgolnikova-WMF.
Cparle subscribed.

I think we should model this task and T319417: Top article-level topics with schema.org markup in the same way, read pick the same property, either about or keywords.
I'd opt for the latter for the following reasons:

  1. it's quite clear it accepts a list of values, while about is vague and looks to me more intended for a single value
  2. it's more concise and easier to implement, we just need to add the Wikidata QID URL

Still, nothing seems to prevent us from using about with a list, and we don't know the results will be rendered anyway.

MarkTraceur renamed this task from Experiment with using section topics pipeline to add section information to schema.org metadata to [L] Experiment with using section topics pipeline to add section information to schema.org metadata.Nov 16 2022, 5:42 PM
Cparle renamed this task from [L] Experiment with using section topics pipeline to add section information to schema.org metadata to [L] Experiment with adding section information to schema.org metadata.Nov 17 2022, 4:09 PM

https://section-topics-schema.toolforge.org/

All page have schema.org information, and the following pages have additional schema.org information about some of their sections


https://section-topics-schema.toolforge.org/wiki/Republican_Party_(United_States).htm


https://section-topics-schema.toolforge.org/wiki/United_States_House_of_Representatives.htm


https://section-topics-schema.org/wiki/2022_United_States_House_of_Representatives_elections.htm


https://section-topics-schema.toolforge.org/wiki/NASA.htm


https://section-topics-schema.toolforge.org/wiki/Artemis_program.htm


https://section-topics-schema.toolforge.org/wiki/Moon.htmn

I tried to test the first page URL via https://search.google.com/test/rich-results > URL tab: I think that robots.txt needs to be fixed before submission for indexing, see here.

Then, I tested the JSON-LD <script> HTML tag of every page. They all look good except NASA, which has NASA Facilities linked to https://cormacparle.org/wiki/NASA_facilities.htm instead of https://section-topics-schema.toolforge.org/wiki/NASA_facilities.htm.

Other minor comments:

  • no need for escaping slashes
  • add headline properties to every section and section topic

Links:

NOTE: I suggest to also experiment with the keywords property (see T318722#8320332) after the about one.

Thanks for integrating my feedback, @Cparle:

  • all pages can now be crawled, so robots.txt looks fixed
  • in 4 out of 6 pages, other microformats are embedded into tables. See a valid hcard here and an invalid vevent here. Not sure whether the invalid ones could affect the final rendering

Here's the updated test:

Thanks @Cparle and @mfossati! Is the next step to wait (a week? two weeks?) and then look for these pages on Google to see if they've been indexed?

@CBogen the next step is to trigger explicit indexing through https://search.google.com/search-console/welcome?action=inspect, see also https://developers.google.com/search/docs/crawling-indexing/ask-google-to-recrawl.
I don't know how long this will take: I think we'll have to regularly run searches on Google with the site:section-topics-schema.toolforge.org prefix to check that.

@CBogen the next step is to trigger explicit indexing through https://search.google.com/search-console/welcome?action=inspect, see also https://developers.google.com/search/docs/crawling-indexing/ask-google-to-recrawl.
I don't know how long this will take: I think we'll have to regularly run searches on Google with the site:section-topics-schema.toolforge.org prefix to check that.

Got it, thanks. Can we trigger the explicit indexing today? I'm happy to take on running searches regularly to check.

Got it, thanks. Can we trigger the explicit indexing today? I'm happy to take on running searches regularly to check.

Yep, done

CBogen added a subscriber: Etonkovidova.

Got it, thanks. Can we trigger the explicit indexing today? I'm happy to take on running searches regularly to check.

Yep, done

Excellent, thanks! I'm moving this to Needs QA, but @Etonkovidova don't worry about checking this one. I'll check daily to see if it's been indexed and update this ticket if I find anything.

Moving back to Ready for Development to submit this for indexing to additional search engines (Bing, Duck Duck Go).

mfossati changed the task status from Open to In Progress.Apr 5 2023, 8:52 AM
mfossati claimed this task.

After a quick investigation, here are a few findings:

  • While Google didn't, DuckDuckGo and Bing did index section-topics-schema.toolforge.org, but only one page, see DuckDuckGo query and Bing query
  • I couldn't retrieve any documentation about schema.org at DuckDuckGo
  • Bing provides support for schema.org markup, although I couldn't find our specific use case
  • moreover, Bing is already offering a super rich result for the regular English Wikipedia, see query

Thanks @mfossati! Thanks for reminding me that Bing already shows rich results with sections; this was something we discovered a while back. I had talked to Partnerships about trying to set up a meeting with Bing to discuss how they are creating those rich snippets, but we weren't able to make it happen unfortunately.

I think this work is sufficient for the grant so I am going to go ahead and close this task. I have what I need from these results and discussion in the ticket to report on the results to Sloan.