Maniphest T198970

Epic: Implement SEO improvements suggested by Go Fish Digital
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• Deskana
	Jul 6 2018, 1:25 PM

Description

I've been working with Go Fish Digital to figure out how the search engine optimisation of the Wikimedia wikis could be improved. The outcome of the project included a big list of recommendations that they had for us which, if we implemented them, would likely improve the search result rankings for our sites.

This task is an epic which contains all the recommendations provided by Go Fish Digital, which should be implemented to improve our rankings.

Related Objects
Search...

Status	Subtype	Assigned	Task
Resolved		Krinkle	T198970 Epic: Implement SEO improvements suggested by Go Fish Digital
Invalid		None	T198969 Ensure links on the mobile version of pages are not to the desktop version
Invalid		None	T198965 Create XML sitemaps so search engine crawlers can crawl more effectively
Resolved		mpopov	T202643 Determine if creation of Italian Wikipedia sitemaps increased traffic from search engines
Resolved		• Imarlier	T205495 Enable $wgMFNoindexPages for beta
Resolved		• Imarlier	T206496 Create sitemaps for Indonesian, Portuguese, Punjabi, Dutch, and Korean Wikipedias
Resolved		mpopov	T209720 Determine impact of sitemaps on search traffic to Indonesian, Portuguese, Punjabi, Dutch, and Korean Wikipedias
Declined		None	T198963 Add "Did you know..." and "On this day" sections from desktop main page to mobile main page
Invalid		None	T198949 Add navbox links to mobile page HTML
Resolved		ovasileva	T198947 There should not be multiple h1 tags on mobile page HTML: Restructure mobile web header for SEO and accessibility
Resolved		ovasileva	T209306 [Epic] [SEO] Enable Schema.org Article linked data for all main namespace pages
Resolved		pmiazga	T209352 [Spike 2hrs] BetaCluster incorrectly points SameAS to production Wikidata
Open		None	T209410 sameAs schema doesn't report dateModified if article has only one edit
Resolved		ovasileva	T209377 Remove A/B test and launch to 100%
Resolved		mpopov	T209891 Analyze results of sameAs A/B test
Resolved		ovasileva	T208755 Launch A/B test for sameAs property
Resolved		ovasileva	T206868 [Spike 24hrs] How do we measure the effects of the sameAs property on pageviews using an A/B test
Resolved		None	T198946 Add Schema property 'sameAs' pointing to Wikidata entries
Resolved		Jdlrobson	T204070 [Spike, 8hrs] Where should Schema property 'sameAs' live?
Resolved		None	T207790 Add Wikibase page schema system messages
Resolved		None	T208772 QA page schemas
Resolved		None	T208763 Enable page schemas on the beta cluster
Resolved		Jdforrester-WMF	T208809 All pages on Beta Cluster Wikidata and Commons do not load, "Error: invalid magic word 'translation'"
Resolved		• Tbayer	T208789 Identify pages to be bucketed in page schema linked data A/B test
Resolved		• Tbayer	T208909 [Bug] Update old nonuniformly distributed page_random values
Resolved		None	T208796 Use wikibase-debug Logstash channel to log unexpected page_random values
Resolved		ovasileva	T209309 [Spike, 1hrs] Evaluate approximate page size increase
Invalid		• Tbayer	T209315 Enable Google Developer Access for SEO deployers
Resolved	Jan 13 2019	mpopov	T211191 Check in sameAs A/B test results
Resolved		mpopov	T211190 SameAs A/B test preliminary analysis

Event Timeline

• Deskana created this task.Jul 6 2018, 1:25 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 6 2018, 1:25 PM

• Deskana triaged this task as High priority.Jul 6 2018, 1:25 PM

• Deskana moved this task from Tag to 2018 SEO project outcomes on the SEO board.

• Deskana added subtasks: T198969: Ensure links on the mobile version of pages are not to the desktop version, T198965: Create XML sitemaps so search engine crawlers can crawl more effectively, T198963: Add "Did you know..." and "On this day" sections from desktop main page to mobile main page, T198949: Add navbox links to mobile page HTML, T198947: There should not be multiple h1 tags on mobile page HTML: Restructure mobile web header for SEO and accessibility, T198946: Add Schema property 'sameAs' pointing to Wikidata entries.

• Deskana added a subtask: T93213: Improve access to local language wikis by fixing bug in generation of hreflang tags in <head> of article pages.Jul 6 2018, 3:13 PM

• Tbayer subscribed.Jul 6 2018, 3:24 PM

• JKatzWMF added a subtask: T198976: Make it easier for search engines to index anchors on mobile.Jul 6 2018, 3:47 PM

Have we actually engaged as a vendor a company with this on their website? https://gofishdigital.com/online-reputation-management/

• Deskana reopened subtask T198949: Add navbox links to mobile page HTML as Open.Jul 9 2018, 9:39 AM

• Jhernandez subscribed.Jul 9 2018, 10:15 AM

ovasileva subscribed.Jul 9 2018, 11:28 AM

In T198970#4404058, @Krenair wrote:

Have we actually engaged as a vendor a company with this on their website? https://gofishdigital.com/online-reputation-management/

Yes. If you have a specific complaint, it might be helpful if you stated it clearly.

In T198970#4408635, @Deskana wrote:

In T198970#4404058, @Krenair wrote:

Have we actually engaged as a vendor a company with this on their website? https://gofishdigital.com/online-reputation-management/

Yes. If you have a specific complaint, it might be helpful if you stated it clearly.

It looks to me like WMF is working with (and from the looks of T192893 and T193052, have provided with access to private information) a company that might be in the business of whitewashing Wikipedia articles.

Bawolff subscribed.Jul 9 2018, 7:20 PM

In T198970#4409320, @Krenair wrote:

It looks to me like WMF is working with (and from the looks of T192893 and T193052, have provided with access to private information) a company that might be in the business of whitewashing Wikipedia articles.

If you have evidence that they're violating the Terms of Use, then I suggest contacting Legal. Potentially libellous accusations like this do not belong on Phabricator.

AfroThundr3007730 subscribed.Jul 16 2018, 1:25 AM

More technically: I don't believe that Google is actually touching our front end pages at all. They don't spider us any more, AIUI --- we give them a direct feed of the (Parsoid format HTML) content of our pages (from RESTBase) and notify them directly whenever page content changes.

So a bunch of these tweaks to PHP-generated UX HTML would have exactly zero effect for Google results, since Google never sees the front-end HTML. Some might be useful for WMF results in other search engines (Bing?) but I bet our search traffic from these non-Google sites is not very high. Some tweaks might also be useful for 3rd party wikis where google is not using their WMF-focused pipeline -- but 3rd party wikis don't typically use wikidata or language links, for instance.

Similarly, Google does seem to rewrite search results to the mobile site when you search on mobile, but this seems to be a google-internal optimization. It doesn't (AFAIK) have anything to do with the HTML we give them. We should probably have a conversation with our contacts about Google about how exactly their search/spider pipeline works before expending effort on any of these changes. Some may be useful. Others may be more efficiently implemented with changes on Google's side.

EDIT: softened wording, added discussion of impact on 3rd party wikis.

In T198970#4438029, @cscott wrote:

More technically: has anyone informed Go Fish digital that Google isn't actually touching our front end pages at all? They don't spider us any more, AIUI we give them a direct feed of the (Parsoid format HTML) content of our pages and notify them directly whenever page content changes.

So a bunch of these tweaks to PHP-generated UX HTML would have exactly zero effect for Google results, since Google never sees the front-end HTML. It might be useful for other search engines (Bing?) but I bet our search traffic from non-Google sites is not very high.

Are you sure? There was some analysis of our page view logs, and there were lots of hits from different crawlers from different search engines, including Google. I don't know the details myself, but they were definitely accessing our sites.

I believe they still hit our front end for zh.wikipedia.org, because I haven't finished implementing LanguageConverter yet for the Parsoid output (T43716, T190689). Finishing LanguageConverter parity is a priority at the moment so that Google can stop using their legacy crawler for zhwiki. There might be other corner cases where they still use their spider. You can actually test this directly by searching google for content which appears only in Parsoid format HTML (or in the UX, or in the mobile front end). This was easier to do when Parsoid had more bugs/differences when compared to the PHP parser, so it was easier then to find corner cases that were searchable. But I used to be able to easily verify in this way that the non-Parsoid content was not indexed.

We should be able to look at page view logs and the RESTBase logs to identify the google crawler by User-Agent. Verifying details with google is a good idea regardless, as they could run multiple search pipelines or do other tricks. They also hit our API directly. I believe Aaron was on the most recent call w/ our google contacts, as they needed us to raise the ORES limits for their use.

Z. Z. from Google is at Wikimania. He confirmed they still spider the site at a low rate, but only to check errors (ie sanity check their internal representation against what the site actually displays to keep us honest/validate our parsing/validate their internal pipeline). They use a variety of sources to build their representation, including ores, wikidata, restbase, the recentchanges feed, and direct queries to the action API.

• MZMcBride subscribed.Jul 20 2018, 12:48 PM

I'd really be interested to know what's potentially libelous about labeling activity such as this as whitewashing (from https://gofishdigital.com/online-reputation-management/):

The primary platforms that define your online reputation include:
[...]
Wikipedia
[...]

With Online Reputation Management, we work hard to make all of the positive information easy to find. At the same time, we use many different strategies and tactics to diminish the visibility of negative content, or in some cases, remove it from the web altogether. The end result is a positive online reputation because when people search your name or brand, they immediately find positive content.

Why is Wikimedia Foundation Inc. engaging with a company that engages in Wikipedia whitewashing? If you'd prefer, I can also ask on a mailing list, though Phabricator Maniphest seems like a reasonable enough venue.

This task is about "implementing SEO improvements suggested by Go Fish Digital" (emphasis by me) so mailing list sounds more appropriate for your question when it comes to task scope.

Okay, posted: https://lists.wikimedia.org/pipermail/wikimedia-l/2018-July/090737.html.

Paladox subscribed.Jul 21 2018, 10:29 PM

Chicocvenancio subscribed.Jul 22 2018, 5:48 PM

MusikAnimal subscribed.Jul 22 2018, 6:15 PM

Addshore subscribed.Jul 22 2018, 8:00 PM

• Marostegui subscribed.Jul 22 2018, 8:39 PM

Niharika subscribed.Jul 23 2018, 4:22 PM

TheresNoTime subscribed.Jul 25 2018, 3:48 PM

SQL subscribed.Jul 28 2018, 4:32 AM

Ixocactus subscribed.Aug 1 2018, 6:29 AM

• Tbayer mentioned this in T198947: There should not be multiple h1 tags on mobile page HTML: Restructure mobile web header for SEO and accessibility.Aug 7 2018, 4:58 PM

ovasileva closed subtask T198947: There should not be multiple h1 tags on mobile page HTML: Restructure mobile web header for SEO and accessibility as Resolved.Aug 17 2018, 9:53 AM

• Imarlier subscribed.Aug 29 2018, 2:14 PM

Jdlrobson changed the status of subtask T198946: Add Schema property 'sameAs' pointing to Wikidata entries from Open to Stalled.Sep 18 2018, 11:54 PM

Jdlrobson changed the status of subtask T198946: Add Schema property 'sameAs' pointing to Wikidata entries from Stalled to Open.Oct 1 2018, 10:55 PM

ovasileva added a subtask: T206868: [Spike 24hrs] How do we measure the effects of the sameAs property on pageviews using an A/B test.Oct 12 2018, 5:25 PM

ovasileva added a project: Web-Team-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q2).Oct 18 2018, 4:05 PM

ovasileva moved this task from To Do to Quarterly Goals on the Web-Team-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q2) board.

SEO optimization came up on the Audiences 1 QCI presentation, and it was mentioned that one question we had was whether Google used the same ingestion pipeline for all languages / wikis, or whether there were certain things that would work differently on (say) English wikipedia -vs- Spanish wikisource.

I couldn't find a better place to discuss this (is there a phab task for the research to answer this question?), so I'll put it here.

As far as I know, Google currently uses the same "special Wikipedia" ingestion pipeline for all wikis *except for those using LanguageConverter*, which (pending resolution of T43716: [EPIC] Support language variant conversion in Parsoid) use a different more-generic pipeline. I assume this applies for Wikipedias, I'm not certain they use this for wikisource, wikivoyage, etc.

But this suggests in particular that we can use zhwiki/srwiki/etc as a good control case if we do this research, since we "know" that these wikis are using the "old" pipeline, and we "know" that enwiki is using the "new" pipeline. So we can come up with some experimental questions, then see which wikis cluster with zhwiki and which cluster with enwiki.

We also have some contacts at google, so we could probably just ask them directly. But it's worth coming up with our own metrics and monitoring here, to sanity check the info we get from our direct contact and so we have some sort of dashboard notification in case the pipeline changes in the future, whether intentionally or unintentionally.

• Mholloway subscribed.Oct 30 2018, 3:54 PM

ovasileva added a subtask: T208755: Launch A/B test for sameAs property.Nov 5 2018, 5:59 PM

• Niedzielski removed a subtask: T198946: Add Schema property 'sameAs' pointing to Wikidata entries.Nov 12 2018, 7:37 PM

• Niedzielski mentioned this in T198946: Add Schema property 'sameAs' pointing to Wikidata entries.

• Niedzielski removed a subtask: T206868: [Spike 24hrs] How do we measure the effects of the sameAs property on pageviews using an A/B test.Nov 12 2018, 7:40 PM

• Niedzielski mentioned this in T206868: [Spike 24hrs] How do we measure the effects of the sameAs property on pageviews using an A/B test.

• Niedzielski added a subtask: T209306: [Epic] [SEO] Enable Schema.org Article linked data for all main namespace pages.Nov 12 2018, 8:08 PM

• Niedzielski removed a subtask: T208755: Launch A/B test for sameAs property.

• Niedzielski mentioned this in T208755: Launch A/B test for sameAs property.

Substituting this epic of epics from Readers Web quarterly goals for targeted task, T209306. All Readers Web SEO work that has Phabricator tasking should now appear under T209306.

Legoktm mentioned this in T206497: Enable $wgMFNoindexPages for: Italian, Dutch, Korean, Arabic, Chinese, and Hindi Wikipedias.Nov 28 2018, 6:40 AM

Krinkle closed subtask T198963: Add "Did you know..." and "On this day" sections from desktop main page to mobile main page as Declined.Dec 3 2018, 3:11 AM

"Optimizing" a website for search engines appears to be the wrong approach to website building to me. If a search engine's algorithm fails to correctly balance the relevance of a website to its users, then the problem is in the algorithm, not the website.

I object to any kind of "SEO" measures. If there are accessibility-related issues to fix, then please describe and fix them as accessibility issues, not "SEO" issues.

ZZ from Google is at Wikimania 2019 and will be on the panel at https://wikimania.wikimedia.org/wiki/2019%3AQuality/Idea_jam_on_quality on Sunday.

If we have any remaining SEO questions we should try to meet up and get them addressed.

• Jhernandez unsubscribed.Apr 2 2020, 6:46 PM

Aklapper added a project: WMF-General-or-Unknown.Oct 24 2020, 1:29 PM

Aklapper removed subscribers: • Imarlier, • Tbayer.

ovasileva closed subtask T209306: [Epic] [SEO] Enable Schema.org Article linked data for all main namespace pages as Resolved.Mar 16 2021, 12:22 PM

Krinkle closed subtask T198965: Create XML sitemaps so search engine crawlers can crawl more effectively as Invalid.Dec 17 2021, 12:29 AM

Krinkle closed subtask T198969: Ensure links on the mobile version of pages are not to the desktop version as Declined.

Krinkle removed a subtask: T93213: Improve access to local language wikis by fixing bug in generation of hreflang tags in <head> of article pages.Dec 17 2021, 12:31 AM

Krinkle removed a subtask: T198976: Make it easier for search engines to index anchors on mobile.

Krinkle changed the status of subtask T198969: Ensure links on the mobile version of pages are not to the desktop version from Declined to Invalid.

Krinkle closed this task as Resolved.Dec 17 2021, 12:34 AM

Krinkle claimed this task.

Krinkle changed the status of subtask T198949: Add navbox links to mobile page HTML from Duplicate to Invalid.

cjming mentioned this in T299215: [SPIKE] Determine instrumentation requirements for A/B test section snippets.Feb 3 2022, 12:27 AM

Epic: Implement SEO improvements suggested by Go Fish DigitalClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

Epic: Implement SEO improvements suggested by Go Fish Digital
Closed, ResolvedPublic
Actions

Related Objects
Search...