Page MenuHomePhabricator

Some articles don't show up in Google search results at all
Open, Needs TriagePublic

Description

For example, if you search Google for "Firefly", the Wikipedia article https://en.wikipedia.org/wiki/Firefly doesn't show up in the first 100 search results (which is all that I checked). The Wikipedia article for "Firefly (TV series)" does show up, as does the Wiktionary page for Firefly. The Firefly Wikipedia article isn't noindexed or redirected or anything else that would obviously cause issues with Google, nor are most of the Google search results about the TV show. Figuring out why this page doesn't show up in Google might help our overall SEO efforts, as there are likely to be other pages with similar issues.

Steps to reproduce: Open up a new incognito tab, go to Google, and search for "firefly".
Expected results: The Wikipedia article "Firefly" should show up somewhere in the search results.
Actual results: The article is nowhere to be found.

Event Timeline

I can reproduce with the search query firefly in Google and DuckDuckGo that the Wikipedia article is not on the first page of results. However, I observed two things that I think make this less urgent than it might seem.

Firstly, both DuckDuckGo and Google do actually provide a neutral path towards the article. For DuckDuckGo, the second scroll gesture (infinite scroll) produces the Wikipedia article as 11th result ("second page"). For Google, the sidebar contains "See results about" where the second suggestion is "Firefly (Insect") showing the image and description from Wikipedia, and when clicked it reloads the search results with the Wikipedia article as the first result.

Secondly, the following search queries all directly return the Wikipedia article "Firefly" as top result:

  • lightning bug (1st result)
  • fireflies (1st result in Google, 2nd result in DuckDuckGo)
  • firefly insect (1st result)
  • firefly bug (1st result)
  • firefly animal (1st result)
  • firefly beetle (1st result)

This might be more a matter of natural language analysis and inferred intent, than an indexing problem. In other words, it might be that the article isn't missing from the result, rather the machine-query that the word "firefly" translates into is interpreting as being about something else.

Short of making the article seem about something it isn't, I don't think we can improve this. It's clearly already well indexed and highly ranked.

I can see how this is counter-intuitive from Wikipedia's perspective, with "Firefly" seeming more important by being granted an "un-parenthesised" title. Perhaps we would be less surprised if there were a hundred other articles named "Firefly (<subject space>)" where most would not show in a plain query for "firefly". But I think it's mostly a good thing that search engines weigh these equally. And per the above, it seems both search engines still provide a clear path to the article, based solely on the interest of the subject. That is the path they provide does not rely on users wanting Wikipedia, it relies on wanting the subject of the insect and still gets Wikipedia as first result.

So according to Google's AI wisdom, the word "firefly" is a TV show, and you have to explicitly tell it that you actually mean the insect instead. We are all doomed.

I see the insect as the 4th result in Google in a private tab on Firefox.

So according to Google's AI wisdom, the word "firefly" is a TV show, and you have to explicitly tell it that you actually mean the insect instead. We are all doomed.

I too worry whether our educational canon will survive the singularity. But to be fair, humans appear to be interested in the TV series article
almost three times as often.[1] So arguably Google does a better job here than Wikipedia's own search + hatnote system in getting the majority of the people to the information they want as quickly as possible.

[1] https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org&platform=desktop&agent=user&range=latest-20&pages=Firefly_(TV_series)|Firefly