Page MenuHomePhabricator

Some of the search results sometimes don't show up
Closed, InvalidPublic

Description

Consider the following python script

import requests


def get_all_search_result():
    url = (
        'https://fa.wikipedia.org/w/api.php?'
        'gsrsearch=mouse&'
        'gsrwhat=text&'
        'generator=search&'
        'action=query&'
        'indexpageids=&'
        'continue=&'
        'gsrnamespace=0&'
        'gsrlimit=500&'
        'format=json'
    )

    json_response = requests.get(url).json()

    titles = set(
        json_response['query']['pages'][page_id]['title']
        for page_id in json_response['query']['pageids']
    )

    while 'continue' in json_response:
        gsroffset = json_response['continue']['gsroffset']
        new_url = url + f'&gsroffset={gsroffset}'
        json_response = requests.get(new_url).json()
        titles |= set(
            json_response['query']['pages'][page_id]['title']
            for page_id in json_response['query']['pageids']
        )

    return set(titles)

count = 0
while True:
    count += 1
    print(count)
    result_set_1 = get_all_search_result()
    result_set_2 = get_all_search_result()
    try:
        assert result_set_1 == result_set_2
    except AssertionError:
        print(result_set_1 ^ result_set_2)
        break

The script searches for the word "mouse" on fawiki two times and compares the results.
If the results are the same it will try again, otherwise it will print the symmetric difference and quit.

Here are the result of running the script a few times:

1
2
3
4
5
{'الکتروپوراسیون'}
1
2
3
4
5
6
7
{'الکتروپوراسیون'}
1
{'فرانسوا آراگو'}
1
2
3
4
5
6
7
{'الکتروپوراسیون', 'فرانسوا آراگو'}

Those pages have not changed during the search operation at all. The question is why is this happening?

A page either will match the query or won't and assuming the page has not changed, the result should not change either. Is this assumption incorrect?

Event Timeline

Dalba renamed this task from Some of the search result sometimes don't show up to Some of the search results sometimes don't show up.Jan 5 2017, 2:59 PM
Anomie subscribed.

A page either will match the query or won't and assuming the page has not changed, the result should not change either. Is this assumption incorrect?

Yes, that assumption is incorrect. The search engine provided by the CirrusSearch extension is not entirely deterministic, repeated searches for the same thing might produce slightly different results. In particular, it's very common to see adjacent results swapped (e.g. results #3 and #4 might be switched around the second time). You can see the same thing using the web UI (Special:Search).

Change 330677 had a related patch set uploaded (by Dalba):
site_tests.py: Remove test_search_where_text and test_search_where_nearmatch

https://gerrit.wikimedia.org/r/330677

Restricted Application added a subscriber: Huji. · View Herald TranscriptOct 16 2020, 5:28 PM