Page MenuHomePhabricator

Semantic MediaWiki unparsed queries and empty or incomplete query results possibly being caused by CirrusSearch
Open, MediumPublicBUG REPORT

Description

Problem Description

Following our recent upgrades of MediaWiki (1.36.4 to 1.38.2) and Semantic MediaWiki (3.2.3 to 4.0.2), we started seeing pages often showing unparsed SMW inline queries as wikitext in the HTML output that is ultimately shown to users, as well as SMW queries sometimes returning empty or incomplete results. These SMW queries are typically defined in templates.

Extensive further details can be found here:

Troubleshooting

Initial troubleshooting included removing most extensions by commenting out their loading from LocalSettings.php. Removing CirrusSearch and Elastica, which seemed related to timing issues, caused the unparsed query issue to disappear, However, through further testing we found a way to mitigate that problem without having to remove CirrusSearch and Elastica.

Our mitigation involved making a change to how our MediaWiki job runners worked. We have four SMW-enabled wikis, each with its own job runner systemd service, where the service is a shell script that runs runJobs.php with --wait --maxjobs=1000 --procs=3 to parallelize the processing and limit the script's runtime to avoid memory leaks, etc. This is basically as recommended by https://www.mediawiki.org/wiki/Manual:Job_queue. The changes made involved removing the use of --procs to cause the jobs to be run completely sequentially, as well as grouping together jobs in the queue by type for each iteration of the infinite loop.

job_types="
enotifNotify
cirrusSearchLinksUpdatePrioritized
cirrusSearchElasticaWrite
cirrusSearchLinksUpdate
cirrusSearchIncomingLinkCount
recentChangesUpdate
htmlCacheUpdate
refreshLinks
"

maxjobs=1000 memlimit="512M"

while true; do
    for job_type in ${job_types}; do
        while [[ $(/usr/bin/php ./maintenance/showJobs.php --type ${job_type} --list | wc -l) -gt 0 ]]; do
            /usr/bin/php ./maintenance/runJobs.php --maxjobs=${maxjobs} --memory-limit=${memlimit} --type ${job_type}
        done
    done
    /usr/bin/php ./maintenance/runJobs.php --maxjobs=${maxjobs} --memory-limit=${memlimit} --wait
done

Our testing only showed this to "fix" the display of unparsed queries but not the problem of empty/incomplete query results.

I have deployed the above runner logic change to our production wiki environment and continue to work with our editors to monitor the wikis for these problems. However, it is quite unclear as to why this problem suddenly started after our aforementioned upgrades and why the runner logic change seems to fix part of the problem.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Umherirrender renamed this task from Semantic MediaWiki unparsed queries and empty or incomplete query results possibly being caused by CIrrusSearch to Semantic MediaWiki unparsed queries and empty or incomplete query results possibly being caused by CirrusSearch.Nov 29 2022, 7:00 PM
MPhamWMF triaged this task as Medium priority.Dec 5 2022, 4:22 PM
MPhamWMF moved this task from needs triage to making others happy on the Discovery-Search board.

I would like to add that, while we have been able to work around the unparsed queries issue, the empty and incomplete query results are a significant problem for us. It would be nice to at least have an idea if this is more likely either a software bug (regardless of which component) or some kind of configuration or content-related issue that we would somehow need to address ourselves. Right now we have no idea and are completely stuck.

I've opened a SMW GitHub issue in case that is a more appropriate avenue, given that these issues appear to be at the intersection of extensions managed here and there.

https://github.com/SemanticMediaWiki/SemanticMediaWiki/issues/5392