Page MenuHomePhabricator

ElasticaWrite.php: Unsupported operand types error with CirrusSearch in runJobs.php
Open, NormalPublic

Description

Under MW 1.28 with matched 1.28 CirrusSearch&Elastica extension (REL1_28-0959e38)

 php /var/www/mediawiki/maintenance/runJobs.php -q

PHP Notice:  unserialize(): Error at offset 18862 of 65535 bytes in /var/www/mediawiki/includes/jobqueue/JobQueueDB.php on line 802
[dde4ea3abab8bf5fa6ab94ff] [no req]   Error from line 40 of /var/www/mediawiki/extensions/CirrusSearch/includes/Job/ElasticaWrite.php: Unsupported operand types
Backtrace:
#0 /var/www/mediawiki/includes/jobqueue/Job.php(74): CirrusSearch\Job\ElasticaWrite->__construct(Title, boolean)
#1 /var/www/mediawiki/includes/jobqueue/JobQueueDB.php(292): Job::factory(string, Title, boolean, string)
#2 /var/www/mediawiki/includes/jobqueue/JobQueue.php(372): JobQueueDB->doPop()
#3 /var/www/mediawiki/includes/jobqueue/JobQueueGroup.php(240): JobQueue->pop()
#4 /var/www/mediawiki/includes/jobqueue/JobRunner.php(157): JobQueueGroup->pop(integer, integer, array)
#5 /var/www/mediawiki/maintenance/runJobs.php(86): JobRunner->run(array)
#6 /var/www/mediawiki/maintenance/doMaintenance.php(111): RunJobs->execute()
#7 /var/www/mediawiki/maintenance/runJobs.php(119): require_once(string)
#8 {main}

Event Timeline

Zoglun created this task.Feb 10 2017, 1:55 AM
Restricted Application added projects: Discovery, Discovery-Search. · View Herald TranscriptFeb 10 2017, 1:55 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Zoglun updated the task description. (Show Details)Feb 10 2017, 2:09 AM

Almost all cirrusSearchElasticaWrite type job failed due to this error.

perhaps related? T124196 It suggests that mysql blobs arn't big enough for the retry jobs.

Anyway, ElasticaWrite jobs only get enqueue'd when there was a problem writing to elasticsearch and we want to try again later. Could you look back in the logs and see what the original error was?

I got nothing in mysql error.log.

Tried out the
php /var/www/wiki/extensions/CirrusSearch/maintenance/forceSearchIndex.php --skipLinks --indexOnSkip
php /var/www/wiki/extensions/CirrusSearch/maintenance/forceSearchIndex.php --skipParse
It does reindex all record but new page still generate same error.

Tried to enlarge the upper limit of blob but can't get it over 65k.

All right, it is T124196 . So for a temporary fix, I got all the job stored in Redis instead of mysql.

debt triaged this task as Low priority.Mar 9 2017, 11:10 PM
debt moved this task from needs triage to later on... on the Discovery-Search board.
debt added a subscriber: debt.

This is a real bug but not that big as the workaround is working - to use Redis and not MySQL.

I'm seeing the same in my logs:

Notice: unserialize(): Error at offset 64870 of 65535 bytes in /opt/htdocs/mediawiki/includes/jobqueue/JobQueueDB.php on line 802

Fatal error: Unsupported operand types in /opt/htdocs/mediawiki/extensions/CirrusSearch/includes/Job/ElasticaWrite.php on line 44
`

Product	Version
MediaWiki	1.28.2 (438c3d6)
PHP	5.6.31 (apache2handler)
MariaDB	5.5.52-MariaDB
ICU	50.1.2
Elasticsearch	2.4.6

Thankfully @GFXDude2010 supplied details on how he got Redis installed as a replacement over on issue T124196. However, Redis is not a good workaround for me (no time to install, test, deploy).

Rudloff added a subscriber: Rudloff.Oct 1 2017, 9:41 PM

I can still reproduce with 1.29+7005f38:

PHP Notice:  unserialize(): Error at offset 20792 of 65535 bytes in /home/vhosts/fabien/archi-mediawiki/vendor/mediawiki/core/includes/jobqueue/JobQueueDB.php on line 807
PHP Fatal error:  Unsupported operand types in /home/vhosts/fabien/archi-mediawiki/extensions/CirrusSearch/includes/Job/ElasticaWrite.php on line 44
Kifit added a subscriber: Kifit.Mar 14 2018, 5:34 PM

We had the same bug on my company's mediawiki (1.30) with the same error messages as the ones quoted by @Rudloff.

I realized today that this bug had an impact on pages indexing by CirrusSearch. It turns out that new pages weren't indexed anymore.

I managed to get rid of the error messages and have CirrusSearch working fine again by following these steps :

However, before applying those guidelines (i.e adapting the script and the systemd service unit), I first checked the state of job queue by visiting this special page on our mediawiki :

http://our_mediawiki_domain/api.php?action=query&meta=siteinfo&siprop=statistics&format=json

Then I realized we had more than 3000 jobs pending. So I wrote a little bash script to enforce jobs execution :

#!/bin/bash
while true
do
php /path_to_your_mediawiki/maintenance/runJobs.php
done

It will run as an infinite loop so when it stops (no more action) you'll have to terminate it's execution by hitting Ctrl C.

You can then check back the special page URL for controlling the number of jobs on your mediawiki. It should have dramatically decreased.

Then you can follow the guidelines I mentionned above and you're done.

Hope it will help.

Dan.mulholland raised the priority of this task from Low to Normal.Nov 18 2018, 4:41 AM
Dan.mulholland added a subscriber: Dan.mulholland.

This is a real bug but not that big as the workaround is working - to use Redis and not MySQL.

  1. I'd like to request increased priority for this.
  2. This has caused me some difficulty in deploying REL1_31.
  3. Many Mediawiki users use MySQL for the job queue and probably would like the advanced and effective searching that CirrusSearch provides.
  4. I accept that a workaround is available and am happy with the lowered priority if it is confirmed that CirrusSearch does not function correctly with MySQL.
  5. Should this be so, we should not rely on a user's first encountering this error and then (eventually) finding this bug report. Instead it should be documented as a requirement at https://www.mediawiki.org/wiki/Extension:CirrusSearch
  6. Nonetheless, resolution is important because adding redis is not without a performance impact and does non-trivially increase the complexity of the Mediawiki stack.
Reedy updated the task description. (Show Details)Nov 18 2018, 4:43 AM
EBernhardson added a comment.EditedNov 20 2018, 5:22 PM

I don't know if i appropriately covered this above, but:

  1. ElasticaWrite is an error recovery mechanism
  2. In normal operations an ElasticaWrite job is never enqueued to the job service
  3. If ElasticaWrite job's are being enqueued that means you've hit the error recovery code path.
  4. That means something *else* is wrong, and needs to be resolved.

Getting ElasticaWrite jobs into the mysql job service might be helpful, but in the end this is an error recovery mechanism and not part of normal operations. If this doesn't work the general operations should be fine. Whatever ElasticaWrite was trying to recover from needs to be dealt with directly.

This would still be true if ElasticaWrite was working. In my experience ElasticaWrite is not very reliable at recovering from failures, but it at least tries. This job was implemented not to handle failure recovery, but to help handle the multi-cluster installation at WMF and allow us to put clusters into a maintenance mode where writes back up into the job queue. It also attempts to do some failure recovery because it's available, but it's not the primary purpose.

Most of the things that end up in ElasticaWrite outside of explicit maintenance actions are typically still errors after a few retries and have to be dealt with manually. If you are not running multiple mirrored elasticsearch clusters the primary purpose of ElasticaWrite is null and void.

@EBernhardson , In fact that ElasticaWrite is quite common, simple server reboot could incur last few line error almost every time. Especially if using ElasticaSearch cluster.

Yes, ElasticaWrite may not work, but it will prohibit jobrun from error. That will cause more trouble than few Elastisearch missing lines.

Moving back into needs triage to review recent comments and suggestions.