Page MenuHomePhabricator

internal_api_error_LocalFileLockError while uploading file via API to commons.wikimedia.beta.wmflabs.org
Closed, ResolvedPublic

Description

I am using commons_upload to upload files using the API.

It works fine when targeting MediaWiki-Vagrant or production Commons, but it fails when targeting beta cluster.

$ ls screenshots/
VisualEditor_toolbar-ta.png

vagrant (works fine)

$ export MEDIAWIKI_API_UPLOAD_URL=http://127.0.0.1:8080/w/api.php
$ export MEDIAWIKI_USER=Admin
$ export MEDIAWIKI_PASSWORD=vagrant
$ bundle exec upload
Uploading ./screenshots/VisualEditor_toolbar-ta.png

production (works fine)

$ export MEDIAWIKI_API_UPLOAD_URL=https://commons.wikimedia.org/w/api.php
$ export MEDIAWIKI_USER=LanguageScreenshotBot
$ export MEDIAWIKI_PASSWORD=not-the-real-one
$ bundle exec upload
Uploading ./screenshots/VisualEditor_toolbar-ta.png

beta (fails)

$ export MEDIAWIKI_API_UPLOAD_URL=https://commons.wikimedia.beta.wmflabs.org/w/api.php
$ export MEDIAWIKI_USER=LanguageScreenshotBot
$ export MEDIAWIKI_PASSWORD=not-the-real-one
$ bundle exec upload
Uploading ./screenshots/VisualEditor_toolbar-ta.png
/usr/local/lib/ruby/gems/2.3.0/gems/mediawiki_api-0.7.0/lib/mediawiki_api/client.rb:211:in `send_request': [V7wzDQpEEH8AAHb1ERgAAAAD] Exception Caught: Could not acquire lock for "Array". (internal_api_error_LocalFileLockError) (MediawikiApi::ApiError)
	from /usr/local/lib/ruby/gems/2.3.0/gems/mediawiki_api-0.7.0/lib/mediawiki_api/client.rb:232:in `raw_action'
	from /usr/local/lib/ruby/gems/2.3.0/gems/mediawiki_api-0.7.0/lib/mediawiki_api/client.rb:36:in `action'
	from /usr/local/lib/ruby/gems/2.3.0/gems/mediawiki_api-0.7.0/lib/mediawiki_api/client.rb:166:in `upload_image'
	from /Users/z/Documents/github/amire80/commons_upload/lib/commons_upload.rb:34:in `image'
	from /Users/z/Documents/github/amire80/commons_upload/lib/commons_upload.rb:45:in `block in images'
	from /Users/z/Documents/github/amire80/commons_upload/lib/commons_upload.rb:43:in `each'
	from /Users/z/Documents/github/amire80/commons_upload/lib/commons_upload.rb:43:in `images'
	from /Users/z/Documents/github/amire80/commons_upload/bin/upload:4:in `<top (required)>'
	from /usr/local/lib/ruby/gems/2.3.0/bin/upload:23:in `load'
	from /usr/local/lib/ruby/gems/2.3.0/bin/upload:23:in `<main>'

I have found V7wzDQpEEH8AAHb1ERgAAAAD in the logs.

{
  "_index": "logstash-2016.08.23",
  "_type": "mediawiki",
  "_id": "AVa3J5W_hLHGnzHlEuTQ",
  "_score": 1,
  "_source": {
    "message": "Failed to lock 'VisualEditor_toolbar-ta.png'",
    "@version": 1,
    "@timestamp": "2016-08-23T11:27:19.000Z",
    "type": "mediawiki",
    "host": "deployment-mediawiki02",
    "level": "WARNING",
    "tags": [
      "syslog",
      "es",
      "es"
    ],
    "channel": "LocalFile",
    "normalized_message": "Failed to lock '{file}'",
    "url": "/w/api.php",
    "ip": "10.68.18.103",
    "http_method": "POST",
    "server": "commons.wikimedia.beta.wmflabs.org",
    "referrer": null,
    "unique_id": "V7wzDQpEEH8AAHb1ERgAAAAD",
    "wiki": "commonswiki",
    "mwversion": "1.28.0-alpha",
    "reqId": "V7wzDQpEEH8AAHb1ERgAAAAD",
    "file": "VisualEditor_toolbar-ta.png"
  },
  "fields": {
    "@timestamp": [
      1471951639000
    ]
  }
}

Event Timeline

Looking at other log messages in that request (https://logstash-beta.wmflabs.org/goto/03d19a17b4404c7708e2782fb44cbe59), I see a few things that could use checking:

  • A lot of "Lua script error on server "10.68.16.231": LOADING Redis is loading the dataset in memory", which if it's using redis for locking might mean the locking failed because redis happened to be restarting at that moment. You might just try again.
  • An error message "Could not acquire lock for "Array".", which could mean something somewhere in there is getting an array where it expected a string. If it's trying to lock "Array" for everything rather than the correct file paths, that could be trouble.
  • A lot of "Lua script error on server "10.68.16.231": LOADING Redis is loading the dataset in memory", which if it's using redis for locking might mean the locking failed because redis happened to be restarting at that moment. You might just try again.

It did not happen just once, it is reproducible every time. Sorry, forgot to make that explicit.

Do the reproductions lack the "Redis is loading" messages during the request?

Sorry, I am not familiar with logstash. How did I get something like https://logstash-beta.wmflabs.org/goto/03d19a17b4404c7708e2782fb44cbe59 for another request?

In the search bar near the top of the page, change the V7wzDQpEEH8AAHb1ERgAAAAD in "reqId:V7wzDQpEEH8AAHb1ERgAAAAD" to whatever the reqId is for your retry request.

Tried again.

$ ls screenshots/
VisualEditor_category_item-hr.png

$ export MEDIAWIKI_API_UPLOAD_URL=https://commons.wikimedia.beta.wmflabs.org/w/api.php
$ export MEDIAWIKI_USER=LanguageScreenshotBot
$ export MEDIAWIKI_PASSWORD=not-the-real-one

$ bundle exec upload
Uploading ./screenshots/VisualEditor_category_item-hr.png
/usr/local/lib/ruby/gems/2.3.0/gems/mediawiki_api-0.7.0/lib/mediawiki_api/client.rb:211:in `send_request': [V71kiwpEEaoAAEaiEmoAAAAF] Exception Caught: Could not acquire lock for "mwstore&#58;//local-multiwrite/local-public/d/de/VisualEditor_category_item-hr.png". (internal_api_error_LocalFileLockError) (MediawikiApi::ApiError)
	from /usr/local/lib/ruby/gems/2.3.0/gems/mediawiki_api-0.7.0/lib/mediawiki_api/client.rb:232:in `raw_action'
	from /usr/local/lib/ruby/gems/2.3.0/gems/mediawiki_api-0.7.0/lib/mediawiki_api/client.rb:36:in `action'
	from /usr/local/lib/ruby/gems/2.3.0/gems/mediawiki_api-0.7.0/lib/mediawiki_api/client.rb:166:in `upload_image'
	from /Users/z/Documents/github/amire80/commons_upload/lib/commons_upload.rb:34:in `image'
	from /Users/z/Documents/github/amire80/commons_upload/lib/commons_upload.rb:45:in `block in images'
	from /Users/z/Documents/github/amire80/commons_upload/lib/commons_upload.rb:43:in `each'
	from /Users/z/Documents/github/amire80/commons_upload/lib/commons_upload.rb:43:in `images'
	from /Users/z/Documents/github/amire80/commons_upload/bin/upload:4:in `<top (required)>'
	from /usr/local/lib/ruby/gems/2.3.0/bin/upload:23:in `load'
	from /usr/local/lib/ruby/gems/2.3.0/bin/upload:23:in `<main>'

https://logstash-beta.wmflabs.org/goto/0c416dc6114efee609ed469aa4ded6c3

Date filter prevent that reqId from being shown. Change it to "Last 7 days" and it finds it: https://logstash-beta.wmflabs.org/goto/b2d8adc1f8c4ec2b673a5a1fdc30e2c6 :-}

On beta in wmf-config/fileback-labs.php the lock manager is configured line 157 via $wgLockManagers[] which in turns point to:

wmf-config/LabsServices.php
'redis_lock' => [
    'rdb1' => '10.68.16.177', // deployment-redis01.deployment-prep.eqiad.wmflabs
    'rdb2' => '10.68.16.231', // deployment-redis02.deployment-prep.eqiad.wmflabs
],

The reqId query in logstash shows a bunch of:

Lua script error on server "10.68.16.231": LOADING Redis is loading the dataset in memory

Maybe that is just a warning. Then looking at 10.68.16.231 which is deployment-redis02.deployment-prep.eqiad.wmflabs , looks like redis has some issue:

[675072.671326] init: redis-instance-tcp_6379 main process (31402) terminated with status 1
[675076.660993] init: redis-instance-tcp_6379 main process ended, respawning
[675076.660975] init: redis-instance-tcp_6379 main process (31407) terminated with status 1
[675080.677410] init: redis-instance-tcp_6379 main process ended, respawning
....

Mentioned in SAL [2016-08-24T09:38:10Z] <hashar> deployment-redis02 initctl stop redis-instance-tcp_6379 && initctl start redis-instance-tcp_6379 | That did not fix it magically though T143655

On deployment-redis02:

/var/log/redis/tcp_6379.log
[1627] 24 Aug 09:41:18.216 # Unable to set the max number of files limit to 10032 (Operation not permitted), setting the max clients configuration to 3984.
                _._                                                  
           _.-``__ ''-._                                             
      _.-``    `.  `_.  ''-._           Redis 2.8.4 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._                                   
 (    '      ,       .-`  | `,    )     Running in stand alone mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6379
 |    `-._   `._    /     _.-'    |     PID: 1627
  `-._    `-._  `-./  _.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |           http://redis.io        
  `-._    `-._`-.__.-'_.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |                                  
  `-._    `-._`-.__.-'_.-'    _.-'                                   
      `-._    `-.__.-'    _.-'                                       
          `-._        _.-'                                           
              `-.__.-'                                               

[1627] 24 Aug 09:41:18.218 # Server started, Redis version 2.8.4
[1627] 24 Aug 09:41:22.102 # Bad file format reading the append only file: make a backup of your AOF file, then use ./redis-check-aof --fix <filename>

I have no idea what AOF stands for nor which AOF file to fix up :(

Mentioned in SAL [2016-08-24T09:43:51Z] <hashar> T143655 stopping redis 6379 on deployment-redis02 : initctl stop redis-instance-tcp_6379

Found the config in /etc/redis/tcp_6379.conf which has the AOF set to /srv/redis/deployment-redis02-6379.aof

# sudo su - redis -s /bin/bash
$ redis-check-aof /srv/redis/deployment-redis02-6379.aof
0x        1d4509a4: Expected prefix '
AOF analyzed: size=491065344, ok_up_to=491063716, diff=1628
AOF is not valid
$ redis-check-aof --fix /srv/redis/deployment-redis02-6379.aof
0x        1d4509a4: Expected prefix '
AOF analyzed: size=491065344, ok_up_to=491063716, diff=1628
This will shrink the AOF from 491065344 bytes, with 1628 bytes, to 491063716 bytes
Continue? [y/N]: y
Successfully truncated AOF
$ redis-check-aof /srv/redis/deployment-redis02-6379.aof
AOF analyzed: size=491063716, ok_up_to=491063716, diff=0
AOF is valid

Restarted it with initctl start redis-instance-tcp_6379 and the log file shows:

[2850] 24 Aug 09:48:01.076 # Unable to set the max number of files limit to 10032 (Operation not permitted), setting the max clients configuration to 3984.
                _._                                                  
           _.-``__ ''-._                                             
      _.-``    `.  `_.  ''-._           Redis 2.8.4 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._                                   
 (    '      ,       .-`  | `,    )     Running in stand alone mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6379
 |    `-._   `._    /     _.-'    |     PID: 2850
  `-._    `-._  `-./  _.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |           http://redis.io        
  `-._    `-._`-.__.-'_.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |                                  
  `-._    `-._`-.__.-'_.-'    _.-'                                   
      `-._    `-.__.-'    _.-'                                       
          `-._        _.-'                                           
              `-.__.-'                                               

[2850] 24 Aug 09:48:01.077 # Server started, Redis version 2.8.4
[2850] 24 Aug 09:48:05.050 * DB loaded from append only file: 3.972 seconds
[2850] 24 Aug 09:48:05.051 * The server is now ready to accept connections on port 6379

Mentioned in SAL [2016-08-24T09:50:33Z] <hashar> deployment-redis02 fixed AOF file /srv/redis/deployment-redis02-6379.aof and restarted the redis instance should fix T143655 and might help T142600

zeljkofilipin claimed this task.

Thanks @hashar, the problem seems to be fixed! :)

This instance probably crashed or something recently - I caught it failing puppet due to apt-get failures, because something hadn't been able to fully install.