Page MenuHomePhabricator

Migrate Parsoid cache from Trusty to Jessie
Closed, ResolvedPublic

Description

Running puppet on deployment-parsoidcache02 yields:

Error: /Stage[main]/Role::Cache::Statsd/Varnish::Logging::Statsd[default]/Base::Service_unit[varnishstatsd-default]/Service[varnishstatsd-default]:
Could not evaluate: Could not find init script or upstart conf file for 'varnishstatsd-default'

The reason is the instance uses Trusty which is no more supported.

We need a Jessie instance to be created with enough disk space for the extended disk. A m1.small will not have enough disk.

Event Timeline

hashar created this task.Jun 24 2015, 8:00 AM
hashar raised the priority of this task from to Needs Triage.
hashar updated the task description. (Show Details)
hashar added subscribers: hashar, BBlack, thcipriani.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 24 2015, 8:00 AM

This is also a problem on deployment-cache-text02 and likely others as well.

thcipriani triaged this task as Normal priority.Jul 6 2015, 7:34 PM
Restricted Application added a subscriber: Luke081515. · View Herald TranscriptJul 6 2015, 7:34 PM
Krenair added a subscriber: Krenair.Sep 4 2015, 4:56 PM

This is also a problem on deployment-cache-text02 and likely others as well.

Looks like it's just parsoidcache02 now.

The other caches have been migrated to Jessie which comes with systemd (T98758). deployment-parsoidcache02 still uses Trusty hence the failure.

Turns out we had an instance created using Jessie:

deployment-cache-parsoid04
debian-8.1-jessie	
10.68.19.197

parsoid-beta.wmflabs.org points to the public IP 208.80.155.156 which is the labs shared proxy. That is entirely wrong: beta-cluster should use a dedicated cache.

Turns out we had an instance created using Jessie:

deployment-cache-parsoid04
debian-8.1-jessie	
10.68.19.197

parsoid-beta.wmflabs.org points to the public IP 208.80.155.156 which is the labs shared proxy. That is entirely wrong: beta-cluster should use a dedicated cache.

hashar renamed this task from deployment-parsoidcache02 fails puppet: Role::Cache::Statsd/Varnish::Logging::Statsd[default]/Base::Service_unit[varnishstatsd-default]/Service[varnishstatsd-default]: Could not evaluate: Could not find init script or upstart conf file for 'varnishstatsd-default' to Migrate Parsoid cache from Trusty to Jessie.Oct 28 2015, 9:51 AM
hashar claimed this task.
hashar updated the task description. (Show Details)
hashar set Security to None.

Deleted deployment-cache-parsoid04 which is too small.

Created deployment-cache-parsoid05 a m1.medium or 40GB of disk.

We should delete:

deployment-parsoidcache02Trusty10.68.16.145

To be replaced with:

deployment-cache-parsoid05Jessie10.68.20.102

Change 249366 had a related patch set uploaded (by Mobrovac):
Labs: Parsoid Cache: Use new IP address for deployment-parsoidcache02

https://gerrit.wikimedia.org/r/249366

Change 249367 had a related patch set uploaded (by Mobrovac):
BetaCluster: Use deployment-cache-parsoid05

https://gerrit.wikimedia.org/r/249367

Change 249366 merged by jenkins-bot:
beta: use new IP for Parsoid Cache

https://gerrit.wikimedia.org/r/249366

Change 249367 merged by jenkins-bot:
BetaCluster: Use deployment-cache-parsoid05

https://gerrit.wikimedia.org/r/249367

Doing a VE edit on beta I get:

Error loading data from server: HTTP 504.

In the browser console there is:

http://en.wikipedia.beta.wmflabs.org/api/rest_v1/page/html/User%3AHashar
Failed to load resource: the server responded with a status of 504 (Gateway Timeout)

And:

$ curl http://en.wikipedia.beta.wmflabs.org/api/rest_v1/page/html/User%3AHashar|python -m json.tool
{
    "detail": "Error: connect ECONNREFUSED",
    "method": "get",
    "type": "https://restbase.org/errors/internal_http_error",
    "uri": "http://deployment-parsoid05.deployment-prep.eqiad.wmflabs:8000/v2/en.wikipedia.beta.wmflabs.org/pagebundle/User%3AHashar/85930"
}

On the Parsoid instance:

deployment-parsoid05:~$ curl http://127.0.0.1:8000/v2/en.wikipedia.beta.wmflabs.org/pagebundle/User%3AHashar/85930
curl: (7) Failed to connect to 127.0.0.1 port 8000: Connection refused

And indeed parsoid is not running there...

The merge of https://gerrit.wikimedia.org/r/#/c/249367/ did trigger the Jenkins job beta-parsoid-update-eqiad with:

+ sudo /etc/init.d/parsoid restart
00:02:46.782 parsoid start/running, process 27941

Parsoid fails with:

module.js:340
    throw err;
          ^
Error: Cannot find module '/srv/deployment/parsoid/parsoid/api/server.js'
    at Function.Module._resolveFilename (module.js:338:15)
    at Function.Module._load (module.js:280:25)
    at Function.Module.runMain (module.js:497:10)
    at startup (node.js:119:16)
    at node.js:902:3
hashar changed the task status from Open to Stalled.Oct 28 2015, 11:02 AM

stalled / pending fixup of Parsoid. Should fill another task for it.

Lot of files have been renamed in the parsoid source repo but the setting files to start the service have not been updated yet. So it is pending T116901: Parsoid refuses to start on beta cluster

So Parsoid is running fine:

root@deployment-parsoid05:~# curl http://localhost:8000/_version|python -m json.tool
{
	    "name": "parsoid",
	    "sha": "47b8013ae7b15428122579bfff4943191c1866db",
	    "version": "0.4.1-git"
}

Gotta verify whether the Varnish Parsoid cache is properly working. No idea how to assert that though :-(

Could use help from Parsoid people to verify whether the new Parsoid Varnish cache is properly working.

The instance is using role::cache::parsoid, it is now a Jessie distribution:

deployment-cache-parsoid05Jessie10.68.20.102

If it is found to work fine, I guess we can finally resolve this task.

So there is still a web proxy entry:

Moved it to the new instance:

hashar closed this task as Resolved.Oct 28 2015, 3:24 PM

curl 'http://parsoid-beta.wmflabs.org/' gives me:

< X-Cache: deployment-cache-parsoid05 hit (2), deployment-cache-parsoid05 frontend miss (0)

So we have migrated to Jessie finally!

Restricted Application added a project: Operations. · View Herald TranscriptMay 4 2016, 9:14 AM