Page MenuHomePhabricator

Decom parsoid-lb.eqiad.wikimedia.org entrypoint
Closed, ResolvedPublic0 Estimated Story Points

Description

In theory, we're already done with this in practical terms, but it remains to verify that there's no real traffic here any more, and then actually deconfigure it at various levels:

  • Remove DNS for parsoid-lb.eqiad.wikimedia.org
  • Remove DNS for parsoidcache.svc.eqiad.wmnet
  • De-configure all puppetization of these listeners (LVS/nginx/varnish)
  • Remove parsoid-specific VCL code from the cache_parsoid cluster (which leaves other services: graphoid, citoid, cxserver, restbase)

Note that "parsoid.svc.eqiad.wmnet" in LVS is completely separate from this, and not being removed.

Will post some traffic logs later

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Just FYI that parsoid-lb.eqiad.wm.org is used for RESTBase testing (both from localhost and Travis), but this can be easily dealt with.

So, I took a 1 hour log of all traffic on the 2x varnish frontends for parsoidcache with any Host header matching parsoid.*.

There were 59 total requests, all but one using the hostname parsoid-lb.eqiad.wikimedia.org.

One req per minute? I'd say that's extremely low. Either you took it in the wrong hour, or we are in better shape than we thought we were :) Would you mind running a 24h or even a 48h trace so to be sure all of the clients counting on small latencies have indeed switched to RESTBase ?

I think it's typically that low, but yeah we can run some longer checks. Note that's filtered only for the parsoid hostnames on the parsoid cache: it doesn't cover use of the correct distinct hostnames for the other services there (cxserver, citoid, graphoid, restbase), and obviously doesn't include the (unrelated) direct, uncached LVS entrypoint to parsoid itself at parsoid.svc.eqiad.wmnet.

Why are we decommissioning this? This is very useful as a public parsoid endpoint. We announced it as such:
http://osdir.com/ml/general/2013-11/msg33063.html
and it's promenient in our documentation:
https://www.mediawiki.org/wiki/Parsoid/API
and the default configuration of the OCG toolset points to this as well:
https://github.com/cscott/mw-ocg-bundler/blob/master/bin/mw-ocg-bundler#L27 (& etc)
as well as kiwix's zim building tools, most likely.

Scott, at this point, kiwix, and everyone else can probably use the restbase api to access content.

Sure, I'm just pointing out that we've been telling people to use parsoid-lb.eqiad.wikimedia.org for a long time. We should at least deprecate it, announce its deprecation to the same mailing lists we announced the service too, and update all the (many) google hits for parsoid-lb.eqiad.wikimedia.org before turning it off.

And the VE guys are in the habit of using the private rt and wikitext conversion forms on parsoid-lb.eqiad.wikimedia.org for their testing as well. I taught James Forrester about the RESTBase API just last week, but the RESTBase API has only a single-line input for wikitext, which isn't nearly as convenient as the parsoid form. There are a *lot* of things still depending on having a publicly-available parsoid instance.

For sure, announcements, etc. should happen. But, one thought is that this is a potential DOS vector -- since we are hitting the cluster directly for full parse requests.

Really a name like parsoid-lb.eqiad.wikimedia.org should never have been announced, but it is what it is. My understanding is that public parsoid functionality should all be through the text-lb entrypoint from here forward (which is through the wiki's own domainname), minus whatever historical / corner cases we're tracking down here.

Also, one request per minute, the bulk of which are internal service-checks of the /_version URL, are not a *lot* of things at first glance. We can't be dedicating the kind of complexity that a separate cache cluster entails to just that.

Scott, at this point, kiwix, and everyone else can probably use the restbase api to access content.

Also, the thing to note is that response times are not an important factor for clients like Kiwix, so even if they keep hitting Parsoid directly we can still remove the cache from the equation.

We can't be dedicating the kind of complexity that a separate cache cluster entails to just that.

+1e3 to that

@Krenair, @ssastry: Both VE and rt testing should be able to use http://parsoid.svc.eqiad.wmnet:8000/, which removes the dependency on the cache.

However @BBlack intends to remove that as well (see description)

However @BBlack intends to remove that as well (see description)

I don't think so:

Note that "parsoid.svc.eqiad.wmnet" in LVS is completely separate from this, and not being removed.

parsoid vs. parsoidcache, right, sorry.

I'm totally happy for the Parsoid endpoint to be replaced when RESTbase provides the same functionality, but it'd be a shame to lose it (in particular the dump-wikitext-and-get-rendered-HTML-back feature).

(RESTBase does have a wt2html endpoint but the input is a single line text input widget, which is inconvenient for manual testing of the sort @Jdforrester-WMF wants. The output from the interactive form is also newline-deficient IIRC; it certainly isn't going through a pretty-printer or even being displayed as HTML.)

In any case (and this point has been confusing throughout the life of parsoidcache): the parsoid service itself would still exist. This is just about the cache layer sitting in front of it and outside-world access to it.

@Jdforrester-WMF, @cscott: It sounds like the only missing bit to make the API easier to test for your use case would be a small HTML form that sends wikitext to the transform end point (possibly parametrized using JS). The return value from RB is just the HTML, so will render naturally.

Another option would be to add an option to switch to the bare return value in swagger-ui, along with a way to scale the text input field.

@BBlack it's the "outside-world access to it" which is the primary issue, since that's used as a convenience for a variety of things: (a) the RESTBase test suite (T110711), (b) manual VE testing (T110712), (c) Parsoid round-trip testing (T110715), (d) external clients like Kiwix and OCG (the latter can use restbase, but it's not the default: T110713). I've filed T110714 for "public announcement of decommissioning" as well.

I've created tasks for these and added them as blockers.

Change 242157 had a related patch set uploaded (by Cscott):
Remove most references to parsoid-lb.eqiad.wikimedia.org.

https://gerrit.wikimedia.org/r/242157

And from puppet:
manifests/role/iegreview.pp: parsoid_url => 'http://parsoid-lb.eqiad.wikimedia.org/enwiki/',

The usage from Wikimedia-IEG-grant-review is easy to change if someone can clue me in on the new proper URL to a parsoid service. The usage in this app is just using Parsoid to expand administrator supplied wikitext to HTML. The HTML output is cached locally by the Grant Review app and only re-queries Parsoid if an admin changes the wikitext in the application database.

Change 242157 merged by jenkins-bot:
Remove most references to parsoid-lb.eqiad.wikimedia.org.

https://gerrit.wikimedia.org/r/242157

@bd808 the new hotness is to use the REST API's /transform/wikitext/to/html endpoint to do this.
http://rest.wikimedia.org/en.wikipedia.org/v1/?doc#!/Transforms/post_transform_wikitext_to_html_title_revision

eg:

curl -X POST --header "Content-Type: application/x-www-form-urlencoded" --header "Accept: text/html; profile="mediawiki.org/specs/html/1.1.0"" -d "wikitext=Hey%20'''there'''&body_only=true" "http://rest.wikimedia.org/en.wikipedia.org/v1/transform/wikitext/to/html"

Status for parsoid-lb.eqiad.wikimedia.org: only a small handful of requests are still flowing that aren't internal healthchecks. It can be minutes between seeing legitimate requests arrive.

Before we can pull this hostname from DNS, however, it would be a good idea (in addition to notifying the community) to clean up refs to it in our own repositories:

https://github.com/search?p=1&q=%40wikimedia+parsoid-lb.eqiad.wikimedia.org&type=Code&utf8=%E2%9C%93

I'm imagining most of those configuration entries are now-unused defaults and such, but they should be confirmed and cleared out before moving on this.

Change 267269 had a related patch set uploaded (by Subramanya Sastry):
T110474: Point iegreview to internal parsoid url

https://gerrit.wikimedia.org/r/267269

Change 267270 had a related patch set uploaded (by Subramanya Sastry):
T110474: Point restbase to internal parsoid url

https://gerrit.wikimedia.org/r/267270

Change 267270 merged by BBlack:
T110474: Point restbase to internal parsoid url

https://gerrit.wikimedia.org/r/267270

Change 267862 had a related patch set uploaded (by BBlack):
switch default parsoid URL to internal service

https://gerrit.wikimedia.org/r/267862

Change 267863 had a related patch set uploaded (by BBlack):
default parsoid URL to use internal service name

https://gerrit.wikimedia.org/r/267863

Change 267863 abandoned by BBlack:
default parsoid URL to use internal service name

https://gerrit.wikimedia.org/r/267863

BBlack changed the task status from Open to Stalled.Feb 3 2016, 3:25 AM

Re: timing - per https://lists.wikimedia.org/pipermail/wikitech-l/2016-January/084649.html and considering weekend stuff, we're can't really remove the public hostname until at least Feb 22nd.

Took another log of all traffic today, for ~1 hour. Excluding our own healthcheck/monitoring requests, there were 14 total requests to the parsoidcache (so ~1 every 4-5 minutes on average). The requests were:

8x of this exact request, which seems to be from a 3rd party using a defaulted configuration for ocg-bundler, making requests to us for a wiki we don't support?

29 RxURL        c /neuroweb.pic.es/v3/page/html/Main%20Page/1
29 RxHeader     c User-Agent: mw-ocg-bundler/1.3.0-git/root
29 RxHeader     c host: parsoid-lb.eqiad.wikimedia.org

5x requests similar to this (but for various different Talk: pages), which seem to be related to some kind of CI/testing on labs?

17 RxURL        c /enwiki/Talk%3ARichard_Nixon?oldid=703383873
17 RxHeader     c Host: parsoid-prod.wmflabs.org
17 RxHeader     c user-agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:44.0) Gecko/20100101 Firefox/44.

And one additional request to parsoid-prod.wmflabs.org for /favicon.ico.

While breaking parsoid-prod.wmflab.org's favicon is a heavy price to pay, all those sacrifices will be worth it in the end. The eagle has landed.

Sometime between now and the 22nd, I'll try to get a full capture for a period of multiple days so we can feel more confident there aren't still intermittent spikes from some important source or other.

Change 267862 merged by jenkins-bot:
switch default parsoid URL to internal service

https://gerrit.wikimedia.org/r/267862

Status updates:

  1. Remaining code refs:
    1. https://github.com/wikimedia/mediawiki-services-cxserver-deploy/blob/master/debian/config.js
    2. https://github.com/wikimedia/mediawiki-extensions-Collection-OfflineContentGenerator-bundler/blob/master/lib/metabook.js#L162
    3. https://github.com/wikimedia/mediawiki-extensions-Collection-OfflineContentGenerator-bundler/blob/master/bin/mw-ocg-bundler#L29
    4. https://github.com/wikimedia/mediawiki-extensions-Collection-OfflineContentGenerator/blob/master/scripts/loadtest.js#L34
    5. https://github.com/wikimedia/mediawiki-services-restbase/search?utf8=%E2%9C%93&q=parsoid-lb
    6. https://github.com/wikimedia/restbase/search?utf8=%E2%9C%93&q=parsoid-lb
  1. Traffic
    • Took an 18 hour log from 19:00 Feb 9 -> 13:00 Feb 10 UTC
    • filtered out internal healthchecks and obvious trash (e.g. robots.txt, favicon, random exploit-spam traffic against non-existent URLs, etc)
    • 243 total requests (avg 1 req per 4.44 mins)
    • P2581 Raw list with UA and/or Referer if available
    • Broadly categorized by host and first segment of URL path:
      • 180 parsoid-prod.wmflabs.org:/enwiki
      • 35 parsoid-lb.eqiad.wikimedia.org:/_rt
      • 15 parsoid-lb.eqiad.wikimedia.org:/skwiki
      • 8 parsoid-lb.eqiad.wikimedia.org:/v2
      • 3 parsoid-lb.eqiad.wikimedia.org:/cswiki
      • 2 parsoid-lb.eqiad.wikimedia.org:/enwiki
Amire80 added a subscriber: Amire80.

Is there anything left to do in ContentTranslation or cxserver code?

Is there anything left to do in ContentTranslation or cxserver code?

@Amire80: Yes, fix this: https://github.com/wikimedia/mediawiki-services-cxserver-deploy/blob/master/debian/config.js

Thanks. I added T127308 and removed CX tags from this task to clean up our board ;)

24 hour log run, with pre-filtering for internal monitoring requests and definite random crawler/junk/noise traffic:

  • 206x (avg 8.6/hour) "Host: parsoid-prod.wmflabs.org" with client IP 10.68.21.68 (which is a labs proxy)
  • 55x (avg 2.4/hour) relatively-legit requests from elsewhere, which can be broken down to:
    • 18x requests for various /_(rt|rtform|html)/ sorts of URLs, which are the stuff linked from the index page at https://parsoid-lb.eqiad.wikimedia.org/ under "There are also some convenient tools for experiments. These are not part of the public API."
      • 15x from a single external IP address somewhere in Russia (which may just be crawling? not sure)
      • 3x from various google-proxy IPs
    • 37x requests to normal parsoid urls like /XXwiki/.... or /v2/...., which break down to:
      • 14x from the same singular external Russian IP referenced earlier
      • 12x from the labs IP 10.68.18.48 which is tools-webgrid-lighttpd-1205 (and all for /cswiki/...)
      • 8x from various tools-exec nodes (2x requests each from tools-exec-12(02|14|15|21), all related to enwiki Template%3ADid_you_know)
      • 3x other requests from the outside world, all for articles named foo or Foobar from 2 distinct random cloud-hosting-service user IPs

Does anyone have a handle on what the random low-traffic labs usages are at the bottom of the list above?

As for parsoid-prod.wmflabs.org - this is apparently a public IP allocated in labs (208.80.155.156) which appears to just proxy traffic into the production parsoid-lb.eqiad.wikimedia.org IP, which is kind of horrible...

I should note, my inclination is to just shut this down today so that we can move on with other related/blocked work. We're past our 3 weeks' public notice, and any way you slice the POV about which requests are legitimate or not, the rate of them is extremely tiny...

Another way to think of the stats: ignoring parsoid-prod.wmflabs.org crazy proxy thing, and ignoring this one oddball Russian IP, there were 6 legit requests from the outside world and 20 legit requests from labs over a 24 hour period, for a combined avg rate of 1.1/hour

Change 272484 had a related patch set uploaded (by BBlack):
cache_parsoid: remove public DNS

https://gerrit.wikimedia.org/r/272484

Change 272484 merged by BBlack:
cache_parsoid: remove from DNS

https://gerrit.wikimedia.org/r/272484

FYI, I didn't realize that this was taking parsoid-prod.wmflabs.org down until it happened, so a lot of the requests from it were probably from my https://en.wikipedia.org/wiki/WP:EPH script.

FYI, I didn't realize that this was taking parsoid-prod.wmflabs.org down until it happened, so a lot of the requests from it were probably from my https://en.wikipedia.org/wiki/WP:EPH script.

Sorry about that!

You should be able to redirect your requests to RESTBase and have your script functional again. But, the url for the RESTBase requests will be a bit different from the Parsoid API.

No problem, I already have it taken care of.