Page MenuHomePhabricator

Convert media handling code (PdfHandler, PagedTiffHandler) to use Shellbox
Open, Needs TriagePublic

Description

These two extensions shell out to pdfinfo, pdftext, tiffinfo, etc. to extract metadata out of uploads and will need to be converted to use Shellbox.


  • Review charts
  • shellbox-media namespaces in k8s
  • shellbox-media accounts in k8s.
  • shellbox-media puppet private tokens.
  • Generate TLS certificates
  • Review helmfile.d files:
  • LVS setup
  • DNS for LVS records
  • Discovery DNS
  • Monitoring dashboard
  • Integration and Acceptance tests

Details

ProjectBranchLines +/-Subject
mediawiki/coremaster+71 -12
operations/mediawiki-configmaster+4 -1
operations/deployment-chartsmaster+8 -1
operations/mediawiki-configmaster+4 -1
operations/mediawiki-configmaster+13 -2
operations/mediawiki-configmaster+12 -2
operations/mediawiki-configmaster+6 -0
operations/puppetproduction+18 -0
operations/dnsmaster+6 -0
operations/puppetproduction+3 -3
operations/puppetproduction+3 -3
operations/puppetproduction+3 -3
operations/puppetproduction+120 -0
operations/dnsmaster+15 -2
mediawiki/extensions/PdfHandlermaster+87 -43
mediawiki/extensions/PagedTiffHandlermaster+117 -53
operations/deployment-chartsmaster+65 -0
operations/deployment-chartsmaster+1 -0
operations/puppetproduction+10 -0
labs/privatemaster+12 -0
mediawiki/libs/Shellboxmaster+16 -0
mediawiki/extensions/PdfHandlermaster+0 -15
Show related patches Customize query in gerrit

Event Timeline

Change 717124 had a related patch set uploaded (by Legoktm; author: Legoktm):

[mediawiki/extensions/PdfHandler@master] Remove questionable PdfHandler::isEnabled() implementation

https://gerrit.wikimedia.org/r/717124

Change 717154 had a related patch set uploaded (by Legoktm; author: Legoktm):

[mediawiki/extensions/PdfHandler@master] [WIP] Port retrieveMetaData to BoxedCommand

https://gerrit.wikimedia.org/r/717154

Change 717124 merged by jenkins-bot:

[mediawiki/extensions/PdfHandler@master] Remove questionable PdfHandler::isEnabled() implementation

https://gerrit.wikimedia.org/r/717124

Change 719651 had a related patch set uploaded (by Legoktm; author: Legoktm):

[mediawiki/extensions/PagedTiffHandler@master] [WIP] Port retrieveMetaData to BoxedCommand

https://gerrit.wikimedia.org/r/719651

Change 720143 had a related patch set uploaded (by Legoktm; author: Legoktm):

[mediawiki/core@master] [WIP] media: Port DjVuImage::retrieveMetaData() to use BoxedCommand

https://gerrit.wikimedia.org/r/720143

Change 721603 had a related patch set uploaded (by Legoktm; author: Legoktm):

[mediawiki/libs/Shellbox@master] pipeline: Build image for media handling

https://gerrit.wikimedia.org/r/721603

Change 721603 merged by jenkins-bot:

[mediawiki/libs/Shellbox@master] pipeline: Build image for media handling

https://gerrit.wikimedia.org/r/721603

Change 721633 had a related patch set uploaded (by Legoktm; author: Legoktm):

[labs/private@master] Add k8s users/tokens for shellbox-media

https://gerrit.wikimedia.org/r/721633

Change 721633 merged by Legoktm:

[labs/private@master] Add k8s users/tokens for shellbox-media

https://gerrit.wikimedia.org/r/721633

Change 721634 had a related patch set uploaded (by Legoktm; author: Legoktm):

[operations/puppet@production] Add tokens and users for shellbox-media service

https://gerrit.wikimedia.org/r/721634

Change 721635 had a related patch set uploaded (by Legoktm; author: Legoktm):

[operations/deployment-charts@master] Add namespace for shellbox-media service

https://gerrit.wikimedia.org/r/721635

Change 721634 merged by Legoktm:

[operations/puppet@production] Add tokens and users for shellbox-media service

https://gerrit.wikimedia.org/r/721634

Change 721635 merged by jenkins-bot:

[operations/deployment-charts@master] Add namespace for shellbox-media service

https://gerrit.wikimedia.org/r/721635

Change 721637 had a related patch set uploaded (by Legoktm; author: Legoktm):

[operations/deployment-charts@master] helmfile.d: Add shellbox-media

https://gerrit.wikimedia.org/r/721637

Change 721637 merged by jenkins-bot:

[operations/deployment-charts@master] helmfile.d: Add shellbox-media

https://gerrit.wikimedia.org/r/721637

Change 721904 had a related patch set uploaded (by Legoktm; author: Legoktm):

[operations/puppet@production] Add LVS for new Shellboxes: media, syntaxhighlight & timeline

https://gerrit.wikimedia.org/r/721904

Change 721905 had a related patch set uploaded (by Legoktm; author: Legoktm):

[operations/puppet@production] service: Switch new Shellboxes to lvs_setup

https://gerrit.wikimedia.org/r/721905

Change 721906 had a related patch set uploaded (by Legoktm; author: Legoktm):

[operations/puppet@production] service: Switch new Shellboxes to monitoring_setup

https://gerrit.wikimedia.org/r/721906

Change 721908 had a related patch set uploaded (by Legoktm; author: Legoktm):

[operations/dns@master] Add *.svc.{codfw,eqiad}.wmnet entries for new Shellboxes

https://gerrit.wikimedia.org/r/721908

Change 721907 had a related patch set uploaded (by Legoktm; author: Legoktm):

[operations/puppet@production] service: Switch new Shellboxes to production

https://gerrit.wikimedia.org/r/721907

Change 721909 had a related patch set uploaded (by Legoktm; author: Legoktm):

[operations/dns@master] Add new Shellboxes to discovery

https://gerrit.wikimedia.org/r/721909

Change 719651 merged by jenkins-bot:

[mediawiki/extensions/PagedTiffHandler@master] Port retrieveMetaData to BoxedCommand

https://gerrit.wikimedia.org/r/719651

Change 717154 merged by jenkins-bot:

[mediawiki/extensions/PdfHandler@master] Port retrieveMetaData to BoxedCommand

https://gerrit.wikimedia.org/r/717154

Change 721908 merged by Legoktm:

[operations/dns@master] Add *.svc.{codfw,eqiad}.wmnet entries for new Shellboxes

https://gerrit.wikimedia.org/r/721908

Change 721904 merged by Legoktm:

[operations/puppet@production] Add LVS for new Shellboxes: media, syntaxhighlight & timeline

https://gerrit.wikimedia.org/r/721904

Change 721905 merged by Legoktm:

[operations/puppet@production] service: Switch new Shellboxes to lvs_setup

https://gerrit.wikimedia.org/r/721905

Change 721906 merged by Legoktm:

[operations/puppet@production] service: Switch new Shellboxes to monitoring_setup

https://gerrit.wikimedia.org/r/721906

Change 721907 merged by Legoktm:

[operations/puppet@production] service: Switch new Shellboxes to production

https://gerrit.wikimedia.org/r/721907

Change 721909 merged by Legoktm:

[operations/dns@master] Add new Shellboxes to discovery

https://gerrit.wikimedia.org/r/721909

Change 722736 had a related patch set uploaded (by Legoktm; author: Legoktm):

[operations/puppet@production] services_proxy: Add envoy proxies for new Shellboxes

https://gerrit.wikimedia.org/r/722736

Change 722737 had a related patch set uploaded (by Legoktm; author: Legoktm):

[operations/mediawiki-config@master] ProductionServices: Add new Shellboxes

https://gerrit.wikimedia.org/r/722737

Change 722736 merged by Legoktm:

[operations/puppet@production] services_proxy: Add envoy proxies for new Shellboxes

https://gerrit.wikimedia.org/r/722736

Change 722737 merged by jenkins-bot:

[operations/mediawiki-config@master] ProductionServices: Add new Shellboxes

https://gerrit.wikimedia.org/r/722737

Change 723050 had a related patch set uploaded (by Legoktm; author: Legoktm):

[operations/mediawiki-config@master] Have PdfHandler use Shellbox service on group0 wikis

https://gerrit.wikimedia.org/r/723050

Change 723052 had a related patch set uploaded (by Legoktm; author: Legoktm):

[operations/mediawiki-config@master] Have PagedTiffHandler use Shellbox service on group0 wikis

https://gerrit.wikimedia.org/r/723052

Change 723052 merged by jenkins-bot:

[operations/mediawiki-config@master] Have PagedTiffHandler use Shellbox service on group0 wikis

https://gerrit.wikimedia.org/r/723052

Mentioned in SAL (#wikimedia-operations) [2021-09-27T22:13:35Z] <legoktm@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Have PagedTiffHandler use Shellbox service on group0 wikis (T289228) (1/2) (duration: 00m 57s)

Mentioned in SAL (#wikimedia-operations) [2021-09-27T22:14:48Z] <legoktm@deploy1002> Synchronized wmf-config/CommonSettings.php: Have PagedTiffHandler use Shellbox service on group0 wikis (T289228) (2/2) (duration: 00m 58s)

Change 723050 merged by jenkins-bot:

[operations/mediawiki-config@master] Have PdfHandler use Shellbox service on group0 wikis

https://gerrit.wikimedia.org/r/723050

Mentioned in SAL (#wikimedia-operations) [2021-09-27T22:25:07Z] <legoktm@deploy1002> sync-file aborted: Have PdfHandler use Shellbox service on group0 wikis (T289228) (duration: 00m 00s)

Mentioned in SAL (#wikimedia-operations) [2021-09-27T22:26:08Z] <legoktm@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Have PdfHandler use Shellbox service on group0 wikis (T289228) (1/2) (duration: 00m 57s)

Mentioned in SAL (#wikimedia-operations) [2021-09-27T22:27:25Z] <legoktm@deploy1002> Synchronized wmf-config/CommonSettings.php: Have PdfHandler use Shellbox service on group0 wikis (T289228) (2/2) (duration: 00m 56s)

So far we're not really getting any traffic from group0 wikis which is mostly expected. When we move to group1, we should exclude commons until the final step since that's actually going to be our biggest source of requests.

I forgot to tag https://gerrit.wikimedia.org/r/724572 with this bug, but PdfHandler+PagedTiffHandler now use Shellbox on all non-Commons wikis. Even that is no meaningful traffic, we'll have to do some % rollout on Commons.

Change 724576 had a related patch set uploaded (by Legoktm; author: Legoktm):

[operations/mediawiki-config@master] Have PdfHandler use Shellbox on 10% of requests

https://gerrit.wikimedia.org/r/724576

Change 724577 had a related patch set uploaded (by Legoktm; author: Legoktm):

[operations/mediawiki-config@master] Have PagedTiffHandler use Shellbox on 10% of requests

https://gerrit.wikimedia.org/r/724577

Change 725121 had a related patch set uploaded (by Legoktm; author: Legoktm):

[operations/deployment-charts@master] Scale up shellbox-media

https://gerrit.wikimedia.org/r/725121

Change 725121 merged by jenkins-bot:

[operations/deployment-charts@master] Scale up shellbox-media

https://gerrit.wikimedia.org/r/725121

Change 724576 merged by jenkins-bot:

[operations/mediawiki-config@master] Have PdfHandler use Shellbox on Commons for 10% of requests

https://gerrit.wikimedia.org/r/724576

Mentioned in SAL (#wikimedia-operations) [2021-10-01T04:00:31Z] <legoktm@deploy1002> Synchronized wmf-config/CommonSettings.php: Have PdfHandler use Shellbox on Commons for 10% of requests (T289228) (duration: 00m 59s)

I've spot checked newly uploaded PDFs on Commons and they seem to be fine in getting metadata. I'll check again in my morning before bumping it up again, probalby 50%.

I forgot to add statsd metrics to PagedTiffHandler, so waiting for https://gerrit.wikimedia.org/r/725183 before enabling that.

Reviewing Special:NewFiles on Commons, I realized Shellbox probably won't be happy if we tried posting a 1GB+ tiff file to it, at the very least we need to increase max sizes that envoy, apache and php-fpm allow. Joe and I had previously discussed having the container download the file from swift for videoscalers, we should probably do something like that for large files too. I'll file a separate task for that.