Page MenuHomePhabricator

Performance review BounceHandler extension for Deployment
Closed, ResolvedPublic

Description

Sub part of Bug: https://bugzilla.wikimedia.org/show_bug.cgi?id=69019

Quoting: https://bugzilla.wikimedia.org/show_bug.cgi?id=69019#c0

The BounceHandler extension generates a unique VERP address on every sent email, and has the bouncehandler API that can handle incoming email bounces once the bounce is HTTP POSTed to it via curl from exim. The extension was built as part of the VERP project to properly handle email bounces.

Pipe incoming bounces to the bouncehandler API:-
You need to add the following to your incoming mails to *@wikimedia.com receving PIPE transport:
command = /usr/bin/curl "action=bouncehandler" --data-urlencode "email@-" http://$IP/api.php

The genrated VERP address is of the form
$wikiId-base36( $UserID )-base36( $Timestamp )-hash( $algorithm,$key, $prefix )@$email_domain

where
$wikiId - The wiki database name, to support multiple wikis
$userID - The user_id from table 'user' - to uniquely identify a recipient.
$Timestamp - The unix timestamp.
$prefix = $wikiId. '-'. base_convert( $uid, 10, 36). '-'. base_convert( $timeNow, 10, 36);

It use the Plancake mail parser external library to extract the headers as an addition to the optional inbuilt regex functions.

Gerrit: https://github.com/wikimedia/mediawiki-extensions-BounceHandler


Version: master
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=69101

Details

Reference
bz69100

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:34 AM
bzimport set Reference to bz69100.

The extension is currently in beta, and in the testing phase. Can we give this a bit speed-up ? You can test a sample mail at deployment.wikimedia.beta.wmflabs.org/wiki/Special:EmailUser.

Ori/Aaron have decided to stop doing pre-deployment performance reviews.

I've unblocked this from the deployment bug and moved it to the BounceHandler component (feel free to close or leave open if you ever do want a perf review).

Some bits:

  • getOriginalEmail() doesn't call reuseConnection() and the return value is not well defined
  • processBounceHeaders()/handleFailingRecipient() doesn't call reuseConnection()
  • The query in handleFailingRecipient() is unindexed, how large is the table expected to get?
  • Also, it would be nice of the table could go in extension-1 (e.g. a custom cluster). Echo and Flow do this.

Where is the code for the broker that takes in the bounces and does a POST to the API?

(In reply to Aaron Schulz from comment #3)

Where is the code for the broker that takes in the bounces and does a POST
to the API?

Here is the code that would go to polonium, that would HTTP POST to the API https://github.com/wikimedia/operations-puppet/blob/production/templates/exim/exim4.conf.SMTP_IMAP_MM.erb#L670

We will have it point to en-wiki after this change - https://gerrit.wikimedia.org/r/#/c/168622/

Change 169654 had a related patch set uploaded by 01tonythomas:
Handle the return value of getOriginalEmail efficiently

https://gerrit.wikimedia.org/r/169654

Change 169654 merged by jenkins-bot:
Various performance fixed for the BounceHandler extension

https://gerrit.wikimedia.org/r/169654

(In reply to Aaron Schulz from comment #3)

  • The query in handleFailingRecipient() is unindexed, how large is the table

expected to get?

I was about to file a bug about that

I guess it should probably be a partial index on the email field, and then the timestamp.

Something like:

CREATE INDEX /*i*/br_mail_timestamp ON /*_*/user (br_user_email(50), br_timestamp);

In core we only index the first 50 chars of the email on the user table

I'd second having an index as above.

(In reply to Sam Reed (reedy) from comment #7)

CREATE INDEX /*i*/br_mail_timestamp ON /*_*/user (br_user_email(50),
br_timestamp);

Using the right table would help :)

CREATE INDEX /*i*/br_mail_timestamp ON /*_*/bounce_records(br_user_email(50), br_timestamp);

Change 170759 had a related patch set uploaded by 01tonythomas:
Create table index for 'bounce_records' table

https://gerrit.wikimedia.org/r/170759

Change 170759 merged by jenkins-bot:
Create table index for 'bounce_records' table

https://gerrit.wikimedia.org/r/170759

Change 170835 had a related patch set uploaded by Aaron Schulz:
Added wgBounceHandlerCluster setting

https://gerrit.wikimedia.org/r/170835

Change 170835 merged by jenkins-bot:
Added wgBounceHandlerCluster setting

https://gerrit.wikimedia.org/r/170835