Page MenuHomePhabricator

Security review of Newsletter extension
Closed, ResolvedPublic

Description

Please proceed with the security review of this new extension, aimed to be deployed in Wikimedia (T110170)

Repository: rENLT extension-Newsletter
Development status: T110642: Implement all the features required for running the Newsletter extension in Wikimedia & https://phabricator.wikimedia.org/tag/mediawiki-extensions-newsletter/
Test instance: http://newsletter-test.wmflabs.org/wiki/Special:Newsletters

Review of 631882e3779 of Newsletter extension (Jan 29, 2017)
The security review can be considered passed once the the minor issues noted below are addressed. If anything is unclear, don't hesitate to ask.

Minor security issues

  • "NewsletterTablePager.php" line 96 - Run htmlspecialchars after you truncate, not before. In this particular case, the worst that could happen would be for an entity reference to get cut off in the middle, but nonetheless escaping should always be the last thing you do. - T159075
  • For javascript taking stuff from data attributes - if the data attributes are not user controlled, use names starting with data-mw (e.g. instead of data-newsletter-id use data-mw-newsletter-id) unless you have a good reason not to. This prevents users from forging data attributes. In the context of this extension, it doesn't appear that user forging is an actual issue, however using the data-mw convention helps future proof things against any later changes where such an issue might be relavent. - T159085
  • "NewsletterContent.php" line 253 - Double escaping in the OOUI button labels. - T159075
  • NewsletterDiffEngine - The messages for the h4 headers are not escaped. - T159075
  • "SpecialNewsletter.php" line 142 - newsletter-subtitlelinks-foo message is not escaped when it is the active link. - T159075
  • "SpecialNewsletter.php" line 207 - Double escaping newsletter-do-unsubscribe. - T159075
  • "NewsletterTablePager.php" line 33 - getFieldNames() has double escaping. - T159075
  • Anyone with newsletter-create rights can edit a Newsletter namespace page (via api). While these changes aren't reflected in the relavent tables, they can make things inconsistent and confusing (e.g. Have the page display a different list of publishers than the actual list). Newsletter-manage permissions should apply to editing Newsletter pages via the api like it does during normal editing. The most obvious way to do this is with a getUserPermissionsErrors (or similar) hook. However, it may also be better to make the content handler page canonical and use it instead of the separate db tables where possible to eliminate the possibility of inconsistency in the system. - T160854

Related Objects

StatusSubtypeAssignedTask
DuplicateQgil
ResolvedQgil
ResolvedQgil
DeclinedNone
ResolvedAddshore
DuplicateNone
Resolvedori
ResolvedBawolff
ResolvedGlaisher
ResolvedTinaj1234
DeclinedNone
ResolvedGlaisher
Resolved01tonythomas
Resolved01tonythomas
Resolvednikitavbv
InvalidNone
ResolvedFilip
DuplicateNone
Resolved01tonythomas
ResolvedFilip
ResolvedMtDu
ResolvedMtDu
ResolvedGeorggi199
ResolvedFilip
ResolvedPppery
ResolvedFilip
ResolvedPppery
ResolvedPppery
ResolvedGlaisher
Resolved01tonythomas
ResolvedPppery
ResolvedDaimona
ResolvedDaimona
Resolved01tonythomas
ResolvedQgil
ResolvedFilip
ResolvedxSavitar
ResolvedNone
Resolved01tonythomas
Resolved01tonythomas

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@ori : how should we follow up with this one, as we got almost 16 patchsets merged right today ( after your security review ). https://gerrit.wikimedia.org/r/#/q/project:mediawiki/extensions/Newsletter,n,z

I'll review those changes, too. @csteipp, @dpatrick: would my review be sufficient, or would you still want to block this until one of you can review it as well?

Hi all, sorry for the delay in this. I've looked through the extension in depth, and just a few things are needed. Overall code safety looks good!

./includes/specials/SpecialNewsletterCreate.php

  • No spam filter integration. Even without wikitext, users can still put abusive stuff in the names/descriptions.
  • No CheckUser integration

./includes/specials/SpecialNewsletter.php

  • Should announcements be throttled too? That could flood the logs pretty quickly if someone decides to announce every page on enwiki at once
  • Since changes to the name/description aren't logged (nor have spam filter integration). If someone modifies their newsletter to have an abusive name / description, it would be hard to monitor.
  • If a newsletter with an abusive name is created, and then deleted, the echo notification still contains the original name. So there is no effective way for admins to clean up if someone starts putting bad stuff in names.

./includes/specials/pagers/NewsletterTablePager.php

  • (not security, but..) the sub-selects don't work on wikis with table prefixes

./modules/ext.newsletter.newsletters.js

  • You should use 'data-mw-whatever' for attributes that are generated server-side, and shouldn't contain user data (like the newsletter id). You're properly checking that it's an integer only, but in the event that someone later allows wikitext in the newsletter description, and they do more processing here in javascript, it seems like a recipe for dom xss.

In Production

  • Will deleteInactiveNewsletters.php be run as a cron? How frequently are you thinking this will run in production? Keeping "deleted" stuff around in the DB makes cleanup hard when we need to suppress stuff.
  • Related to that, you'll need to work with ops to make sure the tables are either not replicated to labs, or that rows for deleted newsletters aren't replicated. We don't want stuff that needs to be suppressed replicated to labs.

I am going to make separate tasks for some of the feedback.

./includes/specials/SpecialNewsletterCreate.php

  • No spam filter integration. Even without wikitext, users can still put abusive stuff in the names/descriptions.

I presume this involves AbuseFilter integration? This can probably be non-blocking (maybe I'm wrong), because I imagine on WMF wikis the newsletter-create privilege will not be given out to just anybody, similar to gadgets.

Thank you @csteipp ! This provides a good ToDo list to move from the Beta cluster to production in mediawiki.org.

I presume this involves AbuseFilter integration? This can probably be non-blocking (maybe I'm wrong), because I imagine on WMF wikis the newsletter-create privilege will not be given out to just anybody, similar to gadgets.

Replied at T132022#2186130, let's continue this conversation there.

./includes/specials/SpecialNewsletterCreate.php

  • No spam filter integration. Even without wikitext, users can still put abusive stuff in the names/descriptions.

I presume this involves AbuseFilter integration? This can probably be non-blocking (maybe I'm wrong), because I imagine on WMF wikis the newsletter-create privilege will not be given out to just anybody, similar to gadgets.

Can we make T132022: Add AbuseFilter integration to Extension:Newsletter not a blocker for deployment indeed, please? AbuseFilter integration is undocumented (T135668) and despite of this fact we have been trying to implement it, looking at how other projects are doing it and asking to developers. However, those who could know seem to be very busy and we still haven't got this to work.

As @Parent5446 points out, we could start by granting the permission to create newsletters to a more restricted user group (currently is autoconfirmed), and then relax it again when we succeed implementing AbuseFilter integration.

./includes/specials/SpecialNewsletterCreate.php

  • No spam filter integration. Even without wikitext, users can still put abusive stuff in the names/descriptions.

I presume this involves AbuseFilter integration? This can probably be non-blocking (maybe I'm wrong), because I imagine on WMF wikis the newsletter-create privilege will not be given out to just anybody, similar to gadgets.

Can we make T132022: Add AbuseFilter integration to Extension:Newsletter not a blocker for deployment indeed, please? AbuseFilter integration is undocumented (T135668) and despite of this fact we have been trying to implement it, looking at how other projects are doing it and asking to developers. However, those who could know seem to be very busy and we still haven't got this to work.

As @Parent5446 points out, we could start by granting the permission to create newsletters to a more restricted user group (currently is autoconfirmed), and then relax it again when we succeed implementing AbuseFilter integration.

@Qgil I think it's fine to go ahead and remove AbuseFilter integration as a blocker, as long as initial permission is restricted to a set of trusted users. I also saw T132022#2367659 giving guidance on integration of AbuseFilter. Hopefully that's helpful.

Has the suppression (oversight) of newsletters being considered? If not, the option should be added to avoid leaks of private information.

The only content generated by users through this extension is the info page for each newsletter (title, description, URL of main page). This content is handled by ContentHandler. @01tonythomas and I believe that, therefore, this content and their logs can be suppressed in the same way that regular wiki content. I guess testing this would require the deployment in the Wikimedia cluster? I am not familiar with Oversight actions.

The rewrite to use ContentHandler may be significant enough to require a new security review. I guess that would be up to @dpatrick or @Bawolff.

@dpatrick / @Bawolff, @01tonythomas and myself believe that the extension is now ready for deployment in Wikimedia, starting with mediawiki.org.

Could you advise about next steps, please?

As for the spam questions above, it requires SpamBlacklist and
TitleBlacklist integration. AbuseFilter would be a plus of course, but
SBL/TBL should be supported as basic antiabuse features.

The only content generated by users through this extension is the info page for each newsletter (title, description, URL of main page). This content is handled by ContentHandler. @01tonythomas and I believe that, therefore, this content and their logs can be suppressed in the same way that regular wiki content. I guess testing this would require the deployment in the Wikimedia cluster? I am not familiar with Oversight actions.

I think I can test suppression in the beta cluster as steward there as well. Let me know if you wish to. Regards. Nope, I can't, since it seems they ain't the same as, ie, deployment.beta.wmflabs.org, etc.

AbuseFilter integration was discussed above:

@Qgil I think it's fine to go ahead and remove AbuseFilter integration as a blocker, as long as initial permission is restricted to a set of trusted users. I also saw T132022#2367659 giving guidance on integration of AbuseFilter. Hopefully that's helpful.

The initial permission is being assigned to Administrators only. I think it is better to proceed with the deployment in mediawiki.org in order to keep polishing the extension based on real use. We can agree that AbuseFilter integration would be a requirement to bring the extension to additional Wikimedia wikis.

To clarify, what type of scale is this extension expected to handle? Tech News has about 650 subscribers - is that the scale that is expected for this extension? [FWIW, it would be good to document scaling assumptions in the extension itself]

Is 1000-10.000 a useful enough assumption? Of course, there will be many (most?) smaller newsletters, and it will take a while to have some Wikimedia newsletters breaking the 1000 subscribers barrier, but I can see it happening at some point.

Is 1000-10.000 a useful enough assumption? Of course, there will be many (most?) smaller newsletters, and it will take a while to have some Wikimedia newsletters breaking the 1000 subscribers barrier, but I can see it happening at some point.

Yep that's useful. I just wanted a very rough range when sanity checking the queries

Review of 631882e3779 of Newsletter extension (Jan 29, 2017)

Overall impression: The extension has some minor issues revolving around escaping i18n messages, but by and large follows appropriate security standards (yay!). While not the primary purpose of this review, I have also left comments about non-security concerns in the extension. To summarize, I have some serious (but for the most part easily fixable) concerns about some of the SQL queries used, and strongly suggest that this extension undergo a DBA review (or at the very least an informal conversation with a DBA) before being deployed. Parts of the extension seem more complex than they need be, and could perhaps be significantly simplified by using content handler to handle state where possible instead of directly managing additional database tables. There were a couple spots in the code where code seemed to be copied and pasted, but with bugs fixed in one spot but not the other, which could be improved by refactoring to share the code in both places.

The security review can be considered passed once the the minor issues noted below are addressed. If anything is unclear, don't hesitate to ask.

Minor security issues

  • "NewsletterTablePager.php" line 96 - Run htmlspecialchars after you truncate, not before. In this particular case, the worst that could happen would be for an entity reference to get cut off in the middle, but nonetheless escaping should always be the last thing you do.
  • For javascript taking stuff from data attributes - if the data attributes are not user controlled, use names starting with data-mw (e.g. instead of data-newsletter-id use data-mw-newsletter-id) unless you have a good reason not to. This prevents users from forging data attributes. In the context of this extension, it doesn't appear that user forging is an actual issue, however using the data-mw convention helps future proof things against any later changes where such an issue might be relavent.
  • "NewsletterContent.php" line 253 - Double escaping in the OOUI button labels.
  • NewsletterDiffEngine - The messages for the h4 headers are not escaped.
  • "SpecialNewsletter.php" line 142 - newsletter-subtitlelinks-foo message is not escaped when it is the active link.
  • "SpecialNewsletter.php" line 207 - Double escaping newsletter-do-unsubscribe.
  • "NewsletterTablePager.php" line 33 - getFieldNames() has double escaping.
  • Anyone with newsletter-create rights can edit a Newsletter namespace page (via api). While these changes aren't reflected in the relavent tables, they can make things inconsistent and confusing (e.g. Have the page display a different list of publishers than the actual list). Newsletter-manage permissions should apply to editing Newsletter pages via the api like it does during normal editing. The most obvious way to do this is with a getUserPermissionsErrors (or similar) hook. However, it may also be better to make the content handler page canonical and use it instead of the separate db tables where possible to eliminate the possibility of inconsistency in the system.

Non-security

As this is a security review, everything after this point should be considered a suggestion as they are not security issues.

General:

  • "NewsletterDb.php" line 388 - getNewslettersUserIsPublisherOf() - Why is that a left join instead of an inner join?
  • NewsletterDb.php - getAllNewsletters(). This function seems to not be used, so I'm kind of confused why its here, seems to just be adding complexity. More importantly, if it was used, this would involve a full table scan and has no paging which seems bad. I would suggest removing it to prevent the temptation of someone in the future using it.
  • "NewsletterEditPage.php" line 215 - The where condition in the select is wrong. It will return all active newsletters.
  • (minor) The behaviour for a failed csrf check is confusing to the user.
  • The subscriber count on Special:newsletters should be put throught $lang->formatNum(). (i.e. If user language is arabic, it should output arabic numerals).
  • In Special:Newsletters - Sorting by if the current user is subscribed doesn't make sense when filtering by if the current user is subscribed.
  • "SpecialNewsletterCreate.php" line 52 - The mainpage field should specify the exists option so oojs does not suggest non-existent pages that would only error later. Ditto for the page title of issue field in the announce page.
  • newsletter-subscribe-text is shown on the unsubscribe page. It should be a message about unsubscribing not subscribing.
  • The deletion/undeletion model doesn't seem quite right. If someone deletes a newsletter, its impossible to create a new newsletter with that name. If a newsletter exists at a title, but previous revisions are deleted, you cannot undelete those old revisions. The most obvious solution to this would be store all the subscribers (and possibly the list of issues. However as it stands the nl_issue table seems unused?) in the Newsletter json page. Then all the tables would functionally depend on the page, and you could just use MediaWiki's deletion handling. This would also simplify things as you could use MediaWiki's LinksUpdate / Content::getSecondaryDataUpdates() to manage all the db stuff for you. The downside is that every subscribe action would be an edit, and there could be some scalability issues if a newsletter had > 10,000 subscribers. Alternatively, you could disassociate the nl_ tables from the page name, and instead store nl_id in the json page. That way you could have multiple newsletters with the same page name (provided only one is nl_active at one time).
  • Missing message action-newsletter-restore, right-newsletter-restore. Also not in AvailableRights or GroupPermissions.
  • The newsletter namespace should probably be localizable.
  • When creating a newsletter with a "main page" not in NS_MAIN, the NS_NEWSLETTER pages links to the main page in the wrong namespace.
  • The interaction between NS_NEWSLETTER and Special:Newsletter is kind of confusing. It might be clearer to use the page title instead of the newsletter id (So to the user, it looks like the special page is operating on the NS_NEWSLETTER page, instead of on some separate thing, similar to how Special:movepage works). On the other hand, it probably doesn't matter.
  • "BaseNewsletterPresentationModel.php" line 18 - If you're using makeTitleSafe, you should check if it returns null and throw an exception or something (If you want it to always return a title, you could use makeTitle, which skips validation, but will always return a title object).
  • "NewsletterContent.php" line 241 - It doesn't matter much since parser cache is disabled, but nonetheless, $wgOut should not be used in getNewsletterActionButtons(). You should use the ParserOutput object instead.
  • The fact that the Newsletter pages are totally uncached and load the entire subscriber list from the DB on every view seems non-ideal. I guess its ok as long as the subscriber list stays small.
  • "NewsletterEditPage.php" line 38, line 131 - using getEscapedName() in a rawParams doesn't really make sense. Why not use the normal name on a normal param.
  • There's a lot of duplicated code between Special:CreateNewsletter and NewsletterEdit. I think it might be better for Special:Createnewsletter to simply ask for a name and then redirect to the EditPage. If not that, the code should definitely be generalized to avoid the repetition.
  • There seems to be quite a bit of duplication between NewsletterContent::getNavigationLinks() and SpecialNewsletter::getNavigationLinks(). This should be refactored to share code where appropriate.
  • The edit page seems to be missing a copyright notice. It should probably have one.

Documentation concerns:

  • "NewsletterDb.php" line 439 - newsletterExistsForMainPage() says it could return boolean? How?
  • onAlternateEdit() - There's no documentation here despite it being very non-obvious what's going on with the interplay between onAlternateEdit() and onCustomEditor(). In fact, I can't tell. Does the onAlternateEdit hook actually do anything?
  • It should be documented why the onContentModelCanBeUsed on hook is being used instead of overriding canBeUsedOn().
  • "NewsletterStore.php" line 22 - @var static is wrong.

Database concerns:

  • includes/NewsletterDb.php" line 503 - SELECT FOR UPDATE without a transaction. From what I understand this would cause an implicit transaction to be opened, with the locks being held for the duration of the request, which is not ideal. Should probably have an explicit transaction (as the code comment notes).
  • Interaction with content handler?
  • Is deleteInactiveNewsletters actually expected to be run in Wikimedia? There's no batching, no waiting for slaves, etc.
    • Should also just use an associative array for IN (1,2,3,..) statements.
  • Not sure about having nl_description as a very large blob of text just floating there in the table (I know a long time ago it was me who suggested making it bigger. Sorry for the reversal). Should perhaps get DBA's opinion on this. Also, the table isn't really the canonical source (the wikipage is), so its kind of unnecessary to store the whole thing when the special page cuts it off after 644 bytes - thus I would perhaps suggest using a VARCHAR(644) instead. In any case, double check with a DBA about this. Perhaps its all fine since the newsletter table will be relatively small.
  • "NewsletterTablePager.php" line 61 - There is a unique constraint on nl_id, so I don't understand the use of DISTINCT nl_id here. Additionally, I don't believe that's the correct way of setting the DISTINCT option (the field name is not included when using as an "option" in the abstraction layer), so I don't think it would do anything anyways.
  • NewsletterTablePager.php line 59 - The whole subscriber count thing is of questionable scalability. Maybe ok for now if things are small. I don't know. Maybe should have caching or something. Maybe should ask DBA
  • NewsletterTablePager.php line 69 (The query) - `SELECT nl_name,nl_desc,nl_id,( SELECT COUNT(*) FROM nl_subscriptions WHERE nls_newsletter_id = nl_id ) AS subscribers,'1' IN (SELECT nls_subscriber_id FROM nl_subscriptions WHERE nls_newsletter_id = nl_id ) AS current_user_subscribed FROM nl_newsletters WHERE (nl_active = 1) ORDER BY current_user_subscribed DESC LIMIT 51 ;`
    • Using a subquery of the form <user_id> in (SELECT all subscribers to current newsletter) seems like a really icky way to check if the current user is subscribed. Fortunately MySQL seems to be able to optiize this correctly. Nonetheless for human sanity, I would suggest rewriting this as a left join. This also applies to the filter which could be implemented as a condition on the current_user_subscribed column instead of an identical subquery.
    • The "subscribers" subquery is probably not ok, although check with a DBA (Maybe you could get away with it on the assumption the table is fairly small).
    • sorting by the subscribers column causes this to be a full table scan filesort, which is probably not ok.
    • For the last two points, probably what needs to happen is there to be a nl_subscribers column that keeps count of how many subscribers, and is properly indexed for sorting.
  • "NewsletterTablePager.php" line 86 - Calling Newsletter::newFromID( (int)$id ); in formatValue() results in 4 db queries per every newsletter returned (For a total of 200 db queries). This seems a little excessive. At the very least it (probably the Newsletter class) should cache things so it doesn't query the same newsletter multiple times. The ideal scenario would be to reuse the results of the main special page query so that no additional queries had to be done (e.g. Some sort of non-private version of NewsletterDb::getNewslettersFromResult()).
  • "NewsletterContent.php" line 115 - It doesn't make sense to call User::newFromName(...)->getId() on all the publishers and then call UserArray::newFromIDs. This would most likely cause a db query to be issued for each user individually, and then a db query for all the users together. Instad just user UserArray::newFromNames which should only use a single db query.

tony can i have the php file for this to try to fix such errors and run few debugging on newsletter extension

tony can i have the php file for this to try to fix such errors and run few debugging on newsletter extension

Hello @mvmohitverma54, once you configure MediaWiki locally and install the Newsletter Extension, then you will have all the files that you need. Hope this helps? :)

@Bawolff, Thanks a lot for your careful review. To avoid the noise in this task (too many people CC'ed), I'm thinking of splitting the non-security tasks completely to General/Documentation and DB concerns (as per the original classification).

NOTE: If you are possible volunteer interested in fixing one of those, please use the Bug: TaskNo accordingly in the patch, so that they are not linked to this parent task, but the subtasks. Thank you!

So here you go:

Change 340150 had a related patch set uploaded (by Shivamgera; owner: Shivamgera):
NewsletterDb.php- Removed getAllNewsletters() which is not used.

https://gerrit.wikimedia.org/r/340150

Change 340150 merged by jenkins-bot:
NewsletterDb.php- Removed getAllNewsletters() which is not used.

https://gerrit.wikimedia.org/r/340150

It looks like as of the hackathon all of this is done (Yay!). I'm leaving this open because I'm going to do a quick double check of the fixes later this week, but this bug is essentially done :D

I cannot believe this! Big Thank You and congratulations to everyone who contributed during the Wikimedia Hackathon.

This week's train is a complicated one anyway, but do you think that next week...?

A month later... this security review is still open and the extension hasn't been deployed yet. @01tonythomas, can you explain what is the status, please?

A month later... this security review is still open and the extension hasn't been deployed yet. @01tonythomas, can you explain what is the status, please?

Sad, yes. We have the one and only blocker now - https://gerrit.wikimedia.org/r/#/c/355226/

Badly missing @Bawolff for a review :-(

Bawolff claimed this task.

Security review passed.