Page MenuHomePhabricator

List of Newsletters should have a column showing the number of issues
Open, Needs TriagePublic

Description

Summary
When viewing the list of all Newsletters on a wiki I should be able to see a count of how many issues an individual newsletter has published.

Description
Adding a column that provides a count of the number of issues (incremented when a new issue is announced) would help give readers a sense of a Newsletter frequency, age, and popularity.

Mockup

Event Timeline

Ckoerner created this task.Nov 20 2017, 6:20 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 20 2017, 6:20 PM
QuimGil added a subscriber: QuimGil.

Very good idea. This task is a good candidate for Google-Code-in-2017

Who would/could mentor this for Google-Code-in-2017 ?

Could help mentor this but not solely. @01tonythomas or @Florian or @Bawolff are any of you interested?

Imported to GCI, will be made available later in the contest when all mentors are available.

I'm happy to mentor this task, too! :)

@Ckoerner Just a quick note personally from my side: I realy really like the tasks I saw from you so far :) They're clearly written, describe small, self-contained changes and mostly have a mockup of what you expect, that's great, thanks for your effort you put into creating these tasks! :)

Change 399096 had a related patch set uploaded (by Pppery; owner: Pppery):
[mediawiki/extensions/Newsletter@master] Add issue count column to Special:Newsletters

https://gerrit.wikimedia.org/r/399096

Pppery claimed this task.Dec 18 2017, 10:16 PM
Florian added a subscriber: jcrespo.

Ok, during implementation and after chatting to @Bawolff about that, we came to the conclusion, that see some things we need to discuss further before merging anything :] The main things we're worried of is the amount of data the database needs to go through for each request of the Special:Newsletters page, which can be a performance killer. I'll add @jcrespo hoping that he has some ideas how we can improve on the solutions we already have or maybe he has a completely different idea :)

The current solution, simplified, would look something like that:

  • Request all required information from the nl_newsletters table
  • afterwards: Iterating over the returned data in php:
    • For each returned row, request the count of newsletter issues from the nl_issues tables by doing a select with a count(*) in it

As this is would mean, that we've COUNT(nl_id) +1 queries against the database (which could mean, if some wiki has 100 newsletters, that there will be 101 selects against the database with 1 count()s for 100) whenever someone opens Special:Newsletters, that might not be the best option.


The second idea would be:

  • Request all required information from the nl_newsletters table and left join the data from nl_issues and counting nl_issues.nli_issue_id
  • afterwards: Iterating over the returned data in php

This might reduce the number of queries started by php, however, the database still has to go through the same amount of data as with separate select statements, so this might also not be the best option.


Another idea would be to store a pre-computed issue_count in the nl_newsletters table, which is updated accordingly in the application code. That would mean: Whenever a newsletter is issued, the issue_count would be increased by 1 and the new number would be saved in the database.

This would require that we need to make sure, that there will never ever be inconsistent data in the issue_count column, as well we would need a method (probably a maintenance script?) to populate the issue_count of already existing newsletters.


fyi: The current tables involved looks like:
nl_newsletters

+---------------------+------------------+------+-----+---------+----------------+
| Field               | Type             | Null | Key | Default | Extra          |
+---------------------+------------------+------+-----+---------+----------------+
| nl_id               | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| nl_name             | varbinary(64)    | NO   | UNI | NULL    |                |
| nl_desc             | blob             | YES  |     | NULL    |                |
| nl_main_page_id     | int(10) unsigned | NO   | MUL | NULL    |                |
| nl_active           | tinyint(1)       | NO   | MUL | 1       |                |
| nl_subscriber_count | int(11)          | NO   |     | 0       |                |
+---------------------+------------------+------+-----+---------+----------------+

nl_issues

+-------------------+------------------+------+-----+---------+-------+
| Field             | Type             | Null | Key | Default | Extra |
+-------------------+------------------+------+-----+---------+-------+
| nli_issue_id      | int(10) unsigned | NO   | PRI | NULL    |       |
| nli_page_id       | int(11)          | NO   |     | NULL    |       |
| nli_newsletter_id | int(11)          | NO   | PRI | NULL    |       |
| nli_publisher_id  | int(11)          | NO   |     | NULL    |       |
+-------------------+------------------+------+-----+---------+-------+

You lack one key metric to give you an idea- how many rows are there on nl_newsletters and nl_issues (current and previous growth on worst case)? How many reads do the Special:Newsletters page have vs. how ofte a new issue is created for all newsletters (again, current and worst case in the future)? This will give you the metrics you need to decide what to do. Regarding the summary field, think how often a new issue is created per newsletter and if that could create row congestion.

Performance knowledgeable people would have a better advice, but the idea is that every mysql request should take less than 1 second, and every http request should take less than 1 second (for heavy pages) to be reasonable for production.

Its very unclear what sort of growth nl_issues/nl_newsletters will have, imo. Current usage is quite low, but unclear what the growth would be if ext became popular.

Reads to special:newsletters will definitely outpace new issue creation. Contention over creating new issues is expected to be extremely low - only a small group can do that for any newsletter and it is unlikely they will do it at the same time or near the same time.

Pppery removed Pppery as the assignee of this task.Jul 18 2018, 3:35 PM
Pppery added a subscriber: Pppery.

<removing myself as asignee from former GCI task>

Change 399096 abandoned by Pppery:
Add issue count column to Special:Newsletters

Reason:
The required steps to make this suitably performant seem to me to be beyond my MediaWiki coding skills.

https://gerrit.wikimedia.org/r/399096