Page MenuHomePhabricator

Add a limit to the number of sites a user can watch
Closed, ResolvedPublic2 Estimated Story Points

Description

One of the grant requirements:

The extension should only be deployed on a limited number of wikis (not all wikis). Ten wikis has been suggested as the limit, but this can be finalized through further consultation with Wikimedia Foundation staff.

I believe that this refers to the number of sites a user can show at once, given one of the other requirements is:

The extension should only be available on Meta.

However, I'll wait for the assigned technical advisor to clarify before proceeding

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
DannyS712 changed the task status from Open to Stalled.Jul 27 2020, 11:52 AM
DannyS712 triaged this task as Medium priority.
DannyS712 set the point value for this task to 2.
DannyS712 moved this task from Backlog to Later on the MediaWiki-extensions-GlobalWatchlist board.
DannyS712 moved this task from Unsorted to Later on the User-DannyS712 board.

Stalling pending confirmation of the intended meaning

The idea here is that since the global watchlist contacts the individual Wikipedias' APIs, we want to limit how many concurrent requests are sent, because sending a request to 900+ apis is unreasonable.

This can and should be iterated on, though, to see what is reasonable performance wise.

There are a couple of options on what to do here:

  • Options 1 -- Check what wikis a user is active on, and then request watched items from those. If you go this route, you still need to create some limits, so that you don't get some users that have dozens of wikis. Also, this means 2 requests -- one to figure out the wikis to contact, and then another for each of the wikis.
  • Option 2 -- Give the user the ability to choose which wikis they want to get their watchlist from, limiting to some x wikis (i'd start with 5, and iterate when there's a clearer idea of performance). The bonus here is that you don't need to have another API call *and* you give your users the control of what wikis they deem as important to them.

I obviously think option 2 is best, but it depends on how you view the product and what users need. The idea of this limit, however, is to handle the (very serious) performance concern where we flood hundreds of API endpoints with requests and wait for results per user. That would be unsustainable.

Does this clarify the request? I'd be happy to hash it out a bit more.

The idea here is that since the global watchlist contacts the individual Wikipedias' APIs, we want to limit how many concurrent requests are sent, because sending a request to 900+ apis is unreasonable.

This can and should be iterated on, though, to see what is reasonable performance wise.

There are a couple of options on what to do here:

  • Options 1 -- Check what wikis a user is active on, and then request watched items from those. If you go this route, you still need to create some limits, so that you don't get some users that have dozens of wikis. Also, this means 2 requests -- one to figure out the wikis to contact, and then another for each of the wikis.
  • Option 2 -- Give the user the ability to choose which wikis they want to get their watchlist from, limiting to some x wikis (i'd start with 5, and iterate when there's a clearer idea of performance). The bonus here is that you don't need to have another API call *and* you give your users the control of what wikis they deem as important to them.

I obviously think option 2 is best, but it depends on how you view the product and what users need. The idea of this limit, however, is to handle the (very serious) performance concern where we flood hundreds of API endpoints with requests and wait for results per user. That would be unsustainable.

Does this clarify the request? I'd be happy to hash it out a bit more.

Okay, so the limit was being applied to the number of sites that were queried, and the extension is only deployed on meta. Makes sense.
The current code, the user script, and my assumption were all for users choosing the sites themselves.
Currently, the requests are all made in parallel - should they be made in series instead?

DannyS712 changed the task status from Stalled to Open.Sep 30 2020, 6:42 PM
DannyS712 moved this task from Later to Next on the User-DannyS712 board.

Okay, so the limit was being applied to the number of sites that were queried, and the extension is only deployed on meta. Makes sense.
The current code, the user script, and my assumption were all for users choosing the sites themselves.
Currently, the requests are all made in parallel - should they be made in series instead?

No, no, parallel is brilliant. The problem was more about unexpected loads on both the API endpoints *and* the user waiting for dozens and dozens of wikis to respond -- the requests can be parallel, just limited in number.

I'd also be mindful of how often you update the watched item list (how often you send the requests). I'm not familiar with the way the current script does it, but if it's by request from the user, you might want to have at least some sort of "cooldown" where you don't suddenly get users clicking on "update" every 15 seconds, flooding the APIs with requests..?

Anyways, I think the point here is to be very mindful of the number of requests you send per user, per time, and iterate on this idea, since this extension will also allow a lot more people to use this functionality.

Hi, Moriel, what's up. Here are my 5 cents. I really don't think that the first option should be implemented. I do not use the global watchlist for my prime wiki. This is because I prefer the regular watchlist there, that has much more functionality (remember WLM (-: ?). I do use the global watchlist on the rest, dozens of times every day, instead of opening all the wikis that I do not reach oftenly.
Also, I use the Global Watchlist for 19 wikis, and I do not think there should be only 5. It works very fast anyway. Moreover, I do not think the new extension will be helpful for me, if it will not show at least 6 wikis - meta, mediawiki, commons, wikidata, enwiki and ruwiki.

Change 631470 had a related patch set uploaded (by DannyS712; owner: DannyS712):
[mediawiki/extensions/GlobalWatchlist@master] Add configuration for limiting the number of sites

https://gerrit.wikimedia.org/r/631470

DannyS712 moved this task from Later to In progress on the MediaWiki-extensions-GlobalWatchlist board.

Okay, so the limit was being applied to the number of sites that were queried, and the extension is only deployed on meta. Makes sense.
The current code, the user script, and my assumption were all for users choosing the sites themselves.
Currently, the requests are all made in parallel - should they be made in series instead?

No, no, parallel is brilliant. The problem was more about unexpected loads on both the API endpoints *and* the user waiting for dozens and dozens of wikis to respond -- the requests can be parallel, just limited in number.

I'd also be mindful of how often you update the watched item list (how often you send the requests). I'm not familiar with the way the current script does it, but if it's by request from the user, you might want to have at least some sort of "cooldown" where you don't suddenly get users clicking on "update" every 15 seconds, flooding the APIs with requests..?

Anyways, I think the point here is to be very mindful of the number of requests you send per user, per time, and iterate on this idea, since this extension will also allow a lot more people to use this functionality.

Sure, updates are either triggered manually via the refresh button (which is supposed to be disabled during the refresh) or in live updates the display is refreshed automatically.

Hi, Moriel, what's up. Here are my 5 cents. I really don't think that the first option should be implemented. I do not use the global watchlist for my prime wiki. This is because I prefer the regular watchlist there, that has much more functionality (remember WLM (-: ?). I do use the global watchlist on the rest, dozens of times every day, instead of opening all the wikis that I do not reach oftenly.
Also, I use the Global Watchlist for 19 wikis, and I do not think there should be only 5. It works very fast anyway. Moreover, I do not think the new extension will be helpful for me, if it will not show at least 6 wikis - meta, mediawiki, commons, wikidata, enwiki and ruwiki.

The 5 can be configured and changed. As @Mooeypoo noted, i'd start with 5, and iterate when there's a clearer idea of performance

Hi, Moriel, what's up. Here are my 5 cents. I really don't think that the first option should be implemented. I do not use the global watchlist for my prime wiki. This is because I prefer the regular watchlist there, that has much more functionality (remember WLM (-: ?). I do use the global watchlist on the rest, dozens of times every day, instead of opening all the wikis that I do not reach oftenly.
Also, I use the Global Watchlist for 19 wikis, and I do not think there should be only 5. It works very fast anyway. Moreover, I do not think the new extension will be helpful for me, if it will not show at least 6 wikis - meta, mediawiki, commons, wikidata, enwiki and ruwiki.

Yeah that's interesting to note. I also suspected that giving control to the user as to which wikis to pull from is better than making that decision programmatically.

As for the limit and performance, I just want to point out that by making this an extension that is deployed, the loads could go significantly higher than what previously existed in the gadget because significantly more people will be using it.

Further, since the gadget was client side and the extension runs a lot of that on the server side, the loads also shift to the server a lot more.

That's why I recommended starting with a low count of wikis and being careful about how often the refreshing is allowed, at least at first, and then iterating up if possible after you get a better sense of the loads and performance.

Regarding performance impact when compared to the user script, on a per-user level I would expect the load to go down

User script is at https://meta.wikimedia.org/wiki/User:DannyS712/Global_watchlist.js

Some of the things the extension does better / the script does poorly:

  • i18n - core messages that are used are retrieved via the api, custom messages are all stored as json within the script and all languages are always loaded
  • Everything is in one script, so all of the Special:GlobalWatchlistSettings handling is loaded on Special:GlobalWatchlist, and vice versa
  • settings page is rendered via client side javascript rather than server side with php
  • code is not minified (ResourceLoader should do this automatically for the extension)
  • code is loaded on all page views globally, and then only runs on the correct pages (extension only loads on the special pages)
  • script needs to include development stuff like translation unit numbers, qqq, and a way to mock the api in tests
  • script saves options by editing user's global.js, extension uses proper user options

The extension, on the other hand, is split up so that only the code for the relevant special page is loaded, only the right language's messages are loaded, etc.
While there may be more users than before, I believe each user's impact will be lower.

Change 631470 merged by jenkins-bot:
[mediawiki/extensions/GlobalWatchlist@master] Add configuration for limiting the number of sites

https://gerrit.wikimedia.org/r/631470

DannyS712 claimed this task.
DannyS712 removed a project: Patch-For-Review.