Page MenuHomePhabricator

DPL extension has been disabled on Russian Wikinews
Closed, ResolvedPublic

Description

The Wikimedia Foundation disabled Dynamic Page List (DPL) on Russian Wikinews today.
But news agencies cannot exist without news feeds. News feeds are the only way to read news and without them the news project is completely dead. Wikinews needs news feeds for every category.
This is an extremely critical error. It requires an ASAP solution.

See details: https://ru.wikinews.org/wiki/Викиновости:Форум/Общий#DynamicPageList .

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

You can use JavaScript to generate the feed in client side.

Kind people at Russian wiki-chats provided me with this link: https://www.mediawiki.org/wiki/Extension:DynamicPageList_(third-party) I don't know if it's helpful, just passing

You can use JavaScript to generate the feed in client side.

  1. This solution will only increase performance problems on the Wikimedia servers. See @Base comments before.
  2. The result of such request will not be visible to search engines.
  3. We are ready to accept any solution that solves these problems from the developers of the Wikimedia Foundation.

@Krassotkin I've proposed an edit that temporarily hides the broken DPL sidebar from news articles. https://ru.wikinews.org/w/index.php?diff=8334384&oldid=7596568. Once accepted by a reviewer, this should solve the issue with previews in social media.

Thank you, but I'll probably return it as it was. We have to test solutions with this block. Otherwise, the server may experience performance problems again when we re-enable this template. Display in social media is the smallest problem we face. This is only a symptom, but we must treat the disease.

You can use JavaScript to generate the feed in client side.

  1. This solution will only increase performance problems on the Wikimedia servers. See @Base comments before.

That's simply not true. Running a bot on Wikimedia cloud either makes it use wikimedia cloud replicas (which reduces the burden on main servers) on the cost of being slightly outdated (like an hour at most) or hits APIs that (I hope) are cached. Do you know how to write a bot to replace DPL's work? I can give it a try later.

You can use JavaScript to generate the feed in client side.

  1. This solution will only increase performance problems on the Wikimedia servers. See @Base comments before.
  2. The result of such request will not be visible to search engines.
  3. We are ready to accept any solution that solves these problems from the developers of the Wikimedia Foundation.

Client-side JavaScript can call CirrusSearch queries, and they are only loaded as needed (It is likely that there are more articles than page views). Eventually, DPL should be rewritten using ElasticSearch.

You say you are ready to accept any solution. There is at least one right now: an easy solution is a bot generating the lists. The bot can use API queries (which are much more carefully written than DPL, and it is much harder to slow down the site than with DPL) or cloud replicas (you run your own queries there, and even they're not DBA-vetted, it never affects the production). It is relatively easy to write one, and I'm happy to provide a bot that would update a list of articles somewhere. Tell me the criteria, and I can do that relatively soon.

I understand you would like DPL to be back, but fixing all the problems in DPL would mean to write it from scratch. Sadly, it is not being actively developed by any full-time team. Any volunteer developer is more than welcomed to submit a patch.

Kind people at Russian wiki-chats provided me with this link: https://www.mediawiki.org/wiki/Extension:DynamicPageList_(third-party) I don't know if it's helpful, just passing

Thanks. Note this variant of the extension was never audited by anyone, and getting an extension audited and deployed takes months to years (without a full time team backing the effort). A bot is definitely a faster solution.

What about using this one? This variant should be checked. I know that this variant isn't deployed on Wikimedia's servers, but...

It is a fully reworked object oriented code base, significant code and database speed improvements, and is fully backwards compatible with previous versions.

So we maybe should give a try.

@Urbanecm, have you read the previous discussion? We have hundreds of thousands of categories. In order to update their feeds using an external bot in real time, we must launch a cluster of servers. Do you think this is a realistic solution?

And secondly. We have fully supported this project for many years. We advertise the project, we attract users, write and review news, create templates, modules and JavaScript programs, we write external bots. We attract sponsors to donate to the Wikimedia Foundation. We do this for many hours a day for many years and all for free.

And this is not our project. This is still a Wikimedia Foundation project.

Only one we want from the Wikimedia Foundation it is the sustainable work of the servers and the server side software required for the project.

I want the Wikimedia Foundation to solve a server-side problem in their project.

@Urbanecm, have you read the previous discussion?

Yes.

We have hundreds of thousands of categories. In order to update their feeds using an external bot in real time, we must launch a cluster of servers.

a) you never will have the update be done in real time; even DPL does some caching, and there always be a waiting period; it may be short, but there will be one

b) no, you don't; there is Wikimedia Toolforge, which can be used to run community-maintained projects like this one. Running a simple query or two once a day is doable.

Do you think this is a realistic solution?

No comment.

And secondly. We have fully supported this project for many years. We advertise the project, we attract users, write and review news, create templates, modules and JavaScript programs, we write external bots. We attract sponsors to donate to the Wikimedia Foundation. We do this for many hours a day for many years and all for free.

Yes, and I fully appreciate what you do for the movement. That doesn't change anything through.

And this is not our project. This is still a Wikimedia Foundation project.

No. The projects we all help to run are called _Wikimedia projects_ (without the word "Foundation"). We all, no matter if WMF-employees, chapter employees or volunteers, help the Wikimedia movement and contribute to its mission. WMF is an important player, they provide hosting services for the projects, but they do not own the projects (ie. copyright is retained by individual contributors] - we all do.

Only one we want from the Wikimedia Foundation it is the sustainable work of the servers and the server side software required for the project.

Yes. In this case, keeping the load in an bearable amount won over keeping one project intact. I can help you to mitigate the change, but I do not have time for writing DPL from scratch.

I want the Wikimedia Foundation to solve a server-side problem in their project.

As I said above, nothing makes WMF more responsible than any other part of the community. I understand this is an issue, I acknowledge it has not been communicated in advance (it was an ongoing incident), I can help you to find an alternative solution, I can offer my expertise, but the lead has to be from your community - you know what the lists should be like, after all.

@Urbanecm Sorry, these are emotions. Now I have the dead project to which I have dedicated many years. I'll try to write to Jimbo and Katherine. If the Wikimedia Foundation staff developers don't want to tackle this problem, I'll think about what to do next. Thanks all for the suggested solutions.

offtop: I use my own servers for my bots.

@Krassotkin Folks from the Wikimedia Foundation are going to work on this. However, please understand that nobody here is familiar with the DynamicPageList extension, and it only gets any attention in emergencies like this. The original authors of the extension (who apparently have been inactive for years now) clearly haven't anticipated it being used on a project with 100,000+ pages, and this isn't something that can be fixed in an hour. I would not expect a speedy solution.

It is used in English Wiktionary, which is a project with even more articles than English Wikipedia.

In T262391#6448525, @Zoranzoki21 wrote:

What about using this one?

See T262391#6448387 and the requirements in https://www.mediawiki.org/wiki/Writing_an_extension_for_deployment before deploying a new codebase.

Kind people at Russian wiki-chats provided me with this link: https://www.mediawiki.org/wiki/Extension:DynamicPageList_(third-party) I don't know if it's helpful, just passing

Thanks. Note this variant of the extension was never audited by anyone, and getting an extension audited and deployed takes months to years (without a full time team backing the effort). A bot is definitely a faster solution.

It was declined many times: T5533, T6468, T8758, T10672

It is used in English Wiktionary, which is a project with even more articles than English Wikipedia.

Which means we have another time bomb on our hands, doesn't imply we should have more time bombs in more places.

Where is the technical description of what happened to trigger the removal? This doesn't seem orderly, so i assume something happened here {Edit: I know now]

Which means we have another time bomb on our hands, doesn't imply we should have more time bombs in more places.

Previously risk was at least partially mitigated via the query killer. Was that insufficient? (Although i agree enwikt is probably too large and it shouldn't have been installed there). Could we potentially reduce risk sufficiently in the short term so as not to break critical infrastructure while people think of alternatives , by directing queries to the vslow slaves?

@Krassotkin Folks from the Wikimedia Foundation are going to work on this. However, please understand that nobody here is familiar with the DynamicPageList extension, and it only gets any attention in emergencies like this. The original authors of the extension (who apparently have been inactive for years now) clearly haven't anticipated it being used on a project with 100,000+ pages, and this isn't something that can be fixed in an hour. I would not expect a speedy solution.

Well i kind of took over maintaining it from Amgine in a maintenance only way, years back, i like to think im the official maintainer as far as anyone is. I'm not super active anymore since switching jobs, but im still "around". [But yes, we all know the extension has scalability flaws as written]

So it seems like what happened, is that NewsBot imported a lot (~100k) articles over a very short time frame, that all had the template {{тема|Материалы PanARMENIAN.Net}} on them. This had a DPL on it that did the category intersection of two categories both with ~100k articles. (To oversimplify) DPL gets slower the bigger the smallest category specified is. Thus this particular DPL was a somewhat slow one. Being somewhat slow by itself was ok, but having 100k articles edited at roughly the same time which all had the same slow DPL on it, was too much for the servers.

Perhaps as a near term compromise, ruwikinews could agree to temporary make the тема template output nothing if the first arg is Материалы PanARMENIAN.Net (Leaving other uses of the template as is), and in turn devs could agree to turn DPL back on, with the understanding that it might get turned off again if it is negatively affected the stability of other wikis. This would be a short term compromise to keep thinks running, until a longer term plan can be decided on.

Edit: just to be clear - i am proposing this to both sides (ruwikinews and wmf operations). I'm not really authorized to "offer" it as a deal

Change 626268 had a related patch set uploaded (by Brian Wolff; owner: Brian Wolff):
[mediawiki/extensions/intersection@master] Attempt to add a query cache to DPL

https://gerrit.wikimedia.org/r/626268

@Bawolff What do you mean?
Do we need to remove DPL from PanARMENIAN categories https://ru.wikinews.org/?curid=19312, https://ru.wikinews.org/?curid=1543119 and https://ru.wikinews.org/?curid=6829679?
Or PanARMENIAN request from "Other themes" template https://ru.wikinews.org/?curid=19721, example (right): https://ru.wikinews.org/?curid=7348599?
Or both? Or somewhere else?

How many items in the category don't load the server?

Because:

It surprises me that we were not warned of the growing load. Together we could solve the problem early. Suddenly project collapse is the worst thing to do.

And anyway we need a high-quality server-side implementation for news feeds in the very near future.

I am not a technician, just wondering: isn't it possible that DPL work is just done at a separate outer server and original server just redirects to it?

DPL is an extension that controls the content of pages. You may doing so using client-side JavaScript.

@Bawolff What do you mean?
Do we need to remove DPL from PanARMENIAN categories https://ru.wikinews.org/?curid=19312, https://ru.wikinews.org/?curid=1543119 and https://ru.wikinews.org/?curid=6829679?
Or PanARMENIAN request from "Other themes" template https://ru.wikinews.org/?curid=19721, example (right): https://ru.wikinews.org/?curid=7348599?
Or both? Or somewhere else?

I mean the other themes template. Individually the query seemed to be fine, the issue seemed to be too many all at once, so it was really putting it on all the pages all in a short time frame, that overloaded things.

How many items in the category don't load the server?

Because:

Not really clear.
(I think - numbers pulled out of a hat) Anything < 5000, is super easy for the server. 5000 to 30,000 is fairly easy, 30,000-100,000 is mild difficulty, 100,000+ is moderate, 1 million+ is high difficulty (these numbers are very made up and might be wrong. It is just my gut feeling.).

It is important to emphasize, its not just that the dpl query was moderate, its that so many of them happened at one time (i suspect from the bot adding so many pages with the query on it so quickly).

I would suggest future imports that happen after dpl is reenabled import things a bit more slowly at least at first. Maybe that won't be neccesary after the patch, but we should be cautious. It would probably be good for whenever they start again if someone could give a heads up to the folks in #wikimedis-operations irc channel that the import is restarting and reference this task so if something bad happens ops knows what might have happened.

It surprises me that we were not warned of the growing load. Together we could solve the problem early. Suddenly project collapse is the worst thing to do.

I don't think it was obvious to the devs what was going on at the time, they just saw the servers were becoming overloaded, and that the servers seemed to be stuck processing dpls from ruwikinews

And anyway we need a high-quality server-side implementation for news feeds in the very near future.

I definitely agree that would be nice - an elastic search based version of DPL would be ideal. I made a patch above that should reduce this type of issue, but its more a band-aid solution.

I am not a technician, just wondering: isn't it possible that DPL work is just done at a separate outer server and original server just redirects to it?

There are a number of possibilities there:

  • bot updated lists (im not really a fan because you're going to have a massive number of lists to keep up to date in a wikinews context, and it seems inelegant to me). En wikipedia does this a lot, so it is a popular solution, and there are good things to say about it: mediawiki can scale very easily to high edit rates generally.
  • have separate db servers [a special replica group] just for dpl, so they can't effect non dpl processes (expensive $$$).
  • use some other backend for dpl. It is my belief that elasticSearch would be ideal and scale much better (as well as being more independent of the main db servers). This requires quite a bit of work to basically redo dpl.

@Bawolff: We discussed your suggestion on the Russian Wikinews and our community has a consensus about this. I made the appropriate changes: https://ru.wikinews.org/?diff=8357270

When can we expect news feeds to turn on?

But I still have doubts that these changes will solve the problem. Please see my comments below when possible.

@Bawolff What do you mean?
Do we need to remove DPL from PanARMENIAN categories https://ru.wikinews.org/?curid=19312, https://ru.wikinews.org/?curid=1543119 and https://ru.wikinews.org/?curid=6829679?
Or PanARMENIAN request from "Other themes" template https://ru.wikinews.org/?curid=19721, example (right): https://ru.wikinews.org/?curid=7348599?
Or both? Or somewhere else?

I mean the other themes template. Individually the query seemed to be fine, the issue seemed to be too many all at once, so it was really putting it on all the pages all in a short time frame, that overloaded things.

The category template requests 200 items, but "other news" template from 3 to 18 only. It is strange that the first one loads the server less than the second. And additionally, there could be some way to optimize the select (or table index) so that the second little request doesn't do unnecessary work.

How many items in the category don't load the server?

Because:

Not really clear.
(I think - numbers pulled out of a hat) Anything < 5000, is super easy for the server. 5000 to 30,000 is fairly easy, 30,000-100,000 is mild difficulty, 100,000+ is moderate, 1 million+ is high difficulty (these numbers are very made up and might be wrong. It is just my gut feeling.).

We have hundreds of categories with over 5000 items and main category (published) with all project articles. Now there are 149,010 articles.

It is important to emphasize, its not just that the dpl query was moderate, its that so many of them happened at one time (i suspect from the bot adding so many pages with the query on it so quickly).

This bot does not request pages, it creates them. Maybe search engines send these requests?

I would suggest future imports that happen after dpl is reenabled import things a bit more slowly at least at first. Maybe that won't be neccesary after the patch, but we should be cautious. It would probably be good for whenever they start again if someone could give a heads up to the folks in #wikimedis-operations irc channel that the import is restarting and reference this task so if something bad happens ops knows what might have happened.

This bot has not experienced any problems or delays. The server handled the creation of the pages with no apparent difficulty. Nobody made any complaints about bot's work. No one even noticed that bot was still working. I had stopped it myself recently just in case. I am ready to turn bot on at any time as soon as the go-ahead is given.

I don't use IRC, but I go through all edits on Russian Wikinews. I will read and immediately react to any message on the project forum: https://ru.wikinews.org/?curid=6768 (https://ru.wikinews.org/wiki/Викиновости:Форум/Общий). If it necessary, administrators and stewards can block the bot.

It surprises me that we were not warned of the growing load. Together we could solve the problem early. Suddenly project collapse is the worst thing to do.

I don't think it was obvious to the devs what was going on at the time, they just saw the servers were becoming overloaded, and that the servers seemed to be stuck processing dpls from ruwikinews

And anyway we need a high-quality server-side implementation for news feeds in the very near future.

I definitely agree that would be nice - an elastic search based version of DPL would be ideal.

I also think this is the best solution at the moment.
How can we speed up such a rework? How can we communicate to the Foundation that this is a critical task?
I wrote to Jimbo and Katherine, but there is no reaction https://en.wikipedia.org/wiki/User_talk:Jimbo_Wales#Russian_Wikinews_is_down. Who can resolve this issue?

I made a patch above that should reduce this type of issue, but its more a band-aid solution.

Thank you from all of us.

Jimbo Wales does not directly control the WMF server. The only thing that Jimbo (as a member of board) can to is to priorize a rewrite of DPL, but this is unlikely to be an action from the Board.

My last comment on this, I feel I have already said everything. In addition to fixing the main issue, building a workaround (a bot that updates the list) would be probably easier and faster.

@Bawolff: We discussed your suggestion on the Russian Wikinews and our community has a consensus about this. I made the appropriate changes: https://ru.wikinews.org/?diff=8357270

When can we expect news feeds to turn on?

As soon as someone will rewrite DPL in a way that it is no longer a performance issue at ruwikinews.

[...]

How many items in the category don't load the server?

Because:

We don't know. There is no research that answers this question.

[...]
This bot does not request pages, it creates them. Maybe search engines send these requests?

It's not _requesting_ the page that is expensive. The parsed version of wikitext is cached, and is rebuilt only once per some time. It is also rebuilt on any edit, including page creation.

I would suggest future imports that happen after dpl is reenabled import things a bit more slowly at least at first. Maybe that won't be neccesary after the patch, but we should be cautious. It would probably be good for whenever they start again if someone could give a heads up to the folks in #wikimedis-operations irc channel that the import is restarting and reference this task so if something bad happens ops knows what might have happened.

This bot has not experienced any problems or delays. The server handled the creation of the pages with no apparent difficulty. Nobody made any complaints about bot's work. No one even noticed that bot was still working. I had stopped it myself recently just in case. I am ready to turn bot on at any time as soon as the go-ahead is given.

Yes, but if the pages are created by the bot at roughly the same time, they also got rebuilt at roughly the same time. If you run the query once, it's not an issue. If you run it 100k times, it is a very big issue.

[...]

It surprises me that we were not warned of the growing load. Together we could solve the problem early. Suddenly project collapse is the worst thing to do.

I don't think it was obvious to the devs what was going on at the time, they just saw the servers were becoming overloaded, and that the servers seemed to be stuck processing dpls from ruwikinews

And anyway we need a high-quality server-side implementation for news feeds in the very near future.

I understand that. The fastest way to get it is to write a code that is acceptable from the performance viewpoint. DPL does not meet that requirement - it is a legacy solution.

I definitely agree that would be nice - an elastic search based version of DPL would be ideal.

I also think this is the best solution at the moment.
How can we speed up such a rework? How can we communicate to the Foundation that this is a critical task?

The fastest way to get some work done is to do it yourself. The Foundation develops a lot of things, but such things are pre-defined by the annual plan. The Foundation doesn't employ as many people as other sites in top 10, and as such, doesn't have as many human resources as other such sites. A lot of developer works is done by volunteers - and any tech-savvy member of ruwikinews community can help.

I wrote to Jimbo and Katherine, but there is no reaction https://en.wikipedia.org/wiki/User_talk:Jimbo_Wales#Russian_Wikinews_is_down. Who can resolve this issue?

Literally anyone can solve the main issue with DPL - by proposing a patch just as Bawolff did.

[...]

When talking about consensus, Krassotkin referred to the fact that we at our forum have agreed to enable DPL on a limited set of our everyday pages and we don't need it now on those tens thousands of "archive" articles made by his bots. So, please consider re-enabling DPL for this limited number of most important pages (don't know how to technically distinguish them though—still the one who knows is Krassotkin). This is vitally needed for us, regular authors not involved in technologies and bot uploads.

[...]

I wrote to Jimbo and Katherine, but there is no reaction https://en.wikipedia.org/wiki/User_talk:Jimbo_Wales#Russian_Wikinews_is_down. Who can resolve this issue?

Literally anyone can solve the main issue with DPL - by proposing a patch just as Bawolff did.

Thank you for your comment, but I have one question.

Should I hire this anyone for my own money or will the Wikimedia Foundation do it?

Grants for software projects are available from the Wikimedia Foundation. See https://meta.wikimedia.org/wiki/Grants:Project However, the process is not quick, and software grants often have other requirements that would make it even slower.

A project grant only means you can get money for doing a project, and you can already participate in development without them. This does not mean someone other than yourself will working on this.

For context, I can't speak for ruwikinews, but at enwikinews, DPL is considered a business-critical aspect of the software. Removing it would probably be the equivalent to removing say Watchlist from enwiki (a feature which has similar non-ideal performance characteristics)


@Bawolff: We discussed your suggestion on the Russian Wikinews and our community has a consensus about this. I made the appropriate changes: https://ru.wikinews.org/?diff=8357270

When can we expect news feeds to turn on?

To clarify, I'm proposing that as a solution to both sides, and both sides have to find it acceptable. I think, if my patch is merged, it will be considered acceptable to DBA's at least on a trial basis to see how it goes. The absolute earliest any movement will possibly happen on this will be Monday (No guarantees that movement will actually happen on monday). [Sorry]

To clarify, I'm proposing that as a solution to both sides, and both sides have to find it acceptable. I think, if my patch is merged, it will be considered acceptable to DBA's at least on a trial basis to see how it goes.

Exactly, we are happy to give it a go, monitor it for a few days and see if it solves the performance issues.
We, DBAs, have no interest at all on having DPL disabled whatsoever. We just need the performance problems to be resolved :-)

ruwikinews shares its storage with around other 900 projects, and unfortunately, if one has performance issues and overload our servers, it affects the other projects too as they are physically hosted on the same place.
We are in process of trying to split the projects into more servers, but that requires lots of time and it is a very long term solution.

Change 626796 had a related patch set uploaded (by Brian Wolff; owner: Brian Wolff):
[mediawiki/extensions/intersection@master] Add an option to limit DPL concurrency with pool counter

https://gerrit.wikimedia.org/r/626796

Exactly, we are happy to give it a go, monitor it for a few days and see if it solves the performance issues.
We, DBAs, have no interest at all on having DPL disabled whatsoever. We just need the performance problems to be resolved :-)

Thank you.

I suggest not to close this ticket until we test everything in regular mode, and then with the bot turned on.

ruwikinews shares its storage with around other 900 projects, and unfortunately, if one has performance issues and overload our servers, it affects the other projects too as they are physically hosted on the same place.
We are in process of trying to split the projects into more servers, but that requires lots of time and it is a very long term solution.

It looks like this will need to be done anyway because Russian Wikinews is in the TOP 100 Wikimedia projects now and we plan to triple the number of articles in the very near future.
Who decides on this issue? What do we need to do?

It looks like this will need to be done anyway because Russian Wikinews is in the TOP 100 Wikimedia projects now and we plan to triple the number of articles in the very near future.
Who decides on this issue? What do we need to do?

As wiki, nothing. Based on the once that a wiki has moved shard before, it'll just be a read only window you'll notice.

It would be decided by the DBA team as needed of which @Marostegui is one.

It looks like this will need to be done anyway because Russian Wikinews is in the TOP 100 Wikimedia projects now and we plan to triple the number of articles in the very near future.

Who decides on this issue? What do we need to do?

Note this is more about how busy the site is than storage space. Only ~50 wikis are not in s3 right now (although perhaps dba's want to change that) , so being top 100 would still put ruwikinews in the s3 group of 900 wikis. Anyways DBA's decide which section which wiki is in based on internal statistics and whatnot. Moving wikis around is a non trivial effort afaik.

It looks like this will need to be done anyway because Russian Wikinews is in the TOP 100 Wikimedia projects now and we plan to triple the number of articles in the very near future.
Who decides on this issue? What do we need to do?

As wiki, nothing. Based on the once that a wiki has moved shard before, it'll just be a read only window you'll notice.

It would be decided by the DBA team as needed of which @Marostegui is one.

We have T226950: Move more wikis from s3 to s5 for this job. I believe that wiki should be moved from s3 to s5/s6, so we can reduce possible overload and problems in the future.

[...] Only ~50 wikis [...]

Well, we will work harder to get into the TOP 50 :--)

BTW, Russian Wikinews is in the TOP 10 by total pages now.

When talking about consensus, Krassotkin referred to the fact that we at our forum have agreed to enable DPL on a limited set of our everyday pages and we don't need it now on those tens thousands of "archive" articles made by his bots. So, please consider re-enabling DPL for this limited number of most important pages (don't know how to technically distinguish them though—still the one who knows is Krassotkin). This is vitally needed for us, regular authors not involved in technologies and bot uploads.

Just remove DPL tags from other pages. We can probably test if that helped next week, but definitely not today (the test can't run on weekend to ensure someone is around to monitor, and we'll need more than a day to be fairly certain). Please let us know once the tags are removed from as many pages as possible, and I'll coordinate with others on the rest :-)

In T262391#6456202, @Zoranzoki21 wrote:

It looks like this will need to be done anyway because Russian Wikinews is in the TOP 100 Wikimedia projects now and we plan to triple the number of articles in the very near future.
Who decides on this issue? What do we need to do?

As wiki, nothing. Based on the once that a wiki has moved shard before, it'll just be a read only window you'll notice.

It would be decided by the DBA team as needed of which @Marostegui is one.

We have T226950: Move more wikis from s3 to s5 for this job. I believe that wiki should be moved from s3 to s5/s6, so we can reduce possible overload and problems in the future.

That's not because of overload on s3, actually generally s3 is pretty quiet in matter of load or storage. The problem is that due to sheer number of tables, the inodes tables are getting really big.

Just remove DPL tags from other pages. [...]

As I wrote above, we have removed DPL from over 100,000 pages: https://ru.wikinews.org/?diff=8357270 (replaced the link)

DPL on other pages is required by the logic of the project.

Please note that while T226950 is something we'd like to work on, it is not a simple thing and it is not something that can be done easily and from one day to another, as @Bawolff correctly points out.
This is more a very long-term thing and not something that will address the current issues with DPL on ruwikinews.

The weekend is over but Wikinews still doesn't work.

This sounds very strange: The databases admins have disabled a project because a running project is preventing them from working.

Just compare the news about Wikimedia now:
https://ru.wikinews.org/?curid=6773
and usually:
https://web.archive.org/web/20200607181405/https://ru.wikinews.org/wiki/Категория:Викимедиа

Or try to find some latest news in our main published news feed now:
https://ru.wikinews.org/?curid=1827
That is how it usually looks:
https://web.archive.org/web/20200607054829/https://ru.wikinews.org/wiki/Категория:Опубликовано

We cannot publish actual news, because no one can see them.

This demotivates our community and encourages readers and users to leave the project. This irresponsible attitude destroys the Wikimedia Foundation.

We have been a ready-made solution from Bawolff and Russian Wikinews community has done everything to keep the server load down.

Just turn it on and let's start testing.

The weekend is over but Wikinews still doesn't work.

This sounds very strange: The databases admins have disabled a project because a running project is preventing them from working.

This statement isn't entirely correct.
Database admins alerted that there were sudden spikes on load on the servers that host ruwikinews along with 900 other projects - those spikes made around 900 projects unavailable for a few seconds throughout the day.
With lots of help from lots of people, it was found that those spikes sort of follow a pattern: they always involved DPL extension's queries on ruwikinews. It was later found that, what is believed to be the root cause (details at: T262391#6449590)

In order to ensure the availability of our servers and the rest of the same projects hosted there, it was decided to turn off DPL for ruwikinews to see if it was indeed the root cause - so far no more spikes have happened since the disablement. It was communicated on this task, several times, that this was just an action to mitigate the impact it was having whilst we work out a better solution.

We are trying to work out a better solution that would allow DPL extension to be turned back on this project.

@Marostegui I understand you. But if you could report the issue to our community we could take the necessary measures on the Wikinews side and work without turn it off. But I don't want to go back.

Now we have a solution and have taken measures to avoid loading. Everyone above agreed that this would work for a while without risks. Therefore we can turn this on until we find a better solution.

Change 627479 had a related patch set uploaded (by Urbanecm; owner: Urbanecm):
[operations/mediawiki-config@master] Revert "Disable DynamicPageList on ruwikinews"

https://gerrit.wikimedia.org/r/627479

Change 627479 merged by jenkins-bot:
[operations/mediawiki-config@master] Revert "Disable DynamicPageList on ruwikinews"

https://gerrit.wikimedia.org/r/627479

Mentioned in SAL (#wikimedia-operations) [2020-09-15T11:44:16Z] <urbanecm@deploy1001> Synchronized wmf-config/InitialiseSettings.php: 294931fc6eb9e365894ec0cf94c155d55ecae549: Revert "Disable DynamicPageList on ruwikinews" (T262240; T262391) (duration: 00m 58s)

Thank you. I'm here online and ready to respond quickly.

@Krassotkin et al.: As you saw, I just re-enabled DPL at ruwikinews. The database administrators will now monitor the load of s3 (the section that hosts ruwikinews and more than 900 other projects). It is possible that we will have to disable it again, in case the load will be unbearable. Thanks for your patience.

But if you could report the issue to our community we could take the necessary measures on the Wikinews side and work without turn it off.

It has been written a few times here that ~900 other projects were affected, that quick action was needed, and that it looked like DPL usage on Russian Wikinews was the culprit, and that quick action was needed, because ~900 other projects were affected. I don't see how repeating this conversation helps anyone. :)

But if you could report the issue to our community we could take the necessary measures on the Wikinews side and work without turn it off.

It has been written a few times here that ~900 other projects were affected, that quick action was needed, and that it looked like DPL usage on Russian Wikinews was the culprit, and that quick action was needed, because ~900 other projects were affected. I don't see how repeating this conversation helps anyone. :)

We could disable DPL ourselves on minor pages. What we did now. This is done in a minute and the load would immediately drop. We didn't need to disable it on the server for this.

@Krassotkin: Let me rephrase: Immediate (!) action was needed. 900 other projects were affected of being temporarily unavailable. Immediate (!) action does not imply "let's contact the community first and discuss for a while how to best proceed here" while 900 other projects are in danger. Thanks for your understanding.

@Aklapper: I don't blame anyone and I don't want to return to this question. I'm just saying that maybe together we could solve the problem better. I am grateful for the participation of everyone who helped us.

eprodromou added a subscriber: eprodromou.

We're watching this ticket, ready to give code review when needed. Please ping us as necessary.

We're watching this ticket, ready to give code review when needed. Please ping us as necessary.

I'm sure https://gerrit.wikimedia.org/r/c/626796 and https://gerrit.wikimedia.org/r/c/mediawiki/extensions/intersection/+/626268 need some eyes - thanks!

Generally I see now that RWN seem to be looking as working normally to a regular editor, so thank you all for your work!

Urbanecm claimed this task.

Tentatively closing, as DPL was re-enabled.

Does the closure of this task mean that we can work as usual? For example, can upload bot be enabled? No more tests required?

The closure means its objective was accomplished, ie. DPL is no longer disabled. It does NOT mean any changes were made to optimize DPL itself, or that DPL is now safer to call. It is the same ticking bomb as it used to be, and if it starts to be an issue again, it may need to be disabled again. As it was mentioned above, any volunteers are welcomed to join the efforts, as well as to review Brian's patches.

Perhaps we should hand over such critical software to Foundation developers? Not only this. Software is also an important part of our platform as are servers. Living on a bomb and doing nothing is a bad idea. How are these issues usually resolved?

See https://www.mediawiki.org/wiki/Code_stewardship_reviews in general. Though https://www.mediawiki.org/wiki/Developers/Maintainers already lists stewards for DPL (which does not imply that someone actively maintains a codebase).

@Aklapper Sorry, but I didn't get it. Doesn't the Foundation have its own developers?

@Aklapper Sorry, but I didn't get it. Doesn't the Foundation have its own developers?

Not all deployed code is maintained by WMF staff. A lot of work is done by volunteers.

@RhinosF1 And why can't we identify and transfer critical code to WMF staff support?

@RhinosF1 And why can't we identify and transfer critical code to WMF staff support?

They have hundreds of other components to work on. I have no idea how maintained DPL has been up to now but it may have never needed any additional work until now that wasn't being done.

As written above it turned out that we have unsupported bomb-level code. I don't know how to solve this problem. But Katherine told me that the Foundation's management monitors this task. We have draw their attention to this.

I don't know how to solve this problem.

You already have @Bawolff, @Urbanecm, @Marostegui that have worked together to try and get it back on for you and propose long term fixes.

But Katherine told me that the Foundation's management monitors this task. We have draw their attention to this.

I'm sure they can discuss it as a board but that's not going to magic up developer time and a fix. People have looked into it and @Bawolff has proposed patches to try and fix it.

Not sure the bomb metaphor is helpful. There's other features in mediawiki that would also cause problems if subjected to the level of traffic that DPL was subject to in this situation. However there is no doubt that DPL does not meet the performance standards that a new feature would be required to meet today.

My patches would avoid some duplicated work in common situations (template used on many pages that all have to be rerendered), and ensure that a single wiki cannot use more than its fair share of resourses (or at least not all the resourses). This would prevent the situation that happened here, where dpl starved other wikis for resourses.

It does not fundamentally change the scalability of DPL. It is still going to at best scale log-linear in the size of the smallest category of the intersection. And it is still going to be sharing dbs that are expected to primarily serve fast queries that look at a small number of rows, where we dont really want those replicas to be handling slow queries because response time for normal queries is business-critical (in theory there could be dedicated hardware for DPL, but i strongly suspect that its considered not worth it for this obscure a feature).

The proposed alternative is to rewrite DPL to use elasticsearch as a backend. Inverted indicies are much more suited for set intersection than B-trees are, and would probably be a more scalable long term solution. CirrusSearch already supports some category intersection, so there already exists some similar work. However it should be clear that that is not simple maintenance but a full rewrite of DPL.

But Katherine told me that the Foundation's management monitors this task. We have draw their attention to this.

I'm sure some managers in the foundation are aware of this task, but i highly doubt c-levels are going to be directly directing developer effort (kind of defeats the point of having c-levels if they are directly managing every little thing that comes up) [just imho. I have no special knowladge on WMF priorities]

[...]
The proposed alternative is to rewrite DPL to use elasticsearch as a backend. Inverted indicies are much more suited for set intersection than B-trees are, and would probably be a more scalable long term solution. CirrusSearch already supports some category intersection, so there already exists some similar work. However it should be clear that that is not simple maintenance but a full rewrite of DPL.

I agree that this is a better suggestion now. But we still don't know exactly who and when will write this code and where to get this person.

But Katherine told me that the Foundation's management monitors this task. We have draw their attention to this.

I'm sure some managers in the foundation are aware of this task, but i highly doubt c-levels are going to be directly directing developer effort (kind of defeats the point of having c-levels if they are directly managing every little thing that comes up) [just imho. I have no special knowladge on WMF priorities]

It looks like a systemic problem not only about this case. It is management's job to fix business processes.

Change 626268 merged by jenkins-bot:
[mediawiki/extensions/intersection@master] Attempt to add a query cache to DPL

https://gerrit.wikimedia.org/r/626268

Change 637059 had a related patch set uploaded (by Urbanecm; owner: Brian Wolff):
[mediawiki/extensions/intersection@wmf/1.36.0-wmf.14] Attempt to add a query cache to DPL

https://gerrit.wikimedia.org/r/637059

Change 637059 merged by jenkins-bot:
[mediawiki/extensions/intersection@wmf/1.36.0-wmf.14] Attempt to add a query cache to DPL

https://gerrit.wikimedia.org/r/637059

Change 626796 merged by jenkins-bot:
[mediawiki/extensions/intersection@master] Add an option to limit DPL concurrency with pool counter

https://gerrit.wikimedia.org/r/626796