Page MenuHomePhabricator

Address performance needs for Wikimedia from DynamicPageList extension so that it can be deployed to further wikis
Open, LowestPublic

Description

There is a request to enable DPL on fr.wiktionary. DPL has "known performance problems" [[citation needed]].

The purpose of this task is to either:

  1. enumerate enough performance problems that we decide it is not worth the effort to fix and thus decide to not deploy further on Wikimedia wikis, or
  2. enumerate performance issues that can be addressed in a reasonable timeframe, and start doing that (by whom?)

Problem summary

Based on @Bawolff summary

The performance issue is full table scan and filesorting on large tables (such as page and categorylinks).

There is ways to make dpl suck less without totally rewriting it:

  • disable sorting by unindexed columns would be a minimal step
  • if that doesn't give satisfaction, only allow category intersection if at least one of the categories is smallish

Ultimately, the most correct approach is to rewrite using Cirrus (or something like Cirrus) as a backend..

Related Objects

Event Timeline

greg raised the priority of this task from to Needs Triage.
greg updated the task description. (Show Details)
greg added subscribers: Aklapper, TheDaveRoss, greg and 6 others.

The performance issue is full table scan and filesorting on large tables (such as page and categorylinks).

There is ways to make dpl suck less without totally rewriting it (e.g. no sorting by unindexed columns. If you want to go further - only allow category intersection if at least 1 of the categories is smallish).

Ultimately the most correct approach is to rewrite using cirrus (or something like cirrus) as a backend..

ori changed the task status from Open to Stalled.Feb 29 2016, 7:54 PM

@Bawolff's comment describes the performance problems.

Dereckson removed a subscriber: wikibugs-l-list.
Dereckson subscribed.

Any update for this?

So the next step is to edit the extension code to disable sorting by unindexed columns.

Adding Editing-team per https://www.mediawiki.org/wiki/Developers/Maintainers.

See https://phabricator.wikimedia.org/T124841#1968724 with @Bawolff's summary of what should be done.

See also https://phabricator.wikimedia.org/T171293#3475766 for my assessment of this blocking future rollouts of this extension in Wikimedia production.

Jdforrester-WMF renamed this task from Performance review of DynamicPageList to Address performance needs for Wikimedia from DynamicPageList extension so that it can be deployed to further wikis.Jul 26 2017, 7:14 PM
Jdforrester-WMF changed the task status from Stalled to Open.Jul 26 2017, 7:18 PM
Jdforrester-WMF triaged this task as Lowest priority.
Jdforrester-WMF subscribed.

Thanks, Greg. I'm marking this as Open as it's no longer stalled waiting for information, but also as Lowest priority, as it's not something we in the Contributors team at large are going to work on any time soon. I'd be happy to see a volunteer take this on, though I note that they'd need some support from us to get the code reviewed and a follow-up performance review undertaken to see whether it would be OK; if someone does want to take this on, please understand that that might take a while, and do talk to us before spending too much time!

(Also, if in the future we do end up sharding the page and revision DB tables, this extensions might go from being hard to being impossible to deploy for our big wikis, so people should keep that in mind.)

trwiktionary is asking for it. Given the size of the trwiktionary database, I guess we can afford some full table scan there or do we block entirely regardless of the database size?

For now, no, not until work is done on this. I don't want us to get into a "but why not us?!" debate.

I'd be happy to see a volunteer take this on, though I note that they'd need some support from us to get the code reviewed and a follow-up performance review undertaken to see whether it would be OK

If anyone is actually interested in working on this, I offer to review their patches (DynamicPageList was the first MediaWiki thing I ever submitted a patch to, so it has a soft spot in my heart)

Change 383694 had a related patch set uploaded (by Dereckson; owner: Dereckson):
[operations/mediawiki-config@master] Revert "Revert "Limit thanks for new users at pl.wikipedia to 3 per day""

https://gerrit.wikimedia.org/r/383694

Change 383694 had a related patch set uploaded (by Dereckson; owner: Dereckson):
[operations/mediawiki-config@master] Revert "Revert "Limit thanks for new users at pl.wikipedia to 3 per day""

https://gerrit.wikimedia.org/r/383694

Wrong task :)

Hi all

I'm sorry to see that this extension is blocked and that there doesn't seem to have been any progress in the past two years. We are currently running an RFC on creating a better import process for data on Wikidata.

https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Mapping_and_improving_the_data_import_process

We have outlined the tools and resources we need and getting a centralised data import register working and getting this extension installed on Wikidata is the only blocker left .

Is there anything I can do to help to get this solved without being a programmer?

@SandraF_WMF @NavinoEvans (for information)

Thanks very much

In T124841#4009566, @Mrjohncummings wrote:

Hi all

I'm sorry to see that this extension is blocked and that there doesn't seem to have been any progress in the past two years. We are currently running an RFC on creating a better import process for data on Wikidata.

https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Mapping_and_improving_the_data_import_process

We have outlined the tools and resources we need and getting a centralised data import register working and getting this extension installed on Wikidata is the only blocker left .

Is there anything I can do to help to get this solved without being a programmer?

Directly? Probably not. However if you clearly define your needs, its possible an alternative tool could be suggested or developed.

@Bawolff

So I'm working on a process for centralising records of which datasets have been imported into Wikidata to help people understand what data is in Wikidata and allow people to work together on data imports.

https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Mapping_and_improving_the_data_import_process

We need to create seperate pages for each entry, the current Data Import Hub has started to be become successful and the page is getting reeeaalll loooonnnggg.

https://www.wikidata.org/wiki/Wikidata:Data_Import_Hub

I've looked at transcluding pages and it's a difficult thing to do, especially for new users, and doesn't get over the long page problem and also it doesn't allow you to categorise things in multiple ways which is very important for process and recording, you can use categories for:

  • Subjects (some datasets span multiple subjects)
  • Dataset formats, e.g tables trapped in PDFs
  • What stage the dataset import is at, not started, in progress, completed, needing manual matching, technical issue (send help) etc.

So categories are the way to go, but we need some way to collate and display dataset links in categories in multiple ways on the same page. This extension is perfect for this, although categories are a commonly enough used thing that there could be something else?

Thanks very much

The performance issue is full table scan and filesorting on large tables (such as page and categorylinks).

There is ways to make dpl suck less without totally rewriting it (e.g. no sorting by unindexed columns. If you want to go further - only allow category intersection if at least 1 of the categories is smallish).

Ultimately the most correct approach is to rewrite using cirrus (or something like cirrus) as a backend..

Currently, it can sort by page_touched (for lastedit) which has no index. All of the other sort options appear to be able to use one or more indexes. Maybe the option to sort by last edit should be removed. @Bawolff thoughts?

Lastedit isnt commonly used, i mean we should check but probably ok. However the other fields, even if in an index,often cannot effectively be used as an index.

Or i guess its not used on enwikinews, other languages might be different.