Page MenuHomePhabricator

Address performance needs for Wikimedia from DynamicPageList extension so that it can be deployed to further wikis
Open, LowestPublic

Description

There is a request to enable DPL on fr.wiktionary. DPL has "known performance problems" [[citation needed]].

The purpose of this task is to either:

  1. enumerate enough performance problems that we decide it is not worth the effort to fix and thus decide to not deploy further on Wikimedia wikis, or
  2. enumerate performance issues that can be addressed in a reasonable timeframe, and start doing that (by whom?)

Problem summary

Based on @Bawolff summary

The performance issue is full table scan and filesorting on large tables (such as page and categorylinks).

There is ways to make dpl suck less without totally rewriting it:

  • disable sorting by unindexed columns would be a minimal step
  • if that doesn't give satisfaction, only allow category intersection if at least one of the categories is smallish

Ultimately, the most correct approach is to rewrite using Cirrus (or something like Cirrus) as a backend..

Related Objects

Event Timeline

greg created this task.Jan 26 2016, 10:52 PM
greg raised the priority of this task from to Needs Triage.
greg updated the task description. (Show Details)
greg added subscribers: Aklapper, TheDaveRoss, greg and 6 others.
Restricted Application added a subscriber: StudiesWorld. · View Herald TranscriptJan 26 2016, 10:52 PM

The performance issue is full table scan and filesorting on large tables (such as page and categorylinks).

There is ways to make dpl suck less without totally rewriting it (e.g. no sorting by unindexed columns. If you want to go further - only allow category intersection if at least 1 of the categories is smallish).

Ultimately the most correct approach is to rewrite using cirrus (or something like cirrus) as a backend..

ori changed the task status from Open to Stalled.Feb 29 2016, 7:54 PM

@Bawolff's comment describes the performance problems.

ori edited projects, added Performance; removed Performance-Team.Feb 29 2016, 7:55 PM
ori set Security to None.
Dereckson updated the task description. (Show Details)Apr 27 2016, 5:24 PM
Dereckson removed a subscriber: wikibugs-l-list.
Dereckson added a subscriber: Dereckson.

Any update for this?

So the next step is to edit the extension code to disable sorting by unindexed columns.

Adding Editing-team per https://www.mediawiki.org/wiki/Developers/Maintainers.

See https://phabricator.wikimedia.org/T124841#1968724 with @Bawolff's summary of what should be done.

See also https://phabricator.wikimedia.org/T171293#3475766 for my assessment of this blocking future rollouts of this extension in Wikimedia production.

Jdforrester-WMF renamed this task from Performance review of DynamicPageList to Address performance needs for Wikimedia from DynamicPageList extension so that it can be deployed to further wikis.Jul 26 2017, 7:14 PM
Jdforrester-WMF changed the task status from Stalled to Open.Jul 26 2017, 7:18 PM
Jdforrester-WMF triaged this task as Lowest priority.
Jdforrester-WMF added a subscriber: Jdforrester-WMF.

Thanks, Greg. I'm marking this as Open as it's no longer stalled waiting for information, but also as Lowest priority, as it's not something we in the Contributors team at large are going to work on any time soon. I'd be happy to see a volunteer take this on, though I note that they'd need some support from us to get the code reviewed and a follow-up performance review undertaken to see whether it would be OK; if someone does want to take this on, please understand that that might take a while, and do talk to us before spending too much time!

(Also, if in the future we do end up sharding the page and revision DB tables, this extensions might go from being hard to being impossible to deploy for our big wikis, so people should keep that in mind.)

hashar added a subscriber: hashar.Oct 11 2017, 1:48 PM

trwiktionary is asking for it. Given the size of the trwiktionary database, I guess we can afford some full table scan there or do we block entirely regardless of the database size?

greg added a comment.Oct 11 2017, 7:30 PM

For now, no, not until work is done on this. I don't want us to get into a "but why not us?!" debate.

I'd be happy to see a volunteer take this on, though I note that they'd need some support from us to get the code reviewed and a follow-up performance review undertaken to see whether it would be OK

If anyone is actually interested in working on this, I offer to review their patches (DynamicPageList was the first MediaWiki thing I ever submitted a patch to, so it has a soft spot in my heart)

Change 383694 had a related patch set uploaded (by Dereckson; owner: Dereckson):
[operations/mediawiki-config@master] Revert "Revert "Limit thanks for new users at pl.wikipedia to 3 per day""

https://gerrit.wikimedia.org/r/383694

Change 383694 had a related patch set uploaded (by Dereckson; owner: Dereckson):
[operations/mediawiki-config@master] Revert "Revert "Limit thanks for new users at pl.wikipedia to 3 per day""
https://gerrit.wikimedia.org/r/383694

Wrong task :)

Mrjohncummings added a comment.EditedFeb 28 2018, 10:31 AM

Hi all

I'm sorry to see that this extension is blocked and that there doesn't seem to have been any progress in the past two years. We are currently running an RFC on creating a better import process for data on Wikidata.

https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Mapping_and_improving_the_data_import_process

We have outlined the tools and resources we need and getting a centralised data import register working and getting this extension installed on Wikidata is the only blocker left .

Is there anything I can do to help to get this solved without being a programmer?

@SandraF_WMF @NavinoEvans (for information)

Thanks very much

Hi all
I'm sorry to see that this extension is blocked and that there doesn't seem to have been any progress in the past two years. We are currently running an RFC on creating a better import process for data on Wikidata.
https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Mapping_and_improving_the_data_import_process
We have outlined the tools and resources we need and getting a centralised data import register working and getting this extension installed on Wikidata is the only blocker left .
Is there anything I can do to help to get this solved without being a programmer?

Directly? Probably not. However if you clearly define your needs, its possible an alternative tool could be suggested or developed.

@Bawolff

So I'm working on a process for centralising records of which datasets have been imported into Wikidata to help people understand what data is in Wikidata and allow people to work together on data imports.

https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Mapping_and_improving_the_data_import_process

We need to create seperate pages for each entry, the current Data Import Hub has started to be become successful and the page is getting reeeaalll loooonnnggg.

https://www.wikidata.org/wiki/Wikidata:Data_Import_Hub

I've looked at transcluding pages and it's a difficult thing to do, especially for new users, and doesn't get over the long page problem and also it doesn't allow you to categorise things in multiple ways which is very important for process and recording, you can use categories for:

  • Subjects (some datasets span multiple subjects)
  • Dataset formats, e.g tables trapped in PDFs
  • What stage the dataset import is at, not started, in progress, completed, needing manual matching, technical issue (send help) etc.

So categories are the way to go, but we need some way to collate and display dataset links in categories in multiple ways on the same page. This extension is perfect for this, although categories are a commonly enough used thing that there could be something else?

Thanks very much

mxn added a subscriber: mxn.Jun 27 2018, 6:20 PM
hashar removed a subscriber: hashar.Jul 31 2018, 11:08 PM