consider generating an empty abstract file for wikidata
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	ArielGlenn
	Oct 21 2019, 5:34 AM

Description

We only produce an abstract for articles in the main namespace that are not redirects and that have a content model of text or wikitext. For Wikidata, all items in the main namespace are Q-items with content model wikibase-item.

If this is not something we expect to change, then there' s no point in spending 36 hours generating a bunch of files that contain for each entry, <abstract not-applicable="" />. We should just generate an empty abstract file and be done with it. I guess we'd generate the 27 empty partial files and one complete 'recombined' file, each containing only the mediawiki and siteinfo information and the mediawiki footer.

I'd like to get the input of folks on the xmldatadumps mailing list as well as @hoo to see what people think.

Details

	Subject	Repo	Branch	Lines +/-
	configure wikidata dumps to generate empty abstracts files	operations/puppet	production	+1 -0
	ability to configure a wiki to produce empty abstract files	operations/dumps	master	+34 -12

Customize query in gerrit

Event Timeline

ArielGlenn triaged this task as Medium priority.Oct 21 2019, 5:34 AM

ArielGlenn created this task.

@hoo if you're not the right person to ping for this, can you point me to the right person? Basically I'm interested in knowing if the configuration can reasonably ever change so that anything besides a Q-item can be in the main namespace, and in particular anything with a content model that is text or wikitext. If not, as the task description states, I'm seriously considering generating 'empty' abstract files and saving wear and tear on the db servers. What do you think?

ArielGlenn moved this task from Backlog to Active on the Dumps-Generation board.Oct 21 2019, 5:39 AM

adding @WMDE-leszek for comments too in case you are more active on Wikidata; if you're the wrong person to answer about contents of the Wikidata main namespace, please redirect me and/or remove yourself.

@ArielGlenn: for Wikidata data it is not expected to change that Q-item or P-property would go to the namespace with text or wikitext content model. So optimizing the process in this area should be absolutely fine
Note that Commons/MediaInfo are a bit different, as they store structured data (M-entities) in the separate slot the File namespace, which, I believe, is wikitext content model? Not sure if this is relevant for the question here.

@WMDE-leszek Thanks for the answer; is it expected that at some time in the future other things might go into the main namespace for Wikidata, that might have a text or wikitext content model?

For MediaInfo items, those are in a secondary slot so they will never show up for abstracts for Commons, and we don't have to worry about them at all :-)

In T236006#5590851, @ArielGlenn wrote:

@WMDE-leszek Thanks for the answer; is it expected that at some time in the future other things might go into the main namespace for Wikidata, that might have a text or wikitext content model?

Negative.

Thanks! I will send an email to the xml datadumps list and see what people think, though I do not expect any objections.

(Updated) Message sent.

Lydia_Pintscher added a project: Wikidata.Oct 21 2019, 10:08 AM

hoo awarded a token.Oct 24 2019, 9:38 AM

Change 547197 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/dumps@master] ability to configure a wiki to produce empty abstract files

https://gerrit.wikimedia.org/r/547197

gerritbot added a project: Patch-For-Review.Oct 30 2019, 1:26 PM

Addshore moved this task from incoming to in progress on the Wikidata board.Oct 30 2019, 1:52 PM

A week has passed and no one has commented. Silence equals consent, and the above patch has been tested with the config setting enabled and disabled, so it's ready to go.

This will be merged shortly before the Nov 20th run unless something else derails things.

Change 547197 merged by ArielGlenn:
[operations/dumps@master] ability to configure a wiki to produce empty abstract files

https://gerrit.wikimedia.org/r/547197

Maintenance_bot removed a project: Patch-For-Review.Nov 15 2019, 12:10 PM

Change 551172 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] configure wikidata dumps to generate empty abstracts files

https://gerrit.wikimedia.org/r/551172

gerritbot added a project: Patch-For-Review.Nov 15 2019, 1:41 PM

Change 551172 merged by ArielGlenn:
[operations/puppet@production] configure wikidata dumps to generate empty abstracts files

https://gerrit.wikimedia.org/r/551172

Maintenance_bot removed a project: Patch-For-Review.Nov 15 2019, 2:10 PM

This is now complete. Nov 20th wikidata abstract files are nice little empty files as expected.

ArielGlenn moved this task from Active to Done on the Dumps-Generation board.Dec 20 2019, 11:43 AM

consider generating an empty abstract file for wikidataClosed, ResolvedPublicActions

Description

Details

Event Timeline

consider generating an empty abstract file for wikidata
Closed, ResolvedPublic
Actions