Page MenuHomePhabricator

SUPPORT: wikibase instance space consumption
Closed, ResolvedPublic

Description

Hello,

we have a Wikibase instance with 780,752 items and 2,865,041 page edits since setup. Currently the space occupied is 150 Gb. The full RDF dump is 15G. It seams to me that more space is occupied than necessary. Can you confirm this? We never run the pruneChanges script. Is this the reason? Should this be done?

Thank you for some insights
D063520

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

It would be good to check how big your database tables are.
pruneChanges could indeed help.
Also there are various things that can be done around the storage of and compression of the main text "content" of pages.

Hi,

here is the size of the biggest tables in Mb

my_wiki | page | 231.84 |

my_wikipage_props365.30
my_wikirevision_actor_temp430.94
my_wikilogging437.27
my_wikirevision818.05
my_wikirecentchanges1172.27
my_wikipagelinks1228.88
my_wikiwb_terms1477.80
my_wikiwb_changes1808.19
my_wikiobjectcache2237.58
my_wikijob5244.39
my_wikitext53325.00

So the problem is really the text content.

  1. How could I compress them?
  2. pruneChanges will not delete any of the data, or history? Just that we cannot see in recent changes very far back?

Merci
D063520

You can move the text off the main sql server, see https://www.mediawiki.org/wiki/Manual:External_Storage
Might not be worth it though, depends on your exact situation.

It is interesting that the job table is so large, this should be getting pruned...
From https://www.mediawiki.org/wiki/Manual:Job_table
"The job table holds a list of pending jobs"
Which wiki is this for? You can see the number of pending jobs in the API for example at https://wiki.personaldata.io/w/api.php?action=query&meta=siteinfo&siprop=statistics

It also looks like you have an objectcache backed by SQL currently.
You'll see some good performance gains if you switch this to something like memcached or redis as well as reduced latency i would expect.

wb_changes will be helped using the prunechanges script.

wb_terms is just naturally large, but will be going away in one of the next wikibase releases. T208425

pagelinks is going through some normalization currently and will eventually be getting smaller in a future version of mediawiki T222224

recentchanges looks a it big, this again should be getting pruned, see https://www.mediawiki.org/wiki/Manual:$wgRCMaxAge

pruneChanges will not delete any of the data, or history? Just that we cannot see in recent changes very far back?

the pruneChanges in wikibase does not relate to RecentChanges at all.
It relates to the changes table used for dispatching changes to client wikis only.

Regarding jobs, that table is basically the jobqueue for your wiki, make sure to add a cron to process the queue just by running:

php maintenance/runJobs.php

(Also once you're done this for the first time, optimize the jobs table to claim back the space it took).

Thank you very much for these insights!

One more question about the job table .... currently there are 4.549.247 jobs. In fact when we import in batch data via a bot, more jobs are created that the php maintenance/runJobs.php can "work through" .... have you any clue why som many jobs are created? Is this normal?

Is this normal?

Yes.
Some of these jobs will be MediaWiki jobs, and some will be Wikibase related.
The jobs do things like, update links tables (that power things like Special:WhatLinksHere), purging caches, updating other secondary stores and other deferred updates.