Page MenuHomePhabricator

Help for reduce the size from table _text
Closed, InvalidPublic

Description

Hello,
Generally, I used maintenance commands to reduce the size of the database.

/usr/bin/php /var/www/wiki.visionduweb.fr/maintenance/deleteOldRevisions.php --delete
the same for maintenance/deleteArchivedRevisions.php --delete

It seems that recently, it does not work, finally, maybe yes, but, the size of the table increases abnormally!
In general, the table was about 11 MB but it is 52 MB!
The change in a few weeks is enormous, and since I can not reduce it, it's because something has happened.
I use PHP 7.3, mariad DB, on Debian, with Apache2.4.

I use MediaWiki since version 1.27 and I have been able to regularly clean the database with maintenance commands, and that is inefficient.

I went to version 1.33.1 but I'm not sure that it can come from, maybe yes.

I have also recently changed the configuration of mariaDB, to change some of my basics to utfmb4, for other sites, adding this in the configuration of MariaDB:

sudo nano /etc/mysql/mariadb.conf.d/50-server.cnf

# This group is only read by MariaDB-10.3 servers.
[MariaDB-10.3]
innodb_file_format = Barracuda
innodb_file_per_table = 1
innodb_default_row_format = dynamic
innodb_large_prefix = 1

If anyone could help me, I can make available the copy of the database, which is about 100 MB globally.
If anyone can help me test my mediawiki database to see if we could clean this _text table because something is to be done, that's for sure. Thanks in advance.

Event Timeline

Hi @Zer00CooL. So far this does not sound like something is wrong in the code base (a so-called "software bug"), but instead like a support request (how to change settings, questions how to do something, etc.).
As Wikimedia Phabricator is for bug reports and enhancement requests, please ask on the MediaWiki Support Desk. Thanks for your understanding!

(Removing unrelated project tags as this task is not related to packaging MediaWiki for Debian or to the mariadb optimizer.)

Zer00CooL updated the task description. (Show Details)

This is actually a request for assistance, but, that does not mean that there is not a problem in the code base, otherwise, why the maintenance script would have no effect on the weight of the code. the _text table?

Indeed, it works properly from mediawiki 1.28, and since my update to 1.33.1 / AND / OR / since the configuration of MariaDB to use UTF8MB4 without breaking my other existing databases, it seems that the _text table of MediaWiki does not be more properly cleaned.

That's why I would like to offer the download of my database, which is 100 MB, and, if someone observes if he manages to clean the table _text because for me, 50 MB is a real problem!

The problem is not to store 50 MB, but, not to arrive at the optimized 11/20 MB, which is the normal size of my database.

Having a database that, overall, is 100 MB now, makes me say that yes, there is a problem, out, I did not change anything particular, except the version of mediawiki and, the configuration lines for mariadb previously mentioned.

If you have had many deletes, your tables might have lots of blank spaces which eat up space. You can try to run an optimize table $table to get those removed so the table can get compacted. If that is the issue, you'll see a reduction in your disk space for that particular table.

No, the commande OPTIMIZE change nothing.

This is really a problem either from Mediawiki, or, because of the encoding of mariadb, but, it is a very recent problem.

I remain available, by chat if necessary, to investigate this problem. Contact me in this way. Thank you.

I inciste to share my database, 100 MB, it is quickly charged, to try to optimize it by your care, if you happen to do something more quickly than by discussing, to pass of 100Mo to 25Mo, what should be the weight of my project being optimized.

I tried to reinstall mediawiki 1.33.1 just to see, but it does not change anything.
So I'm wondering to what extent adding MariaDB's general configuration lines to allow me to convert other databases to UTF8MB4 without loss of data, as indicated above, could affect MediaWiki's behavior on this _text table.

Constat on the table _recentchanges :
id 62394 and more exist, but, 1 to 62393 not exit.
Then, why this content was not purge when i use the maintenance script to optimize the size from my database ?

It was done well "before" apparently, with the official maintenance scripts, but, in my opinion, this is not done anymore, even if here I do not speak of the _text table we can see that the cleaning is stopped recently, if I'm not mistaken in this interpretation of the data in this table _recentchanges. ( I'm not very sure about the operation for this table so it's just a guess. )

INSERT INTO `prefixe_recentchanges` (`rc_id`, `rc_timestamp`, `rc_user`, `rc_user_text`, `rc_actor`, `rc_namespace`, `rc_title`, `rc_comment_id`, `rc_minor`, `rc_bot`, `rc_new`, `rc_cur_id`, `rc_this_oldid`, `rc_last_oldid`, `rc_type`, `rc_source`, `rc_patrolled`, `rc_ip`, `rc_old_len`, `rc_new_len`, `rc_deleted`, `rc_logid`, `rc_log_type`, `rc_log_action`, `rc_params`) VALUES
(62394, 0x3230313931313235303233363033, 0, '', 1, 0, 0x44696374696f6e6e61697265, 1038, 1, 0, 1, 512, 27165, 0, 1, 0x6d772e6e6577, 1, '', 0, 30192, 0, 0, NULL, NULL, NULL),
(62395, 0x3230313931313235303233313431, 0, '', 1, 6, 0x41656f6e2e706e67, 5, 1, 0, 1, 912, 27164, 0, 1, 0x6d772e6e6577, 1, '', 0, 32, 0, 0, NULL, NULL, NULL),
(62396, 0x3230313931313235303233313337, 0, '', 1, 6, 0x444153482e706e67, 5, 1, 0, 1, 435, 27163, 0, 1, 0x6d772e6e6577, 1, '', 0, 43, 0, 0, NULL, NULL, NULL),
(62397, 0x3230313931313235303233313333, 0, '', 1, 6, 0x444f47452e706e67, 5, 1, 0, 1, 453, 27162, 0, 1, 0x6d772e6e6577, 1, '', 0, 43, 0, 0, NULL, NULL, NULL),
(62398, 0x3230313931313235303233313330, 0, '', 1, 6, 0x434c414d2e706e67, 5, 1, 0, 1, 464, 27161, 0, 1, 0x6d772e6e6577, 1, '', 0, 43, 0, 0, NULL, NULL, NULL),
(62399, 0x3230313931313235303233313235, 0, '', 1, 6, 0x43414e4e2e706e67, 5, 1, 0, 1, 517, 27160, 0, 1, 0x6d772e6e6577, 1, '', 0, 47, 0, 0, NULL, NULL, NULL),
(62400, 0x3230313931313235303233313233, 0, '', 1, 6, 0x426974636f72652e706e67, 5, 1, 0, 1, 633, 27159, 0, 1, 0x6d772e6e6577, 1, '', 0, 41, 0, 0, NULL, NULL, NULL),
(62401, 0x3230313931313235303233313230, 0, '', 1, 6, 0x426974636f696e2e706e67, 5, 1, 0, 1, 349, 27158, 0, 1, 0x6d772e6e6577, 1, '', 0, 35, 0, 0, NULL, NULL, NULL),
(62402, 0x3230313931313235303233313136, 0, '', 1, 6, 0x4254432e706e67, 5, 1, 0, 1, 436, 27157, 0, 1, 0x6d772e6e6577, 1, '', 0, 41, 0, 0, NULL, NULL, NULL),
(62403, 0x3230313931313235303232393438, 0, '', 1, 6, 0x424c4b2e706e67, 5, 1, 0, 1, 437, 27156, 0, 1, 0x6d772e6e6577, 1, '', 0, 43, 0, 0, NULL, NULL, NULL),
(62404, 0x3230313931313235303232373539, 0, '', 1, 14, 0x426974636f696e, 5, 1, 0, 1, 352, 27155, 0, 1, 0x6d772e6e6577, 1, '', 0, 59, 0, 0, NULL, NULL, NULL),
(62405, 0x3230313931313235303132363139, 0, '', 3, 2, 0x4869746879736869, 139, 1, 0, 1, 1032, 27154, 0, 1, 0x6d772e6e6577, 0, '', 0, 0, 0, 0, NULL, NULL, NULL),
(62407, 0x3230313931313234313434353337, 0, '', 3, 0, 0x496e7374616c6c65725f44656269616e, 1035, 1, 0, 1, 1007, 27141, 0, 1, 0x6d772e6e6577, 0, '', 0, 25623, 0, 0, NULL, NULL, NULL),
(62408, 0x3230313931313232313435323034, 0, '', 3, 0, 0x496e7374616c6c65725f4d7953514c5f7375725f44656269616e, 1024, 1, 0, 1, 660, 27101, 0, 1, 0x6d772e6e6577, 0, '', 0, 49392, 0, 0, NULL, NULL, NULL),
(62409, 0x3230313931313232303032373431, 0, '', 3, 0, 0x496e7374616c6c65725f5265646d696e655f7375725f44656269616e, 1023, 1, 0, 1, 32, 27098, 0, 1, 0x6d772e6e6577, 0, '', 0, 82871, 0, 0, NULL, NULL, NULL),
(62410, 0x3230313931313231323335323237, 0, '', 3, 0, 0x536f6d6d616972655f6465735f7475746f7269656c735f706f75725f6f626a6574735f6e756d65726971756573, 814, 1, 0, 1, 696, 27097, 0, 1, 0x6d772e6e6577, 0, '', 0, 9479, 0, 0, NULL, NULL, NULL),

In fact, I have a doubt, I still think I made an automatic conversion to utf8mb4_unicode_ci.
I used this script: https://github.com/ZerooCool/charset-utf8mb4-collation-utf8mb4_unicode_ci-mysql-mariadb/blob/master/utf8-utf8mb4.sh

The default Mediawiki snacks are, I believe, utf8mb4_general_ci and I would have passed them in utf8mb4_unicode_ci.
Can that play on the overall size of the base, certainly, but, the problem remains open, I think, as I show with the other table previously, values do not seem to be cleaned, whereas according to me, they should be.

It seems after research in my table that I would rather have a problem of collation on my tables and that I actually had to, wanting to modify the collation. I found this in the _logging table which no longer traces the name of the users who connect or who create pages, for about a month, during my experiments on table collation, when I modified the snacks of other tables, for joomla, or, for gitea.

So I am now trying to repair the damage, a script might exist to put the default collation of Mediawiki?

I continue to share my experience with Mediawiki, because, it is not impossible that a funny bug can exist anyway.

I finally reinstalled Mediawiki 1.33.1 completely on a test database.

I imported the articles, in short, I left on a completely new cms, and on a database with clean content.

I notice that the _text table is now 6.5 MB and not the 55 MB that I had then.

On the other hand, I notice that the last problem encountered, is always present, ie in the _logging table the log_user field is no longer filled with the user name that log (creates a page) but it is the value 0 which is present.

So here I am, on a database MariaDB installed on Debian Buster, in version 10.3, so I added 4 lines in the configuration of MariaDB (It can play, or no, it should check)

Anyway, with an installation of the latest version of Mediawiki, at nine, so I noticed this problem, which does not seem fictitious.

So, yes, it is possible that its 4 lines are problematic, but if it is not the case, it is possible that there is still a problem with the core, that should continue to work (at in its way to log users and their actions) despite this (global) configuration of MariaDB, with 4 more lines.

This group is only read by MariaDB-10.3 servers.

[MariaDB-10.3]
innodb_file_format = Barracuda
innodb_file_per_table = 1
innodb_default_row_format = dynamic
innodb_large_prefix = 1

The mistake may come from me, with its four lines, or, here, maybe not.

If I can, I would do the installation on a virtual machine, shortly, with a default mariadb without changing the configuration, and, a standard installation of Mediawiki 1.33.1, to confirm whether or not, there actually has a problem.

Concerning the volume of the _text table the problem seems fixed, or, attenuated and not visible at the moment, following a total reinstallation of version 1.33.1 but it forced me to reinstall everything and re-import everything.

On this second point which I noticed, I would like a return from you, and, once the return obtained, this subject could be regarded totally as closed.

By doing further research, I could see that the table structure for _logging and the log_user field has evolved to 1.33.1 and will be modified again for the next version.

The default value of this log_user field has been changed to 0 in the new 1.33.1 out structure, which is now entered in the field each time.

I no longer have the username that created a page in the log_user field but 0 instead.

I would like a confirmation from you, to know whether or not you still have the name of the user that appears in your log_user field of the _logging table

It could be a mediawiki problem.
It is also possible that it is voluntary for the development of Mediawiki but then I do not understand because it was convenient to be able to see directly who made the modification. (The user ID is displayed in another field, of course, but it's less readable than a nickname that should be in the log_user field)

Conclusion:
1- After reinstalling Mediawiki 1.33.1 the _text table has a normal weight.

2- I would like a return to the _logging table and the log_user field to know whether or not the username that created a page is still registered.

Thank you for your reading, patience, and, of your return.

I cloned and closed this research on my side, after having verified what I discovered.

My initial question has been derived to a question about the _logging table and its log_user_text field because I discovered that the log_user_text data was no longer registered, since the update to version 1.33.1

In the _logging table, the log_user_text field no longer displays the user name, since version 1.33.1, and this, with a default installation of MariaDB and Apache2.

So I tested this well, without being in a particular configuration for UTF8MB4_unicode_ci but in UTF8MB4_general_ci proposed by default by MariaDB, and, without adding lines of configuration for MariaDB which allowed me to migrate to utf8mb4_unicode_ci.

So I guess this is a Mediawiki bug, or a willingness of the technical team to do that, but, I do not understand why this recording is no longer done.

I read correctly that the version 1.33.1 modified the structure of the database, concerning this table, by assigning a default value for the log_user field (it seems to me) to 0, then, to NULL for the version 1.34.1 I have not tested.

I do not know if this problem has been fixed in the future versions of Mediawiki.

Since reinstalling and importing my pages by XML, and images, I no longer have the size problem on the _text table

Only this problem of the _logging table seems really interesting to me.

Thank you for considering it, or, to complete, to properly close this topic, explaining why the developers have made changes to the structure of the table, but, that the field log_user_text is no longer functional.

I updated to 1.33.3 but, for information, now my _text table is 270 MB.

So there is a problem!
I have not created new content for 260MB.

If I use special page, to export the content of my pages, I am always less than 10 MB.

So I have more than 250 MB extra, on my database, in the _text table.

https://www.mediawiki.org/wiki/Manual_talk:Text_table

Does old_flags contain gzip?

If I use special page, to export the content of my pages, I am always less than 10 MB.

I presume you're only exporting the current revision, not the whole history

I presume you're only exporting the current revision, not the whole history

Yes ! Just !

I use all the scripts allowing me to optimize the weight of the database.
Here :
https://wiki.visionduweb.fr/index.php?title=Scripts_de_maintenance_et_scripts_SQL_pour_Mediawiki

The volume of the _text table increases abnormally.

Despite everything, the volume of the _text table increases abnormally.

It would be absolutely necessary to propose a solution, to clean the table _text properly, and, to allow to keep only the last global version of the page.

If i look the maintenance script :

php maintenance/purgeOldText.php --purge
Searching for active text records via contents table...done.
done.
Searching for inactive text records...done.
0 inactive items found.

sudo nano maintenance/purgeOldText.php

<?php
require_once __DIR__ . '/Maintenance.php';

class PurgeOldText extends Maintenance {
        public function __construct() {
                parent::__construct();
                $this->addDescription( 'Purge old text records from the database' );
                $this->addOption( 'purge', 'Performs the deletion' );
        }

        public function execute() {
                $this->purgeRedundantText( $this->hasOption( 'purge' ) );
        }
}

$maintClass = PurgeOldText::class;
require_once RUN_MAINTENANCE_IF_MAIN;

Note : I don't use this ! It's only for find a issue.

If i read the Discussion : https://www.mediawiki.org/wiki/Manual_talk:PurgeOldText.php

->

Category:

Pages using deprecated source tags

I do not know if the scripts I use for maintenance saw the branch on which I sit, but, in my opinion, there is a real work to follow, to propose a script capable of cleaning properly _text

In my case, indeed, only the latest version interests me.
So I'm in a special case, since, a wiki must be used to keep the history.
Despite everything, this methodology is very heavy, for the database, and, the system resources.

In my case, I wish to be able to use the latest version, since it is a personal summary for the pages I write.
The versioning is practical, during the writing, but, I wish to be able to launch every day, a script which cleans the base, to only offer an optimized version, and, I then delete the revisions.

Currently, I do so:

maintenance/deleteOldRevisions.php --delete
maintenance/rebuildrecentchanges.php
maintenance/deleteArchivedRevisions.php --delete
maintenance/deleteAutoPatrolLogs.php
maintenance/rebuildall.php

Reedy renamed this task from Help for reducte the size from table _text to Help for reduce the size from table _text.Jun 11 2020, 5:44 PM

This has now also been posted at https://www.mediawiki.org/wiki/Topic:Vo3botxgubpfnea7 so discussion will be fragmented in several places.