Page MenuHomePhabricator

Remove rows about file metadata from text table
Open, Needs TriagePublic

Description

Since T275268: Address "image" table capacity problems by storing pdf/djvu text outside file metadata part of file metadata can be stored in text table, and in Wikimedia what is stored in text table is just a pointer to external storage. After T362566: Stop growth of text table by storing ES addresses in content table img_metadata can refers to external storage directly, so no new text table row will be added.

By querying img_metadata for djvu files in Commons, we will find a number of rows still refer to text table. They should refer to external storage directly, then the text table rows can be deleted.

Note after this there may be still rows for file metadata in text table: when file metadata is refreshed (rare for existing files), we add new blobs but not delete old ones. Therefore some rows in text table may be orphaned, but there are no obvious way to find them.

See also: T183490: MCR schema migration stage 4: Migrate External Store URLs (wmf production)

Event Timeline

Change #1120207 had a related patch set uploaded (by Zabe; author: Zabe):

[mediawiki/extensions/WikimediaMaintenance@master] Add migration script for migrating references of es to file revision table

https://gerrit.wikimedia.org/r/1120207

Zabe subscribed.

We can do this once we are reading from filerevision.

Note in most wikis (other than Commons) there are not many (potentially zero) files, and T381599: Migrate current references of text table rows from afl_var_dump would be more important.

Note filearchive table also has a column fa_metadata which is comparible with img_metadata, oi_metadata and fr_metadata and thus may contain references to text table. This table is not yet touched in T28741 and the fate of the table may be decided in future T20493.