There will be three new tables:
- file
- filerevision
- deleted_files.
- (more?)
Details of the schema needs to be hashed out, added, and merged. Preferably with POC so you can try read and write locally and see how it looks like.
There will be three new tables:
Details of the schema needs to be hashed out, added, and merged. Preferably with POC so you can try read and write locally and see how it looks like.
Status | Subtype | Assigned | Task | |
---|---|---|---|---|
· · · | ||||
Open | Ladsgroup | T28741 Migrate file tables to a modern layout (image/oldimage; file/file_revision; add primary keys) | ||
Open | Ladsgroup | T368113 Design and merge the new tables of file tables | ||
· · · |
deleted_files
Note currently we do not use a table to store deleted pages. One of solutions in T20493: RFC: Unify the various deletion systems represents deleted pages using one bit field, so there are no need for a deleted pages (or archive/deleted revisions) table. Similarly we can use a bit field to indicate whether a file is deleted. This will also have the benefit of keeping the (upcoming) file ID upon deletion and undeletion.
Per T28741#9912401, we may want a new table to stored normalized img_media_type, img_major_mime and img_minor_mime.
Per parent task needed columns are:
file table:
filerevision:
This might affect some data we sqoop into HDFS and some of how we compute commons impact metrics or similar future metrics. We have to wait until a schema change is proposed to know for sure.