Page MenuHomePhabricator

refactor imagehash.phash and imagehash.dhash
Closed, ResolvedPublic

Description

s55462__imagehash tables phash and and dhash have approx 20M rows each and it takes 4GB+4GB on disk. Target is to refactor it so that tables would be smaller

Changes

  • url field is dropped.
  • DROPPED FROM DB: source is source_id and source is another table
  • DROPPED FROM DB: type is type_id and type is another table
  • thumb_width = int
  • width = orig_width()
  • height = orig_height()

Current db

MariaDB [s55462__imagehash]> describe dhash;
+---------+---------------------+------+-----+---------------------+----------------+
| Field   | Type                | Null | Key | Default             | Extra          |
+---------+---------------------+------+-----+---------------------+----------------+
| id      | int(11)             | NO   | PRI | NULL                | auto_increment |
| commons | int(11)             | YES  | MUL | NULL                |                |
| hash    | bigint(20) unsigned | YES  | MUL | NULL                |                |
| type    | varchar(32)         | YES  |     | NULL                |                |
| source  | varchar(32)         | YES  |     | NULL                |                |
| width   | int(11)             | YES  |     | NULL                |                |
| height  | int(11)             | YES  |     | NULL                |                |
| created | timestamp           | NO   |     | current_timestamp() |                |
| url     | varchar(1024)       | YES  |     | NULL                |                |
+---------+---------------------+------+-----+---------------------+----------------+

Current size

MariaDB [s55462__imagehash]> SELECT (data_length+index_length)/power(1024,3) tablesize FROM information_schema.tables WHERE table_schema='s55462__imagehash' and table_name='phash';
+-------------------+
| tablesize         |
+-------------------+
| 4.044891357421875 |
+-------------------+
1 row in set (0.001 sec)

MariaDB [s55462__imagehash]> SELECT (data_length+index_length)/power(1024,3) tablesize FROM information_schema.tables WHERE table_schema='s55462__imagehash' and table_name='dhash';
+-------------------+
| tablesize         |
+-------------------+
| 4.066375732421875 |
+-------------------+
1 row in set (0.001 sec)

New db table

MariaDB [s55462__imagehash]> describe phash;
+-------------+--------------------------------------------------------------+------+-----+---------------------+----------------+
| Field       | Type                                                         | Null | Key | Default             | Extra          |
+-------------+--------------------------------------------------------------+------+-----+---------------------+----------------+
| id          | int(11)                                                      | NO   | PRI | NULL                | auto_increment |
| commons     | int(11)                                                      | YES  | MUL | NULL                |                |
| hash        | bigint(20) unsigned                                          | YES  | MUL | NULL                |                |
| width       | int(11)                                                      | YES  |     | NULL                |                |
| height      | int(11)                                                      | YES  |     | NULL                |                |
| created     | timestamp                                                    | NO   |     | current_timestamp() |                |
| thumb_width | int(11)                                                      | YES  |     | NULL                |                |
| thumb_type  | enum('default','lossy-page1','lossy-page2','lossless-page1') | YES  |     | NULL                |                |
+-------------+--------------------------------------------------------------+------+-----+---------------------+----------------+

Related Objects

StatusSubtypeAssignedTask
OpenZache
ResolvedZache

Event Timeline

Zache changed the task status from Open to In Progress.Mon, Apr 22, 5:48 AM
Zache moved this task from Blocked to Doing on the Fiwiki-Wikidata-Commons board.
Zache updated the task description. (Show Details)

partially done. String fields source and type are dropped from db, url is still populated but it can be dropped when frontend code has been changed.

Zache updated the task description. (Show Details)

url string field for thumbnails is now replaced with thumb_width (int) and thumb_type (enum) fields and url is dropped from db.