The implementation should be done using the same steps as for the wb_terms table, see T114903
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Addshore | T114902 Remove numeric entity IDs from database schema | |||
Declined | None | T114904 Migrate wb_items_per_site to using prefixed entity IDs instead of numeric IDs |
Event Timeline
So wb_entity_per_page was killed:
MariaDB [wikidatawiki_p]> SELECT * FROM wb_entity_per_page LIMIT 1;
ERROR 1356 (HY000): View 'wikidatawiki_p.wb_entity_per_page' references invalid table(s) or column(s) or function(s) or definer/invoker of view lack rights to use them
The wb_terms table offers a good alternative to change queries to:
MariaDB [wikidatawiki_p]> SELECT * FROM wb_terms LIMIT 1;
+-------------+----------------+---------------------+------------------+---------------+-----------+------------+-----------------+-------------+
term_row_id | term_entity_id | term_full_entity_id | term_entity_type | term_language | term_type | term_text | term_search_key | term_weight |
+-------------+----------------+---------------------+------------------+---------------+-----------+------------+-----------------+-------------+
363474 | 50445 | Q50445 | item | en-gb | label | Jungwon-gu | jungwon-gu | 0 |
+-------------+----------------+---------------------+------------------+---------------+-----------+------------+-----------------+-------------+
1 row in set (0.00 sec)
The wb_items_per_site offers no such option:
MariaDB [wikidatawiki_p]> SELECT * FROM wb_items_per_site LIMIT 1;
+------------+-------------+-------------+------------------+
ips_row_id | ips_item_id | ips_site_id | ips_site_page |
+------------+-------------+-------------+------------------+
55 | 3596065 | abwiki | Џьгьарда |
+------------+-------------+-------------+------------------+
1 row in set (0.01 sec)
The killing of wb_entity_per_page broke all sorts of tools with no alternative available.
@Multichill wb_items_per_site is *always* items. So to get the full item ID, just use concat('Q', ips_item_id).
That's not very nice, but viable I think. We could even make a virtual column ips_full_entity_id on the labs view...
I'm dropping the prio back to normal, since the issue seems easy enough to resolve. Please let me know if there are big performance issues with using concat('Q', ips_item_id) in joins, though.
I guess you missed T178459 ? Easy to read over, so yes, we currently experience performance problems. Queries don't complete in time or at all when using the concat hack.
Giving the size of the table, changing this shouldn't be overly horrible. It's a fair bit of migration work… but I assume doing this for maintenance queries and consistency is worth it.
Note: I just also found T179793: Consider dropping the "wb_items_per_site.wb_ips_site_page" index while looking at this… maybe this can be done at once?!
We should also consider a solution that will only put the full ids on labs. We don't actually need it in production. Could be a separate table, kept up to day by a trigger.
The query @Multichill is running is complex on its own and even if we resolve this task, it'll be still a very slow query, what I would recommend is to make a temporary table in labs for exactly this tool (which happens all the time, for example Magnus does it too) and query directly from there.
Look at the query that is used to get missing articles for "List of articles every Wikipedia should have" https://quarry.wmflabs.org/query/26700 There are 2 joins:
- For iwlinks/wb_items_per_site join I have to do either CONCAT ('Q', ips_item_id) or TRIM('Q' FROM iwl_title)
- For wb_items_per_site/page join I have to do REPLACE(ips_site_page, ' ', '_') or vice versa and potentially take care of non-default namespace
Isn't it ironic that wikidatawiki_p.wb_items_per_site has 2 columns that is potentially joinable with wikipedia tables and both of them requires format transformation?