The badges of sitelinks should be stored in another table to make them queryable. This table should contain a badge id and the sitelinks site and wiki. Alternativly, we can add the badges to the wb_items_per_site table but this might become tricky because there is a primary key on the siteid/page columns so we cannot create a new row for each badge but have to put them all into one value which then makes querying them harder and not really efficient.
|Resolved||Amire80||T70071 [Compact links] Prioritise interwikis to featured pages|
|Declined||None||T70067 Tell in sitelink whether target is the "preferred" language for the topic/place of the article|
|Resolved||• Addshore||T42810 Wikibase badges (tracking)|
|Resolved||Lydia_Pintscher||T110838 [Story] As a user I want to see all pages on a wiki having a specific badge assigned|
|Open||None||T96040 Wikibase special pages (tracking)|
|Resolved||Bene||T72209 [Task] Create a special page to query for badges|
|Resolved||Bene||T72229 [Task] Store the badges of sitelinks in a queryable format|
I think this should get higher priority because it blocks removal of the badge templates from some Wikipedias. I want some discussion in the development team first how the database table should look like before we proceed with implementing this part of the badges.
I'm not sure I agree that we need such a table (but I can be convinced).
In any case, the basic structure should be:
( badge, siteId, page )
with the primary key covering all three fields, in this order. This allows us to efficiently find all pages that have a given badge, on all sites or a given site.
If we also want to be able to find all the badges associated with a given page (on a given site), we'd need another index covering ( siteId, page ). This would be useful if we wanted to show badges in listings of pages (e.g. add icons for special pages in the Special:Allpages listing.
For convenience, we may want to have a column for the associated item id. This would remove the need to join against wb_item_per_site in cases where we need to know the item ID. However, we generally don't care about the associated item in this context, do we?
I agree that badges should be stored in another table. By the way, I was told that the current way of accessing the badges is through JSON blob but I am not sure how to do that. It would be a good idea to make these items live on their own, considering the fact that tool developers are mostly using direct SQL queries to obtain what they need.
I don't think there is good solution to this other than have the badges in a database table.
We have two options:
- schema change on the wb_items_per_site table to add a badge column (null by default).
- add a new table that can be joined to the ips_row_id column of wb_items_per_site, containing ips_row_id + badge id
with option 1, in most cases, the value would be null and maybe we want to go with option 2 to keep wb_items_per_site smaller, or maybe the impact isn't too excessive and option 1 would be ok.
Option 1 will be a bit tricky because there may be several badges for one sitelink and thus one column could contain several badges (?) This obviously cannot work so perhaps we need option two.
I'm not sure if it is better to include ips_row_id or siteId, page pairs. I rather tend to Daniel's suggestion because we will query for badges on a specific site which requires a join otherwise. Including the item id may also be helpful at some points.
The wording in this statement is not correct. Of course option 1 can work but we will face some anomalies (I only found the German Wikipedia article on that topic) because we'd have to extend the unique key of site_id and page to site_id, page and badge_id. This has to be handled carefully for insert and delete operations. However, this solution is way more efficient than option 2 because it avoids joins between the two tables. Furthermore, as we only have a very small percentage of good or featured articles the redundancy caused by this solution is negligible.
To sum up, both options are arguable. Option 1 is more efficient but sensitive in terms of insert/delete operations. Option 2 requires joins but is free of anomalies.
I'll try to summarize:
- adding a badge_id column to wb_items_per_site won't work because there may be more than one badge per page.
- referencing ips_row_id from another table would work, but would be brittle, since row ids are not stable, and we have no triggers or foreign key constraints to keep this in sync.
- an extra table with ( badge, siteId, page ), and a unique key covering these three fields, as well as an index on ( siteId, page ) for update, seems the best option.
The SiteLinkTable class should be extended to maintain the new table. Care should be taken to avoid orphan entries in the badge table, which do not have a corresponding entry in the pagelinks (items_per_site) table.
Thinking about it, we could use page_props on the client wikis to store badges (one key per badge, the value would be irrelevant). That would not need any new table, or any schema change. But it would not allow cross-wiki queries. Looking at the proposed implementation though, efficient queries across clients would not be supported anyway.
Is it possible to change the page props when the site doesn't get updated? How would you synchronize the page props with the actual badges on Wikidata. Also, is it possible to store multiple values in the page props? There can be pages with several badges.
We sync the page_props with the badges on wikidata the same way we sync sitelinks: we update the parser cache on change notifications. The relevant code is already in SidebarHookHandlers. It even puts badges into page_props, but currently all badges into one property, instead of using separate property keys for the different badge ids.