Page MenuHomePhabricator

Create new DB table for storing wikis of event
Closed, ResolvedPublic5 Estimated Story Points

Description

Acceptance Criteria:

  • Define the updated database schema for storing wikis of event
  • Implement a schema change for it
  • Existing events must be treated as if "no wikis" was chosen
  • Create the schema in production and beta

Checklist

  • Design the schema
  • Make schema change patch for it
  • Get initial reviews (+1s)
  • Tag this task with DBA
  • Request DBA signoff
  • Get approval
Deployment
  • Merge the schema patch
    • This will automatically create the new table in beta
  • Wait until the patch reaches production
  • Schedule a dedicated deployment window, note the time here
  • Create the table everywhere:
    • mwscript sql.php --wiki=testwiki --cluster=extension1 /srv/mediawiki/php-1.44.0-wmf.1/extensions/CampaignEvents/db_patches/mysql/patch-add-ce_event_wikis.sql
    • mwscript sql.php --wiki=test2wiki --cluster=extension1 /srv/mediawiki/php-1.44.0-wmf.11/extensions/CampaignEvents/db_patches/mysql/patch-add-ce_event_wikis.sql
    • mwscript sql.php --wiki=officewiki --cluster=extension1 /srv/mediawiki/php-1.44.0-wmf.11/extensions/CampaignEvents/db_patches/mysql/patch-add-ce_event_wikis.sql
    • mwscript sql.php --wiki=metawiki --cluster=extension1 --wikidb wikishared /srv/mediawiki/php-1.44.0-wmf.11/extensions/CampaignEvents/db_patches/mysql/patch-add-ce_event_wikis.sql

Event Timeline

cmelo changed the task status from Open to In Progress.Nov 8 2024, 12:36 AM
cmelo claimed this task.
cmelo set the point value for this task to 2.

Change #1088392 had a related patch set uploaded (by Cmelo; author: Cmelo):

[mediawiki/extensions/CampaignEvents@master] Create new DB table for storing wikis of event

https://gerrit.wikimedia.org/r/1088392

cmelo changed the point value for this task from 2 to 5.Nov 8 2024, 2:10 PM

Hello there! We would like to request a DBA review for a new table.

Context

This functionality will allow event organizers to specify whether an event is targeting specific wikis (and what these would be). We're currently considering a limit of 100 wikis per event, although in practice I believe the numbers would be much slower. We're also letting organizers pick "All wikis" instead of selecting specific ones, for global campaigns and the like.

Schema

See r1088392:

CREATE TABLE /*_*/ce_event_wikis (
  ceew_id BIGINT UNSIGNED AUTO_INCREMENT NOT NULL,
  ceew_event_id BIGINT UNSIGNED NOT NULL,
  ceew_wiki VARBINARY(64) NOT NULL,
  UNIQUE INDEX ce_event_wikis_event_id_wiki (ceew_event_id, ceew_wiki),
  PRIMARY KEY(ceew_id)
) /*$wgDBTableOptions*/;

The structure is straightforward: we have a surrogate primary key (ceew_id), a foreign key to campaign_events.event_id (ceew_event_id), and a column (ceew_wiki) to store a wiki ID, or a special value that identifies the "all wikis" selection.

In particular, note that this is an ancillary table, meant to be used for joins while querying the main campaign_events table. More on this below.

Information requested

Should this table be replicated to wiki replicas (does it not contain private data)?

The table will be in x1, so it won't get replicated anyway. Nonetheless: the data itself can be considered public, except maybe for deleted events; but since we have no views for the CampaignEvents tables in x1, that's not doable for the time being.

Will you be doing cross-joins with the wiki metadata?

Only joins with other CampaignEvents table, particularly campaign_events.

Size of the table (number of rows expected).
Expected growth per year (number of rows).

I'm not sure if we have estimates for the average number of wikis that each event will be associated with. I think 10 would be quite generous. In any case, we would expect that many rows for each row in campaign_events. There have been ~800 new events created in the last year in wikishared, so an average of 10 wikis per event would mean ~8000 rows/year.

Expected amount of queries, both writes and reads (per minute, per hour...per day, any of those are ok).

Should be roughly the same as the campaign_events table, and you probably know better than me where to find statistics on that ;)

Examples of queries that will be using the table.

The table will mostly be used in conjunction with the campaign_events table. In "controlled" queries (like loading a specific event when on the event page), we've decomposed the joins and therefore we just have a query on the main campaign_events table (usually, but not necessarily, by primary key), and then separate queries on the ancillary tables; in this case, that could be:

SELECT * FROM `ce_event_wikis` WHERE ce_event_wikis = 123;

However, there are also places (mostly Pager classes) where we're going to use a single query with multiple joins. For example, here's what the current query for Special:AllEvents looks like (Special:MyEvents is very similar):

SELECT
  event_id, event_name, event_page_namespace, event_page_title, event_page_prefixedtext, event_page_wiki, event_status, event_start_utc, event_end_utc, event_meeting_type, num_participants
FROM
  (
    SELECT
      event_id, event_name, event_page_namespace, event_page_title, event_page_prefixedtext, event_page_wiki, event_status, event_start_utc, event_end_utc, event_meeting_type, COUNT(cep_id) AS `'num_participants`
    FROM
      `campaign_events`
      LEFT JOIN `ce_participants` ON (
        (event_id = cep_event_id)
        AND cep_unregistered_at IS NULL
      )
      JOIN `ce_organizers` ON (
        (event_id = ceo_event_id)
        AND ceo_deleted_at IS NULL
      )
    WHERE
      event_deleted_at IS NULL
    GROUP BY
      cep_event_id, event_id, event_name, event_page_namespace, event_page_title, event_page_prefixedtext, event_page_wiki, event_status, event_start_utc, event_end_utc, event_meeting_type
  ) `tmp`
WHERE
  (event_end_utc >= '20241111000000')
ORDER BY
  event_start_utc,
  event_id
LIMIT
  51

We are gonna let users filter events by wiki, so we would likely add a join like:

JOIN `ce_event_wikis` ON (
    event_id = ceew_event_id
    AND ceew_wiki IN ( 'awiki', 'bwiki', ... )
)

The release plan for the feature (are there specific wikis you'd like to test first etc).

Our usual plan is to go beta first, then testwiki + test2wiki + officewiki, then all production wikis (which use the same table in wikishared).

Generally speaking it looks good. I have a couple of comments:

Otherwise it seems straightforward enough.

Generally speaking it looks good.

Thank you!

  • I considered suggesting normalizing the wiki id but the given the size of wiki ids (fives bytes usually), the gain will be minimal and the extra overhead is not worth it.

I had the same thought for a brief moment but immediately discarded it for that exact reason. Happy to see that I was right in doing so :D

I'll look into this 👀👀👀

Change #1090851 had a related patch set uploaded (by Daimona Eaytoy; author: Daimona Eaytoy):

[operations/puppet@production] tables-catalog: Add CampaignEvents and WikimediaCampaignEvents

https://gerrit.wikimedia.org/r/1090851

Change #1088392 merged by jenkins-bot:

[mediawiki/extensions/CampaignEvents@master] Create new DB table for storing wikis of event

https://gerrit.wikimedia.org/r/1088392

Change #1090863 had a related patch set uploaded (by Daimona Eaytoy; author: Daimona Eaytoy):

[mediawiki/extensions/CampaignEvents@master] Wire up ce_event_wikis table creation to the updater

https://gerrit.wikimedia.org/r/1090863

Change #1090851 merged by Ladsgroup:

[operations/puppet@production] tables-catalog: Add CampaignEvents and WikimediaCampaignEvents

https://gerrit.wikimedia.org/r/1090851

Change #1090863 merged by jenkins-bot:

[mediawiki/extensions/CampaignEvents@master] Wire up ce_event_wikis table creation to the updater

https://gerrit.wikimedia.org/r/1090863

After deploying 1.44.0-wmf.4 to group0 in T375663, Logstash lists

Error 1146: Table 'testwiki.ce_event_wikis' doesn't exist
Function: MediaWiki\Extension\CampaignEvents\Event\Store\EventWikisStore::getEventWikisMulti
Query: SELECT  ceew_event_id,ceew_wiki  FROM `ce_event_wikis`    WHERE ceew_event_id = 163

for the URI https://test.wikipedia.org/wiki/Event:T347608.

Is that an intentional and ignorable side effect as we proceed with the train (as the task description has "Create the table everywhere" not yet checked)?

Edit: Sorry, I just realized this output also happened already with previous 1.44.0-wmf.3 on testwiki.

Is that an intentional and ignorable side effect as we proceed with the train (as the task description has "Create the table everywhere" not yet checked)?

Oh my. No it isn't intentional, it seems that some of the queries are not behind the feature flag. Creating a separate task now.

Mentioned in SAL (#wikimedia-operations) [2025-01-13T13:10:33Z] <Daimona> Creating new DB tables for the CampaignEvents extension in x1.testwiki, x1.test2wiki, x1.officewiki, and x1.wikishared # T379294 T381424