Page MenuHomePhabricator

Reconsider global vs local implementation of the event DB
Closed, ResolvedPublic

Description

DECISION for v1:
  • Go ahead with option 1: leave it as it is (meaning we use a global DB), which means we will need to:
    • Always redirect the users to the wiki where the event registration was created, for any action the organizer can perform:
      • Special:EventDetails
        • Send message to participants
        • Remove participants
        • Search participants
        • Special:EditEventRegistration
    • Since on Special::MyEvents we need to list all the events of the user, we still need to save the Global user ID
  • In parallel, we will get more information from DBAs to prove our decision & logic around it is sound

The database of events can be configured to be central (i.e., shared by multiple wikis), and we implemented it this way because we thought it would bring some advantages to users (like a single event calendar). Another reason was (I believe) the fact that we were also considering implementing it outside MediaWiki.

As I already explained to @ifried, I'd like to reconsider this decision. While it's definitely true that a central DB has some advantages, it also means that we're often hit by the many limitations stemming from a lack of proper support for "global user accounts". Here are some examples of things that would be much easier to do with a local database, and that caused us problems in the past and in the present:

  • More complicated schema: we need to store the wiki ID of event pages, as well as the full title of the event page (including the formatted namespace) - T307358 and all the related tasks like T311582 and T307108, as well as the related core that you can find in the "mentions" section of T307358;
  • Cannot easily implement talk page and email messaging features - T318378
  • Filtering usernames by participants is impossible to do in pure SQL - no task for it
  • Probably more that I can't think of right now...

Even in contexts where having a central DB would be useful, like the event calendar, there would be some things to figure out; for instance, you'd need a way to filter events that are relevant to you.

All in all, at this point I'm no longer convinced that a central DB is a good idea, and would like to discuss/brainstorm this with the team. I've been thinking about this for a while and firmly believe that we should talk about this before it's too late. Speaking for myself, I have the feeling that we've already entered the territory of sunk cost fallacy, and want to GTFO of it ASAP.

Just to be clear, I'm not saying we shouldn't leave it central. Again, I think it has its pros. But I do want to be convinced that it's still a good idea. I want to be convinced that we can make it work, and that things will be better in the future. I want to be convinced that we're not just leaving it central because of the sunk cost fallacy.


Brainstorming on meta: https://meta.wikimedia.org/wiki/Campaigns/Foundation_Product_Team/Central_vs_local_database

Event Timeline

Thanks for bringing this up @Daimona, I was also thinking on it after our last engineering meeting when I asked about what would be the behavior if an organizer creates an event, lets say on "test2wiki", and then goes to the event details page on "officewiki" (I know for "officewiki" and "test2wiki" we will use the local DB, using this example just to explain what I mean) and try to send message to participants, because the participants may exist on "test2wiki" but not on "officewiki" or may have different settings, and so on, so I am also having the feeling that we need to talk about this ASAP, because we already have things like only allow organizers to edit the event registration if they are at the same wiki they create the event registration, and I see that it will become a common behavior for other features as well (I mean block the users to do something if they are not on the wiki that the registration was created, send message to participants maybe would be another example).

If I recall correctly we want the DB to be global to be able to have the calendar and display the list of all events on any wiki, so I was wondering if we could do this without a global DB, maybe something like use the API endpoint that returns the list of events, and on a configuration file we could add the list of wikis we deployed our extension, and when listing the events, or the calendar in the future, we could call the APIs on each wiki to get the list of all events. I am not sure if this is possible or even a good idea, but we can discuss this later, when we schedule the meeting, thank you!

@ldelench_wmf could you please schedule this meeting for us?
cc: @ifried, @vyuen

If I recall correctly we want the DB to be global to be able to have the calendar and display the list of all events on any wiki,

Correct, but as I noted in the task description, I am not 100% convinced that it would be a good idea to include results from all wikis. It probably is, as long as users can select the wikis to show events from, but still, I'd like to talk about it.

so I was wondering if we could do this without a global DB, maybe something like use the API endpoint that returns the list of events, and on a configuration file we could add the list of wikis we deployed our extension, and when listing the events, or the calendar in the future, we could call the APIs on each wiki to get the list of all events. I am not sure if this is possible or even a good idea, but we can discuss this later, when we schedule the meeting, thank you!

I believe it would be super inefficient without a global DB, having 900 wikis to pull data from. Maybe it could be a combination of local + global, but yeah, definitely something worth discussing...

@ldelench_wmf could you please schedule this meeting for us?
cc: @ifried, @vyuen

(Just noting that Lauren is already aware and should schedule something in the upcoming days)

Noting that whether it should go in central or per-wiki, it should still go to x1. this cluster has one database per wiki (so 900 dbs) and one database called wikishared for central tables. It can have tables in both as well. Moving these tables to core dbs won't give you much benefit (except ability to join with core tables like revision) but it would put pressure on core dbs that are already under quite stress.

Moving these tables to core dbs won't give you much benefit (except ability to join with core tables like revision) but it would put pressure on core dbs that are already under quite stress.

Joining with other core tables (actor/user in particular) is actually one of the things we wanted to do. OTOH, if the extension tables are per-wiki, we can alternatively store the usernames in our table, and handle the user rename hooks as appropriate to update them. This is much easier with local users than it is for global accounts.

Having the schema in per-wiki databases on x1 also seems fine for now, although I would like to know what others (@cmelo , @MHorsey-WMF , @vyuen) think.

I absolutely agree, per-wiki db simplifies a whole lot of things, and for the rare cases where we need global db I think there are alternatives.

DECISION for v1:

  • Go ahead with option 1: leave it as it is (meaning we use a global DB), which means we will need to:
    • Always redirect the users to the wiki where the event registration was created, for any action the organizer can perform:
      • Special:EventDetails
        • Send message to participants
        • Remove participants
        • Search participants
        • Special:EditEventRegistration
    • Since on Special::MyEvents we need to list all the events of the user, we still need to save the Global user ID
  • In parallel, we will get more information from DBAs to prove our decision & logic around it is sound

Pushing to product sign-off; will create followup task re: Hash out details for v1 release (e.g., decide on what database we will use for v1)

The decision has been reached by the engineers, and I have no major concerns with it. I'm marking this work as Done.