Page MenuHomePhabricator

Investigation: How can we store worklists that will be shared by organizers who use Event Invitations?
Closed, ResolvedPublic

Description

As an organizer, I want the worklist associated with my Invitation List saved, so that I can easily refer to it later and understand why certain names came up in my Invitation List.

As a product manager, I want the worklist associated with the Invitation Lists to be saved, si we can later potentially use the basis of worklist storage for a worklist and/or event page creation project.

Background: We will be asking organizers to share worklists of Wikipedia articles with us as part of Event Invitations. These worklists can be useful to save for Event Invitations, if organizers want to refer to old Invitation Lists in the future and remember why names came up in the list. However, it is especially useful for other projects we may do in the future to make it easier for organizers of events or WikiProjects to create, manage, and curate worklists.

Relevant links:

Common tools to build worklists:

Acceptance Criteria:

  • Investigate how we can store worklists that are shared by organizers who use Event Invitations so that they are always available (i.e., not deleted)
    • Note that we will start the MVP for Event Invitations by only allowing it as a feature for those with the Event Organizer right on Wikipedia wikis that have the CampaignEvents extension enabled
    • Note that an organizer can use Event Invitations without using Event Registration
  • Provide recommendation on how this data can be stored
  • Provide recommendations on what limitations or questions we can consider
    • For example: We probably want a limit on the number of articles in the worklist.
  • Flag any concerns or dependencies we should consider

Event Timeline

ifried added a subscriber: Sadads.

@Sadads Do you have a recommendation on what could be a good maximum number of articles to be allowed in a worklist per event that uses Event Invitations? For example, 300 articles? 500 articles?

ifried updated the task description. (Show Details)
cmelo changed the task status from Open to In Progress.Apr 25 2024, 12:18 PM
cmelo claimed this task.

Recommendation on how this data can be stored:

We could have 3 tables:

  1. One for the worklists.
  2. Another for the articles of the worklist.
  3. Another one for the users per article returned when the event invitation is run.

The Tables:

  • ce_worklists:
    • cew_id: ID auto increment
    • cew_name: varchar(100) worklist name
    • cew_event_id: ID (nullable), if null: it means the organizer created it without an event registration on our tool.
    • cew_created_at: timestamp
    • cew_edited_at: timestamp
    • cew_deleted_at: timestamp
  • ce_worklists_articles:
    • cewa_id: ID auto increment
    • cewa_article_url: varchar(100) worklist name
    • cew_id: ID that links the article with the worklist
    • cewa_created_at: timestamp
    • cewa_edited_at: timestamp
    • cewa_deleted_at: timestamp
  • ce_worklists_user_by_article (don't worry, we will choose a better name for this table later):
    • cewuba_id: ID auto increment
    • cewuba_user_id: int (central user id)
    • cewuba_username: varchar(100) worklist name
    • cew_id: ID link to ce_worklists table
    • cewa_id: ID links the user to the article of the worklist
    • cewuba_score: int
    • cewuba_edits_amount: int (extra, see the last question on this comment)
    • cewuba_reverts_amount: int (extra, see the last question on this comment)
    • cewa_created_at: timestamp
    • cewa_edited_at: timestamp
    • cewa_deleted_at: timestamp

I think the DB structure proposed above allows us to have a good balance between performance when storing/retrieving this data, and scalability.
We can start allowing only one worklist per event, but the DB will be ready to implement multiple worklists per event if needed.

Provide recommendations on what limitations or questions we can consider:

  • Number of worklists per event.
  • Since organizers can create worklists without an event registration, we need to set a limit of worklists per day, or per hour.
  • Limit of articles per worklist.
  • Limit of returned users

Other questions and concerns:

  • Do we want organizers to be able to edit a worklist?
    • If yes, what can they edit?
      • Worklist name?
      • Attach an event ID to it in case they created it without the event registration enabled and then decide to add a registration?
      • Add/update/remove articles?
        • If we want to allow this, I think it should only be allowed if the organizer did not click on the button to generate the invitation list. Editing it after the invitation list is generated would be really complicated and maybe not doable. I mean, double is a tricky work, there is always a way to do it, but it would add a lot of complexity like:
          • If the list has been generated and an article is added, we will need to run the list again but this time just run the new added article.
          • If an article is changed or removed and the invitation list has been generated or was already generated, we will need to delete the users that were related to the changed/removed article(s), but the users may also be related to other article(s) so we would need to store more data, and check if the users are not in other articles before removing them or even delete all and run the invitation list job again.
  • Do we want organizers to be able to delete a worklist?
    • Deleting is easier than editing. Just a note, I would allow deletion only if the organizer did not ask to generate the invitation list or the invitation list is already generated, this is to avoid things like, the job is running the invitation list, the organizer deletes the worklist then the job will still try to store the data related to that worklist which means that it will fail or will not need data because the worklist does not exist on the DB anymore.
  • Can we use hard delete if we allow deleting worklist or do we want it to be a soft delete, so the organizer can restore it, or even if they will not be able to restore it, we want this
  • What other data would we like to store about the users by article, I mean for now we have the score, but do we want more data, like:
    • cewuba_edits_amount: int
    • cewuba_reverts_amount: int
    • last_edit_date_in_the_article: timestamp
    • last_edit_date: timestamp
    • any other data about the user/article

Final notices

  • The table and columns names may change
  • DB Indexes will be added only after a final decision
  • Columns may be added or removed depending on the answers for the questions above
  • If the need is just to store and display the worklist data, the better way to do it would be just store a json blob, it would be fast to implement as well

Screenshot 2024-05-07 at 00.25.43.png (1Γ—1 px, 294 KB)

Daimona subscribed.

(We discussed this a couple days ago. Another option to explore is using the main stash to store JSON blobs instead of having a dedicated schema. Semi-persistence should be fine, as the invitation lists are not critical, and are only relevant for a short period of time anyway. There might be a few technical details to hash out though, such as any size limits, and whether the main stash would be suitable in general.)

Hi @Daimona, @ifried, @MHorsey-WMF, @VPuffetMichel

Based on what was shared about aggregate analytics, it sounds that a more structured database is needed in order to be able to create reports like:

  • Last year, you organized three events and 10% of the people you invited ended up joining your events
  • This year, you organized three events and 20% of the people you invited ended up joining your events

Sounds that the best way to go is with a dedicated schema, we also mention that we would like this data to be available for at least 1.5 year, since this is not considered PII data, we can store it forever if we want.

That said, I would like to check if you all agree to go with a dedicated schema, and if the one provided in the first comment of this task has all we need.

Hi @cmelo! Your summary of expectations is correct, so a dedicated schema makes sense to me. I look forward to hearing what others have to say!

I've reviewed the schema again given all the conversations that happened recently, but I still feel like there's a lot I don't know and/or we haven't talked about and that makes it difficult to tell whether something would work.

First of all, I'm still not entirely certain that JSON blobs aren't an option for either the worklists or the user lists. There's no doubt that not using blobs allows for more flexibility going into future iterations, but it also adds complexity. Given the current requirements, it feels like the only reason not to use blobs is data analysis (T362897#9808688), but is this something we're sure of? As in, have we confirmed that JSON blobs would not be compatible with the tooling used to generate these reports (or at least would make it more cumbersome)?

Second, there were conversations about invitation lists being global vs local, but I'm not finding anything on phab and I forgot what the decision was on that. This would need to be documented somewhere, and then the schema may have to be updated accordingly. I'll highlight (some of?) these changes below.

  • ce_worklists:
    • cew_name: varchar(100) worklist name

Might be worth making this longer, in case of events with long names. A standard tinyblob (255 B) would do.

  • cew_edited_at: timestamp
  • cew_deleted_at: timestamp

I agree with you that we need answers to whether these features will ever exist before making a decision on these fields.

Also, if this table will be in the shared database, I assume we'll need to store the wiki where this worklist was created.

  • ce_worklists_articles:
    • cewa_article_url: varchar(100) worklist name

s/worklist/article/? Also, we never store URLs into the database. This should either be a page ID, or two fields for namespace and title; maybe not the namespace if we'll ever only have pages in the mainspace. Additionally, we will need to store the wiki of the article, assuming we will at some point allow external pages. This would be similar to the campaign_events table.

  • cewa_created_at: timestamp
  • cewa_edited_at: timestamp

What would "create" and "edit" be in this context?

  • cewa_deleted_at: timestamp

I think hard deletion should be fine here?

  • ce_worklists_user_by_article
    • cewuba_username: varchar(100) worklist name

Storing usernames is problematic. Firstly because it adds a lot of redundancy, and MW has been moving in the opposite direction for a few years now with the introduction of the actor table and related fields. Secondarily because usernames are not stable identifiers--people get renamed all the time, and tracking that is possible but cumbersome. We should only store the central user ID, like in other places.

  • cewa_id: ID links the user to the article of the worklist

Would this be useful? Without an indication of how strong the link is (i.e., how much someone has contributed to a given article), we probably wouldn't want to use this for analytics purposes or anything.

  • cewa_created_at: timestamp
  • cewa_edited_at: timestamp
  • cewa_deleted_at: timestamp

How would these fields be used?

  • Do we want organizers to be able to edit a worklist?
  • Do we want organizers to be able to delete a worklist?
  • What other data would we like to store about the users by article, I mean for now we have the score, but do we want more data [...]

I agree that we need answers to these questions before moving forward.

Here are some of my early responses/ideas, but I would also love feedback from @Astinson, @Udehb-WMF, and @EUwandu-WMF on these questions regarding how we handle the worklists that are shared by organizers in order to generate their invitation lists:

Do we want organizers to be able to edit a worklist?

Yes. I think we can expect that some organizers will create a worklist, submit it to generate an invitation list, and then look at the invitation list and think, "This list is too short. I should add more articles to my worklist and try again" or "This list has lots of people who I would not invite. Maybe I should pick articles on more specific topics and try again." However, I think the most important thing is that they should be able to try again, whether it's using a new, updated worklist or editing an existing one. I think it's preferable to simply edit an existing worklist (rather than creating a new one), since a) we're not storing multiple worklists that are very similar, and b) we're not encouraging the organizer to generate a ton of different invitation lists per event (but, rather, to edit and improve one main invitation list).

Do we want organizers to be able to delete a worklist?

Maybe? I want more feedback on this from other folks. Basically, I think it could be useful to organizers to have the ability to delete worklists, in case they decided to radically redo or rebuid their worklist (or if there are use cases that I just can't think of, off-hand)... but I don't think it would be very common or very useful to allow deletion. Curious to hear what other people have to say, since I don't have a strong opinion on this one.

What other data would we like to store about the users by article, I mean for now we have the score, but do we want more data?

Some things we may consider (but not sure if they would all be doable from a Trust & Safety or technical perspective)

  • How to contact them
    • If they can be emailed via wikimail
  • Their editing activity
    • Wikis they have an account on
    • Edit count on the wiki of the event
    • Global edit count
    • Link to contributions page on the wiki of the event
  • Public demographic information
    • How they prefer to be described (they/he/she)
  • Their interest in events and/or WikiProjects
    • Number of events they have attended/will attend (via event registration data)
    • Event pages for events they have attended/will attend (via event registration data)
    • Number of events they have organized/are organizing (via event registration data)
    • Event pages for events they have organized/are organizing (via event registration data)
    • If they're a member of any WikiProjects (perhaps by looking for a WikiProject userbox on their user page... which I understand may be hard to do since there is no standardized template for WikiProject userboxes, I think, but just sharing it as a potential idea...)

I've reviewed the schema again given all the conversations that happened recently, but I still feel like there's a lot I don't know and/or we haven't talked about and that makes it difficult to tell whether something would work.

First of all, I'm still not entirely certain that JSON blobs aren't an option for either the worklists or the user lists. There's no doubt that not using blobs allows for more flexibility going into future iterations, but it also adds complexity. Given the current requirements, it feels like the only reason not to use blobs is data analysis (T362897#9808688), but is this something we're sure of? As in, have we confirmed that JSON blobs would not be compatible with the tooling used to generate these reports (or at least would make it more cumbersome)?

Second, there were conversations about invitation lists being global vs local, but I'm not finding anything on phab and I forgot what the decision was on that. This would need to be documented somewhere, and then the schema may have to be updated accordingly. I'll highlight (some of?) these changes below.

  • ce_worklists:
    • cew_name: varchar(100) worklist name

Might be worth making this longer, in case of events with long names. A standard tinyblob (255 B) would do.

Yes, I agree

  • cew_edited_at: timestamp
  • cew_deleted_at: timestamp

I agree with you that we need answers to whether these features will ever exist before making a decision on these fields.

Also, if this table will be in the shared database, I assume we'll need to store the wiki where this worklist was created.

  • ce_worklists_articles:
    • cewa_article_url: varchar(100) worklist name

Yes, indeed, I will include them

s/worklist/article/? Also, we never store URLs into the database. This should either be a page ID, or two fields for namespace and title; maybe not the namespace if we'll ever only have pages in the mainspace. Additionally, we will need to store the wiki of the article, assuming we will at some point allow external pages. This would be similar to the campaign_events table.

  • cewa_created_at: timestamp
  • cewa_edited_at: timestamp

What would "create" and "edit" be in this context?

These I added by mistake, they are not needed, I will remove them thanks

  • cewa_deleted_at: timestamp

I think hard deletion should be fine here?

Yes, I agree, I don't think we need to restore it not even generate reports on it, I will remove it

  • ce_worklists_user_by_article
    • cewuba_username: varchar(100) worklist name

Storing usernames is problematic. Firstly because it adds a lot of redundancy, and MW has been moving in the opposite direction for a few years now with the introduction of the actor table and related fields. Secondarily because usernames are not stable identifiers--people get renamed all the time, and tracking that is possible but cumbersome. We should only store the central user ID, like in other places.

Yes, thanks I added it by mistake, will remove it

  • cewa_id: ID links the user to the article of the worklist

Would this be useful? Without an indication of how strong the link is (i.e., how much someone has contributed to a given article), we probably wouldn't want to use this for analytics purposes or anything.

I think this may be useful to get data like:

  • Which articles retuned this user
  • This user is present in how many articles on my worklist
  • cewa_created_at: timestamp
  • cewa_edited_at: timestamp
  • cewa_deleted_at: timestamp

How would these fields be used?

These ones are not need I will remove them

  • Do we want organizers to be able to edit a worklist?
  • Do we want organizers to be able to delete a worklist?
  • What other data would we like to store about the users by article, I mean for now we have the score, but do we want more data [...]

I agree that we need answers to these questions before moving forward.

@Daimona, @MHorsey-WMF The revised db schema would look like below:

  • ce_worklists:
    • cew_id: ID auto increment
    • cew_name: tinyblob (255 B) worklist name
    • cew_event_id: ID (nullable), if null: it means the organizer created it without an event registration on our tool.
    • cew_created_at: timestamp
    • cew_edited_at: timestamp
    • cew_deleted_at: timestamp
  • ce_worklists_articles:
    • cewa_id: ID auto increment
    • cewa_page_namespace: int
    • cewa_page_title: tinyblob (255 B)
    • cewa_page_wiki: tinyblob (64 B)
    • cew_id: ID that links the article with the worklist
  • ce_worklists_user_by_article (don't worry, we will choose a better name for this table later):
    • cewuba_id: ID auto increment
    • cewuba_user_id: int (central user id)
    • cew_id: ID link to ce_worklists table
    • cewa_id: ID links the user to the article of the worklist
    • cewuba_score: int
    • cewuba_edits_amount: int (extra, see the last question on this comment)
    • cewuba_reverts_amount: int (extra, see the last question on this comment)
  • cewa_id: ID links the user to the article of the worklist

Would this be useful? Without an indication of how strong the link is (i.e., how much someone has contributed to a given article), we probably wouldn't want to use this for analytics purposes or anything.

I think this may be useful to get data like:

  • Which articles retuned this user
  • This user is present in how many articles on my worklist

I get that, but as I was hinting above, I don't think it would be very useful without also storing how strong the user <-> page link is. The simple example is someone who contributed to lots of articles but only made really tiny edits (e.g., copyedits) to all of them. Saying "this person appeared in 123 articles" wouldn't really be significant for statistical purposes I assume.

@Daimona, @MHorsey-WMF The revised db schema would look like below: [...]

Seems OK at first glance, but I would also want to be reassured about the first two things I mentioned in my previous comment (need for structured storage, global vs local) before calling this done.

  • cewa_id: ID links the user to the article of the worklist

Would this be useful? Without an indication of how strong the link is (i.e., how much someone has contributed to a given article), we probably wouldn't want to use this for analytics purposes or anything.

I think this may be useful to get data like:

  • Which articles retuned this user
  • This user is present in how many articles on my worklist

I get that, but as I was hinting above, I don't think it would be very useful without also storing how strong the user <-> page link is. The simple example is someone who contributed to lots of articles but only made really tiny edits (e.g., copyedits) to all of them. Saying "this person appeared in 123 articles" wouldn't really be significant for statistical purposes I assume.

@Daimona, @MHorsey-WMF The revised db schema would look like below: [...]

Seems OK at first glance, but I would also want to be reassured about the first two things I mentioned in my previous comment (need for structured storage, global vs local) before calling this done.

Regarding the use of JSON blobs or structured storage, I think data analysis is something we want, based on what was shared about aggregate analytics, in order to be able to create reports like the ones below that will also be seen by organizers, if I understood correctly:

  • Last year, you organized three events, and 10% of the people you invited ended up joining your events.
  • This year, you organized three events, and 20% of the people you invited ended up joining your events.

Cc: @ifried

For the Global vs. Local topic, I also did not find where we add notes about it, but I remember we said it would be global. We talked about this at our weekly engineers meeting on May 14. Some related points I remember about the reasons why include:

  • Organizers being able to see all the worklists they have created on the wikis and at any wiki that has the CE extension enabled.
  • Creating a worklist with articles from different wikis, like creating an event invitation on meta adding articles from any wiki.
  • More flexibility to add other related global features.
  • We have all data global, so it makes sense to keep it global.
  • Easier to get reports from the DB since all data is in the same DB.

@MHorsey-WMF @VPuffetMichel Do you remember anything else?

Here are some of my early responses/ideas, but I would also love feedback from @Astinson, @Udehb-WMF, and @EUwandu-WMF on these questions regarding how we handle the worklists that are shared by organizers in order to generate their invitation lists:

Do we want organizers to be able to edit a worklist?

Yes. I think we can expect that some organizers will create a worklist, submit it to generate an invitation list, and then look at the invitation list and think, "This list is too short. I should add more articles to my worklist and try again" or "This list has lots of people who I would not invite. Maybe I should pick articles on more specific topics and try again." However, I think the most important thing is that they should be able to try again, whether it's using a new, updated worklist or editing an existing one. I think it's preferable to simply edit an existing worklist (rather than creating a new one), since a) we're not storing multiple worklists that are very similar, and b) we're not encouraging the organizer to generate a ton of different invitation lists per event (but, rather, to edit and improve one main invitation list).

I think trying again is the important part here -- how we represent that is really contingent on what is convenient from a software design perspective. @PWaigi-WMF FYI -- this might effect how we think about lists eventually.

Do we want organizers to be able to delete a worklist?

Maybe? I want more feedback on this from other folks. Basically, I think it could be useful to organizers to have the ability to delete worklists, in case they decided to radically redo or rebuid their worklist (or if there are use cases that I just can't think of, off-hand)... but I don't think it would be very common or very useful to allow deletion. Curious to hear what other people have to say, since I don't have a strong opinion on this one.

I don't think this needs to be a priority feature, and shouldn't be in the first priority of work.

What other data would we like to store about the users by article, I mean for now we have the score, but do we want more data?

Some things we may consider (but not sure if they would all be doable from a Trust & Safety or technical perspective)

  • How to contact them
    • If they can be emailed via wikimail
  • Their editing activity
    • Wikis they have an account on
    • Edit count on the wiki of the event
    • Global edit count
    • Link to contributions page on the wiki of the event
  • Public demographic information
    • How they prefer to be described (they/he/she)

^I wouldn't include this -- its only enabled in languages that are heavily determined by gender. I would assume gender agnostic.

  • Their interest in events and/or WikiProjects
    • Number of events they have attended/will attend (via event registration data)
    • Event pages for events they have attended/will attend (via event registration data)
    • Number of events they have organized/are organizing (via event registration data)
    • Event pages for events they have organized/are organizing (via event registration data)
    • If they're a member of any WikiProjects (perhaps by looking for a WikiProject userbox on their user page... which I understand may be hard to do since there is no standardized template for WikiProject userboxes, I think, but just sharing it as a potential idea...)

+ 1 to prior participation being important -- also another argument for expanding event registration for WikiProject registration (gives a sense of how engaged the person is in collective actions beyond events).

Regarding the use of JSON blobs or structured storage, I think data analysis is something we want, based on what was shared about aggregate analytics, in order to be able to create reports like the ones below that will also be seen by organizers, if I understood correctly:

It's not whether we want these reports or not, but whether JSON makes it more difficult to generate them. To give a concrete example: if the data is pulled from the database and then processed in say a python script, it shouldn't make any difference whether the input data is structured or not, as you can easily decode the JSON in the script itself.

Clearly there are other pros to using a structured format (e.g. scalability), but I think it's also important to weed out things that look like may be pros/cons but aren't.

For the Global vs. Local topic, I also did not find where we add notes about it, but I remember we said it would be global.

Thanks, that's also what I remember. Also the fact that if the data is stored centrally, it's easier to provide a global overview (i.e. remove the local-only restriction) if we want to.

On local vs. global, we want the invitation lists to be global, unless it is quite difficult from a technical perspective. In such a case, we would first implement local and then go for global later. However, it seems that global would not be much extra effort, if I understand correctly (good to confirm here though!). We discussed this in team meetings and in Slack.

Regarding the use of JSON blobs or structured storage, I think data analysis is something we want, based on what was shared about aggregate analytics, in order to be able to create reports like the ones below that will also be seen by organizers, if I understood correctly:

It's not whether we want these reports or not, but whether JSON makes it more difficult to generate them. To give a concrete example: if the data is pulled from the database and then processed in say a python script, it shouldn't make any difference whether the input data is structured or not, as you can easily decode the JSON in the script itself.

Yes, indeed, it is about whether JSON makes it more difficult to generate them or not.
Given the two scenarios I think we may need, I believe JSON will make it more difficult.

  1. Generating reports by pulling this data from the database: If it is JSON, pulling it from the database and processing the data in a script is doable, but it has its costs. If it is a structured database, querying it using SQL is easier. SQL queries are efficient and can be optimized with indexes for faster data retrieval.
  1. Generating reports to show to organizers on our extension: We may have an interface to show reports to organizers using this data. If it is JSON, we will need to get it from the database and process it, which is also doable, but as above, performing a SQL query is easier and more scalable and efficient.

So to summarize: We agreed that Global was fine.

(discussion with engineers)
As far as global vs local list of invitation lists, it is not difficult one way or another from an engineering point of view.
Those are design decisions so that the experience make sense from a user's point of view and is consistent.

ex:
displaying only invitations that Val created on fr.wiki while she is on fr.wiki.
display invitations lists from all wikis on metawiki only by default
Let Val decides if she wants to see invitation lists from all wikis or not.

It seems to me that we have spent enough time discussing the value of JSON vs SQL to take an informed decision. I would leave it to @cmelo to decide and proceed with his task as he owns it and has probably spent the most time thinking about this.

Thanks @VPuffetMichel.

This is my suggestion for the new DB schema to store the worklists data cc: @Iflorez.

I have created the implementation task as a draft T366354.

ce_worklists:

  • cew_id: ID auto increment
  • cew_name: tinyblob (255 B) worklist name
  • cew_event_id: ID (nullable), if null: it means the organizer created it without an event registration on our tool.
  • cew_status: int (1: processing, 2: done)
  • cew_created_at: timestamp

ce_worklists_articles:

  • cewa_id: ID auto increment
  • cewa_page_namespace: int
  • cewa_page_title: tinyblob (255 B)
  • cewa_page_wiki: tinyblob (64 B)
  • cew_id: ID that links the article with the worklist

ce_worklists_user_by_article:

  • cewuba_id: ID auto increment
  • cewuba_user_id: int (central user id)
  • cew_id: ID link to ce_worklists table
  • cewa_id: ID links the user to the article of the worklist
  • cewuba_score: int

I think we should start with this and add more data later if needed.

Regarding the other data below we may want (Thanks @ifried):

  1. How to contact them
    • 1 .1 If they can be emailed via wikimail
  1. Their editing activity
    • 2.1 Wikis they have an account on
    • 2.2 Edit count on the wiki of the event
    • 2.3 Global edit count
    • 2.4 Link to contributions page on the wiki of the event
  1. Public demographic information
    • 3.1 How they prefer to be described (they/he/she)
  1. Their interest in events and/or WikiProjects
    • 4.1 Number of events they have attended/will attend (via event registration data)
    • 4.2 Event pages for events they have attended/will attend (via event registration data)
    • 4.3 Number of events they have organized/are organizing (via event registration data)
    • 4.4 Event pages for events they have organized/are organizing (via event registration data)
    • 4.5 If they're a member of any WikiProjects (perhaps by looking for a WikiProject userbox on their user page... which I understand may be hard to do since there is no standardized template for WikiProject userboxes, I think, but just sharing it as a potential idea...)

I think data:

  • 1.*
  • 2.*
  • 4.1, 4.2, 4.3, 4.4

are doable, but not sure about 4.5 I would need to check

I think all this data we can get in real time when mounting the list of results to show, when the organizer opens the invitation list page, it can be done without the need to change the new schema we will create, because this data already exist in the DB, and we can implement them later.

Now that we have an implementation task and are focused on the implementation work, I am closing this task as Done.