Page MenuHomePhabricator

Investigation: Allow organizers to set group goals for events
Closed, ResolvedPublic

Description

NOTE: This task is just for the storing and handling of goal data. Progress bar will be a separate investigation: T407786.
User stories:

As an organizer, I want to be able to set group goals for my events, so that I can motivate participants to actively participate in the event as editors and so that I can have a sense of when my event is "done" and how can I report on the impact of my event.

Background:

We want to allow organizers to set goals for events, so that:

  • Participants can be motivated to join and participate with a concrete sense of what they are aiming to accomplish.
  • Organizers can report on event goals and outcomes to grant officers, organizing partners, and affiliated institutions.
  • All editors can have a better sense of the goals and impact of events and organized activities on the wikis.

To do this, we are imagining that organizers will first be able to set very basic goals, which means:

  • One goal per event
  • One progress bar for the goal

However, over time, this can be expanded and be made more complex, such as:

  • Multiple goals per event
  • Goals that can be made for groups or individuals
Acceptance Criteria:
  • Investigate how we can allow organizers to set goals for an event, which means:
    • Step 1 (short-term): Organizers can set 1 goal per event in the following format: [number] of [data point that we already collect in the Contributions tab, like new articles created].
      • Goals will be collectively shared by the whole group. So, any edits made for the event count toward the goal.
      • If a group exceeds the goal, we should still track how much they exceeded the goal (i.e., we shouldn't stop tracking when the goal is reached).
    • Step 2 (longer-term): Editors can set individual goals, like a personal challenge.
      • Editors will use the basic infrastructure of event registration to set a goal with a start end/date.
      • Perhaps they can choose if goal is open (so people can register and join) or closed (and therefore just for them). Closed goals will still be public.
    • Step 3: Organizers/editors can set more than 1 goal per event.
      • There will be one goal per event at first, but maybe later multiple: At first, we will probably only allow them to set 1 goal per event, for the sake of releasing a simplified first version. However, over time, I can imagine us allowing organizers to set a few goals (for example, a maximum of 3 goals), which can be tracked. For example, they could have a goal of creating 10 articles with at least 30 references added (if we later collect data on references).
      • We don't know how multiple goals will be stored (i.e., one goal with sub-goals or entirely separate goals). We will need to be flexible in how we think about it at this stage.
Design examples:

Organizer can set up a goal:

Screenshot 2025-10-29 at 1.54.59 PM.png (618×1 px, 91 KB)

Users can see progress against goal on various pages, such as the event page:

Screenshot 2025-10-29 at 1.55.31 PM.png (724×888 px, 309 KB)

Event Timeline

ifried updated the task description. (Show Details)

What kind of reports data would we like to have now or in the future, like:

  • Number of events that reaches their goals
  • Most used goal types
  • What are the type of goals people set for each type of events
  • What are the characteristics of the events that reach their goals

Are the event goals only numbers, or can we have goals that will not be a number.

In this example: have a goal of creating 10 articles with at least 30 references added
Are these 2 goals or one goal with 2 metrics?

cc: @ifried

These are great questions, and I'm also pinging @AJayadi-WMF, @SEgt-WMF, @Udehb-WMF to see what opinions they have.

Reports that I could see in the future could be:

  • Goal outcomes report (which events: reached goals vs. not reached goals)
  • Most common goals for events, separated by event types & outcomes (i.e., are the goals reached?)
  • Top performing organizers (i.e., which organizers have events that most often reach the goals) - but this is a maybe because it could provide too much incentive to do 'easy wins' rather than harder things
  • Goals reached by topic(s) (is this possible? maybe mapping article topics, as defined in LiftWing?)
  • Goals reached by wiki

Are the event goals only numbers, or can we have goals that will not be a number?: I can only think of numeric goals right now, and I think it may be best to just have numeric goals if we want a progress bar. The only "maybe" for not a number is the quality score of an article, but if we use the LiftWing quality score, that is also numeric (between 0 and 1).

In this example: have a goal of creating 10 articles with at least 30 references added. Are these 2 goals or one goal with 2 metrics?:

I am leaning toward having it as one goal because:

  1. I think it is better to have 1 progress bar than, for example, 5 progress bars. It is more visually simple and compelling. More of a sense of what people are trying to accomplish together.
  2. This would allow us to let organizers set more sub-goals. If we had multiple goals with separate progress bars, this would be so visually overwhelming that we would probably want to encourage people to only set a maximum of 3 goals. However, if it is 1 main goal with 1 progress bar, there is more flexibility.
  3. I think the sub-goals complement each other, such as: "I want to create 10 articles with 30 references, since the articles will be of better quality if they have references."

As for how we could visually represent the different progress between the sub-goals, @JFernandez-WMF and I have briefly talked about how they can perhaps be different colors and/or have a tooltip to show progress for each sub-goal.

Maybe we could have an optional goal setting especially for non-numerical goals?

I learned recently some organizers conduct surveys post-events to examine participants' experience of an event. So, if we could have an optional goal setting, organizers could add qualitative goals like "During my event, participants feel they are taught sufficient knowledge about Wikipedia and feel empowered to edit in the near future", then organizers can insert the summary of their survey results (if they do survey).

@AJayadi-WMF, this is a great insight about organizer goals related to survey satisfaction! My question is: Would we represent such data/goals same way? Our idea of goal-setting is to have a collective sense of goals for a group, which we can track and measure with generalized and reliable metrics (like number of edits, number of bytes, quality score, etc). For surveys, the questions and answers can really vary, so it can hard to set repeatable and measurable goals, and these are more internal goals for organizers rather than for the event overall. So, I think this is a great general insight, but it may be better suited for a project around survey support (which I'm not sure we would do). The alternative is for the organizers to manually input in the survey results, like you shared, but I think that could be prone to user error/less useful to add onto another interface.

But maybe there is something else that I am not thinking of OR other non-numeric goals that could be of use. If something comes up in your mind, I would love to hear!

@ifried thank you for the further clarification (also, very helpful to see the design examples). Indeed, re: surveys and/or its results are varied.

Side note: I chatted with Euphemia about this. She likes the "choose the type of goal" and "set a number" because, in the case of grant application, there are questions about target goals if the applicant(s) say they aim to contribute to X articles, etc. Also, she says the bar is interesting as it can be helpful to organizers to see their progress immediately.

Random notes from old team discussions:

  • Showing goals on event page can be problematic for caching (might consider hiding progress for logged-out users / cached page views)
  • Maybe consider storing unstructured targets (eg JSON) to support different future formats

General notes from looking into this:

  • It seems pretty clear that the best option for storage is a separate table, as that will let us implement multiple goals quite easily, and avoid adding to the main campaign_events table. In code, goals can still be a property of EventRegistration. This would be similar to wikis, topics, etc.
  • The table should only store the goal itself, not the progress towards it. This is because progress can be changed by multiple factors: for example, for a goal of number of edits: someone may add an edit, or remove an edit, or the page might be deleted (which will also remove the association), or the goal target itself might be changed by the organizer. So, progress would always be computed on-demand.
    • This should be relatively fast, at least in the initial version; not too dissimilar from the code that builds the output in Special:EventDetails: a single query using COUNT or SUM aggregates.
    • If needed for performance, maybe in a later version, we could add a simple layer of caching. However, this would only be useful if done aggressively, with unconditional caching for a short period of time (a few minutes) and no explicit purges. This might also be made explicit in the UI, saying that the progress shown might be outdated and will update in a few minutes. However, it would only protect against users rapidly viewing progress many times in a short time period. If we were to implement longer and less aggressive caching, we would need proper cache invalidation. But that, in turn, would require reacting to all the things that can change progress, which is exactly what I was trying to avoid by not storing the current progress in the database.
    • Not storing current progress also makes it easier to reason about continuing to track progress once the goal has been reached. Whether a goal has been reached is something we would only find out after pulling the metrics, so it shouldn't make any difference.
  • Where and how to show progress exactly is outside the scope of the investigation and will be looked into in T407786. I made sure that the task has a note about risks of CDN cache.
  • The option to show progress on the event page would also require a schema change to the campaign_events table. I'm not sure if that falls under the scope of this task though, and also, last I heard we weren't sure whether to include that option. I'm also unsure how it would scale to multiple goals (i.e., choose which ones to show); if we choose to leave it out, there'd be no issues.

  • Step 2 (longer-term): Editors can set individual goals, like a personal challenge.
    • Editors will use the basic infrastructure of event registration to set a goal with a start end/date.
    • Perhaps they can choose if goal is open (so people can register and join) or closed (and therefore just for them). Closed goals will still be public.

I don't fully understand how this would work, are these meant to be stand-alone goals unrelated to events? The "infrastructure" of event registration assumes that there is an event registration in the first place. Having individual goals that are not tied to any event, with their own start/end dates and other properties, and custom display rules (e.g., there is no "event page" where to show them) seems like would be better accomplished by an entirely separate system. I don't think we should try to fit into what we'll be developing for event goals, as there appear to be too many incompatibilities.


  • Step 3: Organizers/editors can set more than 1 goal per event.
    • There will be one goal per event at first, but maybe later multiple: At first, we will probably only allow them to set 1 goal per event, for the sake of releasing a simplified first version. However, over time, I can imagine us allowing organizers to set a few goals (for example, a maximum of 3 goals), which can be tracked. For example, they could have a goal of creating 10 articles with at least 30 references added (if we later collect data on references).
    • We don't know how multiple goals will be stored (i.e., one goal with sub-goals or entirely separate goals). We will need to be flexible in how we think about it at this stage.

Because this is for the long term, I am not focusing too much on it. However, there is one thing I would like to clarify straight away: when we say "10 articles, each with at least 30 references added", this is a single goal. It's not even a sub-goal, it's just an individual goal that tracks two separate metrics. We could talk of "multi-metric goal", as opposed to the "single-metric goals" we have now. Separately from that, we could also allow multiple goals, where each goal can track one or more metrics. Again, this is for the future, but this distinction will be very important if we start talking about it more seriously.

As for the initial implementation: if we store goals in a separate table as proposed above, il will be very easy to support multiple goals, so that won't be an issue. It gets a bit more interesting if we think about supporting multi-metric goals. If we only needed to support the initial format of a single-metric, we could model the table as storing tuples of (surrogate primary key, event ID, metric, target number). However, this has little flexibility, and it won't allow us to store multi-metric goals. It will generally not let us do anything more complex than "metric X is at least Y (number)".

Instead, one approach I've been thinking about is to store individual goals as an unstructured blob (using for example JSON). This would give us a lot of flexibility, letting us represent multi-metric goals with lists of conditions joined by logical operators. It would also potentially support more metric types, as well as other metadata that we may want to add in the future (like, for example, a version number if we decide to change the JSON schema; for now we could omit it, and any blob without a version number assumed to be v1). So, the DB schema could be (surrogate primary key, event ID, goal blob), and the blob might look something like this:

{
  "operator": "AND",
  "metrics": [
    {
      "metric": "metric_ID_1",
      "target": "target_num_1"
    },
    {
      "metric": "metric_ID_2",
      "target": "target_num_2"
    },
    {
      "operator": "OR",
      "metrics": [
        {
          "metric": "metric_ID_3",
          "target": "target_num_3"
        },
        {
          "metric": "metric_ID_4",
          "target": "target_num_4"
        }
      ]
    }
  ]
}

This would represent a goal of: "metric 1 reaches target 1, AND metric 2 reaches target 2, AND (metric 3 reaches target 3 OR metric 4 reaches target 4)". The exact syntax is an implementation detail and can be determined later. Also, while this gives us a lot of flexibility, we can also impose restraints, at least initially, such as: "no nested conditions", or "maximum 3 conditions", and things like that.

I think this approach would work best for the future and it's the one I recommend, however it's also worth pointing out its shortcomings:

  • Structured data cannot be queried easily. So, for example, we would not be able to (easily) implement filtering at the goal level, as in "find all goals targeting metric X", or "find all goals whose target number is greater than Y". Basically, the blob should be treated as a black box, and anything inside it cannot be queried/filtered if not by first reading all blobs, then parsing and interpreting them. On the other hand, I don't see any mentions of implementing such filtering, so I don't think this is a concern. Besides, even if we did want such filtering, we would still be fine as long as we only need it per-event. In that case, it would be easy enough to retrieve all goals for the event and parse them, especially if there's a (small) limit on the number of goals that an event can have; and such a limit will be in place as far as I can tell
  • Analytics will also have a harder time accessing the data, for the same reason (it needs to be parsed first). I don't know what our plans are w.r.t. analytics, and I'm not familiar with the software we use; however, this shouldn't be a problem as long as data collection can make use of an intermediate processing step (in basically any programming language), which I hope is not an unreasonable assumption. Even if this weren't the case though, I would still favour the blob approach, on the grounds that the needs of the application come before the needs of analytics.

Reports that I could see in the future could be:

  • Goal outcomes report (which events: reached goals vs. not reached goals)
  • Most common goals for events, separated by event types & outcomes (i.e., are the goals reached?)
  • Top performing organizers (i.e., which organizers have events that most often reach the goals) - but this is a maybe because it could provide too much incentive to do 'easy wins' rather than harder things
  • Goals reached by topic(s) (is this possible? maybe mapping article topics, as defined in LiftWing?)
  • Goals reached by wiki

None of these would be available with pure SQL in the schema above. For that to be the case, we would need to store current progress (and possibly also store structured goals, so no JSON). I recommended against both things above. Because there are no current plans to make these reports, I stand by my previous recommendations of doing what seems like would work best for the application. If we want these reports at a later date, we can think then about the implementation. For example, we could expose an API endpoint or, possibly even better, a maintenance script that runs on a cronjob and generates statistics that are then read and imported elsewhere. Once again, I'm not familiar with the analytics tooling and therefore I don't know what would work best. But still, this is for a hypothetical future, so I would not worry about it for now.


Brief note on non-numeric goals: it's already implied above, but the JSON format would support that seamlessly. However, I don't think this is something we should worry about at this time. If we want to display progress towards a goal, we need numbers that we can compare; and all the current proposed goals have those. Non-numeric goals would be an entirely different family of goals that would mot probably need changes elsewhere anyway (like to how progress is tracked and displayed), so we can perhaps discuss them later if the need arises.


Assuming that the above is agreed upon, we will need to create tasks for the following things. Note that new class names are just examples, and in general, this is an ordered list (things that come first are needed by things that come later); there's some leeway but it's beyond the point for this task and can be discussed separately if needed. Also, if we need a flag for "show progress on event page", it will need to be handled in all the steps below except for progress tracking and deployment stuff, but for simplicity I'm not mentioning it every time.

  • New database table (similar to e.g. T381424; and T402816 and subtask if we also need the flag)
  • Entity layer, add goals as a property of events (EventRegistration, a new EventGoal class)
  • Storage layer / DB code (EventStore; a new EventGoalsStore which also handles the JSON parsing)
  • Behaviour layer, goal validation (EventFactory)
  • Behaviour layer, progress tracking (a new GoalProgressTracker, probably outputs just a floating point percentage)
  • API layer (event registration POST, PUT, and GET endpoints)
  • UI layer for adding goals (SpecialEnableEventRegistration)
    • We'll also need tasks for the UI layer for showing goals, but those will come out of T407786 instead
  • Beta deployment
  • Prod deployment
  • Feature flag removal

Thanks for the detailed analysis — this all sounds good. I just have one concern regarding the JSON blob approach. Because of the limitations you mentioned, I would prefer *not* to store goals as JSON blob data.

For the reports mentioned, I believe those would be primarily internal and consumed through new Superset dashboards. That means having the data in a structured database format would make querying and visualization much easier. With structured tables, we can build dashboards quickly, and analytics will not need an intermediate processing layer just to parse blobs.

A structured approach also allows us to answer questions such as:

  • “Which goals target metric X?”
  • “Which goals have a target greater than Y?”
  • “How many events reached their goals per organizer/topic/wiki?”

All of these are significantly harder or not feasible when using JSON blobs that cannot be queried directly by SQL.

My proposal would be:

  • A main table for goals: ce_event_goals
  • A mapping table for metrics associated with goals: ce_event_goal_metrics

This structure allows:

  • Each goal to track one or multiple metrics (future multi-metric support)
  • Efficient reporting and aggregation for dashboards
  • Better maintainability and schema evolution
  • Easier progress tracking and analytics integration

Since the intention is to eventually present aggregated results (internally or externally), having structured data from the beginning will reduce future rework.

So for these reasons, I would strongly favor a structured relational schema over JSON blobs.

cc: @Daimona, @ifried, @MHorsey-WMF, @VPuffetMichel

For the reports mentioned, I believe those would be primarily internal and consumed through new Superset dashboards.

The problem I have with this is that it's not clear if or when such reports will ever exist, whereas we have pretty concrete plans for goal-setting and its upcoming iterations. As I mentioned above, I would consider analytics to be of secondary importance, and no different than any other client. If a choice needs to be made between something that works best for the application but not analytics, and another approach that works best for analytics but not for the application, I would choose the former. A compromise would be better of course, but that's not always possible or easy.

and analytics will not need an intermediate processing layer just to parse blobs.

I know, but I would consider this a limitation on the analytics side to be addressed there. I'm not keen on making the application worse just because our analytics software can't do JSON parsing.

My proposal would be:

  • A main table for goals: ce_event_goals
  • A mapping table for metrics associated with goals: ce_event_goal_metrics

How would that work? For example, how would it represent the JSON blob I put as an example in T407028#11402480?

Since the intention is to eventually present aggregated results (internally or externally), having structured data from the beginning will reduce future rework.

My argument is that having unstructured data will also reduce future rework, and I see it as a matter of where the rework needs to happen. And of the two, I favour the option in which it's the analytics side that needs to be reworked, rather than the application side.

Of course we could maybe look for compromises. The structured storage option could be particularly suitable if we impose some limits that we're reasonably sure we won't want to change later (things like: only allow AND, not OR, within a goal; have a maximum of X joined conditions; etc.).

One note on the reports, since I see this is becoming a big topic of the conversation: We have no immediate plans to generate reports, but if we do, I think some of them would be public (but probably not all of them). Ideally, the public reports would be available to all editors (not just organizers), so they can see the impact of the events (but we do not yet know the restrictions, since we have not developed any proposals for review by legal/t&s/security). As for the likelihood that we'll want to generate these reports: I would say medium likelihood. No sense of timeline yet - maybe next fiscal year, or later.

Also, regarding the topic of multiple goals: I have thought about this a bit more, and I think we should just have separate goals rather than sub-goals. In other words, if an event organizer wants to create 30 articles with 100 references and 20 images, then that should be 3 separate goals rather than 1 goal with 3 parts. This is because it will be easier to have a sense of accomplishment per each goal, and it will be less messy to understand the outcomes overall. I can imagine some organizers requesting sub-goals, so I'm not totally ruling it out, but at this point, it seems unnecessarily complex and confusing.

How would that work? For example, how would it represent the JSON blob I put as an example in T407028#11402480?

To represent what you added in the JSON we would need a more complex structure like below

NOTE: I am not saying that we should do it this way, JSON also sounds good to me if the product side is ok with its limitations

Main Metrics:

  • Can be combined with each other using configurable operators (AND or OR)
  • Each main metric can have associated sub-metrics
  • The operator between main metrics is defined by the ceegmm_operator field in each row, following the order determined by ceegmm_position

Sub Metrics:

  • Always related to a specific main metric
  • Can be combined with the main metric using configurable operators (AND or OR)
  • Can have multiple sub-metrics per main metric
  • The operator between sub-metrics is configurable

Data Example

The operators are defined in each row of the tables, following the order determined by the position field:

main_metrics (ordered by position):
    position: 1
    metric id: 1, // articles_created
    target: 10
    operator: OR  // combines with next metric (position 2)

    position: 2
    metric id: 2 // articles_edited
    target: 6
    operator: OR  // combines with next metric (position 3)

    position: 3
    metric id: 3 // words_added
    target: 7
    operator: NULL  // last metric, no next one

sub_metrics (ordered by position within each main_metric):
    position: 1
    sub_id: 1
    metric id: 4, // links_added
    main_metric_id: 1
    target: 8
    operator: NULL  // last sub-metric for this main metric

    position: 1
    sub_id: 2
    metric id: 5 // images_added
    main_metric_id: 2
    target: 9
    operator: NULL  // last sub-metric for this main metric

Resulting expression (following position order):

(10 articles_created AND 8 links_added) OR
(6 articles_edited AND 9 images_added) OR
(7 words_added)

Note: The combination would follows the sequence: metric[position=1] [operator] metric[position=2] [operator] metric[position=3]. Each row's operator defines how it combines with the next item in the sequence.

I know, but I would consider this a limitation on the analytics side to be addressed there. I'm not keen on making the application worse just because our analytics software can't do JSON parsing.

Yes, I know it would add extra complexity, as I said I am fine with the JSON approach if the product side is ok with its limitations.

One note on the reports, since I see this is becoming a big topic of the conversation: We have no immediate plans to generate reports, but if we do, I think some of them would be public (but probably not all of them). Ideally, the public reports would be available to all editors (not just organizers), so they can see the impact of the events (but we do not yet know the restrictions, since we have not developed any proposals for review by legal/t&s/security). As for the likelihood that we'll want to generate these reports: I would say medium likelihood. No sense of timeline yet - maybe next fiscal year, or later.

Just to make sure I understand this correctly, I imagine public reports would mean no superset, and therefore more flexibility as to the software used for reporting, right?

Also, regarding the topic of multiple goals: I have thought about this a bit more, and I think we should just have separate goals rather than sub-goals. In other words, if an event organizer wants to create 30 articles with 100 references and 20 images, then that should be 3 separate goals rather than 1 goal with 3 parts. This is because it will be easier to have a sense of accomplishment per each goal, and it will be less messy to understand the outcomes overall. I can imagine some organizers requesting sub-goals, so I'm not totally ruling it out, but at this point, it seems unnecessarily complex and confusing.

Well, if multi-metric goals (see terminology in my previous comment) are not needed, surely we can get away with a much simpler structure in pure SQL. However, it's important to note that the two seem to require fundamentally different data structures, so if we wanted to add support later, it would require a live data migration. Even if we choose not to expose it now, we can make it so that the database structure already supports multi-metric goals. (Also: FWIW, I agree with your point re complexity understanding the outcome, but I could also see multi-metrics being used for things like contests, where there might be rules such as "edits with at least 1 citation added" etc.).

To represent what you added in the JSON we would need a more complex structure like below

I see, I suppose something like that would work, although it looks very complex. Maybe it would be possible to simplify by putting all goals in disjunctive normal form (for example), so we don't have to store operators; it still looks quite complex, but maybe that's just me not having a clear enough picture.

I see, I suppose something like that would work, although it looks very complex. Maybe it would be possible to simplify by putting all goals in disjunctive normal form (for example), so we don't have to store operators; it still looks quite complex, but maybe that's just me not having a clear enough picture.

Yeah, it is a little bit complex, but if we decide to use structured tables, I think it will be worth it. That said, if we go in this direction, we will need to refine the structure — the one I sent was just to demonstrate the feasibility.

If we do not plan to have Superset dashboards for this data, I am totally fine with using a JSON blob, which would be easier.

Update: following discussion with @cmelo, it seems possible to preprocess the JSON before building the superset graph. Therefore, since that was the only reason against JSON, we decided to proceed with the JSON approach. Implementations tasks are being created following the rough list at the end of T407028#11402480.

We are now building out the epic based on the findings from this investigation, and thank you to everyone who took part in the discussion for this investigation! For this reason, I am marking this ticket as done.