Rewrite Wishlist Survey bot
Open, HighPublic
Actions

Assigned To

Authored By

	MusikAnimal
	Mar 26 2024, 9:28 PM

Description

Background

The WishlistSurvey task that has powered the Community Wishlist Surveys from 2016 to 2023 badly needs a rewrite. It is essentially 7 years of hacks built on top of each other and has a number of unnecessary dependencies, limitations, and other issues including but not limited to:

Written in Ruby (not widely used by Community Tech)
Must live on the musikbot tool on Toolforge
Requires manual runs to take advantage of various scripts
etc.

The bot should be rewritten so that people other than just MusikAnimal can maintain it, too.

Acceptance criteria

It should be written in Node.js, sharing some logic with the intake form such as parsing wish content
It should do all the things the old bot did, but match the new designs

More requirements and/or subtasks TBD as we get further into the project

Details

Title	Reference	Author	Source Branch	Dest Branch
Allow the bot to run continuously	repos/commtech/wishlist-intake!78	tstarling	work/tstarling/continuous-bot	main
Have the bot make a table of recent wishes for the dashboard	repos/commtech/wishlist-intake!72	tstarling	work/tstarling/recent-wishes-bot	main
update-indexes: RC and caching	repos/commtech/wishlist-intake!48	tstarling	work/tstarling/update-indexes	main
OOP CLI commands	repos/commtech/wishlist-intake!35	tstarling	work/tstarling/cli-refactor	main
Add more wishlist pages, templates and basic bot run	repos/commtech/wishlist-intake!22	musikanimal	templates	main
Add basic entrypoint for the bot	repos/commtech/wishlist-intake!20	musikanimal	bot	main
Use wiki page when editing wishes, add centralized config and Util class	repos/commtech/wishlist-intake!17	musikanimal	util	main

Customize query in GitLab

Related Objects

Mentioned In: T363236: Wish index page
T363374: [EPIC] Technical plan for Wishlist Survey
T362761: Implement all form fields, including saving contents
T361069: Make SurveyCop bot more robust and possibly agnostic to the Survey
Mentioned Here: T365471: Wish form localization
T348407: Allow Quarry to query ToolsDB public databases

Event Timeline

MusikAnimal created this task.Mar 26 2024, 9:28 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 26 2024, 9:28 PM

MusikAnimal mentioned this in T361069: Make SurveyCop bot more robust and possibly agnostic to the Survey.Mar 26 2024, 9:51 PM

MusikAnimal updated the task description. (Show Details)Mar 26 2024, 9:54 PM

JWheeler-WMF added a project: Community-Tech.Apr 15 2024, 3:07 PM

JWheeler-WMF moved this task from New & TBD Tickets to Up Next (June 3-21) on the Community-Tech board.

MusikAnimal mentioned this in T362761: Implement all form fields, including saving contents.Apr 17 2024, 5:35 PM

JWheeler-WMF moved this task from Up Next (June 3-21) to On deck (June 25-July 5) on the Community-Tech board.Apr 18 2024, 5:29 PM

JWheeler-WMF moved this task from On deck (June 25-July 5) to Up Next (June 3-21) on the Community-Tech board.Apr 24 2024, 4:14 PM

MusikAnimal mentioned this in T363374: [EPIC] Technical plan for Wishlist Survey.Apr 24 2024, 6:29 PM

JWheeler-WMF triaged this task as High priority.May 10 2024, 6:04 PM

musikanimal opened https://gitlab.wikimedia.org/repos/commtech/wishlist-intake/-/merge_requests/17

Use wiki page when editing wishes, add centralized config and Util class

musikanimal merged https://gitlab.wikimedia.org/repos/commtech/wishlist-intake/-/merge_requests/17

Use wiki page when editing wishes, add centralized config and Util class

Maintenance_bot removed a project: Patch-For-Review.May 12 2024, 9:31 PM

MusikAnimal updated the task description. (Show Details)May 13 2024, 12:40 AM

MusikAnimal updated the task description. (Show Details)

musikanimal opened https://gitlab.wikimedia.org/repos/commtech/wishlist-intake/-/merge_requests/20

Draft: Add basic entrypoint for the bot

musikanimal merged https://gitlab.wikimedia.org/repos/commtech/wishlist-intake/-/merge_requests/20

Add basic entrypoint for the bot

Maintenance_bot removed a project: Patch-For-Review.May 13 2024, 3:31 PM

Proposed implementation plan:

Use sqlite3 for data persistence, using the filesystem to avoid the need for an SSH tunnel to a Cloud Services database (or local MySQL-like dependency)
Bot runs every 10 minutes, and we keep track of the last run in the db (because sometimes Toolforge jobs fail)
On each run, it loops through all wishes, finding those that have been edited since the last run
Parse each of the new proposals, updating the db
Update each of the index pages by running the appropriate query and building the output
- Maybe have a placeholder i.e <div id="wishlist-intake-wish-table"> so the bot knows where it should do its editing.
- Utilize on-wiki templates as much as possible. I.e. the "Focus area" cards or each row in the "Wishes" table would be a template.

My plan originally was to save database stuff for later, but given how the index pages are laid out, it's almost easier to put in a db rather than do all the sorting and what not in JavaScript.

Going a step further, maybe the gadget should run the queries and update the index pages? This would require use having a Vite server for the backend with a simple CORS compliant API. That's more work, but moves presentation logic to the frontend where it belongs (as Vue templates).

Thoughts? cc @tstarling

MusikAnimal mentioned this in T363236: Wish index page.May 13 2024, 3:59 PM

musikanimal opened https://gitlab.wikimedia.org/repos/commtech/wishlist-intake/-/merge_requests/22

Draft: Add more wishlist pages, templates and basic bot run

In T361067#9790933, @MusikAnimal wrote:

Proposed implementation plan:

Use sqlite3 for data persistence, using the filesystem to avoid the need for an SSH tunnel to a Cloud Services database (or local MySQL-like dependency)

I would prefer MySQL over SQLite+NFS. I agree that it's good to avoid the need for an SSH tunnel, but a local database with a few cache tables and no wiki views should be easy enough to set up. The SQLite FAQ warns against running it over NFS, since if two copies of the job are somehow run, the database file could be permanently corrupted, requiring human intervention.

Going a step further, maybe the gadget should run the queries and update the index pages? This would require use having a Vite server for the backend with a simple CORS compliant API. That's more work, but moves presentation logic to the frontend where it belongs (as Vue templates).

I thought we were going to allow changes to the proposal pages that were not done by the gadget. Like API and no-JS edits.

In T361067#9793116, @tstarling wrote:

I would prefer MySQL over SQLite+NFS. I agree that it's good to avoid the need for an SSH tunnel, but a local database with a few cache tables and no wiki views should be easy enough to set up. The SQLite FAQ warns against running it over NFS, since if two copies of the job are somehow run, the database file could be permanently corrupted, requiring human intervention.

Great point about NFS! That makes SQLite pretty much a no-go. Unless we jump the gun and go for VPS? There's zero need for that with a simple Node.js bot, but if we start offering an API for the gadget to read from (see below), then we might eventually stretch our limits on Toolforge and have to move to VPS – as is the story of many CommTech tools.

I'm fine with MySQL/MariaDB too if that makes more sense, and that's VPS future-proof as well. In credentials.json, each entry could have their own respective db credentials, such that you could always use tools-db if you wanted, etc. Sounds nice!

Going a step further, maybe the gadget should run the queries and update the index pages? This would require use having a Vite server for the backend with a simple CORS compliant API. That's more work, but moves presentation logic to the frontend where it belongs (as Vue templates).

I thought we were going to allow changes to the proposal pages that were not done by the gadget. Like API and no-JS edits.

I meant more for the "index" pages (i.e. the dashboard), not individual proposals. Manual edits to proposals are exactly why we need a bot! What I'm questioning I guess is the time investment in making an API that the gadget can read from when it builds the index pages. We can certainly save that effort for later, at least.

The index pages are going to be one per focus area aren't they? And the focus areas will be dynamically definable by e.g. creating a page with a {{Community Wishlist/Focus area}} template (with other metadata such as descriptions)? All of which is pretty much like the old categories system.

And that all sounds great for focus areas because I assume we'll aim to never have more than a dozen proposals in each or whatever, but will we at some point be ending up with index pages (e.g. the list of focus areas) that are too large for a single page? In that situation, do we build a manual pagination system (akin to archive pages) that the bot can manage?

We had talked about having a dynamic index page, that looks up the data live from a toolforge API, but I'm guessing that's out of scope now?

(Sorry, maybe that's all just premature optimisation for when there are a thousand proposals!)

Using MariaDB sounds good, with the connection config living alongside the wiki API details in credentials.json, sounds great. Another advantage of having the data in a toolsdb is that (after T348407) it'll be quereable in Quarry (or maybe Superset soon; I thought there was a Community Wishlist Survey task for that but I can't find it now).

In T361067#9793691, @Samwilson wrote:

The index pages are going to be one per focus area aren't they? And the focus areas will be dynamically definable by e.g. creating a page with a {{Community Wishlist/Focus area}} template (with other metadata such as descriptions)? All of which is pretty much like the old categories system.

That sounds correct, but FYI by "index page" I also refer to the "Wishes" page that lists all wishes.

And that all sounds great for focus areas because I assume we'll aim to never have more than a dozen proposals in each or whatever, but will we at some point be ending up with index pages (e.g. the list of focus areas) that are too large for a single page? In that situation, do we build a manual pagination system (akin to archive pages) that the bot can manage?

TBD. For now I say we just throw it all on one page; it will be a while before it becomes a problem.

We had talked about having a dynamic index page, that looks up the data live from a toolforge API, but I'm guessing that's out of scope now?

If we're storing the data in MariaDB, then it's not too far off to go ahead and make a lightweight server. I am looking into using Vite for this, which would also replace Rollup and possibly also the need for the dedicated Node server. If that all works out, it should be relatively simple to add an Express API app to serve the data to wishlist intake.

JWheeler-WMF moved this task from Up Next (June 3-21) to CommTech-Kanban on the Community-Tech board.May 15 2024, 4:23 PM

JWheeler-WMF edited projects, added Community-Tech (CommTech-Kanban); removed Community-Tech.

tstarling merged https://gitlab.wikimedia.org/repos/commtech/wishlist-intake/-/merge_requests/22

Add more wishlist pages, templates and basic bot run

Maintenance_bot removed a project: Patch-For-Review.May 16 2024, 1:30 AM

I suggest renaming bot.js to update-indexes.js.

Also, if it was up to me, it would all be OOP, I wouldn't put the bulk of the logic in the file scope. I'd just have a couple of lines of boilerplate in bin/ and reorganise src/ so that the bot-related classes can go there. Reorganising is going to be painful when we have 4 people editing the same files every day, but maybe it's not too bad if we aggressively merge all outstanding work into the main branch. Keep in mind that the main branch doesn't have to be production quality at this stage.

Regarding the main query: currently the script uses generator=categorymembers feeding into prop=revisions. It would be possible instead to use generator=recentchanges feeding into prop=categories with clcategories=Category:Community_wishlist. Meta gets about 7 RC rows per minute in total, so if you run the bot once per minute, generator=recentchanges becomes the more efficient query strategy once you have more than 7 wishes.

However, generator=recentchanges requires some extra work to process removal from a category, page moves and deletion.

Either way, you need to follow up with a prop=revisions query to get the changed page text.

I would lean towards regenerating the whole index page each time the metadata suggests there will be a change in it, rather than trying to identify and preserve unchanged wikitext.

In the cache DB I imagine there will be the broken-down template parameters from WishlistTemplate.getData(), and also any necessary metadata such as the latest timestamp.

tstarling updated https://gitlab.wikimedia.org/repos/commtech/wishlist-intake/-/merge_requests/35

OOP CLI commands

samwilson merged https://gitlab.wikimedia.org/repos/commtech/wishlist-intake/-/merge_requests/35

OOP CLI commands

MusikAnimal assigned this task to tstarling.May 21 2024, 5:39 PM

MusikAnimal moved this task from Ready 🎬 to In Development 💻 on the Community-Tech (CommTech-Kanban) board.

tstarling updated https://gitlab.wikimedia.org/repos/commtech/wishlist-intake/-/merge_requests/48

Draft: update-indexes: RC and caching

musikanimal merged https://gitlab.wikimedia.org/repos/commtech/wishlist-intake/-/merge_requests/48

update-indexes: RC and caching

tstarling moved this task from In Development 💻 to Ready 🎬 on the Community-Tech (CommTech-Kanban) board.Wed, May 29, 10:08 AM

JWheeler-WMF edited projects, added Community-Tech (June 3-14 2024 Arctic Fox); removed Community-Tech (CommTech-Kanban).Mon, Jun 3, 6:57 PM

Restricted Application edited projects, added Community-Tech; removed Community-Tech (June 3-14 2024 Arctic Fox). · View Herald TranscriptMon, Jun 3, 6:57 PM

JWheeler-WMF moved this task from New & TBD Tickets to June 3-14 2024 Arctic Fox on the Community-Tech board.Mon, Jun 3, 8:59 PM

JWheeler-WMF edited projects, added Community-Tech (June 3-14 2024 Arctic Fox); removed Community-Tech.

With !59 I'm proposing the following for bot automated pages:

Template:Community Requests/Wishes/All – Contains rows for all wishes (bot already does this)
Template:Community Requests/Wishes/Recent – last N proposals sorted by date
Template:Community Requests/Focus areas – Contains overview cards of all focus areas, populated with templates similar to what we do for Wishes/All
Template:Community Requests/Focus areas/Num_votes – Contains the number of votes across all focus area. This is needed in the i18n messages as it seems they're likely to live on-wiki at least initially (T365471)
Template:Community Requests/Focus areas/<focus-area-slug> – Contains rows for wishes in a focus area.
Template:Community Requests/Focus areas/<focus-area-slug>/Num_votes – Contains the number of votes in a focus area, again for us in i18n messages.

Random tech debt thing -- we have a few terms for whatever the subpage name for a wish/focus area is. Is slug a good name and if so maybe the db and backend things could be renamed? Or vice versa :)

In T361067#9858226, @MusikAnimal wrote:

With !59 I'm proposing the following for bot automated pages:

I should mention, the other option of course is build a wrapper API and let the gadget talk to it. But between CSP warnings and whatever else I'm assuming the old fashioned way will do fine enough for now.

In T361067#9858226, @MusikAnimal wrote:

Random tech debt thing -- we have a few terms for whatever the subpage name for a wish/focus area is. Is slug a good name and if so maybe the db and backend things could be renamed? Or vice versa :)

Is this WordPress jargon? To be honest I had to look it up. In the template we have "area" which is in the database as wish_area. I would just call it the area name. The URL form of the area name would have underscores and percent-encoding per normal MW conventions. "Focus" seems undecided since the renaming page calls them "problem areas" -- that's the same thing right?

In T361067#9882691, @tstarling wrote:

Random tech debt thing -- we have a few terms for whatever the subpage name for a wish/focus area is. Is slug a good name and if so maybe the db and backend things could be renamed? Or vice versa :)

Is this WordPress jargon? To be honest I had to look it up. In the template we have "area" which is in the database as wish_area. I would just call it the area name. The URL form of the area name would have underscores and percent-encoding per normal MW conventions. "Focus" seems undecided since the renaming page calls them "problem areas" -- that's the same thing right?

Eh I don't know, surprised you hadn't heard of it! I've used the term extensively over the years, and I've never worked on anything WordPress.

Anyway, by "slug" I just mean the unique identifier in the URL path. One thing we're doing from previous surveys is we're not moving pages when wishes (or focus areas) are renamed. Instead, there's a permanent page title. The subpage name of that is what I'm referring to by "slug". So the "area" name in that case would need to be the subpage name, and not the "area" title. I just figured calling it "area_slug" or something (and likewise for the wish table) would keep things consistent, but it doesn't really matter much.