Page MenuHomePhabricator

Positive Reinforcement: Technical Research Spike
Closed, ResolvedPublic

Description

As a Product Manager, I want to know if there are any technical roadblocks in the Positive Reinforcement project, because I want the Growth team to get started on this project soon.

The Growth team started a team discussion to review Positive Reinforcement ideas, and identify and secure dependencies on other teams and Technology. One idea that needs further investigation and planning is:

We are going to store a bunch of user-specific data that is normally public, because we don’t want to calculate this stuff on the fly. We could further utilize user_properties, but it would make sense as a generic capability: user storage in Mediawiki.

Acceptance Criteria:
Recommend next steps and answer questions:

  • Do we have any dependencies on other teams for Positive Reinforcement work? (No)
  • Will we build something temporary? (We will use MySQL table for storage. I am not sure if that is temporary or not.)
    • If we build a temporary solution, should we involve Platform in this early planning phase? (T310253 informed the Platform team; we may switch to that some point in the future.)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
kostajh triaged this task as High priority.Jun 2 2022, 7:37 AM
kostajh edited projects, added Growth-Team (Current Sprint); removed Growth-Team.

I started a document for discussion.

I think the main open question is if the additional database table, its proposed columns, and usage is acceptable to DBA, so I'm tagging you all to ask for comments specifically on the "Storage" section of the document.

Generally I don't see an issue with the proposal, what has been bothering me about it is the daily cron and maint script and the general plan of migrating to the data gateway while directly using it inside mediawiki.

Let me explain it further. What we currently have is links recommendation system that seems to be decoupled from mediawiki's database and looks nice and all ("modern") but actually, it depends on the core's database. We have a cron that daily updates those values from the service and put it in mediawiki's database which basically defies the whole point of having them in the first place and have been causing a lot of maintenance headache for DBAs doing maintenance of mediawiki databases: T299021: Reduce running time of refreshLinkRecommendations.php to a maximum of 60 minutes (including prolonging several schema changes because the first run failed, etc.)

I was told that these Frankenstein system is temporary but it has been temporary for way too long and I don't think it's wise to add even more cases to it. We should either decide and get it delivered before adding more or design it without keeping data gateway in mind.

Generally I don't see an issue with the proposal, what has been bothering me about it is the daily cron and maint script and the general plan of migrating to the data gateway while directly using it inside mediawiki.

Let me explain it further. What we currently have is links recommendation system that seems to be decoupled from mediawiki's database and looks nice and all ("modern") but actually, it depends on the core's database. We have a cron that daily updates those values from the service and put it in mediawiki's database which basically defies the whole point of having them in the first place and have been causing a lot of maintenance headache for DBAs doing maintenance of mediawiki databases: T299021: Reduce running time of refreshLinkRecommendations.php to a maximum of 60 minutes (including prolonging several schema changes because the first run failed, etc.)

FWIW, we consulted with SRE + DBA teams when planning the Add Link project (see Add_Link#Updates) when figuring out the architecture. Moving the caching layer into the Link Recommendation service (as opposed to MW tables) is perfectly fine, but a separate discussion. If you'd like for us to consider that, let's make a task for it.

I was told that these Frankenstein system is temporary but it has been temporary for way too long and I don't think it's wise to add even more cases to it. We should either decide and get it delivered before adding more or design it without keeping data gateway in mind.

This sounds like a separate concern, although there are some similarities in the setup proposed. The main difference is that if we use the data gateway, we will use that exclusively for storage and not make any writes to MediaWiki database tables.

Generally I don't see an issue with the proposal, what has been bothering me about it is the daily cron and maint script and the general plan of migrating to the data gateway while directly using it inside mediawiki.

Let me explain it further. What we currently have is links recommendation system that seems to be decoupled from mediawiki's database and looks nice and all ("modern") but actually, it depends on the core's database. We have a cron that daily updates those values from the service and put it in mediawiki's database which basically defies the whole point of having them in the first place and have been causing a lot of maintenance headache for DBAs doing maintenance of mediawiki databases: T299021: Reduce running time of refreshLinkRecommendations.php to a maximum of 60 minutes (including prolonging several schema changes because the first run failed, etc.)

FWIW, we consulted with SRE + DBA teams when planning the Add Link project (see Add_Link#Updates) when figuring out the architecture. Moving the caching layer into the Link Recommendation service (as opposed to MW tables) is perfectly fine, but a separate discussion. If you'd like for us to consider that, let's make a task for it.

There is no issues with storage capacity of that data, the update pattern seems to be problematic here. I don't mind whether we fix the current setup or completely throw it away in favor of memcached. Whatever works for you.

I was told that these Frankenstein system is temporary but it has been temporary for way too long and I don't think it's wise to add even more cases to it. We should either decide and get it delivered before adding more or design it without keeping data gateway in mind.

This sounds like a separate concern, although there are some similarities in the setup proposed. The main difference is that if we use the data gateway, we will use that exclusively for storage and not make any writes to MediaWiki database tables.

Yes. My thinking is that we don't end up in the same situation again. As long as you think we won't. I'm happy with it.

Just a quick update that I will update this task (or new tasks) as well as the architecture doc sometime tomorrow.