Page MenuHomePhabricator

Sunset ApiFeatureUsage
Closed, DeclinedPublic

Description

As a search dev, I want to stop supporting (and remove) ApiFeatureUsage, as it is infrequently used service that requires disproportionate work to maintain, so that I can upgrade ElasticSearch to 7.10 more cleanly.

AC:

  • ApiFeatureUsage is removed

Event Timeline

@MPhamWMF: Assuming this is about the ApiFeatureUsage extension at https://www.mediawiki.org/wiki/Extension:ApiFeatureUsage currently deployed on Wikimedia wikis. Is this about undeploying and sunsetting? (cf T294329)

@Aklapper I'm not sure I understand the differentiation, if any, between undeploying and sunsetting (vs decommissioning vs killing)? My team spends a lot of time and effort maintaining this extension everytime we need to update our software, and it seems to be barely used last time we checked, and no other product or tech teams have responded about this extension being useful/valuable to them.

oops, my bad. I think i misread something, and there is no difference between undeploying and sunsetting (too many terms to keep up with sometimes)

MPhamWMF renamed this task from Decommission ApiFeatureUsage to Sunset ApiFeatureUsage.Feb 15 2022, 11:37 PM
MPhamWMF updated the task description. (Show Details)

I was curious:

My question is: Is this worth it? Which of these is critical and would hurt us if we couldn't access it any more via Special:ApiFeatureUsage or the featureusage API? I mean, the data is still there, even without the Elastic store and the extension, as far as I got it.

My question is: Is this worth it? Which of these is critical and would hurt us if we couldn't access it any more via Special:ApiFeatureUsage or the featureusage API? I mean, the data is still there, even without the Elastic store and the extension, as far as I got it.

The special page lets normal API users find deprecation errors related to their User-Agent.

@Reedy , is there a reason this ticket was declined?

@Reedy , is there a reason this ticket was declined?

Yeah. T302638: Sunset ApiFeatureUsage (TDMP) was declined because it was "Withdrawn from Process"; in this case the Tech Decision Forum.

If it doesn't go through the right processes, how can it be expected to be approved?

The ticket was declined because @SWakiyama cleared my team to move forward with this work without pushing it through the rest of the Tech Decision process.
I was under the understanding that this unblocked this work for the search team.

The ticket was declined because @SWakiyama cleared my team to move forward with this work without pushing it through the rest of the Tech Decision process.

There was no comment left to that effect on either this or that ticket.

How would anyone else necessarily know this?

There's also various unanswered questions such as those from Cole in T302638#7811894.

Nor the fact of what we do about the current people who actually use this.

Sorry for all the confusion. This process is new to me, and I don't yet fully understand how it works and what communication practices/norms are.

There was no comment left to that effect on either this or that ticket.
How would anyone else necessarily know this?

I was in communcation with Linh via email about the process and let him know that we were moving ahead with this work. I wasn't personally aware that this was being tracked in the phab ticket in a certain way (I'm not on Phabricator that much for my day to day PM work, and don't really use it to track all my personal work). I incorrectly assumed that the ticket was closed in an appropriate manner.

There's also various unanswered questions such as those from Cole in T302638#7811894.

I do not have the technical answers to these implementation questions. Perhaps @Gehel or @dcausse know more than I do.

Nor the fact of what we do about the current people who actually use this.

Current people using this feature will be notified in advance that this extension will no longer be usable for the foreseeable future until it is able to be rebuilt/redeployed without the Elasticsearch dependency. Search is not providing any workarounds in the meantime, and will not own this feature.

Reposting my questions that were ignored on T302638: Sunset ApiFeatureUsage (TDMP).

To summarize, there isn't major blocking issues around ApiFeatureUsage, but an ongoing level of small annoyances and additional work required to support it (and with no clear ownership, this falls to the Search Platform & Observability teams by default).

My immediate goal is to streamline the Search team's product portfolio and focus, and this feature does not fall within that scope.

I appreciate the desire by both of you to protect the Search team from toil and ownership of abandonware. I also think that this ticket was a successful tactic for finding out if anyone at all cares about Extension:ApiFeatureUsage.

Some additional guidance is needed to help all of us who might want to try and keep this functionality from disappearing even if we cannot convince the Foundation that the Action API is part of their project mandate. Specifically we need to know what if any conditions must be met to continue to use Elasticsearch as the backing store for the (timestamp, feature, agent) tuples that are collected. Would breaking the (admittedly hacky and my fault as the original implementor) inter-dependency between the ELK cluster and the indices be enough? If not, what if any other technical and social corrections would be needed to continue to use consolidated full-text search infrastructure for this project?

Some additional guidance is needed to help all of us who might want to try and keep this functionality from disappearing even if we cannot convince the Foundation that the Action API is part of their project mandate. Specifically we need to know what if any conditions must be met to continue to use Elasticsearch as the backing store for the (timestamp, feature, agent) tuples that are collected. Would breaking the (admittedly hacky and my fault as the original implementor) inter-dependency between the ELK cluster and the indices be enough? If not, what if any other technical and social corrections would be needed to continue to use consolidated full-text search infrastructure for this project?

This might not fully answer your question but I'll try :)
Based on the assumption that this feature is not vital to bot owners but rather to engineers working on the WMF infra willing to help bot owners change their tool because of perf issue or params deprecation:
Would using the hive table event.mediawiki_api_request sufficient for this purpose?

SELECT performer.user_text as user_text, count(*) as cnt
FROM event.mediawiki_api_request 
WHERE params["list"]="allpages" AND year=2022 AND month=5 and day=19
GROUP BY performer.user_text ORDER by cnt DESC LIMIT 10;

Obviously this does not allow users without access to hive to run it but might help in some circumstances by creating a list of bots that need to be updated.

Stepping back I wonder how this feature would look like if we had to design it today:

  • first I'm not entirely convinced that a search backend is the right option for this, none of the fields seem tokenized and quickly looking at the code the only feature it requires are range query on a date field and aggregation on a string field
  • It also seems to bypass the modern event platform which enforces a schema with using data from the logging platform which currently allows schema-less data ingestion

So perhaps I'd consider using the existing mediawiki_api_request and have a job that transforms this data and ingest it into a cassandra db with a small serving layer on top.
That said the ultimate goal here is to simplify things not to create new work :)

If we break the inter-dependency between the ELK cluster and the search cluster we would be in a better shape already but this might not be enough.
Constraints on the search side are during version upgrades and maintenance operations and sensitivity of the data:

  • all writes must be compatible with both versions (as we upgrade one DC at a time)
  • the MW extension code must be maintained when we upgrade the Elastica library
  • the indices must be monitored to make sure that the tools that purge them are properly running
  • size of the indices must be monitored to ensure they do not grow out of control
  • we should probably switch non-dynamic templates to avoid mapping explosion
  • having sensitive traffic data like the user-agent hosted on the search cluster without going through data-engineering checks is not something I'm very comfortable with

These are one of the points we would like to stop worrying about.

For social interaction we already manage non-search indices on the cluster, we generally file a task addressed to the index owner prior to important maintenance operations but we can't obviously accept too many different kind of indices, as of today we have:

  • search indices owned by the search team
  • translation memory indices owned by the language team
  • toolhub owned by the technical engagement team
  • api-feature-usage without clear maintainers

So I guess that if a team would accept to own this extension and data this would be a step in the right direction?