Sunset ApiFeatureUsage
Closed, DeclinedPublic
Actions

Assigned To

None

Authored By

	MPhamWMF
	Feb 14 2022, 8:33 PM

Description

As a search dev, I want to stop supporting (and remove) ApiFeatureUsage, as it is infrequently used service that requires disproportionate work to maintain, so that I can upgrade ElasticSearch to 7.10 more cleanly.

AC:

ApiFeatureUsage is removed

Related Objects
Search...

Status	Assigned	Task
Resolved	None	T248925 Make MediaWiki release tarball compatible with PHP 8.0
Resolved	Jdforrester-WMF	T300463 Make PHP 8.0 voting on MW master
Resolved	None	T283275 Make MW master tests pass on PHP 8.0
Resolved	Reedy	T268861 CirrusSearch uses Elastica's Match class
Resolved	Reedy	T268863 Translate uses Elastica's Match class
Resolved	matthiasmullie	T268866 WikibaseMediaInfo uses Elastica's Match class
Invalid	None	T268864 WikibaseCirrusSearch uses Elastica's Match class
Resolved	Reedy	T268865 WikibaseLexemeCirrusSearch uses Elastica's Match class
Resolved	EBernhardson	T271777 Bump rufin/elastica (and related libraries) to versions that support PHP 8.0
Resolved	Gehel	T263142 [EPIC] Upgrade Elasticsearch to version 7.10
Declined	None	T301724 Sunset ApiFeatureUsage
Declined	None	T302638 Sunset ApiFeatureUsage (TDMP)

Event Timeline

MPhamWMF created this task.Feb 14 2022, 8:33 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 14 2022, 8:33 PM

@MPhamWMF: Assuming this is about the ApiFeatureUsage extension at https://www.mediawiki.org/wiki/Extension:ApiFeatureUsage currently deployed on Wikimedia wikis. Is this about undeploying and sunsetting? (cf T294329)

https://www.mediawiki.org/wiki/Developers/Maintainers lists "Core Platform Team" as code stewards.

MPhamWMF added a parent task: T263142: [EPIC] Upgrade Elasticsearch to version 7.10.Feb 14 2022, 11:33 PM

@Aklapper I'm not sure I understand the differentiation, if any, between undeploying and sunsetting (vs decommissioning vs killing)? My team spends a lot of time and effort maintaining this extension everytime we need to update our software, and it seems to be barely used last time we checked, and no other product or tech teams have responded about this extension being useful/valuable to them.

oops, my bad. I think i misread something, and there is no difference between undeploying and sunsetting (too many terms to keep up with sometimes)

MPhamWMF renamed this task from Decommission ApiFeatureUsage to Sunset ApiFeatureUsage.Feb 15 2022, 11:37 PM

MPhamWMF updated the task description. (Show Details)

MPhamWMF moved this task from needs triage to test column on the Discovery-Search board.Feb 18 2022, 6:30 PM

MPhamWMF moved this task from test column to needs triage on the Discovery-Search board.Feb 18 2022, 7:01 PM

MPhamWMF mentioned this in T302638: Sunset ApiFeatureUsage (TDMP).Feb 25 2022, 10:20 PM

Aklapper added a subtask: T302638: Sunset ApiFeatureUsage (TDMP).Feb 25 2022, 10:35 PM

MPhamWMF moved this task from needs triage to watching / waiting on the Discovery-Search board.Mar 7 2022, 3:50 PM

colewhite subscribed.Mar 23 2022, 10:53 PM

I was curious:

There is effectively only a single (relevant) call to ApiBase::logFeatureUsage(): https://codesearch.wmcloud.org/search/?q=>logFeatureUsage&files=\.php$
It's in ApiBase::addDeprecation(), which is obviously called way more often. However, it turns out we care only about ApiBase::addDeprecation() calls with at least 2 parameters.
Fast search for single-line calls (warning, this is not everything): https://codesearch.wmcloud.org/search/?q=>addDeprecation\([^(,)]*,&files=\.php$. There are 5 in core and 2 in Echo.
Unfortunately the query that would cover everything doesn't work for me: https://codesearch.wmcloud.org/search/?q=(?s)>addDeprecation\([^(,)]*,&files=\.php$
I did the same regex search in my (limited) dev environment and found 8 relevant calls in core, 3 in Echo, and 1 in TemplateData.

My question is: Is this worth it? Which of these is critical and would hurt us if we couldn't access it any more via Special:ApiFeatureUsage or the featureusage API? I mean, the data is still there, even without the Elastic store and the extension, as far as I got it.

In T301724#7804246, @thiemowmde wrote:

My question is: Is this worth it? Which of these is critical and would hurt us if we couldn't access it any more via Special:ApiFeatureUsage or the featureusage API? I mean, the data is still there, even without the Elastic store and the extension, as far as I got it.

The special page lets normal API users find deprecation errors related to their User-Agent.

matmarex subscribed.Apr 7 2022, 1:39 AM

• LNguyen closed subtask T302638: Sunset ApiFeatureUsage (TDMP) as Resolved.May 12 2022, 6:45 PM

bd808 changed the status of subtask T302638: Sunset ApiFeatureUsage (TDMP) from Resolved to Declined.May 13 2022, 12:52 AM

dcausse mentioned this in T308676: Elasticsearch 7.10.2 rollout plan.May 18 2022, 2:17 PM

Reedy closed this task as Declined.May 18 2022, 7:23 PM

@Reedy , is there a reason this ticket was declined?

In T301724#7942806, @MPhamWMF wrote:

@Reedy , is there a reason this ticket was declined?

Yeah. T302638: Sunset ApiFeatureUsage (TDMP) was declined because it was "Withdrawn from Process"; in this case the Tech Decision Forum.

If it doesn't go through the right processes, how can it be expected to be approved?

The ticket was declined because @SWakiyama cleared my team to move forward with this work without pushing it through the rest of the Tech Decision process.
I was under the understanding that this unblocked this work for the search team.

In T301724#7942826, @MPhamWMF wrote:

The ticket was declined because @SWakiyama cleared my team to move forward with this work without pushing it through the rest of the Tech Decision process.

There was no comment left to that effect on either this or that ticket.

How would anyone else necessarily know this?

There's also various unanswered questions such as those from Cole in T302638#7811894.

Nor the fact of what we do about the current people who actually use this.

Sorry for all the confusion. This process is new to me, and I don't yet fully understand how it works and what communication practices/norms are.

There was no comment left to that effect on either this or that ticket.
How would anyone else necessarily know this?

I was in communcation with Linh via email about the process and let him know that we were moving ahead with this work. I wasn't personally aware that this was being tracked in the phab ticket in a certain way (I'm not on Phabricator that much for my day to day PM work, and don't really use it to track all my personal work). I incorrectly assumed that the ticket was closed in an appropriate manner.

There's also various unanswered questions such as those from Cole in T302638#7811894.

I do not have the technical answers to these implementation questions. Perhaps @Gehel or @dcausse know more than I do.

Nor the fact of what we do about the current people who actually use this.

Current people using this feature will be notified in advance that this extension will no longer be usable for the foreseeable future until it is able to be rebuilt/redeployed without the Elasticsearch dependency. Search is not providing any workarounds in the meantime, and will not own this feature.

Reposting my questions that were ignored on T302638: Sunset ApiFeatureUsage (TDMP).

In T302638#7754445, @bd808 wrote:

In T302638#7753360, @Gehel wrote:

To summarize, there isn't major blocking issues around ApiFeatureUsage, but an ongoing level of small annoyances and additional work required to support it (and with no clear ownership, this falls to the Search Platform & Observability teams by default).

In T302638#7754092, @MPhamWMF wrote:

My immediate goal is to streamline the Search team's product portfolio and focus, and this feature does not fall within that scope.

I appreciate the desire by both of you to protect the Search team from toil and ownership of abandonware. I also think that this ticket was a successful tactic for finding out if anyone at all cares about Extension:ApiFeatureUsage.

Some additional guidance is needed to help all of us who might want to try and keep this functionality from disappearing even if we cannot convince the Foundation that the Action API is part of their project mandate. Specifically we need to know what if any conditions must be met to continue to use Elasticsearch as the backing store for the (timestamp, feature, agent) tuples that are collected. Would breaking the (admittedly hacky and my fault as the original implementor) inter-dependency between the ELK cluster and the indices be enough? If not, what if any other technical and social corrections would be needed to continue to use consolidated full-text search infrastructure for this project?

In T301724#7942920, @bd808 wrote:

Some additional guidance is needed to help all of us who might want to try and keep this functionality from disappearing even if we cannot convince the Foundation that the Action API is part of their project mandate. Specifically we need to know what if any conditions must be met to continue to use Elasticsearch as the backing store for the (timestamp, feature, agent) tuples that are collected. Would breaking the (admittedly hacky and my fault as the original implementor) inter-dependency between the ELK cluster and the indices be enough? If not, what if any other technical and social corrections would be needed to continue to use consolidated full-text search infrastructure for this project?

This might not fully answer your question but I'll try :)
Based on the assumption that this feature is not vital to bot owners but rather to engineers working on the WMF infra willing to help bot owners change their tool because of perf issue or params deprecation:
Would using the hive table event.mediawiki_api_request sufficient for this purpose?

SELECT performer.user_text as user_text, count(*) as cnt
FROM event.mediawiki_api_request 
WHERE params["list"]="allpages" AND year=2022 AND month=5 and day=19
GROUP BY performer.user_text ORDER by cnt DESC LIMIT 10;

Obviously this does not allow users without access to hive to run it but might help in some circumstances by creating a list of bots that need to be updated.

Stepping back I wonder how this feature would look like if we had to design it today:

first I'm not entirely convinced that a search backend is the right option for this, none of the fields seem tokenized and quickly looking at the code the only feature it requires are range query on a date field and aggregation on a string field
It also seems to bypass the modern event platform which enforces a schema with using data from the logging platform which currently allows schema-less data ingestion

So perhaps I'd consider using the existing mediawiki_api_request and have a job that transforms this data and ingest it into a cassandra db with a small serving layer on top.
That said the ultimate goal here is to simplify things not to create new work :)

If we break the inter-dependency between the ELK cluster and the search cluster we would be in a better shape already but this might not be enough.
Constraints on the search side are during version upgrades and maintenance operations and sensitivity of the data:

all writes must be compatible with both versions (as we upgrade one DC at a time)
the MW extension code must be maintained when we upgrade the Elastica library
the indices must be monitored to make sure that the tools that purge them are properly running
size of the indices must be monitored to ensure they do not grow out of control
we should probably switch non-dynamic templates to avoid mapping explosion
having sensitive traffic data like the user-agent hosted on the search cluster without going through data-engineering checks is not something I'm very comfortable with

These are one of the points we would like to stop worrying about.

For social interaction we already manage non-search indices on the cluster, we generally file a task addressed to the index owner prior to important maintenance operations but we can't obviously accept too many different kind of indices, as of today we have:

search indices owned by the search team
translation memory indices owned by the language team
toolhub owned by the technical engagement team
api-feature-usage without clear maintainers

So I guess that if a team would accept to own this extension and data this would be a step in the right direction?

Reedy mentioned this in T313248: Undeploy ApiFeatureUsage extension from WMF production infrastructure.Jul 18 2022, 6:18 PM

Gehel mentioned this in T313731: Long term plan for reducing maintenance workload on the Search Platform team of supporting APIFeatureUsage.Jul 25 2022, 2:25 PM

Peachey88 mentioned this in T325880: Figure out who owns apifeatureusage[12]001 servers.Dec 27 2022, 2:41 AM

Sunset ApiFeatureUsageClosed, DeclinedPublicActions

Description

Related ObjectsSearch...

Event Timeline

Sunset ApiFeatureUsage
Closed, DeclinedPublic
Actions

Related Objects
Search...