Page MenuHomePhabricator

Clarify recommendations around using FauxRequest
Closed, ResolvedPublic

Description

In T109836 @Anomie said:

FauxRequests to action API modules are an antipattern that should be avoided. Instead, the functionality should normally be exposed by some "business logic" class that is used by both the action API and by other code.

Full conversation is documented here:
https://phabricator.wikimedia.org/T109836#2430649

Various projects use FauxRequests and various projects use PHP classes directly.

According to @daniel in that same task

We can have this as an ArchCom session, but there was quite a strong consensus when we discussed this in Lyon: don't use the Web API interface and FauxRequests for internal requests. Reasons:

  • It's a "stringly typed" interface, it lacks type safety
  • the encoding/decoding is a performance issue.
  • it makes things very hard to test - it's an antithesis to the idea of injecting narrow, easy-to-mock interfaces.

Best practice IMHO: make business logic that does not know or care about UI or web API, write a thin layer of UI or API on top of it.

I'd like us to discuss this again and document the decision/reasoning on wiki and in wikitech.

My questions

  • Is use of FauxRequest discouraged in Special pages?
  • When should FauxRequest be used?

Output

Event Timeline

I copied the rest of @daniel's comment into the description, since it's directly relevant to the question at hand.

T111074 improve watchlist query continuation handling is a good example of where this is actually quite useful as it allows sharing of logic on front-end and backend. Using FauxRequest would make an infinitely scrollable watchlist much more achievable.

It seems to me that T111074 is about using the API directly instead of
doing your own thing, not about any advantage to using FauxRequest.

The real solution if you're doing watchlist stuff server-side is probably
to finish the WatchedItemQueryService stuff WMDE started, i.e. solution 2
in T109277. ApiQueryWatchlist already uses it.

T111074 improve watchlist query continuation handling is a good example of where this is actually quite useful as it allows sharing of logic on front-end and backend. Using FauxRequest would make an infinitely scrollable watchlist much more achievable.

Code sharing would indeed be good!

The point is to not do it the lazy (and inefficient and unsafe and unstable) way, but to do proper refactoring: The special page and the API module should be thin wrappers around a controller object (or "interactor", since this is not a MVC controller).

Using FauxRequest to achieve this kind of thing means adding technical debt. It should be a short term solution at best, with resources committed to fix it in the future. Even in a possible future microservice centric environment, application logic should not handle HTTP requests and responses directly, but use service interfaces that are modeled after the actual need of the callers, not based on the concepts of some protocol or other.

Quick and dirty is nice for now, but bad for later. We have way to much quick and dirty crap in MediaWiki already. It's slowly getting better. Let's not go back to the bad old ways.

Just a thought...
An interesting approach might be to programmatically ensure it is used appropriately. In React for instance they have a parameter dangerouslysetinnerhtml https://facebook.github.io/react/docs/dom-elements.html#dangerouslysetinnerhtml - it doesn't stop you using it but every time you do it makes you mindful about whether you should be using it.

Maybe FauxRequest should be updated to include a similar kind of parameter e.g. shamefullyUseInternalRequest ?

The reasoning seems sound to me, but I think we can do better to communicate this.

Constructing ApiMain with a WebRequest object is how api.php requests are processed. This is a public signature as part of an OOP and testable design. (Similar to MediaWiki for index.php, and ResourceLoader for load.php).

FauxRequest was not introduced for making internal API requests. It was introduced to construct a WebRequest object not based on global state. (For unrelated purposes, predating api.php). As such, it probably isn't the appropiate place for adding documentation or utility methods relating to "internal API requests".

While passing a FauxRequest object has always worked (mainly used in testing), these "requests" started to be special-cased in in rSVN17005 / b56d23ed4691 when the first "internal request" was introduced in the code, and actually within the API codebase, from the OpenSearch action to the AllPages query action. And in order to avoid duplicate layers of error handling, the layer was disabled if the request is internal (to keep only the outer layer of error handling).

This use was eventually removed again in rSVN30275 / 7a3227596e71, in favour of both code paths calling an internal class directly.

A few uses have creeped back in (mostly in extensions, not in core) as a quick way get to something that doesn't have a shared class. Each use is, without question, technical debt.

As for "how" – plain constructing of ApiMain with a derived RequestContext is the norm, and seems fairly straight-forward.

As for "when" – I would avoid this and recommend to take the opportunity to abstract the API in question into a re-usable PHP class. PrefixIndex was a good example of this.

This RFC has been scheduled for public discussion on IRC on Wednesday, October 25.

As always, the discussion will take place in the IRC channel
#wikimedia-office on Wednesday 21:00 UTC (2pm PDT, 23:00 CEST).

@Jdlrobson @Anomie will you be available for the discussion?

Ok, I've added it to my schedule.

@MaxSem @EBernhardson: Jon indicated you may have an opinion on the use of FauxRequest. Can you take part in the IRC discussion on Wednesday 2pm PDT? If not, can you explain your position here, in a comment?

For the record:
I personally maintain the position that application logic should not be bound to an external API (circular dependency), and that the interfaces used by application logic should be as ideomatic as possible (no stringly typed interfaces, no amorphous interfaces).

However, I think there is an interesting case to be made for using (an emulation of) the web facing public API in the rendering (UI/skin) layer: whether that code runs on the server or the client, and whether it's written in PHP or JS (or XSLT or whatever) should be an implementation detail, and easy to change. If it would hit the web API when running on the client, it should hit the same API when running on the server.

But that requires the "surface" layer to be properly separated from the application logic. This is currently not the case in MediaWiki, especially not in SpecialPages. Such a surface layer should be entirely detachable from MW core, and may not even be written in PHP.

@Yurik Do you feel like having this discussion again? On Wednesday, 2pm PDT?

@Jdlrobson gave me his perspective on the matter as follows:

  • Client side and server side code should use the same APIs.
  • FauxRequest is a way to allow server side code to use the same APIs as the client.
  • If FauxRequest was the preferred way of calling things on the backend, this would force everything to have a well-defined public API that can also be used from the client.

Why are you asking me? :P Yes, of course the abuse of FauxRequest should be prohibited with fire and brimstone, however practically speaking there are huge obstacles around this. MediaWiki should be an example for extensions of how to share a common model between API and web interface, yet we have that for how many core APIs, zero? Furthermore, the API actively encourages its modules to do things directly, for example ApiQueryBase has functionality to build direct DB queries, instead of maybe some code to integrate its pagesets with models outside of the API. Until this gets addressed, people will continue sticking FauxRequest where they shouldn't.

@MaxSem

Why are you asking me?

Because Jon said he'd be interested to hear you opinion. And because I know you as a friend of the quick&dirty ;) But I'm to see you on my side on this one.

Until this gets addressed, people will continue sticking FauxRequest where they shouldn't.

Well, my hope is that prohibiting that will increase pressure to fix core API modules :)

@daniel, I think we shouldn't use FauxRequest objects at all (for the reasons outlined elsewhere, such as no type safety, etc), but I do believe the existing API is better suited as a common entry point for all business logic layer - as outlined in BL architecture on budget, instead of using hundreds of objects each of which could go all the way to the database. So ideally we should partition existing API into the internal API and an extremely thin, no logic layer to convert Request into it.

@Yurik Hey, good to hear from you!

So ideally we should partition existing API into the internal API and an extremely thin, no logic layer to convert Request into it.

Well, I call those "services" (like the ones in MediaWikiServices) and the thin layer "API modules". But yea, I agree :)

  • Client side and server side code should use the same APIs.

Considering that client-side is making requests over the network and needs to be protected from attackers while server-side is typically running in-process with little to worry about attacks such as CSRF, that seems like a statement that needs to be proven.

  • If FauxRequest was the preferred way of calling things on the backend, this would force everything to have a well-defined public API that can also be used from the client.

In my experience, it would instead result in the creation of a plethora of poorly-specified endpoints, probably taking undocumented JSON blobs in lieu of input parameters, for extremely client-specific purposes with extremely client-specific output.

MediaWiki should be an example for extensions of how to share a common model between API and web interface, yet we have that for how many core APIs, zero?

Many more than zero.

ApiAMCreateAccount, ApiChangeAuthenticationData, ApiCheckToken, ApiClearHasMsg, ApiClientLogin, ApiComparePages, ApiFileRevert, ApiImport, ApiLinkAccount, ApiLogout, ApiManageTags, ApiMergeHistory, ApiPatrol, ApiQueryAuthManagerInfo, ApiQueryFileRepoInfo, ApiQueryPageProps, ApiQueryPrefixSearch, ApiQueryTags, ApiQueryTokens, ApiQueryWatchlist (although the web UI code hasn't been updated to use the same backend yet), ApiQueryWatchlistRaw, ApiRemoveAuthenticationData, ApiResetPassword, ApiRevisionDelete, ApiRollback, ApiSetNotificationTimestamp, ApiUndelete, ApiValidatePassword, and ApiWatch all seem to parse and minimally validate parameters, call a backend method to do the real work, and then format the result for output to the client. Some approach the ideal more closely than others, of course.

ApiBlock, ApiEmailUser, ApiQueryQueryPage, ApiSetPageLanguage, and ApiUnblock do the same somewhat backwards by calling into halfway-UI methods in Special pages. Some have to do a few too many checks, while others are almost in the first group except the "backend" happens to be a static method on a SpecialPage class.

ApiDelete, ApiImageRotate, ApiLogin, ApiOpenSearch, ApiProtect, ApiQueryDuplicateFiles, ApiQuerySearch, ApiTag, ApiUpload, and ApiUserrights delegate most of their real work to a backend, although they seem a bit heavy on various checks and/or loops. In several cases this just comes down to the fact that the backend works one item at a time while the API wants to allow multiple items to be specified in one request, and in others there's a lot of output reformatting going on.

By this criterion ApiQuerySiteinfo is actually surprisingly good, it calls backend methods to get all of the information it returns. What makes it huge is that it's the central interface to publicly expose a lot of different bits of configuration, where web interface code just accesses the individual configuration bits directly as it needs them. ApiQueryUserInfo is in a pretty similar situation.

Furthermore, the API actively encourages its modules to do things directly, for example ApiQueryBase has functionality to build direct DB queries, instead of maybe some code to integrate its pagesets with models outside of the API.

True, many of the query models could be merged with corresponding Pager classes. Although it's not a straightforward change to create something as configurable as the API modules that also serves the very web-UI-oriented style of Pager, and then adapt everything to use it. WatchlistQueryService is one attempt along these lines, with somewhat mixed results and SpecialWatchlist hasn't actually been converted to use it.

So ideally we should partition existing API into the internal API and an extremely thin, no logic layer to convert Request into it.

Well, I call those "services" (like the ones in MediaWikiServices) and the thin layer "API modules". But yea, I agree :)

I doubt we can really get to "no logic", since incoming requests from the world will always have to do some validation that a backend "serivce" is going to expect callers to handle as needed, and then responses are probably going to be able to be done more usefully for clients than just dumping a Status object and calling it done.

Well, my hope is that prohibiting that will increase pressure to fix core API modules :)

So you're pro-stick-before-carrot? :)

Highlights of the IRC meeting:

  • T169266 Clarify recommendations around using FauxRequest
    • <DanielK_WMDE> main question today is: if and when is it ok to use FauxRequest to make an internal call to the web API in production code.
    • <anomie> Running into a situation where it seems like a FauxRequest into the API is a good idea is usually really a situation where you have an opportunity to refactor things to reduce tech debt.
    • <ebernhardson> there is lots of functionality you can only get from the api unless you want to tack an extra 200 lines of code onto your feature
    • <TimStarling> do we need to refactor the API to provide a better internal interface for continue parameters?
    • <anomie> Conceptually, many of the API query modules and many of the Special pages do basically the same thing. But API query modules aren't abstracted, and Pager is extremely UI-oriented. So unifying the two would be a rather big design project, and a bigger amount of work converting things.
    • <TimStarling> with infinite scrolling, is the goal to have PHP deliver the first page of results, and then JS will construct subsequent pages using API calls?
    • <anomie> Ideally, Code Monkey would take the existing logic in the API module and turn it into a class that takes input data and produces output. The API module would parse its parameters into the format needed by the class (possibly constructing Title objects, etc) [..]. The special page would do basically the same thing, except its "standard data structure" would be HTML
    • <brion> using fauxrequest also requires you to already have an api module ;) if you're starting from scratch on a new feature, is there a different calculus?
    • <SMalyshev> I don't like the idea of SQL logic in API. That way if you need the same logic somewhere else (and eventually you do), it's very hard to reuse it
    • <DanielK_WMDE>i'm opposed to VirtualRest for the same reaqson I'm opposed to FauxRequest. <Krinkle> Using VirtualRest has a similar characteristic as FauxRequest, in that all parameters must be strings.
    • DanielK_WMDE> so, what about acceptable uses of FauxRequest? <MaxSem> when you need shit done but there's no clean PHP interface to do what you need and it can't be produced in a reasonable amount of time
    • What deadlines do WMF have that would pass that criteria? <DanielK_WMDE> in the Wikimedia case, it's never external pressure, just personal impatience, I think...
    • <MaxSem> if e.g. refactoring an API takes order of magnitude as long as using FauxRequest, lots of people would ask if it's worth it. <ebernhardson> i think there are a number of things where if it takes a day, it gets prioritized. If it takes a week it gets punted into the pile of tickets to do "someday"
    • <bd808> its a dirty hack and everyone should see it as such. It is at best a testing harness for poorly factored code.
    • AGREED: Use of FauxRequest in production code is considered technical debt.

Full log: https://tools.wmflabs.org/meetbot/wikimedia-office/2017/wikimedia-office.2017-10-25-21.03.log.html

I had a read through the irc log. While there are no real surprises thee, one thing that didn't come up in the discussion that I think is important to call out and think about is that usually the team working on a feature is not the right team to do refactoring.

I don't think any developer has not wanted to do the right thing and refactor but I think we are overlooking the fact that not everyone is empowered to refactor for a variety of reasons and typically when they do they take time possibly spanning years.

For instance rendering a watch list and making it infinite scrollable in JavaScript is very different from refactoring the php watchlist code to support that. Generally the former is being done by a product team of front-end devs and/or new hires who are still learning the mediawiki stack. Refactors in mediaWiki on the other hand are complicated and require knowledge and more importantly experience of the entire stack and our processes. Typically what I see happening now is good ideas are dismissed if they take time as we are accountable to our donors.

As a member of a product team I've seen good ideas that should be simple, deprioritised because they're not worth the effort to spend refactoring (as there are other lower hanging fruits with less effort).

How can we get better at making the platform more malleable in preparation for these use cases? To use @bd808 analogy how can we plan the city better to meet the future needs of its residents...?

Hi Jon!

I understand where you are coming from, and I do feel your pain. I think this is one of the fundamental challenges that engineering at Wikimedia is currently facing. I have written a little essay to reflect my thoughts on this: https://www.mediawiki.org/wiki/User:Daniel_Kinzler_(WMDE)/Avoiding_the_Tech_Debt_Trap

I have tried to respond to some of your points below. If my replies seem harsh, please remember that I do understand the pain. I just see no alternative to actually investing into fixing the underlying problem. And yes, that will cost resources that could otherwise be put into feature development. It's an investment into the future.

I had a read through the irc log. While there are no real surprises thee, one thing that didn't come up in the discussion that I think is important to call out and think about is that usually the team working on a feature is not the right team to do refactoring.

That depends on what needs refactoring. If your team is working on the Watchlist, and the watchlist API code needs refactoring, then I do believe it to be your team's responsibility to get that code refactored. If you take ownership of a feature, you should take ownership of it across the stacks - until you hit a "horizontal" like the database abstraction layer. If your team does not have the necessary expertize, then it should gain it - by learning, or by recruiting people who have the knowledge needed. In practice, I could imagine a hybrid approach: find someone who could do the refactoring, and pair up with them to do it, so you (or someone in your team) gains the skills.

If there is refactoring needed in a broad, cross-cutting component, it's time to declare a dependency on whatever team owns that component. And if there is no clear owner, I consider it the responsibility of higher-up management to fix that.

I don't think any developer has not wanted to do the right thing and refactor but I think we are overlooking the fact that not everyone is empowered to refactor for a variety of reasons and typically when they do they take time possibly spanning years.

Yes - but pushing back the refactoring and adding workarounds instead is exactly what got us into this mess. If we keep doing it, our code base will keep getting worse.

We accumulated debt, and we are paying more and more interest on it. We will have to start actually paying it back, even if it's painful.

For instance rendering a watch list and making it infinite scrollable in JavaScript is very different from refactoring the php watchlist code to support that. Generally the former is being done by a product team of front-end devs and/or new hires who are still learning the mediawiki stack. Refactors in mediaWiki on the other hand are complicated and require knowledge and more importantly experience of the entire stack and our processes. Typically what I see happening now is good ideas are dismissed if they take time as we are accountable to our donors.

See above. Sure, green fields development is more fun, and seeing features deployed is gratifying. But letting others deal with the mess is not fair, nor sustainable.

As a member of a product team I've seen good ideas that should be simple, deprioritised because they're not worth the effort to spend refactoring (as there are other lower hanging fruits with less effort).

That I see as a massive problem in WMFs management culture. To get out of the mess we are currently in, all teams have to pledge a good porting of their resources towards addressing tech debt. I recommend talking to the tech debt SIG: https://www.mediawiki.org/wiki/Special_Interest_Groups

How can we get better at making the platform more malleable in preparation for these use cases? To use @bd808 analogy how can we plan the city better to meet the future needs of its residents...?

By not pushing back refactoring, but tackling it bit by bit. And by assigning ownership for all components, and dedicating time to improving the platform "horizontals".

daniel closed this task as Resolved.EditedNov 3 2017, 12:55 PM
daniel claimed this task.

Clarified, as per T169266#3711464.

To summaries: FauxRequest should only be used for testing and prototyping. Use of FauxRequest in production code should be considered technical debt.

I wrote a little rant about this a while ago: https://www.mediawiki.org/wiki/User:Daniel_Kinzler_(WMDE)/FauxRequest