I don't see any way to do that, I guess you'd have to be a phab admin?
Sun, Apr 21
Sat, Apr 20
OAuth is currently not resourced, but there's a team working on blocking tools so realistically you might have better chances for improving things from that direction. Autoblocks of Toolforge IPs probably should not happen, after all, it's sufficiently important and trusted infrastructure that it should only be blocked by someone who is conscious of what they are doing.
The only point of connection might be adding $contentType="wikitext/1.0"/$resulttype="html/2.1.0" parameters to some top-level getParserService call, to keep the door open for backwards-incompatible changes to wikitext / etc in the future.
Fri, Apr 19
Individual endpoints, yeah. Session setup takes a nontrivial amount of time and most GET endpoints won't need it (the action API tends to mix public and private data, e.g. show redacted usernames if you have the permission to see them; in the REST API we probably don't want that as it makes responses uncacheable). I don't think it's hard, just an an extension point in Setup.php (pretty much everything else is loaded by that point), but something to think about.
The deeper problem here is what versioning even means for an API that's inside MediaWiki. Is it reliable if it doesn't include the MediaWiki core, library, extension etc. versions? Is it still useful if it does include them?
The current API sort-of equivalent of Special:Version is action=query&meta=siteinfo. That's probably something we want to port to REST; not sure whether it's the right place for finding out the version of specific API endpoints as it will be rather large. Something like an OPTION request to that endpoint or a /version/<normal endpoint URL> might be easier on clients. Soemthing to consider for the routing handler interface.
Some related tasks:
MediaWiki rate limiting is based on user authentication, and often happens deep in the application (for example there are different rate limits for rendering a thumbnail in a standard size and a nonstandard one; the API does not know what are the standard sizes) and is only communicated to the controller in the form of error messages.
There was a push for API keys as a prerequisite for some vague monetization ideas when the previous ED was trying to reshuffle things; that was opposed vigorously and eventually got dropped. I don't remember the details, I think @bd808 was asked to work on it since he was doing API analytics at the time, so maybe he has pointers to the old discussions.
I imagine we'll still call Setup.php and that takes care of authentication. OTOH there should probably be a way for API endpoints to declare themselves sessionless.
All MediaWiki appservers contain the code and configuration for all wikis and select which one to apply based on the Host header. In theory that logic could be amended to handle Parsoid REST URLs with domains in them, but it would be confusing, would add extra complexity for no good reason, and would not work for third parties (wikifarm management is not standardized, every farm handles multi-wiki configuration in their own way). Also makes cookie-based authentication complicated (if not impossible). Changing what URLs RESTBase and the MediaWiki virtual REST service send to seems far simpler.
If however Parsoid is a library used by core, this should not even be needed. Code that follows a DI design generally has no need to, and in fact should not, have any use for or knowledge of a DI framework.
Thu, Apr 18
Wed, Apr 17
Any objection to creating a MediaWiki-REST-API Phabricator project for this?
Tue, Apr 16
What about extensions bundled with MediaWiki in the release tarballs, but not deployed on Wikimedia servers? Usually we require the same standards for those, as that's what users will expect.
Mon, Apr 15
Duh, not sure how I missed that. Thanks.
Related: document proper use of dependency injection in MediaWiki libraries (e.g. T221041: Convert Parsoid to dependency injection).
exists() seems reasonable for a FileIdentity class (although remote repos make it more complex, but in some sense that issue exists with Title/PageIdentity as well). getUrl() (and getDescriptionUrl() which is fairly different in terms of systems involved) should live in the Repo as that's the class that deals with the local / shared DB / API split (although caching the results in a value object like FileIdentity does not seem unreasonable). allowInlineDisplay() (aka canRender()), mustRender() and isVectorized() should be in MediaHandler (they mostly are, we just need to deprecate handlerless files). getWidth() should probably be moved to a FileMetadata value object (where FileIdentity represents DB state and FileMetadata represents the file on disk, although in practice metadata is also cached in the DB). transform() probably merits its own service.
I would keep tools mostly single-purpose: have the coverage tool expose the data as a JSON API (where API probably just means writing it to a JSON file and making it accessible) and have the extjsonuploader tool deal with pushing it on-wiki along with the other extension data.
statsd typically does not allow you to slice the data per wiki. Logstash says there were 60 logins in the last 7 days, which is trivial (unless you are worried about the potential DoS angle).
Some ideas I have been playing with: T220939: Display extension test coverage in infobox, T220938: Display per-line test coverage during code review.
Is there / will there be an API to access this data?
One potential use case would be T220939: Display extension test coverage in infobox.
Sun, Apr 14
Theoretically you could do it with a hand-crafted import file...
Sat, Apr 13
See also T220893: API for listing authors of an article which would accomplish this task as stated in the description (but not really stated the purpose of determining the principal author).
I imagine this would require a (page, actor) index on revision. action=credit is disabled on Wikimedia projects because without such an index WikiPage::getContributors() is too expensive.
It would be useful to add something about who is expected to know / enforce the policy. This is a pretty heavy technical document; throwing it at new contributors is a good way of making them not contributors anymore. IMO the point by which one is expected to read and understand the document is getting +2; reviewers should be expect to act as human interface to the policy and explain specific problems to patch authors without making them read four pages of dense technical jargon. That might be obvious but IMO worth spelling out.
I feel this needs to be a "SHOULD". There are certainly situations where it is not possible to "avoid" wikitext entirely.
This task is a bit unfocused. Most of the discussion is about blame maps so maybe merge into T2639: [Epic] Add feature annotate/blame command, to indicate who last changed each line / word?
Is this a duplicate of T9240: Usernames in history of imported pages should refer to original wiki (which is now fixed)?
Actions are a fuzzy concept in MediaWiki (see e.g. T212345: Authorization checks have $action parameter, but accept a user right). In one sense, and action is something that has an Action subclass and can be triggered via the action= URL parameter, and that's a small subset of rights. (Well, not technically a subset, there are actions like history which do not have an associated user right.) But the action- messages are used in permission error messages, and in the future we'll probably want to move towards always having error messages (T180888: All permission checks should be able to return a custom error message) as it is hard for the extension author to predict when they will be necessary, so having these messages seems like a step in the right direction. There will be exceptions, but it is always better to err on the harmless side.
The Commons templates on the other hand are not shown, so it seems as if the local description page was fetched when the software tried to get the Commons one.
Fri, Apr 12
@phuedx can you comment? You probably have a better grasp of the current status.
I think the more interesting question is when anonymous user accounts should be created. We cannot create them for visits which don't result in a page save attempt or similar, for obvious scaling reasons. If we create them on write (ie. when an actor ID needs to be inserted somewhere), the user will be detached from their contribution if the user agent does not persist the session (e.g. browsing with cookies disabled) without us being able to detect it beforehand and warn them. If we create them just before write (e.g. whenever a CSRF token is obtained, like the user opening the edit form), that means doing stateful work on GET.
A very minimal error handler has been added to core: mediawiki.errorLogger.js. It doesn't do anything beyond pushing exceptions on an mw.track channel.
The rest was written as Extension:Sentry which uses the Sentry client library to report those exceptions. This was done four years ago so the frontend code is very outdated. We might or might not want to use the Sentry library anyway - it can canonicalize stack traces which is nice, but initially we don't need stack traces (and can't have useful traces anyway due to lack of source map support in ResourceLoader), and it is fairly large. @phuedx or the Reading Web team are probably better people to coordinate with on this.
Quick question: what's the plan for this from a product point of view? Is it slated for implementation soon? Knowing this will help us plan how/when to discuss.
PageIdentity refers to a page that exists; protecting non-existing pages is entirely meaningful. Also you need to deal with conterfactual state for permission checks. E.g. a permission check for moving a page into the MediaWiki namespace (which has a magic restriction) needs to check against the future state of the page. (This comes up elsewhere as well, e.g. permission checks for a content model change. Probably worth thinking about as a general issue.) So something like TitleValue seems more appropriate here.
- include_path in php's ini is still set to the value it had in the old times before we migrated to HHVM. I think it's irrelevant nowadays for all applications and can be removed. I think it's actually potentially harmful.
Thu, Apr 11
No date yet, some work is still pending. Probably about a month from now, unless someone has different plans.
The main complication is that we pin the entire dependency tree so most requires do not come from core or an extension directly but are a requirement for a requirement. And since those requirements change with every upgrade, and a library might be required by multiple things, it's quite hard to keep track even for just core why something is needed or whether it is still needed at all. This also makes it hard to verify that version requirements are met (T178137: MediaWiki-Vendor creates a scenario in which incompatible versions of dependencies can be present).
Thanks, I didn't realize firejail wraps the whole service, not just the render processes. Let's go with JS-only then.
Can someone clarify for me if this is why ParserOptions::setWrapOutputClass() no longer takes 'false' as an option (this being the only sane way I could find to disable the entire wrapper when running parser->parse on a thing)?
To recap what was said at the meeting and in some earlier conversations, there are three independent work streams:
- Capturing errors on the client side.
- Extension:Sentry does this (it loads Raven.js, the old Sentry client on demand when an error happens) but the code is four years old now so probably should be updated to the current Sentry client, @sentry/browser (which might or might not be simple). The old code contained some awkwards workarounds, largely because Raven.js was a wrapper around TraceKit which was a valuable library with lots of hard-won compatibility logic for various browser quirks but very inflexible in how it expected to be used (e.g. it really did not expect to be loaded only after the error was thrown). Back then they were on the verge of removing it and rewriting Raven.js so that probably happened since then.
- We could also write our own client. The benefit of using the official one is support for lots of browser quirks, including parsing of different stack trace renderings in different browsers (which could be reimplemented but it takes time). The benefit of a custom client is that it will probably end up being much smaller (@sentry/browser is 50K minified). Also stack traces are only useful if there are source maps, plus ResourceLoader client side caching would also have to be dealt with (probably also doable with source maps - see T90524 for details).
- A pipeline for getting error logs to their destination.
- In theory this could be a Sentry server. When I showed our traffic numbers to Sentry devs a few years ago they said a single beefy bare-metal server could probably take it (but their own traffic numbers were a bit below our worst case predictions so it was just a guess). But there's no guarantee and anyway we'd probably want decoupling and a standard pipeline for all events.
- Using an EventLogging-ish process with a beacon endpoint being read into Kafka by varnishkafka, as described in this task (that was the old plan as well: T501). Problematic due to size limits (we'd have to drop stack traces at the minimum).
- Using EventGate to send reports to Kafka, which seems the clear winner. With both this and the previous option, we'd want some kind of rate limiting in Kafka (Kafka itself scales well).
- Storing/displaying the erros.
- Sentry would be the nice option here (it has lots of features for supporting browsing, searching, displaying and managing errors) but it's a fair bit of work to package (unless we decide running third-party docker containers is acceptable now). Bonus is that it has clients for just about any programming language and we have plenty of systems (both WMF and community-maintained) where scaling is not a concern so we can just stick in a client and get error management for free. So probably a good long-term goal.
- Just dumping the errors into logstash would probably get us decent searching and trend monitoring capabilities for free, so that seems the reasonable thing to do initially. (This was the old plan - T502.) Even if we use Sentry eventually, we might want to keep Logstash in the pipline for deduplication.
Wed, Apr 10
How would the switch itself look? Do we want some kind of staggered rollout (or rollover, I guess) with only a fraction of the users switched at first? Or not worth the effort?
- RESTBase adds optional parameters supported by Proton and not Electron
To achieve point 1, we need to
This is currently a subtask of T210651: Switch all PDF render traffic to new Proton service - does that mean it is seen as a blocker? If not, what's the expected impact / timeline?
@ovasileva since you are the product owner for PDF rendering (which is really what the communication would be about), any thoughts on this?