Also, while the UI and code quality problems with Extension:Collection don't seem relevant to me (it's not as if it would go away if we do the concatenation elsewhere; the amount to interact with its existing codebase will be the same either way), using the ElectronPdfService extension is always an option. We are talking about functionality that needs to be wrapped around Electron to make it support multi-page documents, so it would make conceptual sense to put it into the extension dedicated to Electron.
Re: what framework to use for concatenation and/or post-processing:
- Currently we have MediaWiki, Node services and Python services on our cluster. Adding a new kind of thing (some kind of standalone PHP service) shouldn't be taken lightly IMO. Following the well-trodden path reduces maintenance overhead. (Plus, PDF post-processing can't be done in PHP anyway so what's the point in writing a PHP microservice to run non-PHP core logic?)
- Given that this project is under severe time pressure, doing the work inside the PHP extension seems by far the best immediate approach to me:
- The code for concatenation in PHP exists already
- No overhead of setting up and managing a service
- Unblocks Ops to get rid of the ocg boxes / RelEng to get rid of trebuchet / Services to update Node. The concatenation and/or PDF modification logic can always be moved into a service later.
- Can still be exposed as an API if we want clients to be able to build their own UI for it. On one hand it's slightly more convenient because the API has access to the session (where collection data is currently stored), on the other hand using the action API to deliver files will be awkward. Neither of those are big issues though.
- I agree the ability to output alternative formats (EPUB, Zim, whatever) is valuable. IMO that's a (weak) argument for doing as much in the extension as possible. We'll probably want a "RESTBase -> concatenated HTML + metadata -> format of choice" pipeline for those as well, but the metadata (and possibly the HTML) will probably need to be subtly different and it's not easy to predict how, so having it travel through more services makes life more difficult. Services are great when there is a simple, stable interface that can serve as a boundary between the different systems. That is the case with Electron (send HTML, get PDF back) but would be much less true for a HTML concatenation or PDF TOC generation service.
You probably forgot to use --raw-output.
@GWicke @mobrovac @Pchelolo where would you prefer this to live? It would be a thin wrapper that makes one or two action API calls (and in one case some calls to the page summary service), reformats the results and returns them. (The draft routes are in T164990.) It would not cache data since all data involved is private. Would it makes sense to include into RESTBase itself, since there is very little processing involved? If so, could you maybe point me to a similar task from the past as a starting point?
@Tgr mentioned, that it's probably ok to let a provider opt out from CSRF protection.
The code generating the feed is https://github.com/huwiki/featured-feeds (it predates FeaturedFeeds) and it is running from tron.wmm.hu IIRC. I'll look at it when I find the time.
You can do this now with external tools such as hypothes.is.
After some changes based on code review feedback, the API syntax is
- meta=readinglists to get all lists of the current user
- rlchangedsince to get lists which changed recently
- rlproject and rltitle to get which lists a page belongs to
- list=readinglistentries to get all entries of the given lists
- rlechangedsince to get entries which changed recently
- can be used as a generator
- list=readinglistorder to get the order of lists/list entries (not really useful, since it's also returned by the other modules, but the REST API wanted such an endpoint, so...)
- action=readinglists&command=[setup|teardown|create|update|delete|createentry|deleteentry|order|orderentry] for all the write operations on lists.
- commands are implemented as submodules
Done in https://gerrit.wikimedia.org/r/#/c/366980/; use the meta=readinglists and list=readinglistentries modules with the changedsince parameter.
Done in https://gerrit.wikimedia.org/r/#/c/366980/; use the meta=readinglists module with the project and title parameters.
https://gerrit.wikimedia.org/r/#/c/366980/ adds a maintenance script to do this. Can be configured via $wgReadingListsDeletedRetentionDays (or overridden in the script parameters).
Done in https://gerrit.wikimedia.org/r/#/c/366980/, although the API does hide deleted items unless asked for sync data.
Mon, Aug 21
Tried to look up the settings but something is misconfigured on deplyment-mediawiki04/05 bad enough that eval.php can't even load:
tgr@deployment-mediawiki05:~$ mwscript eval.php --wiki=enwiki PHP Fatal error: Class 'Memcached' not found in /srv/mediawiki/php-master/includes/libs/objectcache/MemcachedPeclBagOStuff.php on line 61 Fatal error: Class 'Memcached' not found in /srv/mediawiki/php-master/includes/libs/objectcache/MemcachedPeclBagOStuff.php on line 61
On deployment-tin (where that does not happen), Redis uses a single proxy, /var/run/nutcracker/redis_eqiad.sock and using that socket to fetch the captcha data works fine (on mw04/05 as well). Presumably Nutcracker is configured to use consistent hashing and shard the data between the Redis instances.
Thu, Aug 17
Wed, Aug 16
Tue, Aug 15
Mon, Aug 14
Sun, Aug 13
Sat, Aug 12
Fri, Aug 11
The text of a hyperlink usually describes where the link takes you / what happens when you click on the link. I'd expect this to be well understood by users.
@zeljkofilipin can you maybe advise? https://gerrit.wikimedia.org/r/#/c/339584/ fails because one of the unit tests contains something that's only valid in PHP7. The patch changes the composer test command to exclude test directories, but that has no effect, presumably because linting happens directly and not through composer test. What would be the best way to fix this?
This is an upstream issue that we probably can't do much about. I mostly just created the task to have a description of the issue that can be referenced from on-wiki documentation.
@Osnard I don't think there is an established convention and most extensions actually just use <extensionname> but IMO using MediaWiki\Extensions\... follows from the PSR-4 recommendation that namespaces should start with a vendor prefix. Vendor prefixes are reasonably unique; extension names are not (ie. there is no guarantee someone out there is not writing a library with the same name as your extension, and once another extension tries to use that library as a dependency, you are in trouble).
Opening the edit URL works, it's just the VE in-place loading thing that fails.
Thu, Aug 10
paizaQA seems to lack most of the features of a decent Q&A system (reputation, badges etc). Apparently it was written in a very short amount of time to demo a RAD method, and then abandoned.
error: could not load image from http://tatteredwiki.org/images/6/6f/Lycus_Moss.png
Would be cool to have a Toolforge tool which can check some of the criteria and provide you with a TODO list.
- chapter/section number is much easier to do via CSS counters than by rewriting the HTML.
- I don't see the potential benefit in using unix tools. You would have to use something pretty complex like awk, which would result in code most developers can't easily read, and the interaction with the usual development ecosystem (unit tests, logging, debugging tools etc) would be awkward. Not to mention that it would limit the extension to Linux installs (and only specific distributions, unless you pay a lot of attention to portability).
- I doubt performance is a big deal since PDF rendering will probably take much more time than the simple HTML changes that are proposed here; there is little value in trying to optimize parts of the system which are already relatively fast. That said, RemexHTML has better asymptotic performance than the alternatives (see Tim's comment in T163272#3272877) and probably less overhead as well since the delays inherent in communicating via HTTP are very likely going to be larger than any speed benefit a Node implementation might possibly have over a PHP implementation in object instantiation time and whatnot.
- Document why you do things, not what you do. In long blocks of code, adding comments stating what each paragraph does is nice for easy parsing, but generally,
- Use PSR-4: one class per file, file name/path reflects class name. Classes should preferably be in the MediaWiki\Extensions\<extension name> namespace.
- Use dependency injection, avoid static calls for other than utility methods + hook entry points
- Don't overuse private visibility in services
- Use structured logging, with meaningful levels
- Create a Vagrant role for your extension.
- Document hooks used in the extension infobox, it's a nice method of exposing examples so that other developers can learn.
- Store your extension on gerrit so that others can update it for core deprecations
Wed, Aug 9
Uhh, it must have been mindless copypasting of some HTML boilerplate. It uses neither jQuery nor GA (and GA still used the sample siteid ŲA-XXXXX-X).
Sun, Aug 6
Is there a summary of what exactly would need to be changed to make PageTriage work with other wikis?
Fri, Aug 4
Is there any benefit in using a prefixed IP as the username, as opposed to using a session ID (possibly something easier to remember, such as a diceware string) and exposing the IP address separately? Then, a wiki could be configured so that "anonymous" edits are truly anonymous (unlikely to be interesting for Wikimedia projects but might be useful for others, e.g. wikis operating in jurisdictions with stronger privacy laws), and it would be possible to apply judgement in edge cases (e.g. hide IP addresses for edits originating from oppressive regimes).
Changing status to declined which better reflects the outcome.
How many ORES users have ever visited Meta and set their preferred user language there? A tiny fraction, I'd guess.
Tue, Aug 1
Just use Tor etc?
Sat, Jul 29
Options that come to mind:
- Just deal with it. Add a token endpoint to the REST API; clients are required to call it first and fetch a token. Any write endpoint can return a token error (Wikimedia has MediaWiki configured with short session lifespans) in which case the client is required to fetch a new token and resubmit the request. This seems inconvenient but most clients do it already since they interface with the Action API directly, so not that much of a change.
- Use some kind of CSRF-safe authentication or request signing, and relax CSRF requirements when not actually needed (cf. T126257: The API should not require CSRF tokens for an OAuth request).
- Use OAuth. The problem here is that unlike session cookies, OAuth cannot be transparently proxied as the signature is based on the request URL. So the REST service would have to be able to verify OAuth signatures (not hard since there are libraries for it, but the data is stored in MediaWiki so the service would have to access it somehow) and authenticate to the action API in some alternative way.
- Use API tokens. This would require a new authentication module for MediaWiki, plus T126257, but unlike OAuth it can be proxied transparently. Stealing such a token could allow impersonating the user, but that does not seem any more insecure than the long-lived token cookies already used by MediaWiki. Also, it could be bound to a single REST service (Reading Lists, for example, is not super sensitive).
- Use double-submit CSRF; ie. instead of storing the CSRF in the session, just have the client store it in a cookie and submit as both cookie and POST data, and compare on the server. This is less secure than session CSRF, though. (Slightly more secure if the cookie is initially obtained from MediaWiki and signed in some way.)
- Force requests to trigger CORS (e.g. Content-Type: application/json), so web clients can only send requests from trusted domains. (Non-web clients are not affected by CSRF anyway.) Then find some way to exempt requests proxied by the REST service from having to include a CSRF token.
Thu, Jul 27
I don' see anything odd in replacing static hook handlers with methods of a hook handler service. It's a step towards having proper per-extension event listener services. (Also it was sort of possible pre-extension registration, via the $wgHooks = [ $hookHandler, 'method' ]; notation; it'd be nice to have it back (and more performant).
Wed, Jul 26
Tue, Jul 25
- the @ notation is too cryptic. I don't think Symfony is popular enough that it would make sense to freeride on its conventions. I'd rather see something like [ 'service' => 'ServiceName', 'method' => 'hookMethod' ] which is self-documenting even if it's uglier in config files.
- there are quite a few places which implement some kind of object instantiation notation (ObjectFactory, ApiModuleFactory::addModules, ResourceLoader::getModule, ObjectCache::newFromParams...) which is a pretty similar problem. It would be nice to come up with a shared codebase for doing those things.
Other than that, seems like a good idea to me. IMO we should discard the hook system eventually and end up with something more like event listeners/dispatchers, but that's a long term thing and this looks like a reasonable short-term fix.