Sat, Oct 10
Not exactly, even though this is also possible. There are cases in my app when i could be using multiple Banana contexts in the same place, e.g. showing text/links in different languages, and being able to call createMessage( banana, messageKey, ...placeholders ) with my own custom-configured banana instance would be very useful. Another case is that sometimes I need to override the way banana itself functions by overriding default implementation -- example is the well known MW qqx language code that allows MW to show debug messages instead of the real i18n resolution - for that I would also need to be able to provide custom banana handler. Thus, separating <Message> from the actual code that implements react substitutions would be highly useful to me.
Fri, Oct 9
Mon, Oct 5
@bd808 thanks for the explanation. This surfaced because i was doing an rsync directly from my laptop via ssh, and since i cannot login as dibabel directly, i will need to do some extra steps. Would it be possible for take to link to some documentation in its errors about this? Thx!
Sun, Oct 4
Sep 1 2020
Should the terms page be updated, possibly with a link to this discussion?
Aug 29 2020
Just caught this while comparing sitematrix output against the list of allowed sites values. Is there a reason why these are not coming from the same source? Is there a way to get the list that wikibase is using via api?
Aug 28 2020
Aug 27 2020
Closing as it is clearly complete and works great. Thanks everyone!
Aug 26 2020
Aug 25 2020
@Candalua there is a number of bugs and features I would like to fix/add before advertising it further. For example, the service does not refresh page state too often enough, it does not show dependency status (e.g. if one module depends on another which is stale, it doesn't warn about it). Everyone is welcome to use it though - shouldn't cause any major issues :)
@abi_ thanks, done! Let me know when you can push the updates to the repo :)
@abi_ would it be possible to do a massive rename to get rid of the dibabel- prefix in all strings? I just realized that the tool ended up with two of them in the translatewiki, and since this is a fully standalone tool, there is no point in having a prefix everywhere in code either.
There is now a tool for this... https://dibabel.toolforge.org/
Technically this is done. See https://dibabel.toolforge.org/
Aug 24 2020
Thank you everyone for getting it solved! @abi_ could you make it more often than twice a week? I'm pretty sure most contributors would love to see their changes going live asap (the instant gratification is important is FOSS :) ) -- and if there are no changes, it would simply be a noop, i don't think running a bot is very taxing on the system :) Thx!
Aug 20 2020
@abi_ thanks, I reviewed them, made a few minor corrections to the wiki pages, and added translatewiki as the contributor. Pushing to master is fine. If we make some changes to the master's i18n dir (either en.json or some other one), will you automatically pick that up before pushing?
@Jack_who_built_the_house well, if GitHub has your ssh key, in theory any software that can get access to it could establish any kind of connection, including a regular session to run arbitrary commands. But if you establish a tunnel and unset the key right away, any subsequently ran software can only create socket connections, not run commands (at least that's my understanding).
@bd808 thanks for the good example. It does provide a workaround, but it also highlights some issues:
Aug 19 2020
@bd808 I am not sure if the attack surface is significant, and also I don't believe this is relevant
To automate the build & publish process, making it as simple and stable as possible. Desired workflow:
- User publishes a new release of their tool
- System automatically builds, tests, and publishes the result to the wiki
Aug 8 2020
Jul 24 2020
@Tgr forwarding an api call from a complex js client - if the response is a non-200, the client may decide to handle it somehow different (I assume non-200 responses are still returned in some form as content, but they won't be JSON).
Jul 22 2020
Sure, Vega allows you to load from more than one data source, as long as you explicitly list them all. Afterwards, you can create a new data source that joins all the other ones (i think it was available in Vega 2.0 that Wikipedia is using)
Jul 21 2020
May 20 2020
May 19 2020
May 12 2020
I think the right way to integrate spreadsheets and .tab editor is the proper copy/paste handling. It should be possible to paste a spreadsheet directly into the editor, and handle when the paste data is a table from spreadsheet, or has considerable size, or has multiple lines, or has tabs (or commas?), and treat it as a table of text, and ask the user how to deal with it: replace data or append to it. Other cases to handle would be shape mismatch (different number of columns), and type mismatch (e.g. paste data has text in the numeric column). I do not think the special cases like "replace just one column while keeping others intact" should be handled -- for that people can just copy/paste the whole table.
May 11 2020
I just tried it at https://commons.wikimedia.beta.wmflabs.org/wiki/Data:ISO15924/trans.tab -- looks great, and offers an amazing base for further features! Did you remove the "delete row" button?
@Tgr i strongly oppose storing wiki markup inside columns because it makes the system far less portable and less stable. Wiki markup only works in the context of a specific wiki, and would render either differently or simply break -- templates, localization settings, and modules are wiki specific.
May 10 2020
@AlexisJazz per my above comments -- it seems the system pretty-prints JSON, checks the size, and only then it stores it in the compact format. To make it work properly, the system should only validate json size after serializing it in compact form.
May 9 2020
@NavinoEvans I agree - feel free to take my implementation (which was already working for any CSV-style inputs), and extend/adapt it. Ideally, it should be merged upstream to the Blazegraph, so it should support any kind of CSVs. It may make sense to have either some sort of a wrapper for the tabular datasets as an extension to Blazegraph, or alternatively to extend the jsonconfig's API to be able to get CSV directly (which might be a better solution, as it would allow other, non-blazegraph usages)
May 6 2020
Good details, thx. For the localized strings (both in the data and in metadata), I think the better way would be to have just a single "global" language selector somewhere at the top of the dialog, set to the user's language by default. Changing its value would only change what is being shown, but it won't do any data modifications.
May 5 2020
@Tchanders this looks awesome!!! My understanding is that for MVP it should just allow editing of the existing tables, not change the table structure, right? Also, the multi-lingual column might be somewhat difficult to represent visually - how do you think it should be done? Thanks again!
Apr 28 2020
Most wikis do want to protect highly used templates/modules. E.g. the Module:TNT would be used by most pages - you never want to make it editable by novice users. Thus, the bot would need to have the rights to edit that page.
@Tgr I just ran it a bit more, but the issue is that the bot would need a bot flag with admin rights (not possible globally, so one would have to apply to every wiki... painful). The bot source code is in https://github.com/nyurik/dibabel
If the job queue can call external services and wait for the rendered result, it would simplify the architecture a bit, allow for easier testing, and I am all for it. This assumes job queue itself can store large data blobs, rather than just short strings.
Apr 22 2020
At this point additional annotations could only be done as extra columns. This would work for many cases, but probably not all. Could you give some examples of where columns won't be enough, and the dedicated annotation system would be required?
Apr 17 2020
@Milimetric thx for working on this! A few points that I would like further clarification on:
Apr 15 2020
One more thing: the current custom-protocol:///?someparams=url-encoded for the data sources was a work around of the older Vega limitation. In Kibana, we used a much more successful approach:
- In addition to data blobs (WDQS, data pages, API calls, etc), graphs could contain images (i.e. Commons or local wiki), and map image snapshots (generated by the maps snapshot service). See examples. If data is "prepackaged", some system would have to call all those services to assemble the needed data.
- Newer Vega allows data loading as a result of a user action, or as a result of other data loading (e.g. if datasource A returns X, get datasource B)
- MediaWiki PHP could try parse the graph spec to get all data sources, and we could say that for the preview image, data must not be dynamic, but that still leaves images -- e.g. if the data has country codes, a graph could get corresponding country flag by its name, e.g. File:Icons-flag-<countrycode>.png.
- Vega is not allowed to get any data from outside of the WMF network (uses a custom data/image loader for that).
Apr 4 2020
Feb 26 2020
@Gehel lets define this amount of data, just for clarity. My back-of-the-envelope calculations:
Feb 24 2020
hi @Jcross, can't recall what this is about, can close I guess
Feb 13 2020
Feb 11 2020
@Fae Python treats null as None when doing dict <-> json
Jan 13 2020
@Lucas_Werkmeister_WMDE thank you for all the hard work on this task! Do you have any approximate timeline of the getEntity() returning all lexeme forms, or is that already implemented? How significant of a challenge is it? I have been spending considerable time updating Lexicator bot to parse multiple Wiktionary languages, and handle multiple linguistic types, but all that work is mostly pointless until Wiktionaries can access that data.
This would be solved with T155290
Jan 4 2020
I would guess this is mostly a devops task - orchestrate execution of an updating script. Here's the working implementation - https://github.com/Sophox/sophox/blob/master/osm2rdf/updatePageViewStats.py
@Tagishsimon this proposal would not edit wikidata. Instead, as part of the WDQS import process, it would upload pageviews in bulk from the pageview dump files directly into the Blazegraph index. It could do it every hour, and computation-wise it will be relatively inexpensive (i ran it as part of Sophox a few times).
Dec 9 2019
Dec 6 2019
The fix seems sensible, thx!
Nov 25 2019
@MSantos there will be an OpenMapTiles community sync up this Thursday (10:30a ET), let me know if you would like to join in - we will be discussing how to move OMT forward, and possibly accommodate for Wikipedia needs. Email me YuriAstrakhan@gmail.com with your email addr.
Nov 18 2019
sure, sounds good, so how about this - if you create a page/ticket/... with some basic info and goals, I will add implementation details to it. Would that work?
@MSantos I am all for WMF to start using the OMT project rather than our first implementation, but I am not sure how valuable it will be to write an RFC -- so far WMF has not been too eager to support a proper map serving efforts, relying mostly on semi-volunteer efforts of different enthusiasts to keep it around. Do you think writing RFC will help in changing that? Or will it be just another dusty page on Phabricator?
Note that the openmaptiles project is rapidly improving, with the goal of generating tiles "on the fly" -- without the tile pregeneration step, and without mapnik. In other words, a vector tile (MVT) is generated by a single giant PostgreSQL query, and send to the user on request (with some caching to speed up frequently-viewed regions). Adapting this approach will greatly simplify current Wikipedia setup - no more Mapnik, no more Cassandra, easily scalable architecture (the more postgres replicas, the bigger the capacity).
P.S. And yes, OpenMapTiles is using Imposm3, together with a number of other good data sources like Natural Earth for low zooms.
Nov 10 2019
Nov 5 2019
See my above comment, and @Lucas_Werkmeister_WMDE response -- while it stores things in the compact JSON form, the length is checked while it is in the "pretty-printed" format. A way to work around it might be to upload it to the server in the compact form via API, in which case it might get accepted.
Nov 2 2019
Oct 25 2019
@dr0ptp4kt not just JS -- data sources could be far larger component to the graphs - e.g. one graph could mix together multiple data sources, including some tabular data pages (up to 2MB each), queries to Wikidata (currently broken btw -- lots of users are complaining because millions of population graphs are broken), a few images from commons, and even some mediawiki API calls. A full download could be in tens of megabytes, and some could be slow.
Oct 13 2019
Oct 7 2019
@Lucas_Werkmeister_WMDE thanks, but this is very surprising, I was 99.99% certain it was storing it pretty-printed... Either that, or it did the size limit check in the pretty-printed version before storing. Would it be possible to do a direct SQL query for that data, and also to run a MAX( LEN( data ))to see the largest page in the Data namespace on Commons? Thanks for checking!
Correct, this is the tabular data hitting the 2MB page limit. One relatively simple solution would be to fix JsonConfig base class to store data as "compact", rather than pretty-printed JSON (there shouldn't be any externally visible consequences because JSON is always reformatted before saving). That would immediately increase max storage by a significant percentage, especially for .map (geojson tends to have a lot of small arrays, so when they break up between lines and prefixed with tons of spaces, the size increases several times the original). I suspect Wikibase has had to solve a similar problem storing their items in the MW engine.
Oct 6 2019
Sep 27 2019
@Fnielsen i am not sure I understand what that query does, could you elaborate? Especially I am confused why you look at the forms -- from the perspective of Wiktionary, you request a single Lexeme, not individual forms. (btw, the query times out for me).
P.S. @Fnielsen does bring a valid point about various linked lexemes , and that might be useful -- for example if lexeme lists another lexeme as being a synonym, it would be good to show it as a word rather than an L-number.
@Lydia_Pintscher most of the Wiktionary pages have just one corresponding lexeme - and that's all I would expect to load.
Sep 23 2019
@RexxS you do bring up a valid point about watchlist. The minor difference here is that lexeme is tied to a specific language, so it is less likely to have content not relevant to that one language / wiktionary. The only exception might be the description of sensese in other languages. TBH, I am not sure that adding sense description in a non-native language is a scalable solution -- we are repeating the issue of sitelinks, where every wiki page referenced all other wiki pages on the same subject. But this is a separate discussion, unrelated to this ticket.
Sep 14 2019
Sep 13 2019
P.S. to sum up -- Wiktionary needs just a single Lua function for the minimum viable product: getEntity('L100000') that simply returns the whole Lexeme JSON. Everything else is optional.
I have imported some Russian nouns (~20,000 so far, but will be more soon), plus added links from Wiktionary's pages to the corresponding Lexemes. I think the simplest use case for Lexemes would be to allow Wiktionary Lua script to be able to load Lexeme by its ID. This will instantly make Lexemes useful to Wiktionary because the Lua script will be able to:
- generate table of the word forms
- generate etymology and pronunciation sections
- do the above for every lexeme if more than one is used on the page.
Sep 11 2019
@Anomie thx for the explanation. Several weeks ago by bot was banned for a short time because it didn't have the maxlag param. Are you saying that it was a mistake because WMF MW doesn't actually pay any attention to it? Also, would it be possible to update the documentation to indicate what the proper bot should do when running on WMF servers? Thanks!
Sep 3 2019
In theory it should be fairly straightforward to create a <graph> that outputs a single number, but that would still be an image, not text (and it might look slightly off - e.g. fuzzier or in different font)
Aug 21 2019
Thanks, closing for now, waiting for the Vega team and the students.
Aug 18 2019
@Catrope thanks for tackling it! I always thought parser cache is non-persisted, so if a page does not get any edits in 2 months, the relevant data might not be there?
Aug 16 2019
This is awesome, thank you @TheDJ and @JeanFred ! One kinda important issue -- it breaks on localized columns, e.g. Data:I18n/No_globals.tab -- CSV outputs empty values, and Excel shows English (I think).