Mon, Jan 13
@Lucas_Werkmeister_WMDE thank you for all the hard work on this task! Do you have any approximate timeline of the getEntity() returning all lexeme forms, or is that already implemented? How significant of a challenge is it? I have been spending considerable time updating Lexicator bot to parse multiple Wiktionary languages, and handle multiple linguistic types, but all that work is mostly pointless until Wiktionaries can access that data.
This would be solved with T155290
Sat, Jan 4
I would guess this is mostly a devops task - orchestrate execution of an updating script. Here's the working implementation - https://github.com/Sophox/sophox/blob/master/osm2rdf/updatePageViewStats.py
@Tagishsimon this proposal would not edit wikidata. Instead, as part of the WDQS import process, it would upload pageviews in bulk from the pageview dump files directly into the Blazegraph index. It could do it every hour, and computation-wise it will be relatively inexpensive (i ran it as part of Sophox a few times).
Dec 9 2019
Dec 6 2019
The fix seems sensible, thx!
Nov 25 2019
@MSantos there will be an OpenMapTiles community sync up this Thursday (10:30a ET), let me know if you would like to join in - we will be discussing how to move OMT forward, and possibly accommodate for Wikipedia needs. Email me YuriAstrakhan@gmail.com with your email addr.
Nov 18 2019
sure, sounds good, so how about this - if you create a page/ticket/... with some basic info and goals, I will add implementation details to it. Would that work?
@MSantos I am all for WMF to start using the OMT project rather than our first implementation, but I am not sure how valuable it will be to write an RFC -- so far WMF has not been too eager to support a proper map serving efforts, relying mostly on semi-volunteer efforts of different enthusiasts to keep it around. Do you think writing RFC will help in changing that? Or will it be just another dusty page on Phabricator?
Note that the openmaptiles project is rapidly improving, with the goal of generating tiles "on the fly" -- without the tile pregeneration step, and without mapnik. In other words, a vector tile (MVT) is generated by a single giant PostgreSQL query, and send to the user on request (with some caching to speed up frequently-viewed regions). Adapting this approach will greatly simplify current Wikipedia setup - no more Mapnik, no more Cassandra, easily scalable architecture (the more postgres replicas, the bigger the capacity).
P.S. And yes, OpenMapTiles is using Imposm3, together with a number of other good data sources like Natural Earth for low zooms.
Nov 10 2019
Nov 5 2019
See my above comment, and @Lucas_Werkmeister_WMDE response -- while it stores things in the compact JSON form, the length is checked while it is in the "pretty-printed" format. A way to work around it might be to upload it to the server in the compact form via API, in which case it might get accepted.
Nov 2 2019
Oct 25 2019
@dr0ptp4kt not just JS -- data sources could be far larger component to the graphs - e.g. one graph could mix together multiple data sources, including some tabular data pages (up to 2MB each), queries to Wikidata (currently broken btw -- lots of users are complaining because millions of population graphs are broken), a few images from commons, and even some mediawiki API calls. A full download could be in tens of megabytes, and some could be slow.
Oct 13 2019
Oct 7 2019
@Lucas_Werkmeister_WMDE thanks, but this is very surprising, I was 99.99% certain it was storing it pretty-printed... Either that, or it did the size limit check in the pretty-printed version before storing. Would it be possible to do a direct SQL query for that data, and also to run a MAX( LEN( data ))to see the largest page in the Data namespace on Commons? Thanks for checking!
Correct, this is the tabular data hitting the 2MB page limit. One relatively simple solution would be to fix JsonConfig base class to store data as "compact", rather than pretty-printed JSON (there shouldn't be any externally visible consequences because JSON is always reformatted before saving). That would immediately increase max storage by a significant percentage, especially for .map (geojson tends to have a lot of small arrays, so when they break up between lines and prefixed with tons of spaces, the size increases several times the original). I suspect Wikibase has had to solve a similar problem storing their items in the MW engine.
Oct 6 2019
Sep 27 2019
@Fnielsen i am not sure I understand what that query does, could you elaborate? Especially I am confused why you look at the forms -- from the perspective of Wiktionary, you request a single Lexeme, not individual forms. (btw, the query times out for me).
P.S. @Fnielsen does bring a valid point about various linked lexemes , and that might be useful -- for example if lexeme lists another lexeme as being a synonym, it would be good to show it as a word rather than an L-number.
@Lydia_Pintscher most of the Wiktionary pages have just one corresponding lexeme - and that's all I would expect to load.
Sep 23 2019
@RexxS you do bring up a valid point about watchlist. The minor difference here is that lexeme is tied to a specific language, so it is less likely to have content not relevant to that one language / wiktionary. The only exception might be the description of sensese in other languages. TBH, I am not sure that adding sense description in a non-native language is a scalable solution -- we are repeating the issue of sitelinks, where every wiki page referenced all other wiki pages on the same subject. But this is a separate discussion, unrelated to this ticket.
Sep 14 2019
Sep 13 2019
P.S. to sum up -- Wiktionary needs just a single Lua function for the minimum viable product: getEntity('L100000') that simply returns the whole Lexeme JSON. Everything else is optional.
I have imported some Russian nouns (~20,000 so far, but will be more soon), plus added links from Wiktionary's pages to the corresponding Lexemes. I think the simplest use case for Lexemes would be to allow Wiktionary Lua script to be able to load Lexeme by its ID. This will instantly make Lexemes useful to Wiktionary because the Lua script will be able to:
- generate table of the word forms
- generate etymology and pronunciation sections
- do the above for every lexeme if more than one is used on the page.
Sep 11 2019
@Anomie thx for the explanation. Several weeks ago by bot was banned for a short time because it didn't have the maxlag param. Are you saying that it was a mistake because WMF MW doesn't actually pay any attention to it? Also, would it be possible to update the documentation to indicate what the proper bot should do when running on WMF servers? Thanks!
Sep 3 2019
In theory it should be fairly straightforward to create a <graph> that outputs a single number, but that would still be an image, not text (and it might look slightly off - e.g. fuzzier or in different font)
Aug 21 2019
Thanks, closing for now, waiting for the Vega team and the students.
Aug 18 2019
@Catrope thanks for tackling it! I always thought parser cache is non-persisted, so if a page does not get any edits in 2 months, the relevant data might not be there?
Aug 16 2019
This is awesome, thank you @TheDJ and @JeanFred ! One kinda important issue -- it breaks on localized columns, e.g. Data:I18n/No_globals.tab -- CSV outputs empty values, and Excel shows English (I think).
Aug 7 2019
Aug 4 2019
Recently someone also raised this as an issue with graphs. Has anything changed to destabilize mw.ext.data.get() method?
Aug 1 2019
Vega1 doesn't need it - it doesn't support external URLs. Vega2 approach sounds correct. Thx for digging into it!
@Pchelolo Graphoid first calls the action=graph to get the data, but then it should also call to the WDQS directly using that query. Also, you can see what that request looks like if you go to the wiki page with a graph, click edit source, and do a page preview -- your browser should make very similar request to WDQS, except that unlike Graphoid, browser forces a few headers like user agent (IIRC)
Jul 29 2019
@Lea_Lacroix_WMDE not from my side - I'm a bit overbooked at the moment with my main job (elastic.co) and family. It will take me some effort to get the system running again on my laptop to see what Graphoid sends to the servers. It might be easier to track it from the server side logs if anyone has that access.
Jul 24 2019
vegaErr: Error: Load failed with response code 403. -- Vega attempts to call Wikidata API to get the needed data, and I suspect that API returns 403. I would look at the HTTP request Vega makes (it should be very similar to the query stored in the graph on the wiki page), and try to find it in WDQS logs. Perhaps WDQS now blocks some HTTP requests that do not appear to originate from the browser (i.e. have fewer headers than expected)?
Jul 15 2019
Rest in Peace Zero, it was fun... 😢
Jul 2 2019
Thanks, it makes sense. I was only suggesting to search the logs for the same query as being ran from the in-browser's page preview.
@Smalyshev the query should be identical as being issues from the browser when you do a "page preview" with a graph that uses WDQS query. Graphoid does exactly the same steps as the browser - essentially making all the requests and putting together a resulting image.
@Smalyshev i would love to help, but it is a bit hard to say without access to the logs or the servers
Jun 25 2019
Jun 19 2019
Jun 15 2019
@stjn I don't think there is any reliable way to compute the image height/width ahead of time without going through the full graph rendering. PHP could call out to Graphoid during the post-save, but that may require even more work than simply updating to Vega 3+ and implementing the above logic (defaulting to autosize=fit model).
@stjn this is a fairly complex issue. I have done a lot of work with the Vega lib since then, integrating it as part of Kibana (Elasticsearch), so I can elaborate on a "better" way to address it. I will try to be brief :) Also, take a look here on how I did it in Kibana.
Jun 11 2019
sounds good, thanks @sbassett !
@sbassett sorry, unsure what you mean. Some code has already been completed -- https://github.com/nyurik/mw-graph-shared/commit/b97fd309897f701bd0db6b1d60635d0786d84887 -- making it possible to integrate the future v3+. The next step would be to create a patch for graphoid & graph ext using that shared code.
May 24 2019
May 21 2019
May 17 2019
May 12 2019
@Bawolff I was re-reading our notes in light of the resumed interest in this project. Assuming we are not yet rolling out the sandbox approach per our discussion, I have began looking closely at the issues you raised:
@Xiaoyanghaitao if you want to target Vega 5.4.0 initially, you might as well put that version in the subject (or you could just say "latest production version")
May 9 2019
May 8 2019
Agree this would be great to have. I think this should be an API-level rather than an extension-level feature, because in many cases an extension would need to bridge the communication between inside and outside the frame. The system should allow:
- set up a separate initial configuration blob
- set up a separate resource loading for the iframe
- designate some resources as only available via the iframe loading (to restrict accidental loading by the primary site)
- establish a clear usage pattern for extensions to follow (this is more of a documentation than coding)
Do we want a separate ticket for Vega-Lite? Vega-Lite can be thought of as an "add-on" converter library that simply converts one JSON into a different JSON, without any other functionality (e.g. no XHR calls, no UI, etc). This way users can use a much simpler language VegaLite, and it will be dynamically converted to a full Vega.
@Bawolff is there anything in this ticket that is sensitive? There is a discussion about GSOC student tackling the Vega 3 upgrade, and they would need access to this discussion.
@Milimetric has any team at WMF decided to adapt it yet? From what I was told, there is already a GSOC student about to get started on this, and we could ask them to do some restructuring work as well. Which restructuring steps did you have in mind? BTW, it would be great if you could co-mentor this person as well (we already have @domoritz and myself signed up as mentors)
May 4 2019
Apr 19 2019
Apr 12 2019
A bot has been implemented and documented for this functionality. Needs a whole bunch of bot approvals, or a global bot flag. For now running it by hand for a few pages. Anyone interested in this functionality, we can now start building cross-site templates and lua modules...
Apr 10 2019
@Milimetric I think the removal of Graphoid will be far more difficult than just keeping it. If you remove it, you will have to support multiple Vega versions on the client - not as trivial to do as it is for the Graphoid service (which was built with that support).
It's not about "allowing", it's just a matter of me (or someone else) to sit down and implement it :) Afterwards, I will have to document it in detail, and apply for a bot flag for my YurikBot (it has been dormant for a while now), preferably to be allowed on all wikis at once.
Apr 9 2019
Another, not yet mentioned consideration: There was a significant syntax change between Vega 1.5, Vega 2, and 3+. Graph extension & Graphoid support both 1.5 and 2, but Graph ext does it in an imperfect way: browser only loads the latest version required, so if a page has both 1.5 and 2, it would just load 2. This only affects dynamic graphs (v2 only), but it really breaks during page preview -- only v2 graphs are previewed if both are present. When used as images from Graphoid, there are no issues (regular page viewing).
@mobrovac @dr0ptp4kt there is a much larger issue -- performance, and this is identical to maps: when a page loads, do you want to show a map / graph right away, e.g. if it is at the top, or do you want to load a large library, load all the data for it, and only then show the page? Or to show an empty box that eventually gets loaded? Live map requires leaflet, tile images, could get data from commons, could run sparql query (potentially overloading the server), etc. The exact same issues are for the graph - load Vega lib, load data, run sparql qeuries, etc. So if map needs it, graph needs it, and if servers can take both - sure. I would much rather have interactive content from the start.
Apr 8 2019
@Anomie you are right that documentation does not say it will handle the entire ISO 8601, but I do think MW should handle at least a very common 2019-04-08T04:12:45+00:00-style format (only +00:00 / -00:00, not other timezones). It is, for better or worse, fairly common with Python, and supporting it shouldn't be a significant undertaking.
Apr 5 2019
Mar 21 2019
Mar 20 2019
Mar 16 2019
Here's a working implementation using the browser's prompt text box, based on the original idea by @Ricordisamoa. Copy it into your https://wiki.openstreetmap.org/wiki/User:___username___/common.js page. Note that the save will happen even if the user clicks Cancel because there is no good way to abort saving without crashing. This code can be directly used as a gadget.
Mar 12 2019
Mar 8 2019
This feature has been requested several times when adding Wikibase to OpenStreetMap wiki. I think the very first step we can already do is to make it possible for gadgets to add edit summary string during the "save" command. This way we can experiment with the UI implementation.
Mar 3 2019
@A2093064 I think a much more intuitive behavior would be to use the "root" page, e.g. MediaWiki:talkpageheader as a fallback when a language-specific page does not exist, and use /en to override just English -- thus treating /en the same as any other language. Yes, it is possible to run a bot to generate all such pages, but ... ew :)
@A2093064 The talkpageheader message is set to a "-" by default - see code. This seems like a very strange behavior, especially because it makes this message useless for non-WP installations, or for any multi-lingual sites like Commons.
Mar 1 2019
Feb 28 2019
Feb 21 2019
Note that Graphoid already supports POST approach (see v2 Graphoid api). You can post a graph spec to it, and it will return an image. But v2 was never actually used by any services, even though it is in production, just not exposed to end users. The issue with v2 is that we don't want to make MW parser (HTML rendering) dependent on an external service because of potential (unverified) speed concern - we need to generate some graph image URL without waiting for Graphoid to actually generate that image.
Feb 13 2019
@Umherirrender that's an interesting idea, thx. It is not exactly the same because this wouldn't get me the redirect to redirect - e.g. A->B->C would only get me B->C without A->C.
Feb 11 2019
@Pchelolo @Gehel I agree that Node 8 upgrade is urgent and must happen. This ticket was discussing Node 10, and mentioned that it was blocked on some dependencies, so I was proposing how to help with the difficulties of dependency upgrades, and k8s might be a good solution to that. I will be happy to chat on IRC. Thanks!!!
@Gehel is there a doc/procedure describing limits to docker deployment? E.g. would WMF have a docker repository and a well established build process that uploads to it, does it limit to the base images like "must use base Debian but not base Alpine", etc.
@MSantos and @Jhernandez , would it perhaps make things far simpler to migrate to use Kubernetes/Docker? Each maps service will be able to use any version of Node, or any native package, or even a different version/distribution of Linux as needed, and it will not affect any other services even if they are running on the same physical hardware. Even migration will be easier, because you will be able to use different version of Node on the same machine by different instances of Tilerator/Kartotherian.