Mon, Nov 13
Thu, Nov 9
Tue, Nov 7
Sat, Nov 4
Fri, Nov 3
@Smalyshev I suspect it will be relatively easy to do with the standard API - and if so, why not reuse the existing functionality? POST is a very small price to pay for this (think how often this feature is used - not worth creating a special parser just to avoid a few CPU cycles)
@Slaporte thanks for a thorough post. WRT attribution, graphs can already have attributions - they just need to be done by the developer of the graph template. Would a social contract (e.g. "all graphs that use external data must include licensing terms of that data") be enough, or is it a requirement to have a technical mean to enforce this? So far the wiki movement mostly relied on the social contracts for the rule enforcement, so I am a bit reluctant to introduce a complex system to automatically add licensing terms when it is easy enough for the template/graph authors to include that at the bottom of their template, while having full control of the placement and styling of that text. If an author forgets to add that, another editor can easily modify that template to fix the issue.
Thu, Nov 2
@Lucas_Werkmeister_WMDE I don't think it needs parsoid - OSM wiki doesn't have it as far as I can see, and this approach works there. @Addshore correct, this approach does require visual editor extension. I wonder if it would be possible to use action=parse instead.
Wed, Nov 1
@Zache, but in that case the historical boundary is not coming from OSM (OSM doesn't support it AFAIK), so why would OSM license matter?
Tue, Oct 31
@Zache, you can already use some of the data directly from OSM, without first copying it from OSM to the data namespace. See https://www.mediawiki.org/wiki/Help:Extension:Kartographer#External_data
@Looniverse I am sure Wikipedia wants to store the data, it just doesn't have good means to work with it at this point. The .map pages are meant to be used as a "chunk" - all of the data is used at once. The larger datasets, like OSM, are meant to be pre-processed when displayed, e.g. broken up into tiles and simplified for the zoom level. I would recommend simplifying the geometry to make it storable in .map.
I have already implemented this as part of my own override:
Mon, Oct 30
Tue, Oct 24
I also disagree :) The real monitoring should not look at the process running at all. It should only look at the last timestamp - see how far behind WDQS is. If it gets behind further than X, send the alert - and that would be a very stable indicator that something is wrong - no matter if its the process that hung, or crashed, or simply cannot cope with the amount of data. On the other hand, the updater service itself should be resiliant to any kinds of problems - if there is an intermittent problem like a temporary DNS is down (like I had), the service will continue trying, and will self-recover the moment network is back up. This is the same logic as in any router or replication service - they always keeps trying until succeeding.
@Smalyshev I don't think updater should ever crash - regardless of the error. Even if the network goes down, it should go into a retry mode.
Oct 23 2017
@daniel, I think we shouldn't use FauxRequest objects at all (for the reasons outlined elsewhere, such as no type safety, etc), but I do believe the existing API is better suited as a common entry point for all business logic layer - as outlined in BL architecture on budget, instead of using hundreds of objects each of which could go all the way to the database. So ideally we should partition existing API into the internal API and an extremely thin, no logic layer to convert Request into it.
Oct 20 2017
@hashar, this is not exactly a duplicate. It's fine to stay with npm, and allowing either tool to be used by the dev. I would like an option for just some of the projects to specify that a specific tool must be used due to limitations of the other
Oct 18 2017
@Evad37 you might be interested in looking at the TNT module -- its available on many wikis already. It allows you to keep the description for all template parameter documentation in a single .tab page. This way, when you copy the template to another language, and you later add a new parameter, template users won't need to update their doc pages - it will show for them right away in english. later they can click the "edit translations" button underneath, and fix it for each new parameter.
Oct 17 2017
@Gehel, not exactly. The new wiki header field would apply to all data stores, both .tab & .map, because it should be implemented in its base class (they share one parent). It would use the current main page parser - thus parsing in the context of the whole page, rather than create a new parser instance and discarding the "side-effects" - such as categories, link tracking, etc. The reason I mentioned the .map title & description fields is because they use a very similar approach, thus showing that it is doable. They just use a new parser instance IIRC, without tracking things.
P.S. I fixed the template
@Evad37 your geo position is wrong. First is the longitude, second is the latitude. Inside the geometry.
Oct 16 2017
BTW, IIRC, this fix would actually be just a few lines of code.
Clarification - I think (need to check in the code), the title and description use "limited" wiki syntax, similar to what is used in the edit comments. A full wiki markup parsing would be needed to track categories, etc.
Oct 14 2017
Oct 13 2017
@Pnorman actually we already use Wiki markup in these pages - the .map pages treat title and description fields as wiki markup. I haven't heard of any problems -- the values are being sanitized by the regular MW parser, and gets consumed by mapframe/maplink/lua code. Adding this field wouldn't be much of a challenge from the tech perspective.
Oct 12 2017
Oct 10 2017
Oct 9 2017
Well, according to Apache's own site we can - because anything submitted under Apache license is usable under GPL. Its like if someone submitted a patch under MIT, and I build a GPL software, i can simply copy paste MIT code without asking for permission. But legal may know better about this topic. BTW, I suspect this file was added when adding Vega Graphs capability about a year ago.
Sep 28 2017
I have been actively using versions in all of the submodules. In theory, every commit changing the version number should be a tag, but i haven't used.
I saw the issue again, and filed a bug with the stacktraces - https://jira.blazegraph.com/browse/BLZG-9058
Sep 26 2017
Sep 25 2017
I think wikipedias have high enough traffic to always use the static service. Wikivoyage and private wikis should never use it. For all others, i think either way is fine.
I think only wikivoyage and private wikis were set up that way. Private wikis would have security issues and are much harder to set up for the service, so it was not worth the trouble.
I don't think this would be a very good idea for the Wikivoyage - their usage is highly map-oriented, so adding a static page with an additional "click to load" might not be palatable to the community.
Sep 24 2017
Sep 21 2017
I think both of these are from the service template. CC @services group.
Sep 20 2017
@Gehel I have been running it for a day with 16GB space, and got this gc report - apparently sys time is usually slow. Thread dump report. Any thoughts?
P.S. I tried running it in a smaller machine (8gb i think), and i got tons of slowdowns in update scripts.
@Lydia_Pintscher I found that I can specify the list of fallbacks, but can I specify a list + "anything", which doesn't even have to be deterministic? Without it, one would have to generate a full list of all sites with every link - just like we currently have in the WDQS examples page. You don't want that :)
This is awesome, sorry I didn't know about it! Is fallback documented anywhere? (i did try it, and it does work with comma-separated site values)
Sep 19 2017
So it seems the sources & variables file specified in the /etc/tilerator/config.yaml has incorrectly specifying the username/password, most likely for postgres db. I remember @Gehel was doing some cleanup to get various test and prod boxes in sync for that - double check with him.
@Pnorman which sources config file are you using?
I think all 4 are good, but I would like @MaxSem to sign off on it too :)
@Gehel, they are unused at this point - I used data and vem to test some data stuff. I think we can delete them at this point. If I need to, I will recreate a test instance.
Should we also show this warning for embed? I have raised the timeout to 3 minutes, so it would be bad user experience to wait for the query, even though the reason it takes so long might be due to server hanging. Something like lastModified is a very quick query, so if that fails too, it would be a good indicator of the server being down.
Agree, it should not prevent editing at all. I think you meant T=10s, N=3. And yes, the message should be configurable for each site.
Sep 17 2017
@Smalyshev, I force-killed it, and it dumped this (I couldn't copy all of it - just some parts):
@Smalyshev & @Gehel I'm not sure if this is the same error or different. Today, the service froze in a peculiar way: all queries would time out (both from clients and update ones), and Blazegraph wouldn't quit with Ctrl+C. HTOP shows a single blazegraph process using 100% of a single CPU, but about once in a few minutes, almost all CPUs would jump to a 100% for about 5-10 seconds, and then go back to 0 except for a single process. The last errors in the log, might be unrelated:
Sep 15 2017
Sep 7 2017
Sep 6 2017
On the same topic, the language code for schema:inLanguage "en" has the same issue - there are also 60 million of them, with about 10 mil being English - so that's another 0.5 GB, plus some unknown perf benefit. I wonder if it would make sense to pre-declare only top 10 languages and top 10 wikis - I suspect it would almost as beneficial, without having to maintain ever changing list.
There are 60million of isPartOf statements, and I assume that all of them have their objects as the root of a wiki. The space saving would be 9-2 bytes = ~0.5 GB, so not very significant, but it also eliminates a lookup for each value. I wonder how we can measure the performance benefit. I couldn't run the count(distinct ?obj)query due to timeout.
I would like to solicit more community feedback on how useful this would be. Perhaps this is not needed at all, or not worth the hassle As an already working example on a test server, here is a query that lists Wikidata items without French labels but with French articles, ordered by the popularity of the French articles.
Sep 5 2017
@Mike_Peel <mapframe> is currently not enabled on enwp. You need to create a new phabricator ticket to request it, and gather community consesus that it is needed - WMF have been enabling it on per-request basis.
@Lydia_Pintscher, having a built in ranking system is awesome, but that's a problem of search optimization - just like the other ticket suggests, it will be a part of the search drop-down.