Mon, Oct 12
Change 633560 in gerrit has this adjustment - I'm not sure how to link it to this ticket though?
Thu, Oct 8
Wed, Oct 7
From stand-up today I'll look at modifying the Vue components to allow a "view" mode as opposed to edit mode.
I think this should be dealt with by having the default value for Z2K2 be null, not '' (an empty string). Alternatively, just remove the Z2K2 from the default Persistent ZObject.
Hi Lucas - the problem is a change in the data model so that all stored objects (with a ZID, i.e. a page in the namespace) are of type Z2 - Persistent ZObject. This is assumed by the editing UI, so it was very confused to see a Z6 instead of a Z2 as the stored object. A Z6 (string) should be stored as a Z2 with key Z2K2 being the value (the string itself). So yes, probably best to just delete those two for now.
Wed, Sep 30
Just a thought here - I would guess the main difference from the generic ZObject editor for the "Create" process is that there is no page ID yet. Presumably we want to generate the id's automatically - Z3834 followed by Z3835, Z3836, etc. Is there a standard approach to do that reliably? I know Wikidata's had issues with id values for their items (Qxxx...) getting skipped, but at least there doesn't seem to have been any issue with the same ID being generated twice in create. Anyway, whether or not that specific solution is adopted it sounds like that's an extra piece that's different from how normal wiki pages work generally...
Wed, Sep 23
@santhosh there are a few places where English words are used that should be i18n messages - do you know how this should work with Vue in Mediawiki? Is there a good example out there now?
Latest patchset (19) removes the Object.entries and destructuring bit; really this wasn't needed and if we're sticking with ES5 then it should go.
Sep 23 2020
and another patchset to fix eslint complaints...
Patchset 15 fixes the issue with creating new ZObjects! However, it also unintentionally altered the package-lock.json file; I'm not quite sure what happened to change that...
Sep 22 2020
Patchset 11: call makeEmptyContent if ZObject is new
And patchset 10 - this was using an API to fetch the labels of keys in the old implementation; obviously that's not available (yet) here, and we might prefer a different solution anyway. So I stripped that out.
Re-pull from gerrit worked (I think). I've uploaded patchset 9 - basically I've abandoned the special handling for labels as too much has changed there from the old implementation, and it all sort of works (more crudely) with the OtherKeys implementation anyway. No language drop-down any more though... I would have had to completely rewrite those two components.
However, I'm thinking the longer term solution here is to write special components like those for each ZObject type, which will allow special things like the language handling. I sort of have special handling now for lists (Z10) and strings (Z6), though only because those are built in to the representation.
Hmm, thanks for fixing those things. However, now I'm not sure how to submit my further changes - I get the following message after doing a merge with your changes and then trying 'git review -R':
Sep 18 2020
And just added another patch to fix the types and labels list (temporarily - these should really come from the data itself somehow).
Not sure what's going on with the language stuff, but it sounds like it might be a local problem on my end. I'll probably reload everything at some point and see if it goes away.
Sep 17 2020
Sep 16 2020
very rough version (it displays stuff but I don't think any edit functionality is working yet) in gerrit - in progress! https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikiLambda/+/627887
For a reference on the (exceptionally large) number of ways that Wikidata is currently handling languages, see Lea's draft table here:
Sep 15 2020
This is moving along, but I'm noticing that currently there are some significant differences - in particular in language handling and I'm not sure how to proceed. The old 'abstracttext' had a list of languages that were themselves ZObjects - of type Z180 (language). This allowed having both the short code ('en') and a full label ('English', 'Anglais', etc.) in whatever the chosen current language was. I guess for now I'll just use and display the short codes and allow entry of whatever code they want rather than having a drop-down of languages, but maybe longer term we would want a different choice here?
Sep 9 2020
From the standup meeting today, I'm going to try working on this over the next week or two. However, I had a basic question on how to proceed. I'll do it on its own branch for now, so merging shouldn't be an issue right away. The question: in google/abstracttext I implemented this as a separate "new edit" button, but I think it would be better to just replace the regular "edit" page for the ZObject namespace. Any strong opinions on this? This is going to be pretty experimental to start with, so of course we can change things drastically here!
Sep 8 2020
But that value is kind of critical as an identifier within and between ZObjects - the prefix for related keys for types at least, and of course the value of any references. It means they don't make sense without the MW metadata.
Sep 7 2020
I was thinking about this - if we want a single file, why not make it a ZObject itself, i.e. a JSON formatted list (Z10) of these Persistent ZObjects? It will need some sort of script like the main MediaWiki maintenance/importTextFiles.php to run I assume...
Sep 4 2020
This sounds like protection levels? I.e. some things editable by anyone, some only by confirmed users, some only by admins, or other levels. However, you want it granular so that labels can be at a different protection level from other keys. Wikidata has desired something similar, to allow granular protection of some statements while allowing others within the same item to be editable by anyone. Anyway, would using standard Mediawiki protection procedures work for this for now?
Is this the right design page: https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Early_mockups#Create_an_object ?
I guess my question is whether the create page should be separated from the edit page (Special:CreateZObject as outlined in the approach here) or just go to a generic edit page which can do everything... ??
I notice this is (so far) being implemented as a plain form with ooui - I think this is fine, though it limits what you can initially create. But Wikidata's create item is similarly limited to just the label, description and aliases in a given language. So I would suggest this should also be limited to the Z1 keys (type, label and description in a single language), with the id (Z1K2) auto-generated. Maybe I should check out the "designs" - I'm assuming the metawiki pages? (Edit: I just realized that the current design has Z2 - persistent Zobject - replacing most of the Z1 keys, so the id and labels come from Z2 now - anyway the same point stands that the initial creation of the object maybe could only use these required keys for a persistent ZObject.)
Jul 30 2020
Jul 21 2020
Something seems to be going on very recently that's a different pattern - did something change on the infrastructure side, or is there a change in usage pattern for the last few hours? Basically maxlag (WDQS lag specifically) has NOT gone below 5 (5 minutes for WDQS) for more than 1 hour. This hasn't happened, as far as I can tell, for many days, perhaps weeks or months. Typically maxlag recovers when bots stop editing after about 20-30 minutes, sometimes it takes almost an hour, but this is the longest delay in a long time. Specifically around 2020-07-21 14:04 the lag went over 5, and as of 15:18 it's grown to over 16.
(editing) finally at 15:35 it's coming down again (down to about 12 now).
Jul 20 2020
The purpose of checking maxlag is to slow the rate of EDITS to Wikidata. I don't understand why Pywikibot is using it as a reason not to READ data. There are surely a vast number of other applications out there that read from Wikidata (and query WDQS) without checking maxlag!
Jun 23 2020
Ah, I just noted on T253334 - I don't think RemexHtml is the right solution either - Vue templates also are not really html, as they include "elements" not in the HTML standard, and parsers may not handle them correctly. I ran into this just now using wmf/1.35.0-wmf.38 which has the RemexHtml parser, where I have an html table that has some of its rows provided by another Vue component:
I don't think this problem was resolved correctly. What looks like HTML in templates is ALSO not really HTML. In particular, the current ResourceLoader does not handle <table>'s correctly when there is an internal component in the table, something like:<table><tbody> <tr><th>header...</th></tr> <internal-tr-component ...></internal-tr-component> </tbody></table>
The current parsing is pulling out the "internal-tr-component" as a separate element outside of the table. This is wrong - templates should be left alone! I think an XML parser that doesn't understand HTML at all might be best for this?
Jun 2 2020
Apr 8 2020
Thanks for creating this! I'm not sure what the standard citation reference for an external ID is, but what I've been using is:
- stated in (P248) the value of "subject item of this property" (P1629) for that external ID property, if any
- external ID property with value from the item
- retrieved (P813) on the current date.
So it would be nice if this gadget could add these three (or 2 if no P1629 value) statements as a reference with a simple interaction...
Mar 18 2020
Unassigning, I'm not working on this any more!
Wow, was that really almost 3 years ago. There doesn't seem to really be a need for this, so I'm closing the request as declined.
Feb 14 2020
Feb 12 2020
I think increase the factor will not make thing better, it only increase the oscillating period
Feb 11 2020
Possibly relevant comment here: I believe there is a plan also to move to incremental updates (updating only the statements/triples that have changed) so it is probably important that any parallelism in updating be coordinated so that updates for the same item (Q value) be grouped together and done in the same process, so they don't clobber one another. Updates for separate items (different Q values) can be handled in parallel as the associated RDF triples are independent (the subject of a triple is always the item, a statement on the item, or a further node derived from the item). Even without that incremental update process, grouping updates on the same item together could be a significant speed boost, as 5 updates for Q9999 can be collapsed into just the last update under the current procedure of completely rewriting the triples.
Feb 7 2020
Feb 4 2020
@Addshore and others - the problem has deteriorated since Saturday - see this discussion on Wikidata: https://www.wikidata.org/wiki/Wikidata:Contact_the_development_team/Query_Service_and_search#WDQS_lag
Jan 19 2020
Jan 18 2020
@Bugreporter well something must have changed early today - was it previously "mean" and is now "median"? I'm not sure which is better, but having WDQS hours out of date (we're over 4 hours now) is NOT a good situation, and what this whole task was intended to avoid! @Pintoch any thoughts on this?
Jan 17 2020
Just saw this - I'm wondering technically how you would implement it? You could generate a random number between 2.5 and 5, and if maxlag is greater than your random number deny the edit?
Am I misreading this graph? https://grafana.wikimedia.org/d/000000489/wikidata-query-service?panelId=8&fullscreen&orgId=1&from=now-12h&to=now&refresh=10s It looks like the query service lag for 3 of the servers has been growing steadily for the past roughly 8 hours. However, edits are going through. Did something change in the maxlag logic somewhere earlier today?
Dec 11 2019
Marking as resolved...
I increased the default number of retries to 12, so it will now retry for up to an hour. I think we're good here?
(A) Pintoch's patch has been applied, and (B) I also increased the retry time from 5 seconds to 5 minutes - that still means an edit will fail after 25 minutes if maxlag doesn't drop, with only 5 retries. Is there a consensus to retry for an hour? Or if there's a better standard for handling retries let me know!
Oct 22 2019
Here's a draft of slides for our workshop. Please feel free to edit this. Also I think you wanted to cover a bit more basics of how lexemes are put together - maybe that should go first? This was mostly just gathering statistics and data and then a page of questions at the end...
Oct 21 2019
I am uploading files with data on the counts of forms and senses by date for the last year (also totals in last column). There may have been some issues with this - it comes from the Lexicographical statistics pages that are generated from WDQS queries, so there were a few periods where I think the numbers were off. Anyway, it should be close to correct for most of the time period. So we can plot this along a bit of a timeline for the last year I think?
Oct 18 2019
I just did some exploring but I don't think Quarry will help with forms and senses - at least they're not "pages" in themselves with their own namespace. Actually I couldn't figure out where they were in the database schema at all... Anyway, I think I can get some rough numbers from looking at the stats page as it has changed over time, I will work on this.
Oct 11 2019
I was thinking some graphics on the growth of lexemes, forms, and senses would be good - do we already have that somewhere?
Sep 26 2019
If you go to the search page and select "Lexeme" as the only namespace you get the same error with "thanks" in the search box, but "thank" alone works fine - the two lexemes that match are L3798 (verb) and L28468 (noun).
Sep 12 2019
The Basque collection is even more complete now!
I do think some customization may be needed for Lexemes due to the different structure - the forms and senses etc. Perhaps the most useful link for a wiktionary may be from words to senses to wikidata items via the "item for this sense" property. That in principle allows translations to be provided, grouped by sense.
Aug 1 2019
I see the problem also (Safari browser). When you talk about it affecting lexemes, where do you see that? I experimented with adding a form and that seemed fine.
Feb 18 2019
Jan 28 2019
Can you add a test to the statement ID generation code that ensures it has an RDF compatible format (except for the 1 character that's a problem now), and a note that this is required for RDF support?
promise it will always be one-to-one, no matter what happens with internal IDs
Jan 26 2019
Another thought - even better would be if the API could be adjusted so it accepts the WDQS statement ID format as it is (all -'s).
Thanks for creating this ticket! Actually, my use case is the opposite of Lucas's - I want to be able to go from the results of a WDQS query to fetch the full statement via the API, which requires the statement ID. So I would like to see the id conversion documented in BOTH directions - and in particular the arbitrary regex replace listed above (preg_replace( '/[^\w-]/', '-', $statementID )) would NOT work for that purpose. Rather can we just settle that the first $ or - is switched, and that's it? Or is there something else that's an issue here?
Jan 7 2019
I didn't know about the "award token" option!
Nov 28 2018
Just a note - WDQS query gives different results hopping up and down - sometimes 3004 (for English lexeme senses) and sometimes 2872, over about the last 10 minutes.
@Smalyshev I'd forgotten there was a phabricator ticket for this - anyway, this is what I was referring to... Last night's update bumped the number down again to 2718; however when I run the query directly on WDQS I get 3004 right now. Something's not right!
Nov 27 2018
I ran a manual update and the total for English bumped up to 2819 - so it doesn't look as if we've actually lost lexeme senses, just that some of the query servers don't know about all of them?
I wouldn't be surprised if it's a WDQS problem, this is definitely generated from an RDF query.
Oct 16 2018
According to https://www.mediawiki.org/wiki/Extension:WikibaseLexeme/RDF_mapping a lexeme should be "a wikibase:Lexeme " as well as "a ontolex:LexicalEntry", but in the query service I can only find things via the latter relation. Similarly for forms and "wikibase:Form". Something left out of the dump?
Jun 29 2018
WDQS works for me! I'm not sure where that is of course - I guess I could check Phabricator!
Jun 19 2018
Does "alphabetical" ordering even make sense for words in a collection of vastly different writing systems? If this is done I would recommend it be accompanied by some filtering - for language, part of speech, grammatical features, certain properties perhaps.
Jun 1 2018
I am in general favorable to Micru's proposal, and perhaps Pamputt's elaboration of it above: using wikidata items directly allows representation of the lemma language naturally in the user's own script/language for one, and other automatic bonuses of using items given the structured data ethos etc.. However I'm a little confused about the details of how this would work - specifically, the most commonly used lexemes would usually have the same spelling, use etc. across all variants of a language; do we give that a more general language ("en" = Q1860 say) and only use the specific items mentioned ("en-US" = Q7976, "en-GB" = Q7979, "en-CA" = Q44676, etc.) where there really are variations? Or would it be possible to attach multiple language items to a single lexeme, to indicate it applies to several specific variants?
May 29 2018
Here's a specific question that might be detailed enough in description: suppose we have a collection of facts (say the names, countries, inception dates, and official websites for a collection of organizations) that has been extracted from multiple sources, including various language wikipedias, a CC-0 data source (for example https://grid.ac/) and a non-CC-0 non-wikipedia data source - these sources would be indicated in wikidata by the reference/source section on each statement. This extraction has been done by users either manually or running bots with the understanding that they are adding facts to a CC-0 database (wikidata). Reconciling the facts - for example merging duplicates with slightly different names, dates, or URL's - has been done by users manually or semi-automatically, again with the understanding they are contributing to a CC-0 database. Are there any copyright or other rights constraints that apply to this collection, or can it be fully considered to legally be CC-0?
Hmm, I'm not sure this is all that useful at least as it stands. Most external id's can be as easily found now via the Wikidata Resolver tool - https://tools.wmflabs.org/wikidata-todo/resolver.php - However, what I would find useful would be a way to locate for example partial street addresses - this (P969) is often entered as a qualifier on headquarters location (P159). Searching for' haswbstatement:P969=Main' now finds something, but only because that oddly has just 'Main' as the value for P969, and making the string lowercase ("main") finds nothing, which is definitely not what I would expect on this... I don't think treating string values as if they were identifiers is the right approach, the usefulness of a search engine is in normalizing string values so you can find them without having the exact matching string. And qualifiers should be folded in somehow!
May 28 2018
Hi - my most recent response was following MisterSynergy's comment on Denny's proposed questions, and specifically the meaning of "processes that in bulk extract facts from Wikipedia articles," - it sounds like from subsequent discussion that we are not talking solely of automated "processes", so I think I echo MisterSynergy's comment that the question needs to be better defined to "describe how these processes look like". On the one hand there's overall averages, with less than one "fact" per wikipedia article; on the other hand the distribution is probably quite wide, with some articles having dozens of "facts" extracted from them. Since CC-BY-SA applies to each article individually, does extraction of too much factual data from one article potentially violate its copyright?
May 26 2018
based on the fact that we have ~42M “imported from” references and ~64M sitelinks in Wikidata
May 25 2018
Some references on why CC0 is essential for a free public database:
"Databases may contain facts that, in and of themselves, are not protected by copyright law. However, the copyright laws of many jurisdictions cover creatively selected or arranged compilations of facts and creative database design and structure, and some jurisdictions like those in the European Union have enacted additional sui generis laws that restrict uses of databases without regard for applicable copyright law. CC0 is intended to cover all copyright and database rights, so that however data and databases are restricted (under copyright or otherwise), those rights are all surrendered"
May 23 2018
FYI I agree with VIGNERON on what it should look like - but at least something more than the id!
May 22 2018
It has been asserted here several times that OSM data has been wholesale imported into Wikidata - do we know that has happened? Wikidata has two properties related to OSM, one that relates wikidata items to OSM tags like "lighthouse", and one that is essentially deprecated (see T145284), so I assume those are not the issue. According to https://www.wikidata.org/wiki/Wikidata:OpenStreetMap (text which has been there since at least last September) "it is not possible to import coordinates from OpenStreetMap to Wikidata". If the issue is coordinates imported via wikipedia infoboxes that originated with OSM, I can see there might be an issue there, and maybe that should be added to Denny's suggested question in some fashion. But as far as actual importing of OSM data, the only specific cases that I noticed explicitly cited above are (A) a bot request that has been rejected, and (B) a discussion from 2013 where the copyright issue was explicitly raised right away.
Oct 11 2017
Jul 21 2017
Of course, now these examples I gave are working - probably because I updated them recently. However, I found more that are not now, or only partially - for example Q2256713:
Jul 19 2017
Jul 14 2017
I don't understand why Multichill can unilaterally alter the priority on this request in the face of an active wikidata RFC where the voting has been 2:1 in support of this change. It would also be nice to get some actual feedback from developers - is this really "against the core data model of Wikdiata"? I don't see it - particularly as the workarounds in place now prove it can be easily supported.
Jul 13 2017
Thanks! I did search through the open tasks first and didn't find anything on this....
Jun 6 2017
The dummy user solution sounds good to me. Magnus Manske is doing something like this with his QuickStatementsBot so maybe a special purpose Bot account on wikidata for this?
Mar 23 2017
I believe a way this could be done would be to allow the attachment of regular expressions to the formatter URL, and have the external id URL conversion code understand them. That is, if there was a qualifier property that specified "regex substitution" for example, the ISNI problem (of additional spaces within the id that must be removed for the formatter URL) would be handled by a value something like "s/\s+//g" (remove all spaces). Some of the others might need a "regex match" on the id that allows specifying a $1, $2, $3 grouping pattern, and the formatter URL then looks something like http://...../$1/$2/$3 (or that could also possibly be handled by a substitution as in the ISNI case). The IMDB case is more difficult because it's essential 4 different formatter URLs based on the first characters of the id, so it might need a "regex filter" that limits the scope of each formatter URL based on the id; wikibase would then need to look through the filter regexes to find a matching formatter URL and use that.
Mar 22 2017
As background, I'm seeing about 2000 "hits" per day on this service right now, with about a dozen properties linking through it to their databases.
Mar 21 2017
Hmm, Ok, I read through the discussion you linked with @coren - I certainly see there can be a privacy violation regarding expectations in cases as were discussed there. I think this is a quite different case though (for example, the links are exclusively to third-party sites, not anything I or any other WMF person controls) and would like to hear directly from somebody with WMF (and some voices from wikidata) on this. If there is a clearly posted policy somewhere that would be great too. The policy linked by @coren focused on the Labs user collecting personal information, which is not at all happening here, and said nothing specifically about redirects per se.
(claiming task - if this really needs to be done I can certainly take care of it)
Hmm, I think the big issue may be point 3. Do you have an example where this might have come up? I could certainly make it an interstitial easily enough, but that makes these links a bit less convenient for people (extra click); if the links are being included with or without a warning elsewhere based on the wmflabs URL then I can see how it may be important to address this somehow. Also is there boilerplate text we should use if we really do need to put this in?