Page MenuHomePhabricator

mkroetzsch (Markus Krötzsch)
Researcher at TU Dresden; external advisor to Wikidata

Projects

User does not belong to any projects.

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Oct 27 2014, 10:12 AM (490 w, 16 h)
Availability
Available
LDAP User
Markus Kroetzsch
MediaWiki User
Markus Krötzsch [ Global Accounts ]

See my private homepage for details about myself.

Recent Activity

Oct 2 2023

mkroetzsch added a comment to T270764: Wikidata Truthy dump is missing important metadata triples.

@Lydia_Pintscher Are you asking about the discrepancy in the counts, or about the general idea of this issue report?

Oct 2 2023, 3:04 PM · Wikidata

Feb 8 2020

mkroetzsch added a comment to T244341: Stop using blank nodes for encoding SomeValue and OWL constraints in WDQS.

Please don't think or refer to the blank nodes as "unknown values".

Feb 8 2020, 10:09 PM · Community-consensus-needed, Wikidata-Query-Service, Wikidata

Feb 7 2020

mkroetzsch added a comment to T244341: Stop using blank nodes for encoding SomeValue and OWL constraints in WDQS.

Using the same value for "unknown" is a very bad idea and should not be considered. You already found out why. This highlights another general design principle: the RDF data should encode meaning in structure in a direct way. If two triples have the same RDF term as object, then they should represent relationships to the same thing, without any further conditions on the shape of that term. Otherwise, SPARQL does not work well. For example, the property paths you can write with * have no way of performing extra tests on the nodes you traverse, so the meaning of a chain must not be influenced by the shape of the terms on a property chain, if you want to use * in queries in a meaningful way.

Feb 7 2020, 3:25 PM · Community-consensus-needed, Wikidata-Query-Service, Wikidata

Feb 24 2019

mkroetzsch added a comment to T216842: Specify license of mediawiki/Wikibase/WikibaseLexeme ontology.

CC0 seems to be fine. Using the same license as for the rest seems to be the easiest choice for everybody.

Feb 24 2019, 9:58 AM · MW-1.33-notes (1.33.0-wmf.21; 2019-03-12), Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)), Wikidata

Oct 17 2018

mkroetzsch added a comment to T112127: [Story] Move RDF ontology from beta to release status.

Well, for classes and properties, one would use owl:equivalentClass and owl:equivalentProperty rather than sameAs to encode this point. But I agree that this will hardly be considered by any consumer.

Oct 17 2018, 12:22 PM · MW-1.33-notes (1.33.0-wmf.4; 2018-11-13), Wikidata-Campsite, User-Smalyshev, [DEPRECATED] wdwb-tech, Patch-For-Review, Story, Wikidata-Query-Service, Wikidata, Discovery-ARCHIVED

Sep 14 2018

mkroetzsch added a comment to T200822: Remove webrequest misc analytics related jobs and code after cache misc -> text merge is complete.

Yes, it is fine to stop the extraction for now. Many thanks!

Sep 14 2018, 2:19 PM · Analytics-Kanban, Patch-For-Review, Analytics

Jul 2 2018

mkroetzsch added a comment to T190875: Security review for Wikidata queries data release proposal.

This is good news -- thanks for the careful review! The lack of specific threat models for this data was also a challenge for us, for similar reasons, but it is also a good sign that many years after the first SPARQL data releases, there is still no realistic danger to user anonymity known. The footer is still a good idea for general community awareness. People who do have concerns about their anonymity could be encouraged to come forward with scenarios that we should take into account.

Jul 2 2018, 2:28 PM · secscrum, Application Security Reviews, Privacy, User-Smalyshev, Wikidata, Research

Apr 3 2018

mkroetzsch added a comment to T190875: Security review for Wikidata queries data release proposal.

The code is here: https://github.com/Wikidata/QueryAnalysis
It was not written for general re-use, so it might be a bit messy in places. The code includes the public Wikidata example queries as test data that can be used without accessing any confidential information.

Apr 3 2018, 7:37 AM · secscrum, Application Security Reviews, Privacy, User-Smalyshev, Wikidata, Research

Dec 16 2017

mkroetzsch added a comment to T183020: Investigate the possibility to release Wikidata queries.

I agree with Stas: regular data releases are desirable, but need further thought. The task is easier for our current case since we already know what is in the data. For a regular process, one has to be very careful to monitor potential future issues. By releasing historic data, we avoid exploits that could be theoretically possible based on detailed knowledge of the methodology.

Dec 16 2017, 10:28 PM · Data-release, User-Smalyshev, Wikidata, Research

Sep 10 2016

mkroetzsch added a comment to T143819: Data request for logs from SparQL interface at query.wikidata.org.

@AndrewSu As I just replied to Benjamin Good in this matter, it is a bit too early for this, since we only have the basic technical access as of very recently. We have not had a chance to extract any community shareable data sets yet, and it is clear that it will require some time to get clearance for such data even after we believe it is ready.

Sep 10 2016, 6:47 AM · Analytics-Radar, Discovery-ARCHIVED, Wikidata-Query-Service, Wikidata

Aug 24 2016

mkroetzsch added a comment to T142780: Request access to data for WDQS research.

Regarding my remaining todos:

  • I have signed the L3 doc
  • Here is my dedicated production-only SSH key:
  • My preferred login name is "mkroetzsch", same as my user name on labs. My wikitech user name is "Markus Kroetzsch".
Aug 24 2016, 7:40 AM · Patch-For-Review, SRE, SRE-Access-Requests, Research-collaborations, Research-management, Research

Aug 12 2016

mkroetzsch added a comment to T142780: Request access to data for WDQS research.

P.S. Alex is on vacation and possibly disconnected. His reply might therefore be delayed. He is officially back in two weeks.

Aug 12 2016, 2:01 PM · Patch-For-Review, SRE, SRE-Access-Requests, Research-collaborations, Research-management, Research
mkroetzsch added a comment to T142780: Request access to data for WDQS research.

I have signed the Acknowledgement of Wikimedia Server Access Responsibilities.

Aug 12 2016, 1:59 PM · Patch-For-Review, SRE, SRE-Access-Requests, Research-collaborations, Research-management, Research

Jul 5 2016

mkroetzsch renamed T135083: Create a formal collaboration for WDQS research from Create a formal collaboration w/ Markus Krotzsch to Create a formal collaboration w/ Markus Kroetzsch.
Jul 5 2016, 5:58 PM · Research-Archive, Research-collaborations, Research-management

May 25 2016

mkroetzsch added a comment to T136194: wikidata-exports is using 256G in Tools.

We have two kinds of large data files: biweekly Wikidata json entity dumps and RDF exports that we generate from them. The RDF exports are what we offer through our website http://tools.wmflabs.org/wikidata-exports/rdf/index.php?content=exports.php

May 25 2016, 3:05 PM · cloud-services-team (Kanban), Tools, Wikidata

Feb 25 2016

mkroetzsch added a comment to T126862: Datatype for chemical formulae on Wikidata.

Re parsing strings: You are skipping the first step here. The question is not which format is better for advanced interpretation, but which format is specified at all. Whatever your proposal is, I have not seen any syntactic description of if yet. If -- in addition to having a specified syntax -- it can also be parsed for more complex features, that's a nice capability. But let's maybe start by saying what the proposed "structured data" format actually is.

Feb 25 2016, 9:51 PM · Patch-For-Review, Wikidata, Math

Feb 24 2016

mkroetzsch added a comment to T127929: [Story] Add a new datatype for linking to creators of artwork and more (smart URI).

+1 sounds like a workable design

Feb 24 2016, 8:59 AM · SDC General, User-Ladsgroup, Story, Commons, MediaWiki-extensions-WikibaseRepository, Wikidata

Feb 16 2016

mkroetzsch added a comment to T126862: Datatype for chemical formulae on Wikidata.

Re chemical markup for semantics: this is true for Wikitext, where you cannot otherwise know that "C" is carbon. It does not apply to Wikidata, where you already get the same information from the property used. Think of P274 as a way of putting text into "semantic markup" on Wikipedia.

Feb 16 2016, 6:54 AM · Patch-For-Review, Wikidata, Math

Feb 15 2016

mkroetzsch added a comment to T126862: Datatype for chemical formulae on Wikidata.

I really wonder if the introduction of all kinds of specific markup languages in Wikidata is the right way to go. We could just have a Wikitext datatype, since it seems that Wikitext became the gold standard for all these special data types recently. Mark-up over semantics. By this I mean that the choice of format is focussed on presentation, not on data exchange. I am not an expert in chemical modelling (but then again, is anyone in this thread?), but it seems that this mark-up centric approach is fairly insufficient and naive.

Feb 15 2016, 9:30 PM · Patch-For-Review, Wikidata, Math

Feb 10 2016

mkroetzsch added a comment to T126349: RDF export for the math data type should not export input texvc string but its MathML representation.

The MathML expression includes the TeX representation, which can be used in
LaTeX documents and also to create new statements.

Feb 10 2016, 2:53 PM · MW-1.27-release (WMF-deploy-2016-03-01_(1.27.0-wmf.15)), Patch-For-Review, Wikidata, Math
mkroetzsch added a comment to T126349: RDF export for the math data type should not export input texvc string but its MathML representation.

The format should be the same as in JSON. If MathML is preferred there, then this is fine with me. If LaTeX is preferred, we can also use this. It seems that MathML would be a more reasonable data exchange format, but Moritz was suggesting in his emails that he does not think it to be usable enough today, so there might be practical reasons to avoid it.

Feb 10 2016, 2:34 PM · MW-1.27-release (WMF-deploy-2016-03-01_(1.27.0-wmf.15)), Patch-For-Review, Wikidata, Math

Nov 23 2015

mkroetzsch added a comment to T99820: [Task] Add reference to ontology.owl to the RDF output.

Looking at the link, it seems to me we'd (trivially) meet these requirements.

Nov 23 2015, 12:26 PM · Patch-For-Review, Discovery-Wikidata-Query-Service-Sprint, Discovery-ARCHIVED, Wikidata-Query-Service, Wikidata

Nov 19 2015

mkroetzsch added a comment to T99820: [Task] Add reference to ontology.owl to the RDF output.

...and if we consider our data dump to be an ontology, then what isn't an ontology?

Nov 19 2015, 8:41 PM · Patch-For-Review, Discovery-Wikidata-Query-Service-Sprint, Discovery-ARCHIVED, Wikidata-Query-Service, Wikidata

Nov 18 2015

mkroetzsch added a comment to T118860: [RFC] Use Role Object Pattern to represent derived data in the data model.

I don't want to detail every bit here, but it should be clear that one can easily eliminate the dependency to $db in the formatter code. The Sites object I mentioned is an example. It is *not* static in our implementation. You can make it an interface. You can inject Sites (or a mocked version of it) for testing -- this is what we do. The only dependency you will retain is that the formatting code, or some code that calls the formatting code, must know where to get the URL from.

Nov 18 2015, 8:53 PM · Wikidata-Sprint-2016-07-05, Wikidata-Sprint-2016-05-24, Wikidata-Sprint-2016-04-12, Wikidata-Sprint-2016-02-16, Wikidata-Sprint-2015-12-01, Proposal, Patch-For-Review, Wikidata-Sprint-2015-11-17, Wikidata
mkroetzsch added a comment to T118860: [RFC] Use Role Object Pattern to represent derived data in the data model.

@daniel As long as it works for you, this is all fine by me, but in my experience with PHP this could cost a lot of memory, which could be a problem for the long item pages that already caused problems in the past.

Nov 18 2015, 12:57 PM · Wikidata-Sprint-2016-07-05, Wikidata-Sprint-2016-05-24, Wikidata-Sprint-2016-04-12, Wikidata-Sprint-2016-02-16, Wikidata-Sprint-2015-12-01, Proposal, Patch-For-Review, Wikidata-Sprint-2015-11-17, Wikidata
mkroetzsch added a comment to T118860: [RFC] Use Role Object Pattern to represent derived data in the data model.

Structurally, this would work, but it seems like a very general solution with a lot of overhead. Not sure that this pattern works well on PHP, where the cost of creating additional objects is huge. I also wonder whether it really is good to manage all those (very different!) types of "derived" information in a uniform way. The examples given belong to very different objects and are based on very different inputs (some things requiring external data sources, some not). I find it a bit unmotivated to architecturally unify things that are conceptually and technically so very different. The motivation given for choosing this solution starts from the premise that one has to find a single solution that works for all cases, including some "edge cases". Without this assumption, one would be free to solve the different problems individually, using what is best for each, instead of being forced to go for some least common denominator.

Nov 18 2015, 11:18 AM · Wikidata-Sprint-2016-07-05, Wikidata-Sprint-2016-05-24, Wikidata-Sprint-2016-04-12, Wikidata-Sprint-2016-02-16, Wikidata-Sprint-2015-12-01, Proposal, Patch-For-Review, Wikidata-Sprint-2015-11-17, Wikidata

Sep 22 2015

mkroetzsch added a comment to T113168: [Story] Make it possible to alter only Statements with a certain property.

This was a suggestion we came up with when discussing during WikiCon. People are asking for a way to edit the data they pull into infobox templates. Clearly, doing this in place will be a long-term effort that needs a complicated solution and many more design discussions. Until this is in place, people can only link to Wikidata. Unfortunately, people often feel intimidated by what they see there, because they get a very long page that takes long time to load and contains all kind of data that they have not seen in the infobox.

Sep 22 2015, 12:20 PM · Story, MediaWiki-extensions-WikibaseRepository, MediaWiki-extensions-WikibaseView, Wikidata

Sep 11 2015

mkroetzsch added a comment to T111770: [Story] Decide how to represent quantities with units in the "truthy" RDF mapping.

I think the discussion now lists all main ideas on how to handle this in RDF, but most of them are not feasible because of the very general way in which Wikibase implements unit support now. Given that there is no special RDF datatype for units and given that we have neither conversion support nor any kind way to restrict that a property must/must not have units, only one of the options is actually possible now: export as string (no range queries, but minimally more informative than just using a blank node).

Sep 11 2015, 9:09 AM · Story, Wikidata-Query-Service, Discovery-ARCHIVED, Wikidata
mkroetzsch added a comment to T111770: [Story] Decide how to represent quantities with units in the "truthy" RDF mapping.

If we could distinguish type quantity properties that require a unit from those that do not allow units, there would be another options. Then we could use a compound value as the "simple" value for all properties with unit to simulate the missing datatype. On the query level, this would be fully equivalent to having a custom datatype, since one can specify the unit and the (ranged) number individually. (While the P1234inCm properties support only the number, but no queries that refer to the unit).

Sep 11 2015, 8:44 AM · Story, Wikidata-Query-Service, Discovery-ARCHIVED, Wikidata
mkroetzsch added a comment to T111770: [Story] Decide how to represent quantities with units in the "truthy" RDF mapping.

Note that this discussion is no longer just about the wdt property values (called "truthy" above). Simple values are now used on several levels in the RDF encoding.

Sep 11 2015, 8:32 AM · Story, Wikidata-Query-Service, Discovery-ARCHIVED, Wikidata

Sep 10 2015

mkroetzsch added a comment to T101837: [Story] switch default rdf format to full (include statements).

Including more data (within reason) will not be a problem (other than a performance/bandwidth problem for your servers).

Sep 10 2015, 2:59 PM · Wikidata-Sprint-2015-09-29, Story, MediaWiki-extensions-WikibaseRepository, Wikidata

Sep 9 2015

mkroetzsch added a comment to T101837: [Story] switch default rdf format to full (include statements).

Data on the referenced entities does not have to be included as long as one can get this data by resolving these entities' URIs. However, some basic data (ontology header, license information) should be in each single entity export.

Sep 9 2015, 8:27 AM · Wikidata-Sprint-2015-09-29, Story, MediaWiki-extensions-WikibaseRepository, Wikidata
mkroetzsch added a comment to T101837: [Story] switch default rdf format to full (include statements).

One the mailing list, Stas brought up the question "which RDF" should be delivered by the linked data URIs by default. Our dumps contain data in multiple encodings (simple and complex), and the PHP code can create several variants of RDF based on parameters now.

Sep 9 2015, 8:20 AM · Wikidata-Sprint-2015-09-29, Story, MediaWiki-extensions-WikibaseRepository, Wikidata

Sep 8 2015

mkroetzsch added a comment to T85444: [Story] get Wikidata added to LOD cloud.

As another useful feature, this will also allow us to have our SPARQL endpoint monitored at http://sparqles.ai.wu.ac.at/ Basic registration should not be too much work; please look into it (I don't want to create an account for Wikimedia ;-).

Sep 8 2015, 6:55 PM · Story, Wikidata.org, Wikidata

Aug 24 2015

mkroetzsch added a comment to T73349: [Bug] Fix empty map serialization behaviour.

It seems that the Web API for wbeditentities is also returning empty lists when creating new items (at least on test.wikidata.org). Is this the same bug or a different component?

Aug 24 2015, 11:33 AM · Wikibase-DataModel-Serialization, Wikidata, MediaWiki-extensions-WikibaseRepository

Aug 5 2015

mkroetzsch added a comment to T105432: Make wikibase:quantityUnit an URI .

If not dropped, then it should be fixed. The value of "1" (a string literal) is not correct. Units should be represented by URIs, not by literals.

Aug 5 2015, 3:49 PM · Wikidata-Sprint-2015-06-30, Patch-For-Review, Discovery-Wikidata-Query-Service-Sprint, Discovery-ARCHIVED, Wikidata-Query-Service, Wikidata

Jun 23 2015

mkroetzsch added a comment to T102717: https switch changed wdata prefix to https:.

While I did say that pretty much all URIs I know use http, I do not have any reason to believe that https would cause problems. It is not so extensively tested maybe, but in most contexts it should work fine.

Jun 23 2015, 6:42 PM · Wikidata-Sprint-2015-06-16, Patch-For-Review, Wikidata, Wikidata-Query-Service, Discovery-ARCHIVED

Jun 17 2015

mkroetzsch added a comment to T95316: Comparison of the existing Wikidata RDF dumps.

Are there any differences we're missing? Are we ok with these differences?

Jun 17 2015, 4:54 PM · Wikidata-Sprint-2015-06-16, Wikidata-Sprint-2015-06-02, Wikidata-Sprint-2015-05-05, Wikidata-Sprint-2015-04-21, Wikidata-Sprint-2015-04-07, Wikidata

Jun 12 2015

mkroetzsch added a comment to T102155: [Task] find a way to surface rdf/json representation in item UI.

we once planned a popup box with links to the various formats. It would be shown when you click on the Q-id in the title.

Jun 12 2015, 7:11 AM · MediaWiki-extensions-WikibaseRepository, Wikidata

Jun 10 2015

mkroetzsch added a comment to T101752: [RFC] Introduce ExternalEntityId.

I think this is a useful change if you want Wikibase sites to be able to refer to other Wikibase sites. In WDTK, all of our EntityId objects are "external", of course. A lesson learned for us was that it is not enough to know the base URI in all cases. You sometimes need URLs for API, file path, and page path in addition to the plain URI. MediaWiki already has a solution for this in form of the sites table. I would suggest to use this and to store pairs <sitekey,localEntityId> and to have the URI prefix stored in the sites table. It's cleaner than storing the actual URI string (which might change if an external site is reconfigured!) in the actual values on the page.

Jun 10 2015, 8:37 AM · Wikibase-DataModel, Wikidata

May 21 2015

mkroetzsch added a comment to T99907: [RFC] Human-readable serialization of TimeValue precisions in RDF.

@thiemowmde I don't know what you mean with the mutliple tickets you refer to. I am not aware of other tickets related to readability. I was just saying that the requirement you are trying to address will never be addressed even halfway. It's still nice to improve readability a bit if it is possible without much pain and without any other disadvantages, but I don't think that this is the case here.

May 21 2015, 5:04 PM · Proposal, MediaWiki-extensions-WikibaseRepository, Wikidata
mkroetzsch added a comment to T99907: [RFC] Human-readable serialization of TimeValue precisions in RDF.

@thiemowmde One could have documentation as a text that is added as a description of the property used for precision. However, most users would more likely read a web page than look up the description stored in an OWL file. In the end, when you type in a SPARQL query, there is not much documentation directly available to you, even if it is stored in the RDF database somewhere.

May 21 2015, 4:08 PM · Proposal, MediaWiki-extensions-WikibaseRepository, Wikidata
mkroetzsch added a comment to T99907: [RFC] Human-readable serialization of TimeValue precisions in RDF.

A big advantage of the numbers is that you can search for values where the precision is at least a certain value (e.g., dates with precision day or above). This would be lost when using URIs.

May 21 2015, 3:29 PM · Proposal, MediaWiki-extensions-WikibaseRepository, Wikidata

May 19 2015

mkroetzsch added a comment to T94064: Date of +0000-01-01 is allowed but undefined in wikibase but is not allowed in xsd:dateTime as implemented by blazegraph.

@Jc3s5h You are right that date conversion only makes sense in a certain range. I think the software should disallow day-precision dates in prehistoric eras (certainly everything before -10000). There are no records that could possibly justify this precision, and the question of calendar conversion becomes moot. Do you think 4713BCE would be enough already, or do you think there could be a reason to find more complex algorithms to get calendar support that extends further to the past?

May 19 2015, 4:03 PM · Wikidata

May 11 2015

mkroetzsch added a comment to T97195: [Story] Create real URLs for wikidata ontology.

Sounds good.

May 11 2015, 3:02 PM · Wikidata-Sprint-2015-09-15, Patch-For-Review, Wikidata

Apr 3 2015

mkroetzsch added a comment to T94747: Make decision on RDF ontology prefix.

@daniel Changing the base URIs is not working as a way to communicate breaking changes to users of RDF. You can change them, but there is no way to make users notice this change, and it will just break a few more queries. It's just not how RDF works. Most of our test queries do not even mention any wikibase ontology URI, yet they are likely to be broken by changes to come. If you think that we need a way to warn users of such changes, you need to think of another way of doing this.

Apr 3 2015, 9:42 PM · Wikidata-Query-Service, Wikidata
mkroetzsch added a comment to T94747: Make decision on RDF ontology prefix.

I agree with the proposal of @Smalyshev.

Apr 3 2015, 7:24 AM · Wikidata-Query-Service, Wikidata

Mar 31 2015

mkroetzsch added a comment to T94064: Date of +0000-01-01 is allowed but undefined in wikibase but is not allowed in xsd:dateTime as implemented by blazegraph.

@Smalyshev You comment on my Item 1 by referring to BlazeGraph and Virtuoso. However, my Item 1 is about reading Wikidata, not about exporting to RDF. Your concerns about BlazeGraph compatibility are addressed by my item 2. I hope this clarifies this part.

Mar 31 2015, 7:25 AM · Wikidata

Mar 30 2015

mkroetzsch added a comment to T94064: Date of +0000-01-01 is allowed but undefined in wikibase but is not allowed in xsd:dateTime as implemented by blazegraph.

@Smalyshev P.S. Your finding of "0000" years in our Virtuoso instance is quite peculiar given that this endpoint is based on RDF 1.0 dumps as they are currently generated in WDTK using this code: https://github.com/Wikidata/Wikidata-Toolkit/blob/a9f676bfbc2df545d386bfa72e5130fa280521a9/wdtk-rdf/src/main/java/org/wikidata/wdtk/rdf/values/TimeValueConverter.java#L112-L117

Mar 30 2015, 9:25 PM · Wikidata
mkroetzsch added a comment to T94064: Date of +0000-01-01 is allowed but undefined in wikibase but is not allowed in xsd:dateTime as implemented by blazegraph.

@Smalyshev We really want the same thing: move on with minimal disturbance as quickly as possible. As you rightly say, the data we generate right now is not meant for production use but for testing. We must make sure that our production environment will understand dates properly, but it's still some time before that. Here is my proposal summed up:

Mar 30 2015, 9:02 PM · Wikidata
mkroetzsch added a comment to T94064: Date of +0000-01-01 is allowed but undefined in wikibase but is not allowed in xsd:dateTime as implemented by blazegraph.

@mkroetzsch I already listed a few of the tools that implement XSD 1.0 style BCE years and I read your answer as to say that you know of no tools that implement XSD 1.1 style BCE years.

Mar 30 2015, 1:42 PM · Wikidata
mkroetzsch added a comment to T94064: Date of +0000-01-01 is allowed but undefined in wikibase but is not allowed in xsd:dateTime as implemented by blazegraph.

Feel free to post a list of the RDF tools that you found to implement RDF 1.0 rather than RDF 1.1 in terms of dates.

Mar 30 2015, 9:36 AM · Wikidata
mkroetzsch added a comment to T94064: Date of +0000-01-01 is allowed but undefined in wikibase but is not allowed in xsd:dateTime as implemented by blazegraph.

Re "halting the work on the query engine"/"produce code now": The WDTK RDF exports are generated based on the original specification. There is no technical issue with this and it does not block development to do just this. The reason we are in a blocker situation is that you want to move forward with an implementation that is different from the RDF model we proposed and that goes against our original specification, so that Denny and I are fundamentally disagreeing with your design. If you want to return to the original plan, please do it and move on. If not, then better wait until Lydia has a conclusion for what to do with dates, rather than implementing your point of view without consensus. For me, this is a benchmark of whether or not our current discussion setup is working.

Mar 30 2015, 9:29 AM · Wikidata

Mar 29 2015

mkroetzsch added a comment to T94064: Date of +0000-01-01 is allowed but undefined in wikibase but is not allowed in xsd:dateTime as implemented by blazegraph.

@mkroetzsch Do you know of some widely used software that implements XSD 1.1 handling of BCE dates?

Mar 29 2015, 9:59 PM · Wikidata
mkroetzsch added a comment to T94064: Date of +0000-01-01 is allowed but undefined in wikibase but is not allowed in xsd:dateTime as implemented by blazegraph.

@Smalyshev @Lydia_Pintscher Dates without years should not be allowed by the time datatype. They are impossible to order, almost impossible to query, and they do not have any meaning whatsoever in combination with a preferred calendar model. All the arguments @Denny has already given elsewhere for why we should unify dates to Proleptic Gregorian internally apply here too. My suspicion is that the existing dates of this form are simply a glitch in the UI, where users got the impression that dates without years are recognized and pressing "save" silently set the year to zero without them seeing the change in meaning. If this is an important use case, then we should develop a day-of-year datatype that supports this, or suggest the community to use dedicated properties/qualifiers to encode this. However, other datatype extensions would be much more important than this rare case (e.g., units of measurement).

Mar 29 2015, 11:19 AM · Wikidata

Mar 27 2015

mkroetzsch added a comment to T93451: Data format updates for RDF export.

All RDF tools should be able to handle resources without labels (no matter if used as subject, predicate, or objcet). But data browsers or other UIs will simply show the URL (or an automatically created abbreviated version of it) to the user. So instead of "instance of" it would read something like "http://www.wikidata.org/entity/P31c". Nevertheless, we can accept this for now. AFAIK there are no widely used generic RDF data browsers anyway, and it's much more likely that people will first create Wikidata-aware interfaces.

Mar 27 2015, 10:04 PM · Patch-For-Review, Wikidata-Query-Service, Wikidata
mkroetzsch added a comment to T94064: Date of +0000-01-01 is allowed but undefined in wikibase but is not allowed in xsd:dateTime as implemented by blazegraph.

"we don't know what year it was but it was July 4th"

Mar 27 2015, 9:57 PM · Wikidata
mkroetzsch updated subscribers of T94064: Date of +0000-01-01 is allowed but undefined in wikibase but is not allowed in xsd:dateTime as implemented by blazegraph.

Yes, the discussion on SPARQL has converged surprisingly quickly to the view that XSD 1.1 is both normative and intended in SPARQL 1.1 (by the way, I can only recommend this list if you have SPARQL questions, or the analogous list for RDF -- people are usually very quick and helpful in answering queries, esp. if you say why you need it ;-).

Mar 27 2015, 9:53 PM · Wikidata
mkroetzsch added a comment to T94064: Date of +0000-01-01 is allowed but undefined in wikibase but is not allowed in xsd:dateTime as implemented by blazegraph.

Note that all current data representation formats assume that "0000-01-01T00:00:00" is a valid representation:

Mar 27 2015, 10:05 AM · Wikidata
mkroetzsch added a comment to T93451: Data format updates for RDF export.

Don't see why it would be this many. It'd be like 4 additional rows per property:

Mar 27 2015, 8:40 AM · Patch-For-Review, Wikidata-Query-Service, Wikidata

Mar 26 2015

mkroetzsch added a comment to T72385: Wikidata JSON dump: file directory location should follow standard patterns.

@Smalyshev Yes, this is what I was saying. @hoo was proposing to create a special directory for "truthy" based on offline discussion in the office.

Mar 26 2015, 10:02 AM · Patch-For-Review, § Wikidata-Sprint-2015-03-24, Wikidata, Datasets-General-or-Unknown
mkroetzsch added a comment to T93451: Data format updates for RDF export.

@Smalyshev Yes, using lower-case local names for properties is a widely used convention and we should definitely follow that for our ontology. However, I would rather not change case of our P1234 property ids when they occur in property URIs, since Wikibase ids might be case sensitive in the future (Commons files will have their filename as id, and even if standard MW is first-letter case-insensitive in articles, it can be configured to be otherwise). It would also create some confusion if one would have to write "p1234" in some interfaces and "P1234" in others (maybe even both would occur in RDF since we have a P1234 entity and several related properties).

Mar 26 2015, 9:47 AM · Patch-For-Review, Wikidata-Query-Service, Wikidata

Mar 25 2015

mkroetzsch added a comment to T72385: Wikidata JSON dump: file directory location should follow standard patterns.

Re "what does consistent mean": to be based on the same input data. All dumps are based on Wikidata content. If they are based on the same content, they are consistent, otherwise they are not.

Mar 25 2015, 7:54 PM · Patch-For-Review, § Wikidata-Sprint-2015-03-24, Wikidata, Datasets-General-or-Unknown
mkroetzsch added a comment to T72385: Wikidata JSON dump: file directory location should follow standard patterns.

Re using the same code: That's not essential here. All we want is that the dumps are the same. It's also not necessary to develop the code twice, since it is already there twice anyway. It's just the question if we want to use a slow method that keeps people waiting for the dumps for days (as they already do now with many other dumps), or a fast one that you can run anywhere (even without DB access; on a laptop if you like). The fact that we must have the code in PHP too makes it possible to go back to the slow system if it should ever be needed, so there is no lock-in. Dump file generation is also not operation-critical for Wikidata (the internal SPARQL query will likely be based on a live feed, not on dumps). What's not to like?

Mar 25 2015, 7:48 PM · Patch-For-Review, § Wikidata-Sprint-2015-03-24, Wikidata, Datasets-General-or-Unknown
mkroetzsch added a comment to T72385: Wikidata JSON dump: file directory location should follow standard patterns.

All of these dumps will be generated by exporting from the DB.

Mar 25 2015, 4:46 PM · Patch-For-Review, § Wikidata-Sprint-2015-03-24, Wikidata, Datasets-General-or-Unknown
mkroetzsch added a comment to T72385: Wikidata JSON dump: file directory location should follow standard patterns.

@Lydia_Pintscher I understand this problem, but if you put different dumps for different times all in one directory, won't this become quite big over time and hard to use? Maybe one should group dumps by how often they are created (and have date-directories only below that). For some cases, there does not seem to be any problem. For example, creating all RDF dumps from the JSON dump takes about 3-6h in total (on labs). So this is easily doable on the same day as the JSON dump generation. I am sure that we could also generate alternative JSON dumps in comparable time (maybe add an hour to the RDF if you do it in one batch). The slow part seems to be the DB export that leads to the first JSON dump -- once you have this the other formats should be relatively quick to do.

Mar 25 2015, 3:41 PM · Patch-For-Review, § Wikidata-Sprint-2015-03-24, Wikidata, Datasets-General-or-Unknown
mkroetzsch added a comment to T93207: Decide which domain to use for the wikibase ontology URI.

@daniel Changing URIs of the ontology vocabulary is "silently producing wrong results" as well. I understand the problems you are trying to solve. I am just saying that changing the URIs does not actually solve them.

Mar 25 2015, 1:38 PM · Wikidata-Sprint-2015-04-21, Domains, Wikidata
mkroetzsch added a comment to T72385: Wikidata JSON dump: file directory location should follow standard patterns.

@hoo Thanks for the heads up! I do have comments.

Mar 25 2015, 1:27 PM · Patch-For-Review, § Wikidata-Sprint-2015-03-24, Wikidata, Datasets-General-or-Unknown

Mar 22 2015

mkroetzsch added a comment to T93451: Data format updates for RDF export.

is there any existing ontology we may want to use to create such links between entity:P1234 and v:P1234 or q:P1234? Or should we just invent our own?

Mar 22 2015, 1:56 PM · Patch-For-Review, Wikidata-Query-Service, Wikidata

Mar 21 2015

mkroetzsch added a comment to T93451: Data format updates for RDF export.

Also, it was suggested that we may want to change the fact that we use entity:P1234 in link Entity->Statement and give it a distinct URL. However, then it is not clear what would be the link between entity:P1234 and the rest of the data.

Mar 21 2015, 10:45 AM · Patch-For-Review, Wikidata-Query-Service, Wikidata

Mar 20 2015

mkroetzsch added a comment to T93207: Decide which domain to use for the wikibase ontology URI.

@daniel It makes sense to use wikibase rather than wikidata, but I don't think it matters very much at all. We should just define it rather sooner than later.

Mar 20 2015, 4:07 PM · Wikidata-Sprint-2015-04-21, Domains, Wikidata

Mar 19 2015

mkroetzsch added a comment to T93207: Decide which domain to use for the wikibase ontology URI.

@daniel: Have you wondered why XML Schema decided against changing their URIs? It is by far the most disruptive thing that you could possibly do. Ontologies don't work like software libraries where you download a new version and build your tool against it, changing identifiers as required. Changing all URIs of an ontology (even if only on a major version increment) will break third-party applications and established usage patterns in a single step. There is no mechanism in place to do this smoothly. You never want to do this. Even changing a single URI can be very costly, and is probably not what you want if the breaking change affects only a diminishing part of your users (How many BCE dates are there in XSD files? How many of those were already assuming the ISO reading anyway?).

Mar 19 2015, 10:43 PM · Wikidata-Sprint-2015-04-21, Domains, Wikidata
mkroetzsch added a comment to T93207: Decide which domain to use for the wikibase ontology URI.

Hi Daniel.

Mar 19 2015, 3:53 PM · Wikidata-Sprint-2015-04-21, Domains, Wikidata

Mar 16 2015

mkroetzsch added a comment to T88735: wikidata-rdf wrong format. .

Yes, this refers to the Wikidata Toolkit RDF exports. I have created an issue now: https://github.com/Wikidata/Wikidata-Toolkit/issues/128 As it turns out, the error is actually caused by the code that treats the year-0 issue (which we of course are well aware of). Should be easy to fix as part of our upcoming RDF improvements. Note that negative dates before -9999 may still confuse some RDF tools.

Mar 16 2015, 2:41 PM · Wikidata

Mar 11 2015

Gerrit Code Review <gerrit@wikimedia.org> committed rMEXTf487d8b6213c: Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki… (authored by mkroetzsch).
Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki…
Mar 11 2015, 5:22 AM
Gerrit Code Review <gerrit@wikimedia.org> committed rMEXTc1d559ea6671: Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki… (authored by mkroetzsch).
Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki…
Mar 11 2015, 5:21 AM
Gerrit Code Review <gerrit@wikimedia.org> committed rMEXTee1aaf52e050: Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki… (authored by mkroetzsch).
Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki…
Mar 11 2015, 5:10 AM
Gerrit Code Review <gerrit@wikimedia.org> committed rMEXT4cca13f1183e: Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki… (authored by mkroetzsch).
Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki…
Mar 11 2015, 5:10 AM
Gerrit Code Review <gerrit@wikimedia.org> committed rMEXT0fb670fecf3e: Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki… (authored by mkroetzsch).
Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki…
Mar 11 2015, 5:10 AM
Gerrit Code Review <gerrit@wikimedia.org> committed rMEXT853972b0f077: Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki… (authored by mkroetzsch).
Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki…
Mar 11 2015, 5:09 AM
Gerrit Code Review <gerrit@wikimedia.org> committed rMEXTb857331c9397: Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki… (authored by mkroetzsch).
Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki…
Mar 11 2015, 5:09 AM
Gerrit Code Review <gerrit@wikimedia.org> committed rMEXT84023a65bf58: Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki… (authored by mkroetzsch).
Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki…
Mar 11 2015, 5:06 AM
Gerrit Code Review <gerrit@wikimedia.org> committed rMEXT4b83187aea80: Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki… (authored by mkroetzsch).
Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki…
Mar 11 2015, 5:06 AM
Gerrit Code Review <gerrit@wikimedia.org> committed rMEXT78176c3c8604: Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki… (authored by mkroetzsch).
Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki…
Mar 11 2015, 5:06 AM
Gerrit Code Review <gerrit@wikimedia.org> committed rMEXT87053eb316fa: Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki… (authored by mkroetzsch).
Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki…
Mar 11 2015, 5:05 AM
Gerrit Code Review <gerrit@wikimedia.org> committed rMEXTe32d196a69ae: Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki… (authored by mkroetzsch).
Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki…
Mar 11 2015, 5:04 AM
Gerrit Code Review <gerrit@wikimedia.org> committed rMEXT18e897a0b25a: Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki… (authored by mkroetzsch).
Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki…
Mar 11 2015, 5:04 AM
Gerrit Code Review <gerrit@wikimedia.org> committed rMEXT1e69064e4035: Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki… (authored by mkroetzsch).
Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki…
Mar 11 2015, 5:04 AM
Gerrit Code Review <gerrit@wikimedia.org> committed rMEXTd33770b9381b: Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki… (authored by mkroetzsch).
Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki…
Mar 11 2015, 5:04 AM
Gerrit Code Review <gerrit@wikimedia.org> committed rMEXTf924ecd20d10: Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki… (authored by mkroetzsch).
Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki…
Mar 11 2015, 5:04 AM
Gerrit Code Review <gerrit@wikimedia.org> committed rMEXT608349f3c08d: Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki… (authored by mkroetzsch).
Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki…
Mar 11 2015, 5:03 AM
Gerrit Code Review <gerrit@wikimedia.org> committed rMEXT078f4e8f1514: Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki… (authored by mkroetzsch).
Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki…
Mar 11 2015, 5:01 AM
Gerrit Code Review <gerrit@wikimedia.org> committed rMEXT2da8d82a2c23: Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki… (authored by mkroetzsch).
Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki…
Mar 11 2015, 5:01 AM
Gerrit Code Review <gerrit@wikimedia.org> committed rMEXT449d0d8db28e: Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki… (authored by mkroetzsch).
Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki…
Mar 11 2015, 5:01 AM
Gerrit Code Review <gerrit@wikimedia.org> committed rMEXT7884d6b39954: Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki… (authored by mkroetzsch).
Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki…
Mar 11 2015, 5:01 AM
Gerrit Code Review <gerrit@wikimedia.org> committed rMEXTef70749fdcd3: Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki… (authored by mkroetzsch).
Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki…
Mar 11 2015, 5:01 AM
Gerrit Code Review <gerrit@wikimedia.org> committed rMEXT2912a08d0769: Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki… (authored by mkroetzsch).
Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki…
Mar 11 2015, 5:00 AM
Gerrit Code Review <gerrit@wikimedia.org> committed rMEXT7082325c7c76: Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki… (authored by mkroetzsch).
Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki…
Mar 11 2015, 5:00 AM
Gerrit Code Review <gerrit@wikimedia.org> committed rMEXTfb7054204b95: Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki… (authored by mkroetzsch).
Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki…
Mar 11 2015, 5:00 AM
Gerrit Code Review <gerrit@wikimedia.org> committed rMEXTab11200465bb: Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki… (authored by mkroetzsch).
Updated mediawiki/extensions Project: mediawiki/extensions/SemanticMediaWiki…
Mar 11 2015, 4:59 AM