I have a feeling we're overdesigning it a little. I think it should be simple and cover 80% of cases, and if you need more complex things you'd probably be better with using generic boolean syntax like OR/AND.
Fri, Feb 15
I like the structure of the syntax but would probably bikeshed the exact delimiters a bit if possible (later). Also, are we following fallback chains or only seeking exact language match? If we match exactly we may want to also think about allowing fallbacks.
Thu, Feb 14
Another thought about docker and similar technologies. PHP applications - and Mediawiki in particular - can be used on stacks way beyond Linux. I've seen PHP run on HP-UX, AIX, in mainframe containers, other weird setups. As long as our base requirement remains more or less only "PHP" (database is separable, so it's not a big issue), we may not be officially supporting this diverse category but we are serving them. There's a long tail of "weird" cases that our relatively low requirement footprint enables. If we ever changed it to "you need to run docker now" that would cut off some the "long" part of this long tail - all those weird cases that nobody would bother to support explicitly but that "just work" because requirements are low. Raising requirements has costs, we should be always mindful of it.
Are our target demographics (and the reason why we are targeting them) documented somewhere?
I am not sure what requirement you refer to, to be made explicit.
Another solution to this problem would be to require Docker and Kuberneties which are both free and trivial to setup
I don't have a real opinion on this one. Generally for dump users the only concern is for the dump to be recent enough to be useful (wikidata is rather volatile, and linked nature of it means that volatility in one area can influence results in bigger area). That said, I do not see a huge problem with having dumps on 1st and 15th. The only argument I can see for weekly dumps is that we could time them for when the activity of the editors is the lowest to reduce impact, but I am not sure whether it's an important consideration.
A useful Wikibase (with query service etc) can already not simply be installed on a LAMP stack.
Wed, Feb 13
I've created release 1.3.0 for textcat package on packagist from master branch. Turns out it's easy:
- Create the tag in git
- Push it to gerrit
- Click update on packagist site
Tue, Feb 12
I've requested the gettit repo and as soon as it's made, I'll move the fork there.
I get the idea of server-side HTML rendering to avoid delays. But I am kinda questioning whether the advantage of splitting code outside of PHP into separate service and all the complexities that follow from that.
Yes, judging from our preliminary test, if we get uncontested use of the server or even a certain chunk of it maybe (not sure if possible?) it would be enough. Note that an interesting scenario that we want to test in foreseeable future involves cluster setup, so we'd want at least 2 hosts (not sure whether they have to be on 2 separate hardware machines) with requirements close to what wdqs hosts have. It could splitting cloudvirt host into two VMs exclusively used by these test hosts would be ok. Not sure if virtualization that we do now allows such kind of fixed resource allocations (probably also I/O resources need to be taken care of?)
Mon, Feb 11
Sat, Feb 9
Fri, Feb 8
Ok, I've updated the docs accordingly. I think then this can be resolved.
@ was used for documentation but v2 has its own documentation field
Thu, Feb 7
Also, we probably need some test to capture this situation in the future. Looks like current tests do not cover this branch.
Ah, I see what's up - Wikibase has been updated, but WikibaseLexeme was not.
Looks like some deployment issue - 1.33.0-wmf.14 has this class, but 1.33.0-wmf.16 does not.
Wed, Feb 6
Tue, Feb 5
Mon, Feb 4
Hence at the end of mw-debug-cli.log we have:
There's also this library I've encountered recently: https://cytoscape.org/
may be interesting for graph visualizations.
Sat, Feb 2
Would it be acceptable to show the results for Item namespace and Lexeme namespace separately - e.g. in separate tabs? If not, what would be the expected UI - would both kinds of results displayed together with distinction only by prefix or with some separation between Lexemes and Items?
After discussing this on the offsite, the current approach seems to be like this:
- For full-text search with several kinds of namespaces, it's probably best to separate the searches and display each kind in the separate tab (we'd need to think about how exactly the UI would work for this)
- For Wikidata completion search, we want to have a combined profile that unites the searches on the backend, using query dispatcher functionality being developed by @dcausse.
Fri, Feb 1
@mmodell thanks, this would be great!
We do not have editorial control over the data in Wikidata, yet less on maps provided by the third party. We also are not subject neither to Indian law nor the Twitter audience "law". The situation where some governing body has objection to some content in the open sources or wide Internet, whether on Wikimedia or not, is completely routine and happens all the time. It would certainly be unfortunate if somebody overreacts and makes some unfortunate moves due to any particular situation, however whatever happens does not change these basic facts. We show what OSM maps provide.
Example: in this dashboard: https://phabricator.wikimedia.org/tag/wikidata-query-service/ I'd like to sort first two columns by priority, but in other columns sorting by priority does not make too much sense for me. In general, sorting by priority is good for current tasks, that are being worked on or soon to be worked on, but for part of the board that is more long-term classification it makes less sense. So it would be nice if I could use different sorting for these column, and have some columns auto-sorted by priority but others sorted as I specify manually.
Wed, Jan 30
Note that currently we index existing pages on RefreshLinks. We do have special case for new items, which we index instantly. But I don't think this applies to captions since they are part of existing page in the index, right? So maybe MediaInfo extension should copy some code from onPageContentInsertComplete in CirrusSearch to make instant indexing happen.
We will also need to move options out. We could move it to set of WikibaseCirrusSearch options, or keep current option structure, in which case we'd need to enable extensions to define their own Wikibase options, which as I understand not the case now? Should we keep these options in the same place (BC) or create a new set of options for WikibaseCirrusSearch?
So searching article & entity namespace together is not supported (see T194968: Enable search in all wikidata namespaces combined for task about it). So, we have to either configure search so it by default searches only one kind of namespaces, or solve the task above.
I am not sure whether it's ok to use cdnjs.cloudflare.com for privacy reasons. AFAIK, so far we have avoided directly including non-WMF resources.
Hmm, I updated the entities, and I still see TeX there. However, looking at raw backend output, I do see mathml - check out the result of https://query.wikidata.org/sparql?query=DESCRIBE%20wd%3AQ631815. So the rendering seems to be the problem. Looking at the HTML I do see <math> tags, but somehow the display still is not right. Now sure what's the issue there but seems to be on the frontend side.
OK, it looks like some entities indeed have Tex strings, even though export returns MathML now. Not sure when it changed - the entities have right timestamps, but the RDF is different. I'll update the entries and see what happens.