My point was: expanding IDs to resource references is not really the same thing as normalizing values
I've renamed it to statement_keywords. Hopefully it's better.
Looks like SVG support for icons is kinda spotty: https://en.wikipedia.org/wiki/Favicon#File_format_support
Right now it's hardcoded in RdfRepository class:
I think you meant T=10s, N=3.
I would propose a warning banner of something like that which contains:
- Warning that the server is not responding
- Link to a page that describes what to do, who to contact, etc. (probably should be configurable)
Mon, Sep 18
the actual calculation is just “Earth distance × (globe radius / Earth radius)”, right?
Sun, Sep 17
Not sure whether it has some relation, hard to diagnose from this. Did you see the GC logs? What the status page for Blazegraph showed?
Sat, Sep 16
I don't think we have plans to implement the service for test.wikidata, so closing this one.
The language code for links works this way (see SiteLinksRdfBuilder class):
geof:distance is definitely assuming Earth for now. Doing it for other globes is tricky since even if we assume they all spherical (which may be good approximation for larger planets but less so for dwarf planets and completely wrong for things like asteroids) we'd need to account for radius, etc. which is hardcoded now for Earth data. I'm not even sure how to efficiently implement it for random globe. It may be possible to do it for predefined set of globes.
Blazegraph has mechanism of stored queries, however what is not entirely clear for me is how abuse prevention would work in such case. I.e., let's assume we have a heavy query, and we have found a way to run it past common limits. What would happen if somebody, by mistake or out of malice, runs it 100 times? This may take down the whole service, at least temporarily. We need some way to prevent this from happening.
Fri, Sep 15
The heap was bumped from 8G because there were some OOMs with heavy queries (some of them still use a bit of heap even if most of the data uses Blazegraph's own allocator). So let's not be over-zealous in reducing it yet. 12G could still be fine.
Thu, Sep 14
I'm reviewing the patch now and will update by the end of the day.
What I would suggest doing is maybe putting this as a non-indexed field in the index and returning it together with the response. Of course, it could also be done purely client-side but at the cost of one extra round-trip.
We may also want to store some values as non-indexed data, e.g. see T140131
now if you decide to add P1559 (monolingual text) we should not index it in the "statements" elastic field they'll require totally different analyzers (one is an identifier, the other is written language)
Wed, Sep 13
Moving the filtering to the mapping (which I'll find more flexible in the future) will require some custom mapper/analyzer.
Was ldf server moved from wdqs1001? If not we should move it first thing.
I think this use of psn::
p:P227 [ # full statement ps:P227 "4015139-6"; # simple value psn:P227 <http://d-nb.info/gnd/4015139-6> # normalized simple value ].
is OK. I'll take time to review it more thoroughly in coming days, but on the face of it it looks OK. Also, please note that psn:P123 and psn:P345 do not have to be of the same type - you have to preserve consistency within the same predicate, but different predicates with the same prefix can have different types. In this case, they even happen to have the same type, due to how we represent values, but in general that's not a requirement as long as overall semantics is close.
Tue, Sep 12
Now the ElasticSearch configs account for sitelinks (and in general any field can be used in search profile with various functions and weights). Do we still need to do anything for this one? Is this for full-test search (which does not feature ElasticSearch yet)?
In the patch, there was an option raised to index all statements of certain type, instead of just named properties. I am not sure yet whether it is a good idea or not, need some thought. Probably not in the initial iteration, but possibly later.
Sat, Sep 9
Another example here:
Fri, Sep 8
I'm not sure we should really go as far as indexing all statements, now. Most of them would not be very useful for the search purposes for now, and already served by Query Service. Most useful ones would be those that are legitimately limit the searches for relevant items, which I would imaging mostly are P31/P279. In fact, right now I don't even have much of a use case for using anything but those two, but maybe we'd have it in the future. I think maybe it'd be ok for now yo just index those explicitly mentioned. The idea of using analyzer/filters may be still workable in the future, but I'd postpone it for now.
It's not our component, so having a workboard etc. for it would not be very useful. It's just a tag to easily identify tickets that are related to this functionality.
It's also not Wikidata Query Service Optimizer, it's Blazegraph Optimizer. But since many people do not know what Blazegraph is, I chose to use WDQS. If this does not fit some naming guidelines, please choose the suitable name.
@EBernhardson yes, this looks like what I've done in the patch, I just wondered if it's correct. Looks like it is then :)
@dcausse Could you explain a bit more how to set up the analyzer? I tried to figure how to do it but I'm not sure whether I did it right.
Thu, Sep 7
I wonder also, is it possible to do the (de)boosting on rescore stage? The reason is because we can select different rescore profiles from URL (which means different widgets can use different boosts) while getting stuff added to the search query itself is more complicated. Of course, we can add more query params or query syntax, but it seems to be for tuning profiles may be easier to do?
Wed, Sep 6
I am not sure these micro-optimizations are worth the increased complexity... Maybe need a test to see if it really produces any noticeable difference.
i wonder if we could rather have some sort of relationship (name tbd) keyword field that encodes both parts