Mon, Jul 26
Joseph will suggest an optimization to this task when he is back. For now a simple .distinct() has been done on Spark dataframe to facilitate analysis on Wikidata dumps.
Sat, Jul 24
Fri, Jul 23
Mon, Jul 19
Fri, Jul 16
- For June, the average daily successful parsing rate was ~85%. Ranging from 75% to 90%. Note that this only includes queries with status 200 and 500.
- 11% of the distinct queries ran into errors related to prefixes. The number of distinct queries due to each prefix is shown below. By adding the first 4 prefixes (mwapi, geof, foaf, gas) into the query processors' prefix list the average daily successful parsing rate was >96%. A few prefixes were off slightly (data instead of wdata, ref instead of wdref. These account for very few queries, but I fixed them nevertheless.)
Total distinct queries: 67467327
- Other errors included:
- Variable used when already in-scope. This happened when the same variable was reused in a query. Testing such queries in WDQS returns results nicely. These form 2% of the errors in distinct queries.
- Another notable error is the WITH clause. Although it runs well in WDQS, parser doesn't accept it. These form 2.5% of the distinct queries.
Tue, Jul 13
Sun, Jul 11
Hi @akosiaris, I had to fresh install OS and lost my ssh keys. Is it possible to change it so I can regain access? Should I put on a new public key here?
Jun 23 2021
Jun 22 2021
Jun 21 2021
Jun 4 2021
Jun 3 2021
Some of the suggested information to analyse or extract through this analysis are:
- Top items
- Top properties
- Top subject, object types
- Top property types
- Top wikidata vs other predicates
- Number of S, P, O that don't involve wikidata
- The aim is to find the size of the subgraph not concerning wikidata, i.e size of leaves. They are leaves because once they point to something outside of wikidata, they are not expanded within wikidata. Some things are not even exapandable like literals. If we have too many leaves, we may consider using property graphs (where leaves will be listed as properties of a node).
Jun 1 2021
Update 1 June 2021:
May 27 2021
May 25 2021
May 24 2021
Idea on how to store the SPARQL query as a list:
Let's make a list of generic custom class QueryElem[T]. QueryElem contains elemType: String and elem: T.
May 21 2021
May 20 2021
May 19 2021
Unit tests done, patch merged!
- Created a file containing queries that pass and also a file containing queries that don't pass. Those are checked for correctness in the unit tests.
- Checked correctness of extracted nodes for 2 examples queries written inline in the code.
May 7 2021
Apr 23 2021
Thanks, I've updated it. The only thing thats left in the 'All of the above' section on Requesting_access page is 'An ssh key for your shell user'. Let me know if I should I add that as well.