Change Details

**Week: **Dec 7 - Dec 13 **Task:** Identify the best strategy/APIs to find Commons categories that are within a certain radius of the specified GPS coordinates **Deliverable: ** Wiki memo: cURL requests that provide the right categories (30% false positives OK) for all possible use cases and edge cases I will be using https://github.com/nicolas-raoul/apps-android-commons/wiki/Location-based-category-search to document the results of testing the categories obtained via different APIs/strategies against the benchmark of categories that have been manually entered (by the Commons community or by myself) for each picture. Pictures are found by: # Visiting https://commons.wikimedia.org/wiki/Special:Random/File # Eliminating files that are not photos or could not possibly be obtained via a smartphone # File must have location data available For each picture, I aim to perform a comparison for instance: - Manually: x0 good categories - WikiData API: x1 good categories, y1 false positives - Commons API: x2 good category, y2 false positive - "Existing pics at that location" strategy: x3 good category, y3 false positives **WikiData API**: I am running queries via [[ https://tools.wmflabs.org/wikidata-todo/tabernacle.html?wdq=&pagepile=885&props=373%2C625&items=&show=1 | TABernacle ]] for instance ``` claim[373] AND around[625,49.27066666666666,14.073769444444444,0.1] ``` Property 373 signifies the Commons category. I start with radius 0.1km and increase the number if no categories are found. **Method C: Search for existing pics at that location" strategy** This is described in more detail at https://etherpad.wikimedia.org/p/commons-app-android-nearby-categories **Scoring results for WikiData API and Method C:** After obtaining the number of good categories and false positives, I ran them through the equation suggested by @Nicolas_Raoul ``` (number of good categories - number of false positive / 3) / number of good categories found by human ``` to obtain the scores for each sample and each method. I then summed up the scores for the 10 samples for each method to obtain its total score. I made a small Python script to automate this process, and the results are as follows: **Scoring WikiData results...** Score for sample 1 = 0.0666666666667 Score for sample 2 = 0.0666666666667 Score for sample 3 = 0.111111111111 Score for sample 4 = -0.111111111111 Score for sample 5 = 0.333333333333 Score for sample 6 = -0.333333333333 Score for sample 7 = 0.0 Score for sample 8 = -0.666666666667 Score for sample 9 = 0.0 Score for sample 10 = 0.111111111111 Total score = -0.422222222222 **Scoring Method C results...** Score for sample 1 = 0.0 Score for sample 2 = 0.0666666666667 Score for sample 3 = 0.0 Score for sample 4 = -0.111111111111 Score for sample 5 = 0.333333333333 Score for sample 6 = 0.333333333333 Score for sample 7 = -0.222222222222 Score for sample 8 = -1.33333333333 Score for sample 9 = 0.0 Score for sample 10 = 0.222222222222 Total score = -0.711111111111

**Week: **Dec 7 - Dec 13 **Task:** Identify the best strategy/APIs to find Commons categories that are within a certain radius of the specified GPS coordinates **Deliverable: ** Wiki memo: cURL requests that provide the right categories (30% false positives OK) for all possible use cases and edge cases I will be using https://github.com/nicolas-raoul/apps-android-commons/wiki/Location-based-category-search to document the results of testing the categories obtained via different APIs/strategies against the benchmark of categories that have been manually entered (by the Commons community or by myself) for each picture. Pictures are found by: # Visiting https://commons.wikimedia.org/wiki/Special:Random/File # Eliminating files that are not photos or could not possibly be obtained via a smartphone # File must have location data available For each picture, I aim to perform a comparison for instance: - Manually: x0 good categories - WikiData API: x1 good categories, y1 false positives - Commons API: x2 good category, y2 false positive - "Existing pics at that location" strategy: x3 good category, y3 false positives **WikiData API**: I am running queries via [[ https://tools.wmflabs.org/wikidata-todo/tabernacle.html?wdq=&pagepile=885&props=373%2C625&items=&show=1 | TABernacle ]] for instance ``` claim[373] AND around[625,49.27066666666666,14.073769444444444,0.1] ``` Property 373 signifies the Commons category. I start with radius 0.1km and increase the number if no categories are found. **Method C: Search for existing pics at that location" strategy** This is described in more detail at https://etherpad.wikimedia.org/p/commons-app-android-nearby-categories **Scoring results for WikiData API and Method C:** After obtaining the number of good categories and false positives, I ran them through the equation suggested by @Nicolas_Raoul ``` (number of good categories - number of false positive / 3) / number of good categories found by human ``` to obtain the scores for each sample and each method. I then summed up the scores for the 10 samples for each method to obtain its total score. I made a small [[ https://github.com/misaochan/rating-calculator/blob/master/calculator.py | Python script ]] to automate this process, and the results are as follows: **Scoring WikiData results...** Score for sample 1 = 0.0666666666667 Score for sample 2 = 0.0666666666667 Score for sample 3 = 0.111111111111 Score for sample 4 = -0.111111111111 Score for sample 5 = 0.333333333333 Score for sample 6 = -0.333333333333 Score for sample 7 = 0.0 Score for sample 8 = -0.666666666667 Score for sample 9 = 0.0 Score for sample 10 = 0.111111111111 Total score = -0.422222222222 **Scoring Method C results...** Score for sample 1 = 0.0 Score for sample 2 = 0.0666666666667 Score for sample 3 = 0.0 Score for sample 4 = -0.111111111111 Score for sample 5 = 0.333333333333 Score for sample 6 = 0.333333333333 Score for sample 7 = -0.222222222222 Score for sample 8 = -1.33333333333 Score for sample 9 = 0.0 Score for sample 10 = 0.222222222222 Total score = -0.711111111111