**Week: **Dec 7 - Dec 13
**Task:** Identify the best strategy/APIs to find Commons categories that are within a certain radius of the specified GPS coordinates
**Deliverable: ** Wiki memo: cURL requests that provide the right categories (30% false positives OK) for all possible use cases and edge cases
I will be using https://github.com/nicolas-raoul/apps-android-commons/wiki/Location-based-category-search to document the results of testing the categories obtained via different APIs/strategies against the benchmark of categories that have been manually entered (by the Commons community or by myself) for each picture.
Pictures are found by:
# Visiting https://commons.wikimedia.org/wiki/Special:Random/File
# Eliminating files that are not photos or could not possibly be obtained via a smartphone
# File must have location data available
For each picture, I aim to perform a comparison for instance:
- Manually: x0 good categories
- WikiData API: x1 good categories, y1 false positives
- Commons API: x2 good category, y2 false positive
- "Existing pics at that location" strategy: x3 good category, y3 false positives
**WikiData API**: I am running queries via [[ https://tools.wmflabs.org/wikidata-todo/tabernacle.html?wdq=&pagepile=885&props=373%2C625&items=&show=1 | TABernacle
]] for instance
```
claim[373] AND around[625,49.27066666666666,14.073769444444444,0.1]
```
Property 373 signifies the Commons category. I start with radius 0.1km and increase the number if no categories are found.
**Method C: Search for existing pics at that location" strategy**
This is described in more detail at https://etherpad.wikimedia.org/p/commons-app-android-nearby-categories
**Scoring results for WikiData API and Method C:**
After obtaining the number of good categories and false positives, I ran them through the equation suggested by @Nicolas_Raoul
```
(number of good categories - number of false positive / 3) / number of good categories found by human
```
to obtain the scores for each sample and each method. I then summed up the scores for the 10 samples for each method to obtain its total score. I made a small Python script to automate this process, and the results are as follows:
**Scoring WikiData results...**
Score for sample 1 = 0.0666666666667
Score for sample 2 = 0.0666666666667
Score for sample 3 = 0.111111111111
Score for sample 4 = -0.111111111111
Score for sample 5 = 0.333333333333
Score for sample 6 = -0.333333333333
Score for sample 7 = 0.0
Score for sample 8 = -0.666666666667
Score for sample 9 = 0.0
Score for sample 10 = 0.111111111111
Total score = -0.422222222222
**Scoring Method C results...**
Score for sample 1 = 0.0
Score for sample 2 = 0.0666666666667
Score for sample 3 = 0.0
Score for sample 4 = -0.111111111111
Score for sample 5 = 0.333333333333
Score for sample 6 = 0.333333333333
Score for sample 7 = -0.222222222222
Score for sample 8 = -1.33333333333
Score for sample 9 = 0.0
Score for sample 10 = 0.222222222222
Total score = -0.711111111111