**Week: **Dec 7 - Dec 13
**Task:** Identify the best strategy/APIs to find Commons categories that are within a certain radius of the specified GPS coordinates
**Deliverable: ** Wiki memo: cURL requests that provide the right categories (30% false positives OK) for all possible use cases and edge cases
I will be using https://github.com/nicolas-raoul/apps-android-commons/wiki/Location-based-category-search to document the results of testing the categories obtained via different APIs/strategies against the benchmark of categories that have been manually entered (by the Commons community or by myself) for each picture.
Pictures are found by:
# Visiting https://commons.wikimedia.org/wiki/Special:Random/File
# Eliminating files that are not photos or could not possibly be obtained via a smartphone
# File must have location data available
For each picture, I aim to perform a comparison for instance:
- Manually: x0 good categories
- WikiData API: x1 good categories, y1 false positives
- Commons API: x2 good category, y2 false positive
- "Existing pics at that location" strategy: x3 good category, y3 false positives
**WikiData API**: I am running queries via [[ https://tools.wmflabs.org/wikidata-todo/tabernacle.html?wdq=&pagepile=885&props=373%2C625&items=&show=1 | TABernacle
]] for instance
```
claim[373] AND around[625,49.27066666666666,14.073769444444444,0.1]
```
Property 373 signifies the Commons category. I start with radius 0.1km and increase the number if no categories are found.
**Method C: Search for existing pics at that location" strategy**
This is described in more detail at https://etherpad.wikimedia.org/p/commons-app-android-nearby-categories
**Scoring results for WikiData API and Method C:****Method D** - Same as Method C, except we increase radius until at least 5 unique categories are found. Results on GitHub wiki.
After obtaining the number of good categories and false positives, I ran them through the equation suggested by @Nicolas_Raoul
```
(number of good categories - number of false positive / 3) / number of good categories found by human
```
to obtain the scores for each sample and each method. I then summed up the scores for the 10 samples for each method to obtain its total score. I made a small [[ https://github.com/misaochan/rating-calculator/blob/master/calculator.py | Python script ]] to automate this process, and the results are as follows:
**WikiData results:**
Sample 1 : 2 good categories, 1 false positives, 5 manual categories.
Score = 0.0666666666667
Sample 2 : 1 good categories, 0 false positives, 5 manual categories.
Score = 0.0666666666667
Sample 3 : 1 good categories, 0 false positives, 3 manual categories.
Score = 0.111111111111
Sample 4 : 0 good categories, 1 false positives, 3 manual categories.
Score = -0.111111111111
Sample 5 : 1 good categories, 0 false positives, 1 manual categories.
Score = 0.333333333333
Sample 6 : 0 good categories, 1 false positives, 1 manual categories.
Score = -0.333333333333
Sample 7 : 0 good categories, 0 false positives, 3 manual categories.
Score = 0.0
Sample 8 : 1 good categories, 3 false positives, 1 manual categories.
Score = -0.666666666667
Sample 9 : 0 good categories, 0 false positives, 2 manual categories.
Score = 0.0
Sample 10 : 1 good categories, 0 false positives, 3 manual categories.
Score = 0.111111111111
Total score = -0.422222222222
**Method C results:**
Sample 1 : 3 good categories, 3 false positives, 5 manual categories.
Score = 0.0
Sample 2 : 1 good categories, 0 false positives, 5 manual categories.
Score = 0.0666666666667
Sample 3 : 1 good categories, 1 false positives, 3 manual categories.
Score = 0.0
Sample 4 : 2 good categories, 3 false positives, 3 manual categories.
Score = -0.111111111111
Sample 5 : 1 good categories, 0 false positives, 1 manual categories.
Score = 0.333333333333
Sample 6 : 1 good categories, 0 false positives, 1 manual categories.
Score = 0.333333333333
Sample 7 : 0 good categories, 2 false positives, 3 manual categories.
Score = -0.222222222222
Sample 8 : 1 good categories, 5 false positives, 1 manual categories.
Score = -1.33333333333
Sample 9 : 1 good categories, 1 false positives, 2 manual categories.
Score = 0.0
Sample 10 : 2 good categories, 0 false positives, 3 manual categories.
Score = 0.222222222222
Total score = -0.711111111111**Conclusion:** We will go with Method D