Page MenuHomePhabricator

Outreachy Application Task: Develop an Image Similarity API
Closed, ResolvedPublic

Description

Overview

This task serves as a tutorial with microtasks for the Outreachy Project T291071 (Develop an Image Similarity Tool). Starting from this notebook try go through the steps and complete the different TODOs.

The full Outreachy project will involve more comprehensive coding than what is being asked for here (and some opportunities for additional explorations as desired). This task will introduce some of the basic concepts and give us a sense of your Python skills, how well you work with new data, documentation of your code, and description of your thinking and results. We are not expecting perfection -- give it your best shot! See this tutorial for working with Wikipedia edit tag data as an example of good practices for building an easy-to-follow notebook.

Setup

  • Make sure that you can login to the PAWS service with your wiki account: https://paws.wmflabs.org/paws/hub
  • Using this notebook as a starting point, create your own notebook (see these instructions for forking the notebook to start with) and complete the functions / analyses. All PAWS notebooks have the option of generating a public link, which can be shared back so that we can evaluate what you did. Use a mixture of code cells and markdown to document what you find and your thoughts.
  • As you have questions, feel free to add comments to this task (and please don't hesitate to answer other applicant's questions if you can help)
  • If you feel you have completed your notebook, you may request feedback and we will provide high-level feedback on what is good and what is missing. To do so, send an email to isaac@wikimedia.org with the link to your public PAWS notebook. We will try to make time to give this feedback at least once to anyone who would like it.
  • When you feel you are happy with your notebook, you should include the public link in your final Outreachy project application as a recorded contribution. You may record contributions as you go as well to track progress.

Event Timeline

Hi @Miriam ...looking to get started with the task. I have created and logged into the PAWS service. Is the starter notebook still under development? Please guide me to the next steps. Thanks

Hello @fkaelin, after going through the setup, what are the next steps to take to start contributing?

I updated the description with a link to the notebook. Please let us know if you run into problems.

Hello, i'm not able login to mediawiki using the same username and password , i don't know what's the issue.

image.png (594×1 px, 52 KB)

@Abid999: Please bring up support questions in support forums instead. This task is about developing an Image Similarity API only. Thanks for your understanding.

Is it possile to start my work without logging in to media wiki , they are not letting me register saying i'm using proxies.

@Abid999: Again, please see my previous comment. Please ask general support questions in support places instead. Thanks.

Hi, I am Aniket Bharti, have been approved for the contribution period for the December 2021 internship. Very excited to learn new things by solving project problems.

@AniketArs welcome and looking forward to your contribution!

Hi everyone, I'm very interested in working on this project as an Outreachy applicant. I tried to access T291071 but get the following error:

Access Denied: Restricted Task
You do not have permission to view this object.
Users with the "Can View" capability:
Members of a particular project can take this action. (You can not see this object, so the name of this project is restricted.)
The owner of a task can always view and edit it.

Is this normal? The description on Outreachy pointed to that task as a source of resources and tutorials, so I was looking to get a better grip on the tools to be used. Hope someone can help. Thank you.

I tried to access T291071 but get the following error:

Hey @AlexGP thanks for the interest and alerting us. That was an oversight and you should be able to see the task now!

This comment was removed by AniketArs.

Just wanted to note that per a request we clarified the feedback route -- you may send your notebooks to me (isaac@wikimedia.org) and I will delegate them out to the other mentors to provide feedback. We only can guarantee providing this preliminary feedback once before you submit your final applications so I recommend waiting until your notebook feels complete before asking for feedback. In the meantime, if you get stuck or have specific questions about pieces of the notebook, feel free to ask them here!

Running the second cell now returns an error related to tensorflow_hub library usage in image_similarity_tools.py, it persists after kernel restart. Disabling hub and the example classifier from the .py removed the error. Tried in two instances of PAWS

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_118/1035847415.py in <module>
      3 
      4 import cv2
----> 5 import image_similarity_tools
      6 import os
      7 import random

~/image_similarity_tools.py in <module>
      5 import os
      6 import tensorflow as tf
----> 7 import tensorflow_hub as hub
      8 
      9 '''

...


/srv/paws/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/canned/dnn.py in <module>
     25 from tensorflow.python.framework import ops
     26 from tensorflow.python.util.tf_export import estimator_export
---> 27 from tensorflow_estimator.python.estimator import estimator
     28 from tensorflow_estimator.python.estimator.canned import head as head_lib
     29 from tensorflow_estimator.python.estimator.canned import optimizers

/srv/paws/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/estimator.py in <module>
     68 
     69 @estimator_export(v1=['estimator.Estimator'])
---> 70 @doc_controls.inheritable_header("""\
     71   Warning: Estimators are not recommended for new code.  Estimators run
     72   `v1.Session`-style code which is more difficult to write correctly, and

AttributeError: module 'tensorflow.tools.docs.doc_controls' has no attribute 'inheritable_header'

Running the second cell now returns an error related to tensorflow_hub library usage in image_similarity_tools.py, it persists after kernel restart. Disabling hub and the example classifier from the .py removed the error. Tried in two instances of PAWS

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_118/1035847415.py in <module>
      3 
      4 import cv2
----> 5 import image_similarity_tools
      6 import os
      7 import random

~/image_similarity_tools.py in <module>
      5 import os
      6 import tensorflow as tf
----> 7 import tensorflow_hub as hub
      8 
      9 '''

...


/srv/paws/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/canned/dnn.py in <module>
     25 from tensorflow.python.framework import ops
     26 from tensorflow.python.util.tf_export import estimator_export
---> 27 from tensorflow_estimator.python.estimator import estimator
     28 from tensorflow_estimator.python.estimator.canned import head as head_lib
     29 from tensorflow_estimator.python.estimator.canned import optimizers

/srv/paws/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/estimator.py in <module>
     68 
     69 @estimator_export(v1=['estimator.Estimator'])
---> 70 @doc_controls.inheritable_header("""\
     71   Warning: Estimators are not recommended for new code.  Estimators run
     72   `v1.Session`-style code which is more difficult to write correctly, and

AttributeError: module 'tensorflow.tools.docs.doc_controls' has no attribute 'inheritable_header'

Hi @AlexGP, First restart the server than add ! pip install tensorflow_estimator==2.6.0 in first code block like

# install ml dependencies
! pip install --upgrade pip
! pip install tensorflow 
! pip install tensorflow_hub
! pip install opencv-python
! pip install tensorflow_estimator==2.6.0  # forcing to install specific version

# download a python file with helper methods for image similarity
! curl -L https://analytics.wikimedia.org/published/datasets/one-off/image_similarity/image_similarity_tools.py -o image_similarity_tools.py

# download data    
! curl -L https://analytics.wikimedia.org/published/datasets/one-off/image_similarity/microtask_data.tar.gz -o microtask_data.tar.gz
! tar -xf microtask_data.tar.gz
! rm microtask_data.tar.gz

Thank you @AniketArs, can fully use image_similarity_tools.py now : )

Thank you for pointing this out @AlexGP, and thank you for providing a fix @AniketArs!

There seems to have been a dependency bump that is causing an issue with tensorflow_hub. I added @AniketArs's workaround to the main notebook, remember to restart the kernel too if you are running into this issue.

@fkaelin @Isaac help regarding Outreachy internship project timeline:
it is showing

Please work with your mentor to provide a timeline of the work you plan to accomplish on the project and what tasks you will finish at each step. Make sure take into account any time commitments you have during the Outreachy internship round. If you are still working on your contributions and need more time, you can leave this blank and edit your application later.

help regarding Outreachy internship project timeline:

Hey @AniketArs thanks for the question. I'm sure others will have it too. We don't much weight on the timeline portion of the application because you're really just becoming familiar with the project so don't spend too much time on it. And there'd be an opportunity to rework it as you learned more about the project. It's most useful though to help show what you are interested in / where you'd want some additional time to learn more -- e.g., maybe you have some machine learning background and so want to spend some time exploring different approaches to embeddings but have never built an API so would want an additional week to learn about approaches to that (don't worry, that's what we mentors are here for). So I'd take the the description of the project steps from the project task (T291071) and convert that into a timeline based on how much you think each step will take. And don't forget to set some time aside at the beginning for getting settled etc. Hope that helps!

srishakatux subscribed.

This was a microtask for Outreachy Round 23. Closing it as the round is long over.