Page MenuHomePhabricator

Outreachy Application Task (Round 25): Explore Toolhub Data
Closed, ResolvedPublic

Description

This task is one of two application microtasks for T317083

Overview

For this task, you're being asked to complete a Jupyter notebook that will help you familiarize yourself with the project. The notebook explores the Toolhub API and the data relevant to building the backend of the web application. https://public.paws.wmcloud.org/User:SStefanova_(WMF)/Outreachy_25_Toolhub_API_microtask.ipynb

Using your knowledge of Python, do your best to complete the notebook, and provide plenty of details to explain what you are doing and seeing in the data. There are also some non-coding questions asking you to reflect on how to structure and design the web application.

The full project will involve more comprehensive coding than what is being asked for here, with support from your mentors (and some opportunities for additional explorations as desired). This task will introduce some of the basic concepts and give us a sense of your Python skills, how well you work with new data, documentation of your code, and description of your thinking and results. We are not expecting perfection -- just do your best and explain what you're doing and why!

Set-up

  • Make sure that you can log in to the PAWS service with your wiki account: https://paws.wmflabs.org/paws/hub
  • Using this notebook as a starting point, create your own notebook (see these instructions for forking the notebook to start with) and complete the functions/analyses. All PAWS notebooks have the option of generating a public link, which can be shared back so that we can evaluate what you did. Use a mixture of code cells and markdown to document what you find and your thoughts.
  • As you have questions, feel free to add comments to this task (and please don't hesitate to answer other applicant's questions if you can help).
  • If you feel you have completed your notebook, you may request feedback, and we will provide high-level feedback on what is good and what is missing. To do so, email both of the mentors (sstefanova@wikimedia.org and dadedoyin@wikimedia.org) with the link to your public PAWS notebook. We will try to make time to provide this feedback once to anyone who would like it.
  • When you feel you are happy with your notebook, you should include the public link in your final Outreachy project application as a recorded contribution. We encourage you to record contributions as you go, as well to track progress.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Hi @Slst2020 and everyone,

And also, can I send you a mail to check some parts I have worked on mentor's view/feedback on them @Slst2020 ?

As I mentioned in the Zulip stream for this project

Woah! It's more than enough. Thank you @Slst2020

Some of you have submitted what you have so far to get feedback on whether your work is going in the right direction or not. Please keep in mind that due the high number of applicants, mentors can only provide feedback once per microtask during the contribution period.

Those who have asked for feedback on in-progress work on the backend API task have received some general advise that I'm now sharing with everybody here:

  • Make sure to include tests for all your functions
  • If you have imported a specific library once somewhere in the notebook, it's available to the whole notebook and doesn't need to be imported again
  • It's recommended to keep your imports at the top-level, and not inside functions or between code blocks
  • It's also a good idea to keep imports in separate code cells so that you don't reimport packages over and over again when rerunning other code.
  • It's good practice to give your functions and variables meaningful names, to improve code readability
  • Similarly, it's important to document your code, using docstrings and comments where needed

You could use this as a checklist right now to self-evaluate your code; once you consider it finished, we will give you detailed and personalized feedback on your notebook.

Hello everyone

I have an urgent question. While going through the backend task, I noticed that the primary focus of the retrieved data is in the annotation dictionary. My question is while retrieving the missing information, do I retrieve from both the core and annotation or from annotation only?

Thank you.

Hello everyone

I have an urgent question. While going through the backend task, I noticed that the primary focus of the retrieved data is in the annotation dictionary. My question is while retrieving the missing information, do I retrieve from both the core and annotation or from annotation only?

Thank you.

Here @nnekasandra

Hi @Slst2020 and everyone,

Just for further clarification, can I consider data in core info but not in annotations as missing information? For example, subtitle = Null

@Olalekan
"...The conclusion is that we will consider a piece of information missing only if it is absent from both the Core and Annotations layers"

Hello everyone

I have an urgent question. While going through the backend task, I noticed that the primary focus of the retrieved data is in the annotation dictionary. My question is while retrieving the missing information, do I retrieve from both the core and annotation or from annotation only?

Thank you.

Here @nnekasandra

Hi @Slst2020 and everyone,

Just for further clarification, can I consider data in core info but not in annotations as missing information? For example, subtitle = Null

@Olalekan
"...The conclusion is that we will consider a piece of information missing only if it is absent from both the Core and Annotations layers"

Alright thank you

Again: Please strip unneeded full quotes to keep things readable - thanks.

Hello, still finding an issue with forking, anyone to assist me. converting format to raw gives me sort of json, do I even need to fork at all or I can start from my clean notebook?

Hello, still finding an issue with forking, anyone to assist me. converting format to raw gives me sort of json, do I even need to fork at all or I can start from my clean notebook?

open this link and press 'ctrl s' to save it as a file.
https://public.paws.wmcloud.org/User:SStefanova_(WMF)/Outreachy_25_Toolhub_API_microtask.ipynb?format=raw

Then upload the saved file into your own jupyter lab

Hi @Slst2020 and everyone,

And also, can I send you a mail to check some parts I have worked on mentor's view/feedback on them @Slst2020 ?

As I mentioned in the Zulip stream for this project

Woah! It's more than enough. Thank you @Slst2020

Some of you have submitted what you have so far to get feedback on whether your work is going in the right direction or not. Please keep in mind that due the high number of applicants, mentors can only provide feedback once per microtask during the contribution period.

Those who have asked for feedback on in-progress work on the backend API task have received some general advise that I'm now sharing with everybody here:

  • Make sure to include tests for all your functions
  • If you have imported a specific library once somewhere in the notebook, it's available to the whole notebook and doesn't need to be imported again
  • It's recommended to keep your imports at the top-level, and not inside functions or between code blocks
  • It's also a good idea to keep imports in separate code cells so that you don't reimport packages over and over again when rerunning other code.
  • It's good practice to give your functions and variables meaningful names, to improve code readability
  • Similarly, it's important to document your code, using docstrings and comments where needed

You could use this as a checklist right now to self-evaluate your code; once you consider it finished, we will give you detailed and personalized feedback on your notebook.

This is helpful

Hi @Slst2020 and everyone,

In task 3, we have to use TOOLS_API_ENDPOINT = "https://toolhub.wikimedia.org/api/search/tools" can you please clarify which parameter we can use to retrieve the data of a single tool. I have tried using "name_term" parameter but it is returning empty results array and also used "q" parameter but it is returning many tools that almost matching with the tool name pattern.
For eg:
using API_ENDPOINT/?q=toolforge-graphql it is returning 1573 tools.

Please clarify if there is a way to retrieve a single tool or if can i move ahead with "q" parameter.

Hi @Slst2020 and everyone,

In task 3, we have to use TOOLS_API_ENDPOINT = "https://toolhub.wikimedia.org/api/search/tools" can you please clarify which parameter we can use to retrieve the data of a single tool. I have tried using "name_term" parameter but it is returning empty results array and also used "q" parameter but it is returning many tools that almost matching with the tool name pattern.
For eg:
using API_ENDPOINT/?q=toolforge-graphql it is returning 1573 tools.

Please clarify if there is a way to retrieve a single tool or if can i move ahead with "q" parameter.

while working with the said API, i observed that most of the query parameters will return more than one reult. This because its a "SEARCH" feauture and will not bring only the exact tool you need but any other tool that contains the same set of strings.
if you search for piwikibot for example, you'd get other tools that have piwikibot in their name string, or depending on whatever param you're are using to query the db.

what i did was to loop the result of the search and find the tool that == the tool need

i hope this helps

Hi @Slst2020 and everyone,

In task 3, we have to use TOOLS_API_ENDPOINT = "https://toolhub.wikimedia.org/api/search/tools" can you please clarify which parameter we can use to retrieve the data of a single tool. I have tried using "name_term" parameter but it is returning empty results array and also used "q" parameter but it is returning many tools that almost matching with the tool name pattern.
For eg:
using API_ENDPOINT/?q=toolforge-graphql it is returning 1573 tools.

Please clarify if there is a way to retrieve a single tool or if can i move ahead with "q" parameter.

while working with the said API, i observed that most of the query parameters will return more than one reult. This because its a "SEARCH" feauture and will not bring only the exact tool you need but any other tool that contains the same set of strings.
if you search for piwikibot for example, you'd get other tools that have piwikibot in their name string, or depending on whatever param you're are using to query the db.

what i did was to loop the result of the search and find the tool that == the tool need

i hope this helps

Yeah Thanks!! @Durotimi-Hector

@Amisha27, @Durotimi-Hector.
Alternatively, you can use the endpoint:
https://toolhub.wikimedia.org/api/search/tools/?q=name:pywikibot. This will return only the results that have the query parameter value (pywikibot) in the name field. I found that the backend was using elasticsearch, which makes this possible.

Next, select the top item from the list of results, that is, data["results"][0].
This will always be the tool that matches the query parameter value.

Please I will really appreciate your feedbacks on this approach

I also don't understand why the tool that matches the string should not be returned by the query parameter name __term. It is expected that it will.

@Slst2020, I have also tried searching with name___startswith in combination with name__endswith which is expected to return a single tool that matches their query parameter respectively, but instead I am getting an empty results list.

@Amisha27, @Durotimi-Hector.
Alternatively, you can use the endpoint:
https://toolhub.wikimedia.org/api/search/tools/?q=name:pywikibot. This will return only the results that have the query parameter value (pywikibot) in the name field. I found that the backend was using elasticsearch, which makes this possible.

Next, select the top item from the list of results, that is, data["results"][0].
This will always be the tool that matches the query parameter value.

Please I will really appreciate your feedbacks on this approach

Yes @tobianointing, I have also used similar approach.

@Slst2020 Hello, I have a couple of questions.

  1. Is it allowed for me to write other functions that are of use to me in the notebook or only the functions we are asked to write are allowed?
  2. Can all my answers to the non-coding questions be written as comments in the notebook?

@Slst2020 Hello, I have a couple of questions.

  1. Is it allowed for me to write other functions that are of use to me in the notebook or only the functions we are asked to write are allowed?

You can write any code you want/need :)

  1. Can all my answers to the non-coding questions be written as comments in the notebook?

I suggest you write text in markdown cells, not as code comments.

@Slst2020 Hello, I have a couple of questions.

  1. Is it allowed for me to write other functions that are of use to me in the notebook or only the functions we are asked to write are allowed?

You can write any code you want/need :)

  1. Can all my answers to the non-coding questions be written as comments in the notebook?

I suggest you write text in markdown cells, not as code comments.

Alright thank you so much.

Hello Everyone

I have a question relating to task 2 which says we should return a tool as a dictionary with the tool name as the key and the values as tuples. The task made the following comment as the values:
{<tool name>: (<number of missing fields>, <days since the tool was last edited>)}.
Does this mean we'll return only "number of missing fields" and "days since the tool was last edited" as the values in the tuple or are there more fields to return?

Thank you!

@nnekasandra
Returning this:
{<tool name>: (<number of missing fields>, <days since the tool was last edited>)}
Is the task .

But as suggested in the notebook here:

Feel free to use other fields and/or create additional visualizations!

We were asked to add other fields for additional visualization

@nnekasandra
Returning this:
{<tool name>: (<number of missing fields>, <days since the tool was last edited>)}
Is the of task

Oh thank you so much

@nnekasandra
Returning this:
{<tool name>: (<number of missing fields>, <days since the tool was last edited>)}
Is the task .

But as suggested in the notebook here:

Feel free to use other fields and/or create additional visualizations!

We were asked to add other fields for additional visualization

@nnekasandra

@nnekasandra
Returning this:
{<tool name>: (<number of missing fields>, <days since the tool was last edited>)}
Is the task .

But as suggested in the notebook here:

Feel free to use other fields and/or create additional visualizations!

We were asked to add other fields for additional visualization

@nnekasandra

Okay which endpoint exactly are we querying? Is it /api/search/lists or /api/search/tools/ or /api/tools/{toolname}?
Thank you

Okay which endpoint exactly are we querying? Is it /api/search/lists or /api/search/tools/ or /api/tools/{toolname}?
Thank you

@nnekasandra

At that point, it's suggested that we use another of the available endpoints to request all of the tools. If you check the API docs, you'll see that making a GET request to /api/tools (without using a specific toolname) will return such a list.

We're asked to use the api/search/tools endpoint in Task 3.

Okay which endpoint exactly are we querying? Is it /api/search/lists or /api/search/tools/ or /api/tools/{toolname}?
Thank you

@nnekasandra

At that point, it's suggested that we use another of the available endpoints to request all of the tools. If you check the API docs, you'll see that making a GET request to /api/tools (without using a specific toolname) will return such a list.

We're asked to use the api/search/tools endpoint in Task 3.

Okay, I understand now. Thank you @NicoleLBee

Hello

In the <days since the tool was last edited> from task 2, are we meant to convert the date-time data type from the modified_date dictionary key to days or leave it as it was?

Hello

In the <days since the tool was last edited> from task 2, are we meant to convert the date-time data type from the modified_date dictionary key to days or leave it as it was?

Since we're being asked to return a number of days, my interpretation is that we should do whatever we need to do with the value of "modified_date" in order to get the result we're after. (I made use of the datetime module, for instance.) I hope that helps!

Hello

In the <days since the tool was last edited> from task 2, are we meant to convert the date-time data type from the modified_date dictionary key to days or leave it as it was?

Since we're being asked to return a number of days, my interpretation is that we should do whatever we need to do with the value of "modified_date" in order to get the result we're after. (I made use of the datetime module, for instance.) I hope that helps!

Yes, it helped. Thank you, dear.

Hello

In the <days since the tool was last edited> from task 2, are we meant to convert the date-time data type from the modified_date dictionary key to days or leave it as it was?

Since we're being asked to return a number of days, my interpretation is that we should do whatever we need to do with the value of "modified_date" in order to get the result we're after. (I made use of the datetime module, for instance.) I hope that helps!

I also had the same interpretation and did the same!

Do we have to use brute-force method in task-2 for accessing each page by getting the 'next' url of the page from '/api/search/tools' endpoint?
I am very confused about it :)
@Slst2020

Do we have to use brute-force method in task-2 for accessing each page by getting the 'next' url of the page from '/api/search/tools' endpoint?
I am very confused about it :)
@Slst2020

I'm not sure what you mean by brute-force here. Have you tried modifying the page_size parameter?

Do we have to use brute-force method in task-2 for accessing each page by getting the 'next' url of the page from '/api/search/tools' endpoint?
I am very confused about it :)
@Slst2020

You can focus on getting the tools on the page first and perform the required operation on those tools. When you change the request URL e.g https://toolhub.wikimedia.org/api/tools/?page=8&page_size=20, it automatically fetches tools from this particular page and returns the desired result.

Hello all, important reminder to keep experimentation to the Toolhub demo server, and NOT the live page. Thank you!

Do we have to use brute-force method in task-2 for accessing each page by getting the 'next' url of the page from '/api/search/tools' endpoint?
I am very confused about it :)
@Slst2020

You can focus on getting the tools on the page first and perform the required operation on those tools. When you change the request URL e.g https://toolhub.wikimedia.org/api/tools/?page=8&page_size=20, it automatically fetches tools from this particular page and returns the desired result.

I read in an earlier post that the max page_size=1000. You can try to loop/iterate over the number_of_pages=(count_of_tools//1000), and make a call for each page to gather the information into the dict. It may take some time to run since the count is ~2703

Hello everyone

Please, I'm a bit confused about task 3. It's saying something about getting tools through facets. Does it mean I should loop through the facets and return all missing fields in the facets? Please help enlighten me about this task. Thank you!

Hi @nnekasandra. What you're being asked to do in Task 3 is to write a function that takes a tool name and returns a list containing the missing fields, as in Task 1. The key difference is that your function should call the /api/search/tools endpoint rather than /api/tools.

The part about "facets" refers to the fact that if you check out the API documentation for the /search/tools endpoint, you'll see that it's possible to add a query string that will limit your search to tools that don't have a value for the field(s) that your query string references. For instance, making a GET request to https://toolhub.wikimedia.org/api/search/tools/?keywords__isnull=true would return all of the tools that have empty keyword lists.

You don't have to use these strings for Task 3; you're just encouraged to play around with them and to try to get a feeling for how the /search/tools endpoint works and how it differs from the other one.

Does that help at all?

Hi @nnekasandra. What you're being asked to do in Task 3 is to write a function that takes a tool name and returns a list containing the missing fields, as in Task 1. The key difference is that your function should call the /api/search/tools endpoint rather than /api/tools.

The part about "facets" refers to the fact that if you check out the API documentation for the /search/tools endpoint, you'll see that it's possible to add a query string that will limit your search to tools that don't have a value for the field(s) that your query string references. For instance, making a GET request to https://toolhub.wikimedia.org/api/search/tools/?keywords__isnull=true would return all of the tools that have empty keyword lists.

You don't have to use these strings for Task 3; you're just encouraged to play around with them and to try to get a feeling for how the /search/tools endpoint works and how it differs from the other one.

Does that help at all?

Thank you @NicoleLBee
I now understand the task about getting all missing fields through the /api/search/tools endpoint. Does it mean I don't need to use the facet field while working on the endpoint?

My pleasure, @nnekasandra

I now understand the task about getting all missing fields through the /api/search/tools endpoint. Does it mean I don't need to use the facet field while working on the endpoint?

That's correct: you don't need to use it to complete the task. The suggestion that you explore the available options and try to familiarize yourself with how the API works is purely for your own benefit.

My pleasure, @nnekasandra

I now understand the task about getting all missing fields through the /api/search/tools endpoint. Does it mean I don't need to use the facet field while working on the endpoint?

That's correct: you don't need to use it to complete the task. The suggestion that you explore the available options and try to familiarize yourself with how the API works is purely for your own benefit.

Thanks a lot.

Please I'm having json decode errors when I try to convert the response from search query endpoint.
Below is my code block. What am I doing wrong?

TOOLS_API_ENDPOINT = "https://toolhub.wikimedia.org/search?q="
QUERY_PARAMS = "&ordering=-score&page=1&page_size=12"
def getMissingTool(tool):
    url = f'{TOOLS_API_ENDPOINT}{tool}{QUERY_PARAMS}'
    print(url)
    response = requests.get(url, headers=headers)
    print(response.status_code)
    if( response.status_code == 200):
        response_body = json.dumps(response.json(), indent=2)
        core_dict = json.loads(response_body)
        print(core_dict)
        
        
    else:
        print(response.status_code)
print(getMissingTool('youtube'))

Error is from response_body :JSONDecodeError: Expecting value: line 4 column 1 (char 3)

@nnekasandra You are not using the search tool's endpoint. The route to the page that displays the search result is what you are calling in your code; the response you receive will be an HTML page.
https://toolhub.wikimedia.org/api/search/tools/ is the correct endpoint.

Here is your code that has been refactored:

TOOLS_API_ENDPOINT = "https://toolhub.wikimedia.org/api/search/tools?q="
QUERY_PARAMS = "&ordering=-score&page=1&page_size=12"
def getMissingTool(tool):
    url = f'{TOOLS_API_ENDPOINT}{tool}{QUERY_PARAMS}'
    
    response = requests.get(url, headers=headers)
    if( response.status_code == 200):
        core_dict = response.json()
        return core_dict
    else:
        return response.status_code
print(getMissingTool('youtube'))

@nnekasandra You are not using the search tool's endpoint. The route to the page that displays the search result is what you are calling in your code; the response you receive will be an HTML page.
https://toolhub.wikimedia.org/api/search/tools/ is the correct endpoint.

Here is your code that has been refactored:

TOOLS_API_ENDPOINT = "https://toolhub.wikimedia.org/api/search/tools?q="
QUERY_PARAMS = "&ordering=-score&page=1&page_size=12"
def getMissingTool(tool):
    url = f'{TOOLS_API_ENDPOINT}{tool}{QUERY_PARAMS}'
    
    response = requests.get(url, headers=headers)
    if( response.status_code == 200):
        core_dict = response.json()
        return core_dict
    else:
        return response.status_code
print(getMissingTool('youtube'))

Oh wow, 🤦‍♀️ thank you Tobi.

Hi everyone,
I'm getting a little bit confusion I would like to brief me and answer me my questions:

  1. will I use my local pc and the editor I want to develop the web application then I push my project changes to the git repository?
  2. I didn't understand the notebook and how to do it what is the roll of the notebook and what is the task about it?

I look forward to hearing from you,
Marian,
Thanks.

Hi again, @Marian2023

I'm getting a little bit confusion I would like to brief me and answer me my questions:

  1. will I use my local pc and the editor I want to develop the web application then I push my project changes to the git repository?
  2. I didn't understand the notebook and how to do it what is the roll of the notebook and what is the task about it?

For the frontend task, yes, you'll build the website on your local PC and push it to a repository on GitHub.

For the backend task, you need to complete a series of assignments that are written in the notebook: mostly writing functions and answering questions. If you haven't already, I'd suggest clicking on the link to the notebook that's given above, in the Set-up section, and reading through it a couple of times.

Then you can follow the instructions above to sign into the PAWS network, copy the notebook, and start filling it in.

I hope that helps!

Hello @Slst2020. I got a couple of questions as regards task 2

  1. Am I allowed to upload an image of my plot to this platform for opinion?
  2. Will I have to visualize all 2000+ tools data dictionary returned by my task 2 function?

Hello @Slst2020. I got a couple of questions as regards task 2

  1. Am I allowed to upload an image of my plot to this platform for opinion?
  2. Will I have to visualize all 2000+ tools data dictionary returned by my task 2 function?

Hi @Daniel_Ngene,

Regarding your first question, if you have run the cell in which you have written code for graph plot, then the plot image will be present in the paws link notebook which you would share for feedback, no need to upload an image.

Hello @Slst2020. I got a couple of questions as regards task 2

  1. Will I have to visualize all 2000+ tools data dictionary returned by my task 2 function?

If you mean the first #TODO of Task set 2, you can limit the output to the first 5 or so entries. This is good practice for any code cell output, as otherwise the notebook can become very cluttered and difficult to scroll through.

Can anyone help me to know, how to delete the tool, which I created on main server?

As 'Add or Remove Tools' tab is not working for removing the tool.

@Slst2020

Can anyone help me to know, how to delete the tool, which I created on main server?

As 'Add or Remove Tools' tab is not working for removing the tool.

@Slst2020

Yes – the UX for this is not great unfortunately, see T308529 – "Add or Remove Tools" has no UI for actually removing a tool. Try doing it through the delete endpoint and let me know how it goes: https://toolhub.wikimedia.org/api-docs#delete-/api/tools/-name-/

Hello all! When sharing the link to your notebook for feedback, or recording it as a contribution on Outreachy, please make sure to create a public link before doing so. This will ensure that we can open and review your work.

  • Click on the "PAWS public link" button in the top-right corner of the notebook UI. A new tab with the public version of the notebook will open
  • Copy the URL of this new tab – this is the public link.

Can anyone help me to know, how to delete the tool, which I created on main server?

As 'Add or Remove Tools' tab is not working for removing the tool.

@Slst2020

Yes – the UX for this is not great unfortunately, see T308529 – "Add or Remove Tools" has no UI for actually removing a tool. Try doing it through the delete endpoint and let me know how it goes: https://toolhub.wikimedia.org/api-docs#delete-/api/tools/-name-/

Yes it works for deletion. Thanks.

Hi, Everyone!
Hope all of you are doing well.
Please explain how can I test the output of my function, for the API.
For-example: I called the api for response, then I applied a function to filter the values from response. Now, how can I apply test that the values I filtered through the function are my desired values.

I have applied some initial tests like following.

# This function is for getting the response from the server and also testing it with assertion
def make_request_3(url,toolname):
    response = requests.get(url, headers=headers)
    assert response.status_code == 200, print('Error Occured in status_code!')                               
    assert response.headers["Content-Type"] == "application/json", print('Error Occured in headers!')
    response_body = json.loads(response.text)
    assert response_body['results'][0]['name']== toolname, print('Error Occured in body!')
    return response_body['results'][0]

@Slst2020 or anyone.

Thanks.

Many candidates are struggling with unit testing, you are not alone @Mehwish540.

@RoySmith If you have the time, would you mind chiming in here with some advice on unit testing and mocking API responses, for folks with little to no prior experience?

Hi guys....anyone kind enough to share how they went about task 2? which endpoint did you use? i can see two end end points indicated in the task brief

Hi guys....anyone kind enough to share how they went about task 2? which endpoint did you use? i can see two end end points indicated in the task brief

Hi Salwoch, for Task 2, I believe we should be using endpoint: /api/tools/{toolname}/ and utilizing the missing fields function you created in task 1 to complete it.
I hope this helps!

Hi guys....anyone kind enough to share how they went about task 2? which endpoint did you use? i can see two end end points indicated in the task brief

Hi Salwoch, for Task 2, I believe we should be using endpoint: /api/tools/{toolname}/ and utilizing the missing fields function you created in task 1 to complete it.
I hope this helps!

Hi Ros, in the notebook i see two endpoints https://toolhub.wikimedia.org/search?ordering=-score&page=1&page_size=12 and /api/search/tools and I'm confused on which one we're supposed to use

Hi @Salwoch . I believe the expectation for task 2 is that you use the /api/tools endpoint. If you look at the API documentation, you'll see that by not specifying a tool name when using that endpoint, you can receive a list of all the tools. For task 3 it's specified that you're to use the /api/search/tools endpoint
Hope that helps!

Hi @Salwoch . I believe the expectation for task 2 is that you use the /api/tools endpoint. If you look at the API documentation, you'll see that by not specifying a tool name when using that endpoint, you can receive a list of all the tools. For task 3 it's specified that you're to use the /api/search/tools endpoint
Hope that helps!

Thanks Nicole. This helps.

Hi @Salwoch . I believe the expectation for task 2 is that you use the /api/tools endpoint. If you look at the API documentation, you'll see that by not specifying a tool name when using that endpoint, you can receive a list of all the tools. For task 3 it's specified that you're to use the /api/search/tools endpoint
Hope that helps!

This is true

hey...for task 2, are we returning all 2k plus tools and visualizing all of them or we're working with tools returned from a particular page? how are we supposed to go about this?

Hello @Slst2020. I got a couple of questions as regards task 2

  1. Will I have to visualize all 2000+ tools data dictionary returned by my task 2 function?

If you mean the first #TODO of Task set 2, you can limit the output to the first 5 or so entries. This is good practice for any code cell output, as otherwise the notebook can become very cluttered and difficult to scroll through.

This could help you.

hey...for task 2, are we returning all 2k plus tools and visualizing all of them or we're working with tools returned from a particular page? how are we supposed to go about this?

I have been having issues accessing my jupyter notebook. Is anyone able to log in?

I have been having issues accessing my jupyter notebook. Is anyone able to log in?

yes, I'm Currently logged in

I have been having issues accessing my jupyter notebook. Is anyone able to log in?

Yes, I am also logged in the only thing I have noted is that it is taking a lot of time to load..like it presents a blank page for a long time you might assume it is not working.

For anyone having issues logging in to PAWS, the service has just been restarted and should be working fine now. If not, please ping the cloud services team on IRC, as I'm off today and will mostly be unavailable. https://wikitech.wikimedia.org/wiki/Help:IRC

I have been having issues accessing my jupyter notebook. Is anyone able to log in?

Yes, I am also logged in the only thing I have noted is that it is taking a lot of time to load..like it presents a blank page for a long time you might assume it is not working.

Okay. For those still experiencing this problem after the service restored, here are steps to fixing it:

Try logging in another browser, just to be certain an extension isn't interfering with the service
Also, check your security settings on your browser. You can turn off enhanced protection or even safe browsing temporarily.
This should take care of the problem. clear workspace if prompted to and log in again into the notebook

Hi. for task 2, how can I obtain other tool names, I can only seem to get pywikibot?

Hi @DAseneca, without seeing your code it's hard to say but the first thing that springs to my mind is that you're using the variable toolname, which was set to equal pywikibot in the first code block in the notebook. You'll need to give it a new value, or just use the name of the tool you're searching for as a string in the fetch request url.

Hi. for task 2, how can I obtain other tool names, I can only seem to get pywikibot?

Hi @DAseneca, without seeing your code it's hard to say but the first thing that springs to my mind is that you're using the variable toolname, which was set to equal pywikibot in the first code block in the notebook. You'll need to give it a new value, or just use the name of the tool you're searching for as a string in the fetch request url.

Hi @DAseneca,
To further @NicoleLBee point, the example that was given to us was:

TOOLS_API_ENDPOINT = "https://toolhub.wikimedia.org/api/tools"
toolname = 'pywikibot'  # name of tool we want info about 
url = f'{TOOLS_API_ENDPOINT}/{toolname}/'

You should be able to manipulate the toolname to which tool you are searching for.
For example, you can switch toolname = 'pywikibot' to toolname ='wikifile-transfer'.
or alternatively change url = https://toolhub.wikimedia.org/api/tools/wikifile-transer.
There are many ways to make an API call / change your variables, dependent on the use-case and design you'd like to use!

I would also recommend looking into the API documentation here for a better understanding:
https://toolhub.wikimedia.org/api-docs#get-/api/tools/

Hope this helps!

Hi @Slst2020 @Damilare @Daniel_Ngene Could you help with direction on how I can make my contribution Outreachy intern

Hi @Slst2020 @Damilare @Daniel_Ngene Could you help with direction on how I can make my contribution Outreachy intern

Where are you encountering challenges?
There are two microtask that you are supposed to attempt. A frontend task and a backend task. However it is sufficient to attempt one. After you are satisfied with your work and have implemented any high level feedback you might receive, go ahead and record a contribution.

@NicoleLBee @Rossrosales What do you suggest when running the data visualization in task 2, Do I have to run the function on different five or more toolname? How are the fields that are frequently missing be identified?

@NicoleLBee @Rossrosales What do you suggest when running the data visualization in task 2, Do I have to run the function on different five or more toolname? How are the fields that are frequently missing be identified?

@DAseneca I think it is important to read the whole thread before asking a question. Many common questions about the project are already answered.
Do I have to run the function on different five or more toolname?
You should be able to retrieve all of the tools for data visualization. The tool count is small enough for analysis.

How are the fields that are frequently missing be identified?
There is no one correct way, but one possible way is: you can create a function to get the count of missing fields per tool and store in a dictionary.
For example, {name_missing_field : count}
In this example, the highest count would be the most frequently missing field.

Below are some replies that could be helpful to find your answer.

For task 2, it is asked to "Visualize this data in a meaningful way." @Slst2020 @Damilare, will you please give some clue on how can I visualize the data(dict where the keys are tool names, and the values are tuples)?

You are free to transform the data in any way you need for your visualization. You could start by thinking about what you would like your visualization to look like, then figure out what your data needs to look like. Likewise, you can also extract additional data, if you think that would help you.

I hope this helps!

@NicoleLBee @Rossrosales What do you suggest when running the data visualization in task 2, Do I have to run the function on different five or more toolname? How are the fields that are frequently missing be identified?

@DAseneca I think it is important to read the whole thread before asking a question. Many common questions about the project are already answered.
Do I have to run the function on different five or more toolname?
You should be able to retrieve all of the tools for data visualization. The tool count is small enough for analysis.

How are the fields that are frequently missing be identified?
There is no one correct way, but one possible way is: you can create a function to get the count of missing fields per tool and store in a dictionary.
For example, {name_missing_field : count}
In this example, the highest count would be the most frequently missing field.

Below are some replies that could be helpful to find your answer.

For task 2, it is asked to "Visualize this data in a meaningful way." @Slst2020 @Damilare, will you please give some clue on how can I visualize the data(dict where the keys are tool names, and the values are tuples)?

You are free to transform the data in any way you need for your visualization. You could start by thinking about what you would like your visualization to look like, then figure out what your data needs to look like. Likewise, you can also extract additional data, if you think that would help you.

I hope this helps!

Yeah it did, thanks. I also agree it is important to review the chat history.

Hi @Slst2020 @Damilare @Daniel_Ngene Could you help with direction on how I can make my contribution Outreachy intern

Hello! While this project is not officially closed to new candidates, we are only two days from the deadline. This project has many strong candidates already, so if you have also contributed to other projects, I'd advise you to polish those submissions instead.

Hello @Slst2020

In my bid to make a few corrections to my notebook. I discovered that some tools such as 'toolforge-authors', has some info in the Core info that is completely non-existent in the annotations layer.

Example:
"subtitle": null
"sponsor": []

These fields do not exist at all in the annotations layer.
Can they also be considered as missing fields?
Or must missing fields be what exists in the annotations layer but is empty on both Core info and annotations?

Or must missing fields be what exists in the annotations layer but is empty on both Core info and annotations?

Yes, this. A field that doesn't exist in the Annotations layer is not editable, so even if it's empty, there's nothing we could do about it.

Alright. Thanks you so much for the clarification

Good morning guys, y'all were amazing ...

I'm Mofor Emmanuel btw, and took part in the backend contribution task late, but I must say, this really is an awesome community

Slst2020 claimed this task.