Your first task: classify sample statements using Citation Needed Models
Closed, ResolvedPublic
Actions

Assigned To

None

Authored By

	Miriam
	Oct 3 2019, 10:47 AM

Description

In this task, you will have to classify a set of statements using the models provided in the github repository, and write a small summary reporting back the classification results.
Please proceed as follows:

Clone the repository https://github.com/mirrys/citation-needed-paper/
Make sure you have the following required packages installed:
- Python 2.7
- Keras 2.1.5 with Tensorflow 1.7.0
- sklearn 0.18.1
Download models from the link in the model/ folder
Download dictionaries from the link in the embeddings/ folder
Run the script "run_citation_need_model.py" using models and dictionaries for English, and using as input the test_input_data_sample.txt file provided.
Send @Miriam the file containing the output of the model and any other observation you might have, over email.

Please reach out to @Miriam if you have any troubles installing requirements!

Related Objects
Search...

Status	Assigned	Task
Declined	None	T199190 [2.4] Improve unsourced statement identification tools and algorithms
Resolved	AikoChou	T233707 A system for releasing data dumps from a classifier detecting unsourced sentences in Wikipedia
Resolved	None	T233709 Onboarding Task: getting familiar with the machine learning models for Citation Need
Resolved	None	T234519 Your first task: classify sample statements using Citation Needed Models

Event Timeline

Miriam created this task.Oct 3 2019, 10:47 AM

Miriam mentioned this in T233709: Onboarding Task: getting familiar with the machine learning models for Citation Need.

Hello @Miriam, my name is Dorothy Kabarozi, an outreachy intern I would love to work on this issue/project am well versed with Python been currently learning Machine Learning I would love to work on this kindly advise

Also, am a little confused if this is part of the projects for interns,i saw Outreachy (round 9) and jumped right in , kindly guide me through

Thank you for creating this task, @Miriam.

I was able to install the required libraries in my development environment.

I tried looking for sample_data.txt file in the project's GitHub repo, but I'm afraid I can't seem to find it there. The only data file that I can see is test_input_data_sample.txt file.

I'll appreciate a hint on locating this sample_data.txt file.

In T234519#5543781, @Pgadige01 wrote:

Thank you for creating this task, @Miriam.

I was able to install the required libraries in my development environment.

I tried looking for sample_data.txt file in the project's GitHub repo, but I'm afraid I can't seem to find it there. The only data file that I can see is test_input_data_sample.txt file.

I'll appreciate a hint on locating this sample_data.txt file.

Great catch - I had changed the name of the file and forgot to change it here as well. I modified the task description. THANKS!

In T234519#5543780, @Kabarozi wrote:

Also, am a little confused if this is part of the projects for interns,i saw Outreachy (round 9) and jumped right in , kindly guide me through

Hi @Kabarozi thanks for your interest in the project. As part of your application, you are required to start with an onboarding task. This page describes the steps to follow for your first onboarding task!

Thanks!

Hi @Miriam ..can we use python 3.6 for the tasks?

In T234519#5543837, @Shamima19 wrote:

Hi @Miriam ..can we use python 3.6 for the tasks?

Sure, I think you have to adapt a few things from the main script (minor things like the "print" functions etc) - but feel free to do so!

In T234519#5543766, @Kabarozi wrote:

I would love to work on this kindly advise

@Kabarozi: Hi, what exactly "to advise"? If something is unclear in the task description, please explain what is unclear in the task description or ask specific questions. Please see https://www.mediawiki.org/wiki/How_to_become_a_MediaWiki_hacker#Feedback,_questions_and_support - Thanks! :)

Hi @Miriam!
Just a quick question; we are supposed to send the output file to your email, right?

A few notes, before running the program.
More requirements included h5py (to run 'load model') and pandas.

@Achillesheel02 thanks for the feedback and yes, please send me to my email.
Thanks!

When I was trying to use the command (given in README.md) as it is to run the citation need model on the sample data input file, I realised the command should use the latest file name run_citation_need_model.py instead of test_citation_need_model.py. Please let me know your thoughts on this, @Miriam

I've submitted a PR to update the command.

Thank you.

Hello everyone! I'm also participating to this Outreachy round and I'm very interested in doing to this project of Wikipedia :)
Thanks for the opportunity, @Miriam!
I'm having the following error when I run the script:

2019-10-03 19:05:10.628372: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudart64_100.dll'; dlerror: cudart64_100.dll not found
Traceback (most recent call last):
  File "run_citation_need_model.py", line 17, in <module>
    K.set_session(K.tf.Session(config=k.tf.ConfigProto(intra_op_parallelism_threads=10, inter_op_parallelism_threads=10)))
AttributeError: module 'keras.backend' has no attribute 'tf'

I'm also using Python 3.7, and I have Keras 2.3.0. Maybe this is my issue?
Thanks in advance!

In T234519#5545523, @Meloju wrote:
Hello everyone! I'm also participating to this Outreachy round and I'm very interested in doing to this project of Wikipedia :)
Thanks for the opportunity, @Miriam!
I'm having the following error when I run the script:
2019-10-03 19:05:10.628372: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudart64_100.dll'; dlerror: cudart64_100.dll not found
Traceback (most recent call last):
  File "run_citation_need_model.py", line 17, in <module>
    K.set_session(K.tf.Session(config=k.tf.ConfigProto(intra_op_parallelism_threads=10, inter_op_parallelism_threads=10)))
AttributeError: module 'keras.backend' has no attribute 'tf'
I'm also using Python 3.7, and I have Keras 2.3.0. Maybe this is my issue?
Thanks in advance!

Hi @Meloju, did you check if you installed Tensorflow? the error shows Keras couldn’t find Tensorflow in the backend.

In T234519#5545617, @AikoChou wrote:
In T234519#5545523, @Meloju wrote:
Hello everyone! I'm also participating to this Outreachy round and I'm very interested in doing to this project of Wikipedia :)
Thanks for the opportunity, @Miriam!
I'm having the following error when I run the script:
2019-10-03 19:05:10.628372: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudart64_100.dll'; dlerror: cudart64_100.dll not found
Traceback (most recent call last):
  File "run_citation_need_model.py", line 17, in <module>
    K.set_session(K.tf.Session(config=k.tf.ConfigProto(intra_op_parallelism_threads=10, inter_op_parallelism_threads=10)))
AttributeError: module 'keras.backend' has no attribute 'tf'
I'm also using Python 3.7, and I have Keras 2.3.0. Maybe this is my issue?
Thanks in advance!
Hi @Meloju, did you check if you installed Tensorflow? the error shows Keras couldn’t find Tensorflow in the backend.

Hi @AikoChou , thanks for your answer.
Yes, I had it installed. Now I reinstalled it and the first error is not happening (cudart64_100.dll not found). However, I still get this:

Using TensorFlow backend.
Traceback (most recent call last):
  File "run_citation_need_model.py", line 17, in <module>
    K.set_session(K.tf.Session(config=k.tf.ConfigProto(intra_op_parallelism_threads=10, inter_op_parallelism_threads=10)))
AttributeError: module 'keras.backend' has no attribute 'tf'

I am using :
Python 3.7
Keras 2.2.4
Tensorflow 2.0.0

I am using :
Python 3.7
Keras 2.2.4
Tensorflow 2.0.0

Hi @Meloju, no worries. I suggest installing Keras 2.1.5 and Tensorflow 1.7.0 as Miriam said. Since Tensorflow 2.0 has many changes, it may not compatible with the citation need model script here.

Hi @Meloju ..I can see from Keras documentation that it only supports python from 2.7-3.6 but ur using 3.7 Maybe that's an issue?

Shamima19 unsubscribed.Oct 4 2019, 4:31 AM

IrinaGruz subscribed.Oct 4 2019, 6:32 AM

Hello @Miriam
Thank you for organizing this. I have a few questions:

Is there a deadline for submission of the first task ?
How do we send the output file to you (i.e. by email, or other means) ?
Will there be more tasks, and if yes, then how many ?

Thanks !

Ibia-ahmad subscribed.Oct 4 2019, 10:11 AM

Outreachy will ask us to choose an applicant around mid- to end-October according to their timeline, which we'll do based on which tasks they have completed, and how. We have no deadlines other than that - so let's say, any time in the next couple of weeks is perfectly fine.
I believe question 2 was answered a couple of comments ago: so, email for this one, please. You can find Miriam's email in this user page, but I'll also edit this and the parent task to make that obvious, sorry for the omission.
This and https://phabricator.wikimedia.org/T234606 are the only onboarding tasks we have in mind.

Surlycyborg updated the task description. (Show Details)Oct 4 2019, 8:27 PM

Surlycyborg et al,
Thank you very much, everything is clear now.

Ghassanmas subscribed.Oct 6 2019, 11:47 AM

KalindiFonda subscribed.Oct 6 2019, 1:20 PM

Hello! I found it useful to install the packages in a virtual environment, had some issues probably with packages from before, and having a virtual environment solved the "No module found" errors. Here is some info: https://docs.python-guide.org/dev/virtualenvs/

Hello, I am facing an issue here. I am trying to access the model file(.h5.gz) via google colab but I am getting an error that the file is a read-only system. Is anyone facing this while trying to extract the model?

In T234519#5550636, @KalindiFonda wrote:

Hello! I found it useful to install the packages in a virtual environment, had some issues probably with packages from before, and having a virtual environment solved the "No module found" errors. Here is some info: https://docs.python-guide.org/dev/virtualenvs/

Yes, thanks for the suggestion! By the way, I filed a similar issue in the repository itself a few days ago: https://github.com/mirrys/citation-needed-paper/issues/2. If you (or anyone else reading this) would like to send a Pull Request to update the docs, that would also be a nice little contribution :)

In T234519#5552574, @H_bushro wrote:

Hello, I am facing an issue here. I am trying to access the model file(.h5.gz) via google colab but I am getting an error that the file is a read-only system. Is anyone facing this while trying to extract the model?

Hmm, unless @Miriam contradicts me, I'd suggest not using Colab and doing things locally and from the command line, to be honest. You'll probably need to be comfortable using the model in that environment to continue the tasks, especially if we're going to deploy this on Toolforge at some stage.

In T234519#5552996, @Surlycyborg wrote:

In T234519#5550636, @KalindiFonda wrote:

Hello! I found it useful to install the packages in a virtual environment, had some issues probably with packages from before, and having a virtual environment solved the "No module found" errors. Here is some info: https://docs.python-guide.org/dev/virtualenvs/

Yes, thanks for the suggestion! By the way, I filed a similar issue in the repository itself a few days ago: https://github.com/mirrys/citation-needed-paper/issues/2. If you (or anyone else reading this) would like to send a Pull Request to update the docs, that would also be a nice little contribution :)

Hi @Surlycyborg,
I just sent a pull request to add requirements.txt. :) Thanks!

Hello, @Miriam! My name is Monique and I don't know python, it will be difficult for me? Thank you in advance!

Ferculell subscribed.Oct 15 2019, 2:50 AM

In T234519#5570276, @Moniquedejesus wrote:

Hello, @Miriam! My name is Monique and I don't know python, it will be difficult for me? Thank you in advance!

HI @Moniquedejesus thanks for reaching out! In principle, you could use any other programming language to perform the task, but you might have to at least learn some basics in order to use the available ML models in tensorflow.

Hello @Miriam @Surlycyborg @Samwalton9,

I've submitted a summary report + the output file in an email to you.

Thank you!

@Miriam @Surlycyborg @Samwalton9

Hi, I am facing some issues in the running of the repository, it would be great if someone can help me out.I have sent mails as well but couldn't get the response though.
I am using the same dependencies as mentioned in the main repository. However, even after trying alot I am unable to resolve this issue.

ValueError: Dimensions must be equal, but are 200 and 187 for 'dense_25/add' (op: 'Add') with input shapes: [?,200,187], [1,187,1].

I believe I have downloaded everything and am using the same command as well but there is definitely something that I am missing. The similar issues were arising in the other rnn models. It would be great if someone can guide me a bit about it.

Hi @iriyagupta, I don't know why you are facing this issue. It's strange. As far as I know, this error occurs in a neural network when the output data shape of some layer does not match the expected input data shape of the next layer. We are using here a ready-made and tested model. So, if it is feeded with the appropiate data, this should not be occurring. I can only think in some library not performing as desired. Are you running the script on a virtual environment with all the specified requirements? Are you feeding the model with the correct test input file? I would suggest you to review the process step by step:

1- Clone the repository
2- Download the models, dictionaries and trainning data.
3- Create a virtual environment with Python 2.7 and all the required libraries (requirements.txt)
4- Run the indicated command in terminal with the appropiate parameters on the virtual environment.

I’m sorry I can’t help you more than that. I hope you can work it out.

Can there be issue with the virtual environment I am creating? I have been stuck on this issue since 2nd Oct.

I am using the requirements.txt file for loading the modules and imports necessary for this repository to run.
The the test_input file for the input and the rnn models as given for the inference. I am not sure where I am going wrong.

In T234519#5590033, @Ferculell wrote:

Hi @iriyagupta, I don't know why you are facing this issue. It's strange. As far as I know, this error occurs in a neural network when the output data shape of some layer does not match the expected input data shape of the next layer. We are using here a ready-made and tested model. So, if it is feeded with the appropiate data, this should not be occurring. I can only think in some library not performing as desired. Are you running the script on a virtual environment with all the specified requirements? Are you feeding the model with the correct test input file? I would suggest you to review the process step by step:

1- Clone the repository
2- Download the models, dictionaries and trainning data.
3- Create a virtual environment with Python 2.7 and all the required libraries (requirements.txt)
4- Run the indicated command in terminal with the appropiate parameters on the virtual environment.

I’m sorry I can’t help you more than that. I hope you can work it out.

Hi @iriyagupta. If you used Python 2.7 and requirements.txt to create the virtual environment, it should not be the cause of the issue. But I don't know what else could be the cause. Have you activated again the virtual environment before running the script in the terminal?

Hi ! So I found the issue. I would like to mention the issue here as well. So I worked on keras with theano as backend sometime back but changed the config file as it was initially compatible with tensorflow hence I changed the "image_data_format" in the main config json file. So when I reverted back to the tensorflow backend for this task, I changed the format to channels first instead of channels last.
So if anyone else faces this same issue with the dimensions of the model please do check their config file.

vim ~/.keras/keras.json

This can open up the config file for keras. Thanks @Ferculell for your efforts to help me out. Really appreciate it.
I will soon be sending in my observations for the repository and if I am able to make any progress on the tasks given. :)
Thanks.

In T234519#5591626, @Ferculell wrote:

Hi @iriyagupta. If you used Python 2.7 and requirements.txt to create the virtual environment, it should not be the cause of the issue. But I don't know what else could be the cause. Have you activated again the virtual environment before running the script in the terminal?

You're welcome, @iriyagupta. I'm glad you solved it :) Let's go ahead with the tasks!

In T234519#5592401, @iriyagupta wrote:

This can open up the config file for keras. Thanks @Ferculell for your efforts to help me out. Really appreciate it.
I will soon be sending in my observations for the repository and if I am able to make any progress on the tasks given. :)
Thanks.

Hi,

@Miriam @Samwalton9 @Surlycyborg .

I have sent my contributions for the first task and have added it to the outreachy contributions list as well.
Do I need to attach anything else as well except for the report and output file in the email ?

-Riya

I want to add here if a windows user faces similar issues.

I couldn't load tensorflow from requirement.txt. I had this error, "ERROR: Could not find a version that satisfies the requirement tensorflow==1.7.0 on virtualenv with python 2.7.13"

Resolved: Download tensorflow from here: 'https://github.com/fo40225/tensorflow-windows-wheel'

Also if you have difficulty with the model, you can extract using 7zip.

Unit-ade mentioned this in T237422: A System for releasing periodic data dumps from the citation needed model.Nov 5 2019, 3:17 PM

Thanks for creating a proposal! As we are past the deadline, if you would like us to consider your proposal for review, please move it to the submitted column. Thank you!

Samwalton9-WMF closed this task as Resolved.Nov 6 2019, 9:46 AM

Samwalton9-WMF claimed this task.

Samwalton9-WMF removed Samwalton9-WMF as the assignee of this task.

Your first task: classify sample statements using Citation Needed ModelsClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

Your first task: classify sample statements using Citation Needed Models
Closed, ResolvedPublic
Actions

Related Objects
Search...