Page MenuHomePhabricator

AikoChou
User

Projects

User does not belong to any projects.

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Wednesday

  • Clear sailing ahead.

User Details

User Since
Oct 2 2019, 10:06 AM (89 w, 5 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
AikoChou [ Global Accounts ]

Recent Activity

Fri, Jun 18

AikoChou added a comment to T276407: An End-to-End Image Classification Pipeline.

Weekly updates:

Fri, Jun 18, 3:06 PM · Research (FY2020-21-Research-April-June), Structured-Data-Backlog, MachineVision

Fri, Jun 11

AikoChou added a comment to T276407: An End-to-End Image Classification Pipeline.

Weekly updates:
We confirmed (1) How the input data is formatted and (3) The function used to transform the Keras model to an Estimator are not the cause of the poor performance for Estimator, as we trained a CNN model from scratch that can reach the same performance in both Keras and Estimator.

Fri, Jun 11, 4:24 PM · Research (FY2020-21-Research-April-June), Structured-Data-Backlog, MachineVision

Sun, May 30

AikoChou awarded T283980: Phacility (Maintainer of Phabricator) is winding down. Upstream support ending. a Burninate token.
Sun, May 30, 5:12 PM · Release-Engineering-Team (Seen), User-Matthewrbowker, Phabricator

May 21 2021

AikoChou added a comment to T276407: An End-to-End Image Classification Pipeline.

Weekly updates:
We wrote documentation of distributed image inference workflow in the Github repo and provided three tasks as examples: image quality inference, face detection, and Resnet feature extraction. With regard to distributed training using tf-yarn, we are looking for an alternative to wrap a Keras model in Estimator to solve the accuracy issue.

May 21 2021, 4:21 PM · Research (FY2020-21-Research-April-June), Structured-Data-Backlog, MachineVision

May 18 2021

Ladsgroup awarded T276407: An End-to-End Image Classification Pipeline a Yellow Medal token.
May 18 2021, 5:39 AM · Research (FY2020-21-Research-April-June), Structured-Data-Backlog, MachineVision

May 3 2021

elukey awarded T276407: An End-to-End Image Classification Pipeline a Party Time token.
May 3 2021, 10:29 AM · Research (FY2020-21-Research-April-June), Structured-Data-Backlog, MachineVision
AikoChou added a comment to T276407: An End-to-End Image Classification Pipeline.

Weekly update:

May 3 2021, 9:16 AM · Research (FY2020-21-Research-April-June), Structured-Data-Backlog, MachineVision

Apr 29 2021

AikoChou created P15638 image classification gpu.
Apr 29 2021, 6:44 AM

Apr 6 2021

AikoChou added a comment to T277828: Investigate placeholder image recommendation.

For point 1. I calculated the number of overlapped images in allowed_images and image_placeholders as follows:

Apr 6 2021, 8:12 AM · Research (FY2020-21-Research-April-June), Growth-Team-Filtering, Image-Recommendations, Growth-Team
AikoChou added a comment to T276407: An End-to-End Image Classification Pipeline.

I want to use tf-yarn to train a simple model on the cluster, but I found some environment variables need to be set up, which described in this doc:

  • JAVA_HOME: /usr/bin/java
  • HADOOP_HDFS_HOME: /usr/bin/hdfs
Apr 6 2021, 5:14 AM · Research (FY2020-21-Research-April-June), Structured-Data-Backlog, MachineVision

Mar 27 2021

AikoChou added a comment to T277828: Investigate placeholder image recommendation.

I updated the code in the GitHub repo (in the branch) that improves filtering out placeholders. The workflow is as follows - first use PetScan to search all the subcategories from Category:Image_placeholders (https://petscan.wmflabs.org/?psid=18699732). Next, query for all images from those categories in Hive. Then, exclude these images when querying for candidates in both wikidata commons category (fewer cases) and other wikis (many cases).

Mar 27 2021, 3:12 PM · Research (FY2020-21-Research-April-June), Growth-Team-Filtering, Image-Recommendations, Growth-Team

Mar 23 2021

AikoChou added a comment to T274225: Multivariate logistic regression on search scores.

Hi @Cparle - yes of course, there you go:

Mar 23 2021, 2:42 PM · SDAW-MediaSearch (MediaSearch-ImageRecs), Structured-Data-Backlog (Current Work), Image-Recommendations, Structured Data Engineering, WikibaseMediaInfo

Mar 10 2021

AikoChou added a comment to T274878: Estimate the number of images added to each Wiki in a month.

@MMiller_WMF -- here are the results computed using unillustrated articles for which the algorithm has at least one recommendation. Since illustrated articles for February are available to query, I added results for January. Most of them fall within the range of 0.1% ~ 8%. There are two very high numbers 21.46% and 31.62% in arzwiki (In previous results, these two months also have relatively high percentages). A scatter plot is shown below that excludes the two outliers, showing the distribution for most wikis.

Mar 10 2021, 9:02 AM · Research (FY2020-21-Research-January-March), Image-Recommendations, Growth-Team

Mar 4 2021

AikoChou updated the task description for T276407: An End-to-End Image Classification Pipeline.
Mar 4 2021, 1:35 AM · Research (FY2020-21-Research-April-June), Structured-Data-Backlog, MachineVision
AikoChou added a comment to T276407: An End-to-End Image Classification Pipeline.

Summary of the work done so far:

  • Imported the image data on local and saved to TFRecords files
  • Finetuned an Xception model to classify images between 'sculptures' and 'maiolica'
  • Ran inference on test data on local
Mar 4 2021, 1:27 AM · Research (FY2020-21-Research-April-June), Structured-Data-Backlog, MachineVision
AikoChou created T276407: An End-to-End Image Classification Pipeline.
Mar 4 2021, 1:08 AM · Research (FY2020-21-Research-April-June), Structured-Data-Backlog, MachineVision

Mar 2 2021

AikoChou added a comment to T274878: Estimate the number of images added to each Wiki in a month.

Here are estimates of the percentage of unillustrated articles that become illustrated after one month for each target wikis.

Mar 2 2021, 5:08 AM · Research (FY2020-21-Research-January-March), Image-Recommendations, Growth-Team

Feb 22 2021

AikoChou added a comment to T272109: Assess prevalence of Wikidata infoboxes.

Hi all,

Feb 22 2021, 8:02 AM · Research (FY2020-21-Research-April-June), Growth-Team-Filtering, Image-Recommendations, Growth-Team, Wikipedia-Android-App-Backlog

Feb 15 2021

AikoChou added a comment to T272109: Assess prevalence of Wikidata infoboxes.

Hi @MMiller_WMF @Tgr -- it's very nice to meet you too. I'm really happy to have the opportunity to help :D

Feb 15 2021, 8:26 AM · Research (FY2020-21-Research-April-June), Growth-Team-Filtering, Image-Recommendations, Growth-Team, Wikipedia-Android-App-Backlog

Feb 11 2021

AikoChou added a comment to T274225: Multivariate logistic regression on search scores.

Hi all,

Feb 11 2021, 6:00 AM · SDAW-MediaSearch (MediaSearch-ImageRecs), Structured-Data-Backlog (Current Work), Image-Recommendations, Structured Data Engineering, WikibaseMediaInfo

Feb 10 2021

AikoChou added a comment to T274225: Multivariate logistic regression on search scores.

@Miriam Yeah if there is no maximum, it's not appropriate to use normalization. I'll update the result of the non-normalization one.

Feb 10 2021, 1:31 PM · SDAW-MediaSearch (MediaSearch-ImageRecs), Structured-Data-Backlog (Current Work), Image-Recommendations, Structured Data Engineering, WikibaseMediaInfo

Feb 9 2021

AikoChou added a comment to T272109: Assess prevalence of Wikidata infoboxes.

Hi all!
Here are the results after removing icons (.svg). Overall, these numbers drop slightly but not change much.

Feb 9 2021, 8:43 AM · Research (FY2020-21-Research-April-June), Growth-Team-Filtering, Image-Recommendations, Growth-Team, Wikipedia-Android-App-Backlog
AikoChou created T274225: Multivariate logistic regression on search scores.
Feb 9 2021, 7:18 AM · SDAW-MediaSearch (MediaSearch-ImageRecs), Structured-Data-Backlog (Current Work), Image-Recommendations, Structured Data Engineering, WikibaseMediaInfo

Feb 8 2021

AikoChou added a comment to T272109: Assess prevalence of Wikidata infoboxes.

Hi all,

Feb 8 2021, 8:27 AM · Research (FY2020-21-Research-April-June), Growth-Team-Filtering, Image-Recommendations, Growth-Team, Wikipedia-Android-App-Backlog
AikoChou reopened T273602: Access to analytics-privatedata-users for Research contractor AikoChou as "Open".
Feb 8 2021, 12:05 AM · Research, SRE, SRE-Access-Requests
AikoChou added a comment to T273602: Access to analytics-privatedata-users for Research contractor AikoChou.

Could you double check that I have LDAP access? because I'm not able to access the notebooks.

Feb 8 2021, 12:05 AM · Research, SRE, SRE-Access-Requests

Feb 3 2021

AikoChou added a comment to T273602: Access to analytics-privatedata-users for Research contractor AikoChou.

Hi @CDanis,
My wikitech username: AikoChou
Preferred shell username: aikochou
SSh public key: https://phabricator.wikimedia.org/P14137
I have read and signed the L3 Wikimedia Server Access Responsibilities document.
Thanks! :)

Feb 3 2021, 6:14 AM · Research, SRE, SRE-Access-Requests
AikoChou created P14137 Ai-Jou Chou (AikoChou) production SSH public key.
Feb 3 2021, 6:00 AM

Mar 9 2020

Pavithraes awarded T241518: Outreachy Proposal: A system for releasing data dumps from a classifier detecting unsourced sentences in Wikipedia a Love token.
Mar 9 2020, 7:41 AM · Outreachy (Round 19)
AikoChou closed T233707: A system for releasing data dumps from a classifier detecting unsourced sentences in Wikipedia, a subtask of T199190: [2.4] Improve unsourced statement identification tools and algorithms, as Resolved.
Mar 9 2020, 4:54 AM · Knowledge-Integrity, Epic
AikoChou closed T233707: A system for releasing data dumps from a classifier detecting unsourced sentences in Wikipedia as Resolved.

Completed the wrap-up steps:

Mar 9 2020, 4:54 AM · User-ArielGlenn, Research, Outreachy (Round 19)
AikoChou closed T241518: Outreachy Proposal: A system for releasing data dumps from a classifier detecting unsourced sentences in Wikipedia, a subtask of T233707: A system for releasing data dumps from a classifier detecting unsourced sentences in Wikipedia, as Resolved.
Mar 9 2020, 4:53 AM · User-ArielGlenn, Research, Outreachy (Round 19)
AikoChou closed T241518: Outreachy Proposal: A system for releasing data dumps from a classifier detecting unsourced sentences in Wikipedia as Resolved.

Completed the wrap-up steps:

Mar 9 2020, 4:53 AM · Outreachy (Round 19)

Mar 2 2020

srishakatux awarded T233707: A system for releasing data dumps from a classifier detecting unsourced sentences in Wikipedia a Love token.
Mar 2 2020, 11:25 PM · User-ArielGlenn, Research, Outreachy (Round 19)
Pavithraes awarded T233707: A system for releasing data dumps from a classifier detecting unsourced sentences in Wikipedia a Love token.
Mar 2 2020, 2:54 PM · User-ArielGlenn, Research, Outreachy (Round 19)
AikoChou added a comment to T233707: A system for releasing data dumps from a classifier detecting unsourced sentences in Wikipedia.

In the last week of the internship, I've been working on:

Mar 2 2020, 7:59 AM · User-ArielGlenn, Research, Outreachy (Round 19)
AikoChou added a comment to T233707: A system for releasing data dumps from a classifier detecting unsourced sentences in Wikipedia.

Week 9-10

Mar 2 2020, 7:59 AM · User-ArielGlenn, Research, Outreachy (Round 19)
AikoChou added a comment to T233707: A system for releasing data dumps from a classifier detecting unsourced sentences in Wikipedia.

Week 1-8 Summary

Mar 2 2020, 7:58 AM · User-ArielGlenn, Research, Outreachy (Round 19)
AikoChou added a comment to T241518: Outreachy Proposal: A system for releasing data dumps from a classifier detecting unsourced sentences in Wikipedia.

In the last week of the internship, I've been working on:

Mar 2 2020, 7:56 AM · Outreachy (Round 19)
AikoChou added a comment to T241518: Outreachy Proposal: A system for releasing data dumps from a classifier detecting unsourced sentences in Wikipedia.

Week 9-10

Mar 2 2020, 7:53 AM · Outreachy (Round 19)
AikoChou added a comment to T241518: Outreachy Proposal: A system for releasing data dumps from a classifier detecting unsourced sentences in Wikipedia.

Week 1-8 Summary

Mar 2 2020, 7:49 AM · Outreachy (Round 19)

Jan 22 2020

AikoChou added a comment to T233707: A system for releasing data dumps from a classifier detecting unsourced sentences in Wikipedia.

Weekly update

  • Modified the input pipeline and the format written to the database.
  • Worked on a script to ingest data into Citation Hunt.
  • Created pull request of 1 and 2 for Guilherme to review
Jan 22 2020, 1:57 PM · User-ArielGlenn, Research, Outreachy (Round 19)
AikoChou added a comment to T241518: Outreachy Proposal: A system for releasing data dumps from a classifier detecting unsourced sentences in Wikipedia.

We are in week 8 now but have moved to week 9 work. Just swapped Citation Hunt work with testing/regular job work. Swapped 9-12 with 6-8. :)

Jan 22 2020, 10:04 AM · Outreachy (Round 19)

Dec 30 2019

AikoChou closed T241585: Facing an issue when loading a Tensorflow model as Resolved.
Dec 30 2019, 9:12 PM · Toolforge
AikoChou added a comment to T241585: Facing an issue when loading a Tensorflow model.

Thank you all for your help!
The issue was solved when I run it using the grid engine and set -mem option. :)

Dec 30 2019, 9:11 PM · Toolforge
AikoChou added a comment to T241585: Facing an issue when loading a Tensorflow model.
Dec 30 2019, 1:53 PM · Toolforge
AikoChou created T241585: Facing an issue when loading a Tensorflow model.
Dec 30 2019, 11:10 AM · Toolforge

Dec 28 2019

AikoChou updated the task description for T241518: Outreachy Proposal: A system for releasing data dumps from a classifier detecting unsourced sentences in Wikipedia.
Dec 28 2019, 8:21 PM · Outreachy (Round 19)
AikoChou created T241518: Outreachy Proposal: A system for releasing data dumps from a classifier detecting unsourced sentences in Wikipedia.
Dec 28 2019, 8:20 PM · Outreachy (Round 19)

Nov 28 2019

AikoChou updated AikoChou.
Nov 28 2019, 12:52 AM

Oct 14 2019

AikoChou added a comment to T234606: Your second task: classify statements within an article.

@Surlycyborg @Miriam @Samwalton9 I have confusion hopefully you could clear it out.

First, could you define the statement and the sentence in the scope of this project.

I see in your paper that you did the rain on random sentences. But the example on the github repo in doing the prediction on statements which could be composed of multiple sentences.

Hi @Ghassanmas I am also confusing about the definition between statement and sentence in this project. Thanks for pointing out.

Oct 14 2019, 7:33 PM · Outreachy (Round 19)
AikoChou added a comment to T234606: Your second task: classify statements within an article.

Here is my repo:
https://github.com/AikoChou/wikimedia-outreachy-2019

Oct 14 2019, 6:49 PM · Outreachy (Round 19)

Oct 8 2019

AikoChou added a comment to T234519: Your first task: classify sample statements using Citation Needed Models.

Hello! I found it useful to install the packages in a virtual environment, had some issues probably with packages from before, and having a virtual environment solved the "No module found" errors. Here is some info: https://docs.python-guide.org/dev/virtualenvs/

Yes, thanks for the suggestion! By the way, I filed a similar issue in the repository itself a few days ago: https://github.com/mirrys/citation-needed-paper/issues/2. If you (or anyone else reading this) would like to send a Pull Request to update the docs, that would also be a nice little contribution :)

Oct 8 2019, 2:20 AM · Outreachy (Round 19)

Oct 4 2019

AikoChou added a comment to T234606: Your second task: classify statements within an article.

Do we have a deadline for this task? Thanks!

Oct 4 2019, 11:32 AM · Outreachy (Round 19)
AikoChou added a comment to T234519: Your first task: classify sample statements using Citation Needed Models.

I am using :
Python 3.7
Keras 2.2.4
Tensorflow 2.0.0

Oct 4 2019, 3:00 AM · Outreachy (Round 19)

Oct 3 2019

AikoChou added a comment to T234519: Your first task: classify sample statements using Citation Needed Models.

Hello everyone! I'm also participating to this Outreachy round and I'm very interested in doing to this project of Wikipedia :)
Thanks for the opportunity, @Miriam!
I'm having the following error when I run the script:

2019-10-03 19:05:10.628372: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudart64_100.dll'; dlerror: cudart64_100.dll not found
Traceback (most recent call last):
  File "run_citation_need_model.py", line 17, in <module>
    K.set_session(K.tf.Session(config=k.tf.ConfigProto(intra_op_parallelism_threads=10, inter_op_parallelism_threads=10)))
AttributeError: module 'keras.backend' has no attribute 'tf'

I'm also using Python 3.7, and I have Keras 2.3.0. Maybe this is my issue?
Thanks in advance!

Oct 3 2019, 11:51 PM · Outreachy (Round 19)