@BBlack does the above help? Happy to steer the analysis in other directions if needed.
Hello! Sorry for the slow response, travels in the middle.
Fri, May 17
Fri, May 10
@BBlack Thank you, I'll get back to you with more insights soon!
Fri, May 3
Thanks @chelsyx ! Yes the questions are somehow similar -- how performance impacts engagement -- and we want perform a robust analysis 1 year after switching to the Singapore data center.
Tue, Apr 30
This was via the internet. But we should try to do this from the internal cluster, too, for comparison, if possible. I just need few instructions on how to do this!
Data collection is over, it took 271044s (~75hours) for 320815 images (~160k per class), i.e. 0.85 sec/image.
I downloaded 600-px thumbnails, and used 4 parallel sessions, as suggested in T215250.
Mon, Apr 29
Thank you so much @Dzahn!
Fri, Apr 26
@Gilles do you plan to test Thumbor's face detection functionalities too?
Apr 24 2019
Apr 23 2019
Apr 15 2019
Will add one more task for the improvement and release of this.
The paper will be published at The Web Conference 2019. Arxiv version here: https://arxiv.org/abs/1902.11116
Apr 11 2019
Apr 3 2019
OK, I can prepare a task for this, or we can start from something like this maybe?
Thanks @EBernhardson and all!!. Would a CNN finetuning task, using few thousand images only as input, work as a training task for testing performance?
Mar 28 2019
@Gilles, sounds good, thanks!
Mar 26 2019
Thank you @Gilles, I think they aimed for thumbnails sized 512 or 600. Do you think those are sizes reasonably widely used on wikis? I can advise them to keep the number of concurrent downloads below 5 just in case.
Mar 25 2019
Hi @Gilles @fgiunchedi, a group of researchers from HTW Berlin would be interested in doing Commons image visualization. They would like to download ~1M images on their machines, and they are asking what is a reasonable number of parallel download requests they could send?
Thanks! Grazie! Merci :)
Mar 19 2019
@leila, this task is tracking the analysis bit of the second round of data collection, too. The results for the first round of analysis are avilable here: https://meta.wikimedia.org/wiki/Research:Characterizing_Wikipedia_Citation_Usage/First_Round_of_Analysis
However, for next quarter, we would like to perform a more in-depth analysis. So either we modify the task (as it involves different sub-tasks), or we keep it open?
Yes, announcement just posted!
Feb 27 2019
@mforns yes, that is correct :) Thanks!
Feb 13 2019
@Ottomata yes, the size of the best image classification models is <1G
Feb 12 2019
Hi @RyanSteinberg ! We have decided to turn on the data collection for a short period of time, so that you have real data samples to perform all the quality checks you might need on your side. @bmansurov suggested we can collect data for one day, at a sampling rate of maybe 1% for both schemas. Would that sound good to you? Would it be OK if we switch on this small data-collection in the coming days? Thanks.
Feb 11 2019
Feb 8 2019
@Gilles thanks for this! Images and graphics have very different underlying image statistics: it is therefore fairly easy for a classifier to tell them a part. So it should be feasible.
Feb 7 2019
Feb 6 2019
Yess it now works after starting a new session on the client. Thanks @Ottomata !
@Ottomata thanks you're the best! I now see the table correponding to the new version of the schema, but I don't see the events I generated :/.
I'll try to see if that works from another session, but if you say it's flaky let's maybe not spend too much time on it? I can just parse the log file for the test events I generate, to double check that on the server side everything looks good.
I can see all events on the client side. I'll do some tests there.
On the server side, I can see in the client-side-events.log file all the events generated by my session_token . However, I can't find the same events in the MySQL log database. My understanding is that there should be a table recording events from the last version of the Citation Usage Schema) called CitationUsage_18810892, but I can't find it. Am I doing something wrong @Ottomata @elukey ? Thanks!
Feb 5 2019
Wow, thank you SO much, that works now :)
320px sounds like a good solution. In general, neural networks resize images to 256px before processing them.
Jan 24 2019
Sorry @bmansurov and thanks for the explanation ;) I think we should do the test in the beta cluster before deployment. @RyanSteinberg I might need your help to do these tests, as I you are more familiar with the last changes requested.
Jan 17 2019
Hi @bmansurov, I discussed with the others, and we are OK with starting the data collection, no more issues on our side. Thanks!