Page MenuHomePhabricator

Kdhingra2210 (Karan Dhingra)
User

Projects

User does not belong to any projects.

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Feb 24 2018, 8:40 AM (173 w, 23 h)
Availability
Available
LDAP User
Unknown
MediaWiki User
Kdhingra2210 [ Global Accounts ]

Recent Activity

Nov 5 2018

Kdhingra2210 added a comment to T190660: GSoC 2018 Proposal: [Wikipedia Search] Predict relevance of search results from historical clicks using a Neural Click Model.

@Kdhingra2210: Any news to share about the status of this task? Thanks in advance!

Nov 5 2018, 2:35 PM · Discovery-Search, Google-Summer-of-Code (2018)

Sep 6 2018

Kdhingra2210 added a comment to T186742: Predict relevance of search results from historical clicks using a Neural Click Model.

@srishakatux yes, testing is ongoing and I think it should be finished in next week or so.

Sep 6 2018, 7:11 AM · Outreach-Programs-Projects, Discovery-Search, Google-Summer-of-Code (2018)
Kdhingra2210 added a comment to T190660: GSoC 2018 Proposal: [Wikipedia Search] Predict relevance of search results from historical clicks using a Neural Click Model.

@srishakatux the model is complete but it is in testing over the complete data, so those patches and the project should be completed once the testing is over. It should take more than a week to finish those tests and I am currently working on those patches.

Sep 6 2018, 7:10 AM · Discovery-Search, Google-Summer-of-Code (2018)

Jun 1 2018

Kdhingra2210 added a comment to T186742: Predict relevance of search results from historical clicks using a Neural Click Model.

@TJones and @EBernhardson

Jun 1 2018, 1:35 PM · Outreach-Programs-Projects, Discovery-Search, Google-Summer-of-Code (2018)

May 25 2018

Kdhingra2210 added a comment to T186742: Predict relevance of search results from historical clicks using a Neural Click Model.

@EBernhardson you were right, I made a mistake while defining an object in cython code

May 25 2018, 7:44 AM · Outreach-Programs-Projects, Discovery-Search, Google-Summer-of-Code (2018)

May 24 2018

Kdhingra2210 added a comment to T186742: Predict relevance of search results from historical clicks using a Neural Click Model.

@EBernhardson I did implement 3d sparse matrices in scipy too

May 24 2018, 9:47 PM · Outreach-Programs-Projects, Discovery-Search, Google-Summer-of-Code (2018)

May 16 2018

Kdhingra2210 added a comment to T186742: Predict relevance of search results from historical clicks using a Neural Click Model.

@EBernhardson and @TJones during the past couple of weeks I have majorly worked in building different types of sparse matrices and understanding the research paper because there are certain minute details which can affect it largely.

May 16 2018, 11:16 AM · Outreach-Programs-Projects, Discovery-Search, Google-Summer-of-Code (2018)

Apr 23 2018

Kdhingra2210 updated subscribers of T190660: GSoC 2018 Proposal: [Wikipedia Search] Predict relevance of search results from historical clicks using a Neural Click Model.

Thanks, @EBernhardson, @TJones, @srishakatux and other fellow community members for accepting my proposal.

Apr 23 2018, 11:32 PM · Discovery-Search, Google-Summer-of-Code (2018)
Kdhingra2210 added a comment to T186742: Predict relevance of search results from historical clicks using a Neural Click Model.

Congratulations @Kdhingra2210 on getting selected for this year's Google Summer of Code.

Thank you very much!

Apr 23 2018, 11:30 PM · Outreach-Programs-Projects, Discovery-Search, Google-Summer-of-Code (2018)

Apr 20 2018

Kdhingra2210 added a comment to T186742: Predict relevance of search results from historical clicks using a Neural Click Model.

Interesting that they have a SparseTensor but it's not supported.

Yepp, it has support just like normal sparse matrices but converting it to dense batches is very costly compared to scipy.sparse ones. (at least while computing using CPU)

Sure, i'll pull some data for 30 and 60 day aggregations and we can compare the datasets.

yep, I will look into these once uploaded

Apr 20 2018, 10:25 AM · Outreach-Programs-Projects, Discovery-Search, Google-Summer-of-Code (2018)

Apr 5 2018

Kdhingra2210 added a comment to T186742: Predict relevance of search results from historical clicks using a Neural Click Model.

I have tried plenty of approaches to maintain 3d sparse matrices, but none have worked out well atleast till now

Apr 5 2018, 1:18 PM · Outreach-Programs-Projects, Discovery-Search, Google-Summer-of-Code (2018)

Mar 27 2018

Kdhingra2210 moved T190660: GSoC 2018 Proposal: [Wikipedia Search] Predict relevance of search results from historical clicks using a Neural Click Model from Proposals In Progress to Proposals Submitted on the Google-Summer-of-Code (2018) board.
Mar 27 2018, 4:04 PM · Discovery-Search, Google-Summer-of-Code (2018)
Kdhingra2210 updated the task description for T190660: GSoC 2018 Proposal: [Wikipedia Search] Predict relevance of search results from historical clicks using a Neural Click Model.
Mar 27 2018, 4:04 PM · Discovery-Search, Google-Summer-of-Code (2018)
Kdhingra2210 added a comment to T186742: Predict relevance of search results from historical clicks using a Neural Click Model.

I can certainly run a new aggregation of the data if we find a better way. Some script to convert scipy.sparse to the tensorflow format should be relatively short to evaluate that before hand.

Mar 27 2018, 12:04 AM · Outreach-Programs-Projects, Discovery-Search, Google-Summer-of-Code (2018)

Mar 26 2018

Kdhingra2210 added a comment to T186742: Predict relevance of search results from historical clicks using a Neural Click Model.

Due to how much SERP_SIZE changes the dimensions, i'm thinking that we need to stay pretty much at 10. For new data i was referring to the amount of time we aggregate the click data over. Within wikimedia we only keep the input data (click logs, queries, etc) necessary for the machine learned ranking for 90 days. What i was thinking about here is that if we train a model against 90 days of aggregated data, save it somewhere, and then later want to run new data against it, does that new data need to be for the same number of days? Many of the individual values of the feature vector are counts of clicks over the aggregation period so the scale of data might make a difference. I'm not sure this is the most important thing to figure out though, setting the aggregation period and keeping it constant is relatively easy.

Currently model support variable length of the input (in terms of days), Also we can save the trained model to work as a backup in case, we ever have to restart the whole model.

Mar 26 2018, 11:57 PM · Outreach-Programs-Projects, Discovery-Search, Google-Summer-of-Code (2018)
Kdhingra2210 updated the task description for T190660: GSoC 2018 Proposal: [Wikipedia Search] Predict relevance of search results from historical clicks using a Neural Click Model.
Mar 26 2018, 8:16 AM · Discovery-Search, Google-Summer-of-Code (2018)
Kdhingra2210 claimed T190660: GSoC 2018 Proposal: [Wikipedia Search] Predict relevance of search results from historical clicks using a Neural Click Model.
Mar 26 2018, 8:15 AM · Discovery-Search, Google-Summer-of-Code (2018)
Kdhingra2210 moved T190660: GSoC 2018 Proposal: [Wikipedia Search] Predict relevance of search results from historical clicks using a Neural Click Model from Backlog to Proposals In Progress on the Google-Summer-of-Code (2018) board.
Mar 26 2018, 8:14 AM · Discovery-Search, Google-Summer-of-Code (2018)
Kdhingra2210 created T190660: GSoC 2018 Proposal: [Wikipedia Search] Predict relevance of search results from historical clicks using a Neural Click Model.
Mar 26 2018, 8:14 AM · Discovery-Search, Google-Summer-of-Code (2018)

Mar 25 2018

Kdhingra2210 added a comment to T186742: Predict relevance of search results from historical clicks using a Neural Click Model.

@EBernhardson for 3d sparse matrices, I looked for multiple posts and found that tensorflow has sparse input and it supports N dimensions, so we can pipeline the input to tensorflow sparse directly.

Mar 25 2018, 12:26 PM · Outreach-Programs-Projects, Discovery-Search, Google-Summer-of-Code (2018)
Kdhingra2210 added a comment to T186742: Predict relevance of search results from historical clicks using a Neural Click Model.

Also, for application over the different size of datasets, without retraining can be done using dynamic lstm cells (preferably gru), have worked over dynamic time series analysis and it provide quite better outputs but need relatively more training

Considering time would be an issue more or less, we would require to analyze which one would be better for this dataset. Here[1] it was concluded that both gating units perform better than traditional RNN which is pretty obvious. But the further conclusion said how LSTM performed better in one dataset whereas GRU performed better in another one. @EBernhardson could direct us from here.

[1] https://arxiv.org/pdf/1412.3555v1.pdf

Mar 25 2018, 6:55 AM · Outreach-Programs-Projects, Discovery-Search, Google-Summer-of-Code (2018)

Mar 24 2018

Kdhingra2210 added a comment to T186742: Predict relevance of search results from historical clicks using a Neural Click Model.

@EBernhardson I ported this model to barebone tensorflow, though it doesn't support gridsearch or improve timedistributed dense functions, it does support dynamic indexing over the time dimension.

Mar 24 2018, 1:07 PM · Outreach-Programs-Projects, Discovery-Search, Google-Summer-of-Code (2018)

Mar 22 2018

Kdhingra2210 added a comment to T186742: Predict relevance of search results from historical clicks using a Neural Click Model.

@EBernhardson Can you share what results you got with that algorithm because I was getting fairly good confusion matrix.

The confusion matrices in your link look relatively similar to what I was getting. The confusion matrices are good for a quick look at how it's doing, but i think they sometimes make the results look better than they are. Precision, recall and F1 scores might be a reasonable way to look at the data. One of the strongest models in your link looks to be (roughly):

val_loss: 0.1629 - val_acc: 0.9404
[[255628   6692]
 [  6338  20592]]

This says in the ground truth there were 20592 + 6692 = 27284 clicks. The model predicted 26930 clicks.
This gives a few metrics:

  • precision: 20592 / 26940 = 0.764
  • recall: 20592 / 27284 = 0.754
  • We could calculate F[0.5, 1, 2] from that, but the numbers are pretty close so call it 0.76.

This is certainly not a bad baseline, but seems like there is plenty of room for improvement.

I thought it to be in the opposite, that's why I asked about confusion matrix of yours,

Mar 22 2018, 12:58 PM · Outreach-Programs-Projects, Discovery-Search, Google-Summer-of-Code (2018)

Mar 20 2018

Kdhingra2210 added a comment to T186742: Predict relevance of search results from historical clicks using a Neural Click Model.

@EBernhardson Can you share what results you got with that algorithm because I was getting fairly good confusion matrix.

Mar 20 2018, 10:38 AM · Outreach-Programs-Projects, Discovery-Search, Google-Summer-of-Code (2018)
Kdhingra2210 added a comment to T186742: Predict relevance of search results from historical clicks using a Neural Click Model.

@Kdhingra2210 Great! I'm not really sure what the best entry point is. Without review on the data collection we have to mostly hope the aggregation I wrote is correct. I still put up a sample of the available click data on the public archive[1]. The data is perhaps deceptively small as it compresses very well. Contained there are around 900k observations (search sessions) . The X matrix has an overall shape of (899310, 11, 11265) giving ~111 billion individual values. I wrote a README file that hopefully explains the data format. The data format can also be reviewed in the gerrit link above which contains the code used to collect these sparse matrices.

[1] https://analytics.wikimedia.org/datasets/discovery/ncm-agg-clicks/

Mar 20 2018, 4:57 AM · Outreach-Programs-Projects, Discovery-Search, Google-Summer-of-Code (2018)

Feb 24 2018

Kdhingra2210 added a comment to T186742: Predict relevance of search results from historical clicks using a Neural Click Model.

Hi @EBernhardson, I looked up this project from GSoC2018 and am interested in this, I have gone through the research paper and would like to work on this. I have already worked on time series models.

Feb 24 2018, 8:55 PM · Outreach-Programs-Projects, Discovery-Search, Google-Summer-of-Code (2018)