===Profile Information
**Name**: Karan Dhingra
**IRC**: kdhingra307
**Github**: https://github.com/kdhingra307
**Resume**: [[ https://drive.google.com/file/d/1POELxrhPLXr4wVFhWxjvDDUwCwGXd9BN/view?usp=sharing | karan_dhingra.pdf ]]
**Location**: New Delhi, India
**Working hours**: 142:00 to 231:59 (UTC+5:30)
===Synopsis
Click models are algorithmic approaches which help in the understanding relevance of documents over a given query by modeling the clicksearch queries in a particular fashion. Currently, Wikimedia Search uses Dynamic Bayesian Network[DBN] which is based on a probabilistic graphical model. Wikipedia Search is using Dynamic Bayesian Network[DBN] to model the clicks.An algorithmic model has been proposed named Neural Click Model [NCM]An algorithmic model, Neural Click Model [NCM] has been proposed, which is not only seems to be of more accurate than DBN but also provides a way to input semantic features apart from click data. This project is about implementing, testing and analyzing NCM to seeverify if it provides any computational or prediction benefits to the current model and finally integrating with the Mjolnir library.
- **Mentors**: @EBernhardson and @TJones
- I have been discussing with the mentors and community about the approaches and methodologies which should be followed over phabricator and have provided different patches through external links.
===Timeline
|Period | Task
|23rd April - 14th May| **Community Bonding**, discussion with mentors and community regarding requirements and approach for the projects.Revising, Integratefixing and test patches that have been developed by that timerefactoring existing code.
|14th May - 21st May| Patching and pProviding support for **3d sparse matrices** with functionality to load and store serializedto create and access 3d sparse matrices to file.
|21st May - 28th May| Writing the **core algorithm**, though many patches have already been introduced in the thread. Also implementing Hyper-Parameter auto tuner over grid search (Need Research)Development of core architecture.
|28th May - 10th June| Patching the model to provide support for **dynamicity over a certain axis** (so that it could be implemented over different datasets without retraining every time).
|11th June - 15th June| **GSoC First Evaluation**, fixing the bugs or issues if present in the code base. Also, tuning hyperparameters over single dataset and discussion regarding the results obtained1th June| Implementing Input Normalization function and tuning hyper parameters.
|15|11th June - 165th June| **Documen Testing** the results obtained and about the work completed till now in a blog post on Machine Learning Ranking models (ndcg).
|16th|15th June - 21st June| **Integrating**on of the whole model with **architecture in Mjolnir library** and wrapper for input of data directly from **hdfs file system**.rary
|22nd June - 29th June| Patching the current Dynamic Bayesian Network Model to support sparse inputs and comparing the results with of NCM1st June - 30th June| A/B testing on already available data for the majority of test parameters.
|30th June - 9th July| Working on different Machine Learning Ranking models (patching them if required) and training the labels obtained from NCM over these MLR model14th July | Normalization functions for probabilistic outputs to labels and for input search query
|14th July - 21st July | Retesting the whole model over single and multiple wikis.
|9th|21st July - 1330th July | GSoC Second Evaluation.Dynamicity over SERP_SIZE
|30th July - 5th August | Wiki, Bug fixing along with discussions regarding results oftest cases and Documentation (if remaining) for the trained model over MLR modelswhole code base.
|14th July - 21st July | Training the outputs received from NCM over different machine learning ranking models and comparing the results obtained from NCM to DBN model|5th August - 1st September | Retesting the whole model to analyze the performance improvements by varying SERP_SIZE.
-- **Providing support to create and access 3d sparse matrices**: This model requires input labels to be of 3 dimensions but scipy does not support 3D sparse matrices. So it requires the data to be stored in 2D, converted from sparse to dense batch by batch and then finally be transformed to 3D. The task would be to make this implicit so that learning engine can directly access input matrices without any transformation.
-- **Implementing Input Normalization function and tuning hyper parameters**: Approach would be to start tuning hyper parameters using grid search and work on normalization function while grid search is running in the background to save time. Normalization is required because while creating data for training, click counts are generated and they will vary with the time length.
-- **Integration of the architecture in Mjolnir library**: In this task, major work would be to write a wrapper that can convert input data from SQL to the 3D Sparse model and also integration with the data_pipeline.
-- **A/B testing on already available data for the majority of test parameters**: A/B testing has already been done last year over DBN model, so if we have data from last year then for most of the test cases we can compare the proposed model and DBN directly.
-- **Dynamicity over SERP_SIZE**: SERP_SIZE is the number of search results generated per page. It has very core interlinking in the input data, that's why I have taken this task last in the timeline, dynamicity of SERP_SIZE would enable us to have a generalized model and also we can predict optimal SERP_SIZE. It would require remodeling of input data and the way architecture is feeding to sequential units.
===Participation
Since the original Mjolnir is present both on the GitHub and GitterGerrit, I can work on either of the platforms. I would prefer to fork the repository over Github and merge down my changes whenever my mentors approve them. Also, I can maintain a PR or add the mentors to the forked repository so that they can provide me insights, I can maintain a PR or add the mentors to the forked repository so that they can provide me insights and review the code.
Communication is a key aspect of the success of any project. I can use any platform to communicate which mentors would like to have, I am available on most of the platforms. For community bonding, I would be using phabricator to discuss with other members.
I would be following proper documentation style and coding standards(will be discussed with the mentors during community bonding period), I would prefer to write blogs about my work after every evaluation( rather than weekly).
===About Me
I am a final year undergraduate at the University of Delhi, studying "Information Technology and Mathematical Innovation". I have my majors in mathematics and computer science.My majors are in mathematics and computer science. I was well aware of GSoC from last year but had prior commitments, this year I will be free from the first week of May which allows me to devote 100% of my time on this project. While applying to GSoC and choosing which project to work on, my focus was to work in my current domain of specialization (Human-Computer Interaction). Since the time I have been working on this project I have been attracted by the community and will be applying on for this.
During my last internship, I was well aware of GSoC from last year but had prior commitments last yearorked on similar kind of project to understand user behavior from their keystrokes dynamics using time series analysis only and I was very surprised with the results I got. I will be getting free in the first week of May which suits very well to devote 100% of my efforts to oneWith this project., I have just applied to GSoC and I choose my projects based on my domain of research (Human-Computer Interaction) and currently focussing on this project only.would get an opportunity to see the impact of similar kind of models on a massive scale.
===Experiences
**Keystrokes Dynamics Using Computational Intelligent Methods**
I was introduced to machine learning and concepts of Human Conscious during my internship. I was intrigued by the idea of how children learn by just interacting with their surroundings and develop consciousness, and this is what motivated me to explore more of Human-Computer Interaction. Communication is a blend of verbal as well as non-verbal components, and since good communication is about the transfer of feelingsThe main purpose of this project was to understand user behavior and use it as a secondary measure of authentication, non-verbal components have the same importance as of verbalkeystrokes are not much popular as an authentication measure because of accuracy and privacy concern. That's why I like to workAn architecture based on those projects in which I have to understand the user behavior and thought processime series analysis of keystrokes and semantic analysis of user's input was designed.
This project is completely in my domain of research and understanding, working on any project that interests me give me a sense of satisfactionMy Contributions:
-- Designing the architecture for input of dynamic sequence length using GRU cells.
-- Semantic analysis using Multilayer Perceptron with a mathematical normalizer for ranking the vocabulary.
===Past Experience**Emotion Recognition from Speech**
This project involves analysis of speech using MFCCs and deep neural networks to predict the emotion over segment level.
I have worked majorly in the field of Human-Computer Interaction and optimization problems. I had developed an architecture that analyzes user behavior from their keystroke dynamics using dynamic time series analysis and semantic analysis, It had a very similar way of approach to this project but on smaller and research scaleMy Contributions:
-- Developed a model to generate feature set of MFCCs from raw audio and then plugging to a deep neural network for feature training.
-- Designed a feedback loop for the auto training of hyperparameters.
I have recently started contributing to the open source and handling large indices sparse matrices in scikit-learn, though it is not so major PR. I got introduced to Wikimedia and phabricator recently through this project, though I have created different patches(which can be found in discussion thread).===Contributions
===Any Other InfoI have recently started contributing in open source and am handling large indices sparse matrices at scikit-learn, though it is not a major PR. I got introduced to Wikimedia and phabricator recently through this project and have created different patches.
You can find my patches on that thread or can go through my github page but I cannot share my major projects publically, though I can share it with the mentors or specific people if askedAll of the external patches have been shared in this [[ https://phabricator.wikimedia.org/T186742 | thread ]].