Page MenuHomePhabricator

Implement the fuzzy search
Closed, ResolvedPublic


Current code already provides category suggestions as the user types, via AsyncTask with a call to the MediaWiki Commons API using the letters typed as a prefix. We intend to add Method A calls that would run in parallel with the existing prefix method.


  1. Use Utils.executeAsyncTask to run method A in parallel with prefix method
  1. Store categories in LinkedHashSet to preserve order (method A results should be shown above prefix results) -
  1. Display the results returned by Method A in the UI as soon as it returns. Use onPostExecute() for this purpose.
  1. The prefix task waits until method A is done before it shows its results, to prevent misclicks. Use CountDownLatch -

Update: Code appears to work on real device & emulator tests. Pushed release to Google Play.

Event Timeline

josephine_l claimed this task.
josephine_l raised the priority of this task from to Medium.
josephine_l updated the task description. (Show Details)
josephine_l moved this task from Backlog to Week-size tasks on the Commons-App-Android-Upload board.

@Niedzielski @Nicolas_Raoul - I'm encountering some issues with structuring the parallel methods. The problem is that CategoriesUpdater, which extends AsyncTask for the existing prefix method, is huge and bloated and contains a lot of extra logic that isn't relevant to the as-you-type category suggestions. If I want another task to run in parallel with CategoriesUpdater, it seems like I would need to create an additional class which extends AsyncTask, and duplicate most of the logic found in CategoriesUpdater?

I could try to refactor CategoriesUpdater entirely, but that seems like a pretty big undertaking. If I did refactor, what would be the best way to structure CategoriesUpdater? Or can we go with the duplicate logic for now?

I am not against refactoring in principle, Stephen what do you think?

When you say duplicate logic, I'm not sure quite which parts you mean. It's pretty massive, so a copy and paste of that AsyncTask wholesale would be a detriment to the codebase. Ideally, the codebase gets better with each commit. If you're unable to refactor, please do comment any code that is duplicated. Actually, just yesterday I noticed one of my peers made a change to some code that was even commented that it must be kept in sync with another method and failed to do so.

By the byte, on the subject of refactoring, I stumbled upon this talk[0] yesterday. It's not perfectly applicable here, but it is some good food for thought and even mentions the case of duplicate logic vs the wrong abstraction.


@Niedzielski - The existing CategoriesUpdater asynctask seems to contain the code for displaying the recent cats (+gps cats that I added) when the text field is empty, as well as the code for displaying the as-you-type category suggestions (which is the part that Method A actually changes).

I think the simplest way to refactor it would be to extract out some of the code blocks as separate methods that could be called from both parallel tasks, which would look neater but would still end up with, for instance, the database being queried twice for recent categories. Is this okay? But if I only do the recent cat query with CategoriesUpdater and not MethodAUpdater, would that negatively impact the behaviour of the app?

@josephine_l, small reusable methods sounds like a great starting point! We can work from there as needed.

Thanks @Niedzielski ! I've tried to do that - my current implementation is at .

I haven't implemented the LinkedHashSet yet though, still thinking on how to do that. A question I've always wondered about - if I'm using a LinkedHashSet to store the results from two inner classes, where should I call the new LinkedHashSet() constructor? In onCreate()? Or in the member variables at the top of the outer class?

Failed attempt at adding the items to LinkedHashSet and displaying them - . Having problems with adding the items from the two parallel tasks to a Set and then to the adapter. The code at that branch does not display anything when the user types in the text field, even though the logs show that CategoryItems are being added to the Set.

I would initialize the the LHS at declaration or in the constructor of the outer class. The inner classes can be static (or in their own files) and just a return a List or their own LHS in their onPostExectues. Then you just subclass them wherever they're newed up to override the onPostExecutes and update the outer class' LHS. I'll follow up on your patch.

Hi @Niedzielski,

Sorry, just a bit confused - do you mean that I should make a file and a file, and put most of the current code in those? And then in I should make a

private class PrefixUpdaterSub extends PrefixUpdater

(and similarly for MethodAUpdaterSub), where I just call the super onPreExecute() and doinBackgoround() methods, and override the onPostExecute() to add code that updates the outer class's LHS. Is my interpretation correct? If so, could I ask what the benefits of doing this is, compared to just keeping the inner classes the way they are? I don't disagree, just trying to understand the reasoning so I can learn from it. :)

Also, if we do this, how will the etc work without the member variables in CategorizationFragment that they used to have access to when they were inner classes? For instance:

private class MethodAUpdater extends AsyncTask<Void, Void, ArrayList<String>> {

    private String filter;
    protected void onPreExecute() {
        filter = categoriesFilter.getText().toString();


Does that mean that we will need to pass categoriesSkip, categoriesFilter etc to them as parameters in their constructor?

@Niedzielski @Nicolas_Raoul - My second failed attempt at linked-hash-set branch:

  1. I put PrefixUpdater and MethodAUpdater in their own files as recommended. Passed a CategorizationFragment object in as a parameter to their constructors and that seems to work fine for accessing member variables of CategorizationFragment.
  2. Tried to find a way to add items found in the 2 AsyncTasks to a Set in CategorizationFragment and do further results processing after their onPostExecute()s. Unfortunately I can't find a clear explanation of where the control goes after the AsyncTasks onPostExecute()s are done. Usage of AsyncResponse delegates in this SO thread seemed to be my best bet so I tried that and it seems to work, sort of.
  3. Encountering problems with processing the results because the AsyncTasks return THREE types of results: recent + GPS cats if text field is empty, cached items if user types in a string that is found in cache, and the results of the API search if the user types in a string that is not found in cache. When adding to our LHS we only want the 3rd type of result. So I changed the return type of the AsyncTasks to Pair where I pass in Pair<String, List<String>> so that I can choose to only add cats found to the LHS if the 3rd type of result is found. I have yet to find a way to do the appropriate things with the recent + GPS cats and the cache cats.
  4. Another issue - my LHS currently keeps accumulating the results of all the strings typed in. So if I type in 'latte' then it correctly accumulates all the results of Prefix and MethodA 'latte's... but then if I type in 'cup' later on it accumulates that along with the results from 'latte'... and so on and so forth. How do I make the Set empty itself each time the user types something new?

I read your comment on my patch, Niedzielski, thanks. But adding the items to the Adapter will have to come later after I figure all of the above out I guess.

I'm not sure I fully grok the questions :) I think Nicolas had something like this in mind:

class Fragment {
    private final Set<SearchResult> results = new LinkedHashSet<>();

    private void requestSearchResults() {
        final CountDownLatch latch = new CountDownLatch(1);
        execute(new PrefixSearchTask() {
            void doInBackground() {

            void onPostExecute(ResultSet result) {


        execute(new MethodASearchTask() {
            void onPostExecute(ResultSet result) {



Does that make sense? The meat of the Task code still lives in separate files but in the Fragment, you override the methods you need for specialized behavior when you create an instance.

@Niedzielski, thanks for the explanation. I think I went a bit far down the rabbit hole in my last branch so I made a fresh start with another branch, subtask-async. Pull request here.

I tried to follow your structure, with a few tweaks to get it to work with the existing codebase. The problem I'm encountering now is:

  1. This structure makes use of the anonymous inner class to override the methods when creating an instance, right? But since it's anonymous I think I can't use it with this part of the existing codebase:
if (prefixUpdater != null) {

But this part is quite important since without it, the API gets queried for every single keystroke (so even if you type LATTE quickly it searches for 'L', 'LA', 'LAT', etc, displaying the results for each one in turn). Before I understood that you were talking about using the anonymous inner class, I used actual subclasses instead which did work with this part of the code but threw InterruptedException errors (see earlier commit).

  1. Also, the results in the ListView seem to be randomly shown from either the Method A results or the Prefix results. I tried fiddling around with the Adapter code but to no avail.

@josephine_l, I'm a big believer in scrapping drafts an starting over. It's a great way to leverage your previous experience in solving a problem with none of the missteps.

I left a couple comments on your PR. Regarding #1, I'm not sure I fully understand you. Maybe the confusion is because you're trying to hold references to actual implementations of PrefixSearchTask anonymous subclasses, in this case, instead of a more abstract or interface type, AsyncTask, in this case. AsyncTask has a cancel[0] method. You can set prefixUpdaterSub / methodAUpdaterSub to their anonymous type where you new them up, e.g.::

prefixUpdaterSub = new PrefixUpdater(this) {
    protected ArrayList<String> doInBackground(Void... voids) {
        ArrayList<String> result = new ArrayList<String>();
        try {
            result = super.doInBackground();
        catch (InterruptedException e) {
            Log.w(TAG, e);
        return result;

    protected void onPostExecute(ArrayList<String> result) {

        Log.d(TAG, "Prefix result: " + result);

Regarding #2, you update results but I don't think you actually update the adapter's data. It looks like this code maybe lives in the new setCatsAfterAsync() method and you just need to call it towards the end PrefixUpdater.onPoseExecute().

Thanks @Niedzielski ! I had no idea we could do that with the anonymous types. :) I changed some of the code according to your feedback, and it seems to work now, yay! Updated the pull request.

josephine_l updated the task description. (Show Details)
josephine_l moved this task from Doing to Done on the Commons-App-Android-Upload board.

I am testing fuzzy search in 1.8 and here are a few observations. I don't know exactly what should work and what is too much to ask:

"japanese Buildings construction" correctly finds "Building construction in Japan"

Room for improvement:
"Buiding construction" does not find "Building construction"
"Building construction in japa" does not find "Building construction in Japan"
"Building construction of Japan" does not find "Building construction in Japan". It finds "Construction in Japan" and "Buildings in Japan" though. Removing "of" leads to the correct result.

Hi @Nicolas_Raoul

Is it okay if I copy your comment over to T119287 and we can discuss it there? Since this is feedback from manual testing. :)

Ah sorry that's the best place indeed! I copy it there.