I will wait until the implementation is close to being merged before doing this, as there's a small chance that the engineers could encounter issues that require changing the schema.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
May 5 2021
May 4 2021
All merged now!
May 3 2021
To check the possibility that KaiOS app pageviews were not being recorded properly in the pageviews_hourly stream, @SBisson used the app to view the Ugandan recommendations from Canada over the weekend.
Apr 29 2021
Apr 27 2021
I've completed the check; you can check out my notebook for the full details.
Apr 23 2021
Apr 21 2021
Apr 19 2021
I've emailed Legal with the question.
Pulling this data is extremely easy. The main issue is the uncertainty about how we can share it internally—see the extensive discussion in T271202. I will check with Legal this week.
For what it's worth, I would recommend not reporting the opt-out rate for registered users at all. Since most of those users are long vanished, it's not really a fair metric. You've already dealt with that by providing a better metric (opt-out rate for active users), but in this case, it's so much better that I think it should just stand on its own.
Apr 18 2021
Apr 17 2021
This is now in Legal's queue. I requested that they do it by 30 April; they said they'll do their best but can't commit. I've offered to do anything that would help expedite the process, such as briefing one of the their team members face-to-face.
Apr 15 2021
In T278597#7002929, @DAbad wrote:So we have two time_stamps: Server_dt and Client_dt. We should validate using the Server_dt to validate if the client_dt is off and then determine how to handle it...
After speaking with Jason and Michael today, we likely don't need milliseconds and if we don't have fractional sections we will interpret as a round number.
The schema draft and diagram have been fully updated with the changes I made in response to the engineering review. There should be no more changes unless I get high-priority feedback or something changes in the product itself.
Apr 14 2021
In the end, we didn't need this task. We went with an announce-the-switch-then-test model rather than the other way around.
I discussed the schema some time ago with @ngkountas and @santhosh. They gave me some helpful suggestions and said it looked good otherwise. I've just finished adjusting the draft to incorporate their suggestions.
In T280178#7000128, @SBisson wrote:I understand it's early but given the early numbers you shared, ~3.5k devices per group per day, we could start seeing a difference between the groups.
I've put in the request to Legal.
Apr 5 2021
Apr 1 2021
In T277348#6959373, @SNowick_WMF wrote:will try the spark-sql query next time I need to run these queries.
The job has now run correctly for two days in a row, so I think we can now rely on it for the experiment.
Mar 31 2021
In T216294#6957007, @elukey wrote:Neil I think that this can be closed, what do you think?
In T275212#6958479, @razzi wrote:@nshahquinn-wmf any luck with that setting?
It looks like the job ran just fine today! I'll check again tomorrow before closing this.
Mar 30 2021
Okay, I believe I've finally got everything set up properly.
If creating this list is too difficult or expensive, a ranking of wiki projects by monthly active editors would be good enough as well. What matters to us is the ranking of projects, not the exact number of counted editors in each project.
Mar 24 2021
I've discussed this with @Pginer-WMF: we do want to keep this job running properly, but I'll do it after T254891/T254891.
Mar 23 2021
This has also caused T275233.
Mar 22 2021
@Ottomata and @JAllemandou, thank you very much for investigating this!
Mar 19 2021
The job is running again, but the daily pageview counts generated by the new job are only 20% of the ones generated by the old job, even though I didn't make any substantive change to the query. I'll need to figure this out if we want to keep producing meaningful data.
Yes, that looks right to me! Great work figuring this out yourself; if you ever get tired of product management, you can certainly switch to data science 😊
In T271962#6928254, @Samwalton9 wrote:We weren't sure if CentralAuth counts deleted edits or not. From some testing it doesn't seem clear - the numbers don't match total edits with or without deleted contribs, so I think it's being calculated differently. In my query, looking at one test user with ~9000 edits, keeping deleted contribs gave a number that was close - but not exactly - the figure CentralAuth has. Removing deleted contribs gave a substantially different number. I'll move forward including deleted edits, since this is going to count more users rather than less, which is I think preferable.
Mar 18 2021
Mostly done now (even if there's less point because of T277781); raising to high because (more) data loss is imminent.
This seems high priority to me; please change if you disagree.
Just to make sure @Samwalton9's question gets looked at 😊
Legal has finished their work, so my role as shepherd and consultant is finished too.