@Lea_WMDE Oh, I see... at this point, I can't find a fix. What the dashboard is showing is essentially correct (N/A; and in fact there are no data points for the respective singlestat). I've tried with value mappings, no changes. Give me some time to browse the docs.
@Lea_WMDE Please: what do you mean by 1-value displays? I will see to fix it, just let me know what display do you mean.
@Lea_WMDE This is fixed now: FileImporter Pageviews and ImportSuccess do not show 0 views/ imports, which makes the graph look a bit weird.
@Lea_WMDE Re-scaled to MBs. Please take a look at the Dashboard and let me know whether the numbers in the last row now make sense.
FileImporter Pageviews and ImportSuccess do not show 0 views/ imports, which makes the graph look a bit weird
I can pick this one up if someone can provide an introduction to Lua modules usage in Wikimedia projects for me.
@Lea_WMDE Please take a look at https://grafana.wikimedia.org/dashboard/db/mediawiki-fileimporter
The Exploratory Data Analysis visualizations are found in the Exploratory Data Analysis sections for the Donations and the Membership data sets respectively:
Wed, Jun 20
- Lexems moved to the right Y-axis with properties.
Membership Data Set, campaign-wise analyses
- Lexeme usage is now present at https://grafana.wikimedia.org/dashboard/db/wikidata-datamodel (note: the data has just started to roll in, so you are looking for a point in the lower right corner of the graph).
- We need to understand how to track forms. Will forms have a namespace?
Tue, Jun 19
Campaign-wise analysis (i.e. all the data on donation amounts and opting in analyzed together, no matter the particular campaign from where did they originate):
Thank you for your comments!
Mon, Jun 18
You can track the development of this dashboard at: https://grafana.wikimedia.org/dashboard/db/mediawiki-fileimporter
@Jan_Dittrich Yes, that would be the conclusion.
Just another side note: @Tobi_WMDE_SW is my engineering manager for non-Wikidata related tasks, and that is the reason why he is always poked - namely, he (successfully) takes care about the most of my priorities.
@Tobi_WMDE_SW the " I'm sorry very much" part is completely unnecessary. In analytics, it happens every day: complicated data sets, many variables under consideration... Catching it is what is important, and we caught it.
@Tobi_WMDE_SW You say "The reason to say the old landing page underperforms is that certain other perfomance indicators perform worse: the rate of donations without adress ("anonymous") is increased, the rate of email opt-ins is decreased, the rate of female donors (=female salutations) is decreased."
Sun, Jun 17
Gentlemen, all that I could provide in respect to this is found at T195242.
Here's a detailed technical report. Please let me know if you need any additional sections (I didn't bother to visual the differences between group means for t-tests as I find that trivial; if you wish to take a look at the distributions per campaign - just let me know).
Sat, Jun 16
- Finally, the analysis of the opt_in variable in the Donations data set:
- binary logistic regressions were run, one model per campaign, with one categorical predictor (the campaign A/B, of course) and opt_in as a dependent variable;
- only for mob05-ba-171218 we obtain no effect; for all other campaigns, the new page actually lowers the odds to opt in significantly.
- The analysis of the Membership data set relied on non-parametric tests only (Mann-Whitney U test), because
- we get to have really small sample sizes per campaign following the data clean-up
- (in which case we are looking for a normal distribution in the samples themselves in order to ensure for the validity of t-tests, and I don't think we would have any luck in doing so).
Fri, Jun 15
- We can conclude with certainty that there was no old page vs. new page effect in any of the campaigns in the Donations data set.
- Thus far, I have conducted
- (a) t-tests (independent and Welch),
- (b) tested the power law behavior in the distributions of donation amount to see whether the sampling distributions can be assumed to be normal (75% of the time, the answer is yes, so t-tests should do fine),
- (b) run the non-parameteric Mann-Whitney U test to confirm the findings, and
- (d) repeated the t-tests and the Mann-Whitney tests after removing extreme outliers (> 3*IQR) from the tail of the distributions (i.e. very large donations).
- Converging methodologies provide the same result, equivocally, for all campaigns: no effect.
Thu, Jun 14
- First results for the donations dataset, parametric tests (t-tests) used: 'old' vs 'new' comparison has no effect in any of the campaigns.
- Moreover, TOST tests of equivalence (@Jan_Dittrich - thanks) show than no effects are significantly larger than the minimal assumed effects.
- However, non-parametric tests must be conducted, given that (most probably) no distributions here have finite variances in the first place.
- Donations data set is now clean, re-scaled to yearly amounts, and under analysis.
Wed, Jun 13
We do need the number of editors who reached their 10th edit in 2018 AND registered via one of our campaigns.
- 15 users who have registered via the respective campaigns reached their 10th edit in 2018.
- Merged https://gerrit.wikimedia.org/r/#/c/analytics/wmde/WDCM/+/440112/.
- Code deployed; wb_terms usage is now completely phased out from the WDCM workflow.
- all tests successful; putting the code in deployment from CloudVPS only.
- Experimental code in place and tested; good results.
- Running a test update now (CloudVPS component only).
Tue, Jun 12
@Addshore After reconsidering this, I have to state openly that I am against relying on JSON dumps as the only source of data.
Mon, Jun 11
@abian @Lydia_Pintscher KNIME integrates with R and Java nicely. Given that we already use R extensively to analyse Wikidata, it could be possible to build a set of R developed Wikidata nodes for KNIME, I guess. If it is ETL only that you need, orchestrating SPARQL and API calls from within R and then integrating with KNIME seams feasible (note: it only seams feasible at this point).
Sat, Jun 9
Fri, Jun 8
Mon, May 28
I am on it.
- I've seen a lot of data re-engineering going on around the wb_terms SQL table recently.
- The WDCM system also experiences some problems when fetching from this table (check the WDCM dashboards and you will spot too many missing labels for quite frequently used items - while the WDCM R code itself didn't change) .
- As soon as I'm done with Cognate/Wiktionary (T166487), I will take a look at the state in which the wb_terms table is found at the present moment, and then suggest a solution for this.
- This is not the first time that I am about to state the following (c.f. T193969#4185028): we are in obviously in the Big Data segment here, which would imply SQL must go. It was not meant to be.
- Stay in touch.
@Lydia_Pintscher Done. Please check the dashboard.