Page MenuHomePhabricator

deliver test report from real wiki
Closed, ResolvedPublic

Description

  1. Check if the setup from T174952 is applicable on beta-wiki
  2. Create a test report for 2 days.

Date of execution will be latest the 22.09.
Start and end of the tracking period will be delivered here asap.

@GoranSMilovanovic
Could you do that task?

Event Timeline

@Stefan_Schneider_WMDE One thing only, please:

Any chances you give a bit more time than 24h to complete the test report? Please, with sugar on top?

If we do want the tracking code to go to prod earlier we can do so, it's just a little bit of extra work / involvement!

...
The final version should be live on beta in the next 30 mins

Oh, I got that the hack is live on beta earlier. If not the report would be fine on latest 26th, too. To correct any mistakes.

Stefan_Schneider_WMDE renamed this task from deliver test report from beta-wiki to deliver test report from real wiki.Sep 18 2017, 1:13 PM

Hi @GoranSMilovanovic

We want to test the tracking of the following data:

  • banner clicks / page views
  • registrations (per each banner)
  • guided tour shown (yes/no)

The respective campaign tags you can find here T175912#3617532.

... and we will click today and maybe tomorrow, to create some test data.

@Stefan_Schneider_WMDE

Please find the test report attached.

Do not forget to track https://phabricator.wikimedia.org/T175912 and see why there's not Registration page hits from the mach_mit page, please.

Note: (1) banner impressions not tracked in the test report; (2) coloring scheme is inconsistent across the charts in the test report - will take care of it once we have the mach_mit as the referer for the landing page included.

@GoranSMilovanovic

Thank you for the report! There are a few questions and notes to the report:

  • in general the referer 'unknown' and 'other' do not seem reasonable. We could just leave them out.
  • is it possible to have exact numbers above the bars of the charts?
  • Why is there a difference between 1.2.3 (banner clicks) and 1.2.4. (page views)? I guess 1.2.4 is more reliable for page views as (banner clicks)!=(page views).
  • in the dataset 1.2.4 there are only referer from the last page that leads to the Benutzerkonto_anlegen. There are 3 different groups (bt1-3) coming from Fehler_korrigieren so these referers are not exactly what we need. Is it possible to seperate them by the campaign tag?
  • From T175912 I get, that registrations are not trackable right now with campaign tags in log.ServerSideAccountCreation_5487345. I hope there will be an answer to that soon.

@Stefan_Schneider_WMDE

(1) "in general the referer 'unknown' and 'other' do not seem reasonable. We could just leave them out."

  • No problem. However, keeping the data on 'unknown' and 'other' gives you an opportunity to learn how much of the web-traffic on the landing/registration pages streams from the Campaign... Please confirm if you really want this data out of the report.

(2) "is it possible to have exact numbers above the bars of the charts?"

  • Yes, but I would not advise to do that in this case. There will be too many numbers on the charts, and probably rendered unreadable. This thinking is what has motivated the introduction of the data table in section 1.2.4. Please confirm if you want the numbers on the chart anyways.

(3) "Why is there a difference between 1.2.3 (banner clicks) and 1.2.4. (page views)? I guess 1.2.4 is more reliable for page views as (banner clicks)!=(page views)."

Because the landing pages as well as the registration page received hits from other sources than our campaign banners. If you want to have only "Campaign Page Views", yes then 1.2.3 is relevant; if you want to have both (page views from banners = clicks = 1.2.3, and page views overall = 1.2.4) then everything is already in place.

(4) "in the dataset 1.2.4 there are only referer from the last page that leads to the Benutzerkonto_anlegen. There are 3 different groups (bt1-3) coming from Fehler_korrigieren so these referers are not exactly what we need. Is it possible to seperate them by the campaign tag?"

Hmmm... you're right, let me take a look at what's in the source data and I will let you know whether I can take care of it.

(5) "From T175912 I get, that registrations are not trackable right now with campaign tags in log.ServerSideAccountCreation_5487345. I hope there will be an answer to that soon."

Yes, that is true, but there's nothing I can do about it. I see you have provided some feedback on that problem already. Please let me test during the day and I will report back to you.

@GoranSMilovanovic

(1) My understanding is, that we can track exactly the numbers that are created by clicks on the banners (=streams from the campaign). All other traffic would be from other sources, right? If that's true I would leave it out. At the moment I don't get how 'unknown' and 'other' relates to our camaign/banners.

(2) Hmm... I got your point. If it's possible to have some space between the bars and consequently make more space for the numbers, I would like to have the numbers. If it's not possible we can get it from the table.

(3) OK. With that explanation it's clear. Thx!

(4) / (5) Let's check the next test data. Thx for taking care!

@Stefan_Schneider_WMDE

This

(4) "in the dataset 1.2.4 there are only referer from the last page that leads to the Benutzerkonto_anlegen. There are 3 different groups (bt1-3) coming from Fehler_korrigieren so these referers are not exactly what we need. Is it possible to seperate them by the campaign tag?"

is fixed; we track whether the user who came from Fehler_korrigieren hit the registration page has previously clicked on any of the Specific Task Banners (bt1, bt2, or bt3), and we already know for gib_lp and gib_rg banners.

Currently running and update and fixing the Report as you have suggested (hiding 'Unknown' and 'Hidden' data etc).

I'll see what I can do about showing the exact numbers on the charts. If {ggvis} package works nicely with RMarkdown, which should be the case, we can switch to interactive visualizations. The numbers would show up only on mouse hover over the respective chart element (i.e. a bar).

@Stefan_Schneider_WMDE The updated test report is here.

I've run several data consistency checks and everything seems to be in place.

The following steps:

  • testing for Guided Tours, if possible before the onset of the Campaign;
  • introducing {ggvis} plots in place of {ggplot2} to enable viewing data from charts directly.

@GoranSMilovanovic

Thank You Goran.

Everything looks good. I just have one thing.

(1) The figure 1.2.2 does not show data from GIB_LP_click. Does that mean, that it's not possible to track that? Maybe the same way the BT1-BT3 are tracked? While testing I remember that I clicked the button to forward myself from the 'Mach-Mit' to 'Registration' so there should be some data. I also remember there was an Issue about that Mach-Mit-Site, but could it be solved with the campaign tag?

Please keep me posted when the direct viewing of the charts is possible.

@Stefan_Schneider_WMDE

In our data, the Mach_Mit page is never a referer to Spezial:Benutzerkonto_anlegen. Could it be that the reason is that the user must click on an image from Mach_Mit to get to the registration page? (c.f https://phabricator.wikimedia.org/T175912#3626977 @kai.nissen @Addshore)

As of the Mach_Mit page in general, I've found one case in our data where someone found the Mach_Mit page on google.de first, and then visited the page itself. It would be nice to know if that hit was produced by someone from our team (as it would provide a minor additional data consistency check).

As of the {ggvis} charts - ASAP, but it will take me some time because I need to replace all {ggplot2} code with the new one. If it works at all, {ggvis} is great prima facie , but it's a new R package and still has some issues, which is the reason why the {ggvis} team still does not recommend its usage in production.

@GoranSMilovanovic

That Mach_Mit is never a referer to Spezial:Benutzerkonto_anlegen should not be a problem, if we can use the campaign tag GIB_LP_click to track the origin of the page view. I thought this would be possible the same way as with BT1-3.

Hi,

sorry for the late reply:

Sorry @Stefan_Schneider_WMDE , but we should include 'Others' as source in the Page View Sources.

For the publishing of the results onwiki we would also need a table of the numbers (registrations, page views, impressions). Otherwise it will be some work to manually extract that from the report. Could you add that as appendix?

It would be interesting for us to see also other sources of entry to the page. @GoranSMilovanovic In that case, would it be also interesting to include unknown in the report? The difference is not quite clear to me yet.

Hi,

Ok, so first we have a new version of the test report here. Changes:

(1) Categories 'Unknown' and 'Other' are back there, *but* now we have two charts each time: (A) including these two categories, and (B) not including these two categories (i.e. campaign only sources are present). Thus you will be able to compare and judge what has happened on the landing pages and the registration page due to our campaigning and how many "spontaneous hits" there are.

(2) Color consistency across the Report is now established.

TO DO:

(1) Try to switch from {ggplot2} to {ggvis} charts for interactivity and the possibility to present exact numbers on the charts directly; @Stefan_Schneider_WMDE you will have to give me some time for this.

(2) Tracking/Analytics for the Guided Tours.

(3) Tracking/Analytics of the new user edits.

(4) Putting the dataset together for a Multi-Channel Attribution model of the Campaign (the research design is already there, while getting to an appropriate dataset is another thing).

PLEASE Do not forget to check what is happening with the Mach_Mit page to Spezial:Benutzerkonto_anlegen page path tracking; it doesn't seem that we have that tracking enabled (or, I am committing to an error in my R code somewhere, but anyways I need to know whether the tracking works correctly or not). Thanks a lot.

@Verena The Appendix table as you have requested will be included in the Report, but not before I have at least some data for all phases (currently, I have no options to test the Guided Tours and User Edits).

@GoranSMilovanovic Thx for the report. Regarding the tracking of the Mach_Mit page: I get, that the referer of the page does not work but I also get that the campaign tag from BT1-3 is working as a referer for the first 3 banners with a landingpage in between. Do you mean, that also the campaign tag is not trackable through the Mach_Mit page?

@Stefan_Schneider_WMDE As it can be seen from the Campaign Analytics R Code:

dataSet$Source[dataSet$Page %in% 'Spezial:Benutzerkonto_anlegen' & dataSet$Source == 'Other'] <- str_extract(dataSet$Referer[dataSet$Page %in% 'Spezial:Benutzerkonto_anlegen' & dataSet$Source == 'Other'], "campaign=wmde_abc(.)+$")

and then later on:

dataSet$Source[dataSet$Page %in% 'Spezial:Benutzerkonto_anlegen' & grepl("wmde_abc2017_gib_lp", dataSet$Source)] <- "Mach_mit"

we first search for any wmde_abc tagged referer in the referer field of the dataset that we obtain from wfm.webrequest, and the explicitly search for the wmde_abc2017_gib_lp referer for the registration page. The method as implemented successfully finds other referers than wmde_abc2017_gib_lp which would later translate into Mach_mit in the Report.

In other words: no, there are no referers to the registration page from wmde_abc2017_gib_lp.

However, please, let me re-check everything. I am now running a test update again and the data will be inspected at several intermediate steps before the production of the final dataset. Then I will report back.

Ok - Got it,

@Addshore
The campaign tag wmde_abc2017_gib_lpdoes not funktion when tracking the page views on Spezial:Benutzerkonto anlegen. In between there is the landig page Mach_Mit. Analysing the campaign tag in the link from the test banner it is showing https://de.wikipedia.org/wiki/Wikipedia:Wikimedia_Deutschland/Mach_mit?campaign=wmde_abc2017_gib_lp. There should be everything working.

Q: Could there be something in the hack or on the page Mach_Mit that prevents the campaign tag to be forwarded to the weblogs or some other cause of the problem?

@GoranSMilovanovic

I just created another test account with the gib_lp-tag. Could you check if that registration was tracked? The account name was 'T1e5st'.

@Stefan_Schneider_WMDE it seems to be working just fine for me:

Navigating to https://de.wikipedia.org/wiki/Wikipedia:Wikimedia_Deutschland/Mach_mit?campaign=wmde_abc2017_gib_lp produces the following server side log line:

2017-09-28 14:24:53 [Wc0GNQpAAD0AAHrLNSEAAAAO] mw1266 dewiki 1.30.0-wmf.19 WMDE INFO: wmde_abc2017_gib_lp - 1 - Banner click by anon user without cookie

And navigating to the account registration page results in:

2017-09-28 14:24:55 [Wc0GNwpAMFkAAKfXMIYAAABM] mw1254 dewiki 1.30.0-wmf.19 WMDE INFO: wmde_abc2017_gib_lp - 2 - Inject campaign value on CreateAccount

When testing you will need to use a new incognito window for each banner click / test otherwise new cookies can not / will not be set.
Either this or clear your browser / session cookies between each test.

@Stefan_Schneider_WMDE @Verena

In the meantime, and as of the following:

"It would be interesting for us to see also other sources of entry to the page. @GoranSMilovanovic In that case, would it be also interesting to include unknown in the report? The difference is not quite clear to me yet."

In the campaign dataset:

"Other" - the referer is something outside of the scope of the campaign, i.e. a campaign page view was registered after viewing something else than our landing pages, or following an action which is not a campaign banner click;

"Unknown" - the same as "Other", except for that we cannot determine what has lead to a campaign page view, e.g. when I find "-" as a single datum in the referer field of the wmf.webrequest table.

@Addshore @Stefan_Schneider_WMDE Houston, we have a problem.

I have inspected the raw data as obtained from the wmf.webrequest table, and the conclusion is certain:

  • gib_lp was never registered as a referer to any of our pages between 09/20/2017 and 09/28/2017.

Once again, this was a regex search through the raw values of the referer field as obtained from the wmf.webrequest table, so I'm pretty sure that it's simply not there.

I can provide my HiveQL scripts and filtering procedures in R for inspection, however, I don't think that it would help - this test was performed on an unfiltered, unmodified dataset directly obtained from Hadoop. If there was a bug there, it would be so obvious that even I couldn't have missed to spot it.

@Stefan_Schneider_WMDE @Addshore

Also, rather important:

"I just created another test account with the gib_lp-tag. Could you check if that registration was tracked? The account name was 'T1e5st'."

  • No, my friend: there's no users registered under gib_lp found in the log.ServerSideAccountCreation_5487345.

I'm not sure how much I can help here, but again, take into your consideration the following possibilities:

  • A user must click on an image on the mach_mit page in order to get to the registration page; might that be the cause of the problem?
  • We had test banners with alphabetic suffixes ('a', 'b', 'c', 'd', 'e') running; could it be that they have caused some interactions that we didn't consider?

    Just thinking aloud and trying to help.

When I tested for my comment at T175914#3642982 I did not click on a banner, only triggered my code by following a simulated link so there would have been no referrer.

I have just stepped through the banner click to registration page for that banner and campaign and the following is what you should be able to see in webrequest:

The request where I actually view the test banner:

{
    "uri_host": "de.wikipedia.org",
    "uri_path": "/",
    "uri_query": "?banner=WMDE_editor_campaign_fall17_d",
    "content_type": "text/html; charset=UTF-8",
    "referer": "-",
    "x_analytics": "ns=4;page_id=5248757;https=1;nocookies=1",
}

The request once I have clicked on the test banner, landing on the landing page:

{
    "uri_host": "de.wikipedia.org",
    "uri_path": "/wiki/Wikipedia:Wikimedia_Deutschland/Mach_mit",
    "uri_query": "?campaign=wmde_abc2017_gib_lp",
    "content_type": "text/html; charset=UTF-8",
    "referer": "https://de.wikipedia.org/?banner=WMDE_editor_campaign_fall17_d",
    "x_analytics": "ns=4;page_id=9688087;WMF-Last-Access=28-Sep-2017;WMF-Last-Access-Global=28-Sep-2017;https=1",
}

The request to Special:Register

{
    "uri_host": "de.wikipedia.org",
    "uri_path": "/w/index.php",
    "uri_query": "?title=Spezial:Benutzerkonto_anlegen&returnto=Wikipedia%3AWikimedia+Deutschland%2FMach+mit&returntoquery=campaign%3Dwmde_abc2017_gib_lp",
    "content_type": "text/html; charset=UTF-8",
    "referer": "https://de.wikipedia.org/wiki/Wikipedia:Wikimedia_Deutschland/Mach_mit?campaign=wmde_abc2017_gib_lp",
    "x_analytics": "ns=-1;special=CreateAccount;WMF-Last-Access=28-Sep-2017;WMF-Last-Access-Global=28-Sep-2017;https=1"
}

I have inspected the raw data as obtained from the wmf.webrequest table, and the conclusion is certain:

  • gib_lp was never registered as a referer to any of our pages between 09/20/2017 and 09/28/2017.

Once again, this was a regex search through the raw values of the referer field as obtained from the wmf.webrequest table, so I'm pretty sure that it's simply not there.

Regex is probably not the most efficient thing to do here, you should be able to parse the url parts.

The link in the banner is https://de.wikipedia.org/wiki/Wikipedia:Wikimedia_Deutschland/Mach_mit?campaign=wmde_abc2017_gib_lp
You will not know what referer to look for as the banners can be shown on any page.
You need to look for the campaign in uri_query, and as said in other tickets and in discussions the banners can have an extra param (for example wmdesource=bannerclick) to confirm that they came from a banner, if implemented this would also appear in the uri_query, although its usefulness is debatable.

I can provide my HiveQL scripts and filtering procedures in R for inspection, however, I don't think that it would help - this test was performed on an unfiltered, unmodified dataset directly obtained from Hadoop. If there was a bug there, it would be so obvious that even I couldn't have missed to spot it.

Please do provide a link to look at.

@Stefan_Schneider_WMDE @Addshore

Also, rather important:

"I just created another test account with the gib_lp-tag. Could you check if that registration was tracked? The account name was 'T1e5st'."

  • No, my friend: there's no users registered under gib_lp found in the log.ServerSideAccountCreation_5487345.

The most recent timestamp entry in the table is 20170928233929 so it looks like it hasn't updated in a while, but it should eventually catch up.

I'm not sure how much I can help here, but again, take into your consideration the following possibilities:

  • A user must click on an image on the mach_mit page in order to get to the registration page; might that be the cause of the problem?

I still don't think this is the case.
What evidence do we have that this is the case?

@Addshore It's an .Rmd file (RMarkdown), found here, and the HiveQL section that you want to take a look at is under:

2. Banner Clicks and Landing Page Views

But I don't think it's HiveQL that is problematic here.

"Regex is probably not the most efficient thing to do here, you should be able to parse the url parts."

Of course, I've used regex on the raw data directly to make sure whether anything like "gib_lp" is found under the referer field or not.

"You need to look for the campaign in uri_query, and as said in other tickets and in discussions the banners..."

However, in your The request to Special:Register we find: "referer": "https://de.wikipedia.org/wiki/Wikipedia:Wikimedia_Deutschland/Mach_mit?campaign=wmde_abc2017_gib_lp" - so it can can be found under the referer field, right?

Here's a HiveQL model that I've used:

SELECT uri_path, uri_query, referer FROM webrequest
  WHERE (uri_host = 'de.wikipedia.org'
    AND (uri_path = '/wiki/Wikipedia:Wikimedia_Deutschland/Fehler_korrigieren' OR uri_path = '/wiki/Wikipedia:Wikimedia_Deutschland/Mach_mit' OR uri_path = '/wiki/Spezial:Benutzerkonto_anlegen')
    AND year = ", y,
    " AND month = ", m,
    " AND day = ", d, ");

Are you suggesting that my choice of uri_path is wrong?

"You need to look for the campaign in uri_query, and as said in other tickets and in discussions the banners..."

However, in your The request to Special:Register we find: "referer": "https://de.wikipedia.org/wiki/Wikipedia:Wikimedia_Deutschland/Mach_mit?campaign=wmde_abc2017_gib_lp" - so it can can be found under the referer field, right?

Yes, but I don't know why your code would be looking at requests to Special:Register.
Also, people can register after seeing the banner, in their session, while not having a referrer that links to the banner or campaign at all.
Registration details will come from EventLogging (the log database)

Banner clicks can be inferred by looking for the URL parameters that are in the link on the banner in the webrequest table in the uri_path and uri_query fields.
If the banner links to de.wikipedia.org/wiki/FOOBAR?campaign=SOMECAMPAIGN
then you can look for uri_path="/wiki/FOOBAR" && uri_query="?campaign=SOMECAMPAIGN"
the referrer does not matter, at most you could check to make sure that the referrer is de.wikipedia.org to make sure that the banner link has not been copied, sent to someone else and opened (as they will not have seen the banner)

@Stefan_Schneider_WMDE @Addshore

Ok, it looks like this can be managed (in the scope of the tracking methodology that we already have) by merely placing a slightly different links on the images in the Mach_mit page. Adam has sent out an e-mail on this following our IRC discussion.

However, if the image links cannot be changed for any reason, I will work on a bit more complex way to track everything that we need during the weekend (following Adam's suggestions to hack everything out from the url_query fields).

@Stefan_Schneider_WMDE Well, including the {ggvis} interactive plots into the Campaign Report calls for a re-format of the current RMarkdown layout that I use. To remind you, this has to do with the question of displaying counts on the charts directly. Also, the Report would need to be hosted from a Shiny Server. Good news: all this can be done. Drawback: please let me focus on the campaign tracking and analytics at the moment, and I'll see to switch from static to dynamic charts during the course of the campaign - once that I make sure that all essential things are running smoothly. Ok?

@GoranSMilovanovic

Great, that you guys found the cause of the problem.

Good, that it's possible to switch to displaying counts on the charts. Regarding this campaign tracking and analytics go first. Nevertheless it would ease our workflow if there are some corrections in the banner diet are necessary.

Another little thing:

"Other" - the referer is something outside of the scope of the campaign, i.e. a campaign page view was registered after viewing something else than our landing pages, or following an action which is not a campaign banner click;

"Unknown" - the same as "Other", except for that we cannot determine what has lead to a campaign page view, e.g. when I find "-" as a single datum in the referer field of the wmf.webrequest table.

--> Could you include that explanation in the report, please?

@Stefan_Schneider_WMDE

  • All the response variable categories (like "gib_lp", "bt1", "Other", "Unknown" etc) will be explained in the Report; let me remind you that this is a test version only;
  • I've already undertaken some steps to host the Report from a Labs instance running a Shiny Server that will enable us to deploy an interactive Rmarkdown which is needed for the {ggvis} plots; hopefully that setup will not clash with the current notebook setup that I use;
  • Thank you for recognizing the priorities.

@Stefan_Schneider_WMDE Confirming that I see the "Mach_mit" page as a referer to the Regustration page under the new campaign page setup.

Status report:

  • Banner Impressions: ready to gather data
  • Banner Clicks: ready to gather data
  • Page Views: ready to gather data
  • User Registration: ready to gather data
  • Guided Tour: ready to gather data
  • User Edits: ready to gather data

Please take a look at https://phabricator.wikimedia.org/T177110#3648246 and let me know what do you think. Thanks.

@Stefan_Schneider_WMDE As of the possible implementation of {ggvis} interactive charts in our Report - the ones that would enable you to hover over a bar on a chart and retrieve the exact datum directly:

  • Theoretically, it can be done; I am running a live test on one Labs instance and I can display {ggvis} plots under a Shiny server there; however, *I do not advise going for it*. {ggvis} is not a mature R package yet, and the visualizations that it can produce are way below the standards that can be achieved with {ggplot2}.

My advise would be to go for a non-interactive but clear and comprehensive {ggplot2} charts, and then combine reading from charts and tables to understand the result. I will make sure that all charts are immediately followed by the respective data tables.