Page MenuHomePhabricator

Scrape HTML from IBM UI and bring queries into civi
Closed, ResolvedPublic

Description

We investigates and chose a direction here: T227363

We want to try and scrap the queries from the UI and save them in civi as a whole text field. @Eileenmcnaughton , @Ejegg Please add more technical info if I am missing something.

notes from Brian

We login a post to the login url https://login4.silverpop.com/login, full code attached. I think on pod4 you might login to https://pod4.ibmmarketingcloud.com/login instead.

Then we get mailing summary data from "action=displayHtmlBody&mailingId=XXXXX" and "action=mailingSummary&mailingId=XXXXX"

You can get the Query criteria from https://pod4.ibmmarketingcloud.com/lists.do?action=listSummary&listId=XXXXX and pull what’s in the newQueryEnglish span and format however you need.

private async Task<HttpClient> GetAuthenticatedWebClientAsync()

{
    var httpClientHandler = new HttpClientHandler { UseCookies = true };
    var cookieContainer = new CookieContainer();
    httpClientHandler.CookieContainer = cookieContainer;
    var httpClient = new HttpClient(httpClientHandler);
    httpClient.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36");

    // Load the login page and extract a few key values
    var loginGetResponse = await httpClient.GetAsync(this.WebLoginUri);
    string loginHtml = await loginGetResponse.Content.ReadAsStringAsync();
    var ltRegex = new Regex(@"name=""lt"" value=""(?<ltValue>.*?)""", RegexOptions.IgnoreCase);
    var executionRegex = new Regex(@"name=""execution""\s+value=""(?<executionValue>.*?)""", RegexOptions.IgnoreCase);
    string lt = ltRegex.Match(loginHtml).Groups["ltValue"].Value;
    string execution = executionRegex.Match(loginHtml).Groups["executionValue"].Value;

    // Post the username and password to the action url to log in the user
    var loginRequest = new HttpRequestMessage(HttpMethod.Post, this.WebLoginUri);
    var formValues = new Dictionary<string, string>();
    formValues.Add("username", this.tenant.Username);
    formValues.Add("password", this.tenant.Password);
    formValues.Add("lt", lt);
    formValues.Add("execution", execution);
    formValues.Add("_eventId", "submit");
    loginRequest.Content = new FormUrlEncodedContent(formValues);

    var loginPostResponse = await httpClient.SendAsync(loginRequest);
    var loginPostHtml = await loginPostResponse.Content.ReadAsStringAsync();

    if (loginPostHtml.Contains("loginForm"))
    {
        throw new Exception("Could not log in via web interface");
    }

    return httpClient;
}

Details

Related Gerrit Patches:
wikimedia/fundraising/crm : masterAdd ability to turn omnihell on & off
wikimedia/fundraising/crm : masterSlight cleanup for Omnihell
wikimedia/fundraising/crm : masterAdd finally-working attempt to log into Silverpop & get the query html
wikimedia/fundraising/crm : masterOmnimail test cleanup

Event Timeline

Change 534561 had a related patch set uploaded (by Eileen; owner: Eileen):
[wikimedia/fundraising/crm@master] Omnimail test cleanup

https://gerrit.wikimedia.org/r/534561

Change 534561 merged by jenkins-bot:
[wikimedia/fundraising/crm@master] Omnimail test cleanup

https://gerrit.wikimedia.org/r/534561

Change 534736 had a related patch set uploaded (by Eileen; owner: Eileen):
[wikimedia/fundraising/crm@master] [WIP] Add not-yet-working attempt to log into Silverpop

https://gerrit.wikimedia.org/r/534736

I've been battling this today - TBH I'm a bit stumped at the moment - curl output on login attempt

Testing started at 18:23 ...
/Applications/MAMP/bin/php/php7.2.14/bin/php -dxdebug.remote_enable=1 -dxdebug.remote_mode=req -dxdebug.remote_port=9000 -dxdebug.remote_host=127.0.0.1 /Users/eileenmcnaughton/buildkit/build/wmff/vendor/phpunit/phpunit/phpunit --configuration /Users/eileenmcnaughton/buildkit/build/wmff/sites/default/civicrm/extensions/org.wikimedia.omnimail/phpunit.xml.dist OmnihellGetTest /Users/eileenmcnaughton/buildkit/build/wmff/drupal/sites/default/civicrm/extensions/org.wikimedia.omnimail/tests/phpunit/OmnihellGetTest.php --teamcity
PHPUnit 5.7.27 by Sebastian Bergmann and contributors.

*   Trying 74.121.50.37...
* TCP_NODELAY set
* Connected to login4.silverpop.com (74.121.50.37) port 443 (#0)
* ALPN, offering http/1.1
* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
* successfully set certificate verify locations:
*   CAfile: /Applications/MAMP/Library/OpenSSL/cert.pem
  CApath: none
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server did not agree to a protocol
* Server certificate:
*  subject: C=US; ST=New York; L=Armonk; O=INTERNATIONAL BUSINESS MACHINES CORPORATION; OU=Operations; CN=*.ibmmarketingcloud.com
*  start date: Mar 21 00:00:00 2019 GMT
*  expire date: Mar 25 12:00:00 2020 GMT
*  subjectAltName: host "login4.silverpop.com" matched cert's "*.silverpop.com"
*  issuer: C=US; O=DigiCert Inc; CN=DigiCert SHA2 Secure Server CA
*  SSL certificate verify ok.
> GET /login HTTP/1.1
Host: login4.silverpop.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36

< HTTP/1.1 200 OK
< Strict-Transport-Security: max-age=31536000;includeSubDomains
< X-Content-Type-Options: nosniff
< X-XSS-Protection: 1; mode=block
< X-Frame-Options: SAMEORIGIN
< Pragma: no-cache
< Expires: Thu, 01 Jan 1970 00:00:00 GMT
< Cache-Control: no-cache
< Cache-Control: no-store
< Set-Cookie: wcasid=1141BEEB9689774DF55D141B9975EA9E; Path=/; Secure; HttpOnly
< Content-Type: text/html;charset=UTF-8
< Transfer-Encoding: chunked
< Date: Fri, 06 Sep 2019 06:24:08 GMT
< Server:  
< Set-Cookie: Silverpop_cookie=!fWvgHm5kHaqo165KZhA6sy+5kE98ylkeXX6KtoozJSjL99RIY6ieJQpwyUl9/GgEKUPerLYGZbnjaA==; path=/; Httponly; Secure
< 
* Connection #0 to host login4.silverpop.com left intact

##teamcity[testFailed* Hostname login4.silverpop.com was found in DNS cache
*   Trying 74.121.50.37...
* TCP_NODELAY set
* Connected to login4.silverpop.com (74.121.50.37) port 443 (#0)
* ALPN, offering http/1.1
* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
* successfully set certificate verify locations:
*   CAfile: /Applications/MAMP/Library/OpenSSL/cert.pem
  CApath: none
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server did not agree to a protocol
* Server certificate:
*  subject: C=US; ST=New York; L=Armonk; O=INTERNATIONAL BUSINESS MACHINES CORPORATION; OU=Operations; CN=*.ibmmarketingcloud.com
*  start date: Mar 21 00:00:00 2019 GMT
*  expire date: Mar 25 12:00:00 2020 GMT
*  subjectAltName: host "login4.silverpop.com" matched cert's "*.silverpop.com"
*  issuer: C=US; O=DigiCert Inc; CN=DigiCert SHA2 Secure Server CA
*  SSL certificate verify ok.
> POST /login HTTP/1.1
Host: login4.silverpop.com
Content-Length: 0
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36
form_params: REDACTED
form_params: REDACTED
form_params: LT-5190-eFNFBauETvQTDebKJpJ5rOZ6lg6WCb-cas
form_params: e1s1
form_params: submit
Cookie: wcasid=1141BEEB9689774DF55D141B9975EA9E; Silverpop_cookie=!fWvgHm5kHaqo165KZhA6sy+5kE98ylkeXX6KtoozJSjL99RIY6ieJQpwyUl9/GgEKUPerLYGZbnjaA==

< HTTP/1.1 200 OK
< Strict-Transport-Security: max-age=31536000;includeSubDomains
< X-Content-Type-Options: nosniff
< X-XSS-Protection: 1; mode=block
< X-Frame-Options: SAMEORIGIN
< Pragma: no-cache
< Expires: Thu, 01 Jan 1970 00:00:00 GMT
< Cache-Control: no-cache
< Cache-Control: no-store
< Content-Type: text/html;charset=UTF-8
< Transfer-Encoding: chunked
< Date: Fri, 06 Sep 2019 06:24:12 GMT
< Server:  
< 
* Connection #0 to host login4.silverpop.com left intact
 name='testOmnihellGet' message='Failure in api call for Omnihell get:  login failed|n#0 /Users/eileenmcnaughton/buildkit/build/wmff/sites/default/civicrm/extensions/org.wikimedia.omnimail/vendor/guzzlehttp/promises/src/Promise.php(203): {closure}(Object(GuzzleHttp\Psr7\Response))|n#1 /Users/eileenmcnaughton/buildkit/build/wmff/sites/default/civicrm/extensions/org.wikimedia.omnimail/vendor/guzzlehttp/promises/src/Promise.php(156): GuzzleHttp\Promise\Promise::callHandler(1, Object(GuzzleHttp\Psr7\Response), Array)|n#2 /Users/eileenmcnaughton/buildkit/build/wmff/sites/default/civicrm/extensions/org.wikimedia.omnimail/vendor/guzzlehttp/promises/src/TaskQueue.php(47): GuzzleHttp\Promise\Promise::GuzzleHttp\Promise\{closure}()|n#3 /Users/eileenmcnaughton/buildkit/build/wmff/sites/default/civicrm/extensions/org.wikimedia.omnimail/vendor/guzzlehttp/guzzle/src/Handler/CurlMultiHandler.php(98): GuzzleHttp\Promise\TaskQueue->run()|n#4 /Users/eileenmcnaughton/buildkit/build/wmff/sites/default/civicrm/extensions/org.wikimedia.omnimail/vendor/guzzlehttp/guzzle/src/Handler/CurlMultiHandler.php(125): GuzzleHttp\Handler\CurlMultiHandler->tick()|n#5 /Users/eileenmcnaughton/buildkit/build/wmff/sites/default/civicrm/extensions/org.wikimedia.omnimail/vendor/guzzlehttp/promises/src/Promise.php(246): GuzzleHttp\Handler\CurlMultiHandler->execute(true)|n#6 /Users/eileenmcnaughton/buildkit/build/wmff/sites/default/civicrm/extensions/org.wikimedia.omnimail/vendor/guzzlehttp/promises/src/Promise.php(223): GuzzleHttp\Promise\Promise->invokeWaitFn()|n#7 /Users/eileenmcnaughton/buildkit/build/wmff/sites/default/civicrm/extensions/org.wikimedia.omnimail/vendor/guzzlehttp/promises/src/Promise.php(267): GuzzleHttp\Promise\Promise->waitIfPending()|n#8 /Users/eileenmcnaughton/buildkit/build/wmff/sites/default/civicrm/extensions/org.wikimedia.omnimail/vendor/guzzlehttp/promises/src/Promise.php(225): GuzzleHttp\Promise\Promise->invokeWaitList()|n#9 /Users/eileenmcnaughton/buildkit/build/wmff/sites/default/civicrm/extensions/org.wikimedia.omnimail/vendor/guzzlehttp/promises/src/Promise.php(62): GuzzleHttp\Promise\Promise->waitIfPending()|n#10 /Users/eileenmcnaughton/buildkit/build/wmff/sites/default/civicrm/extensions/org.wikimedia.omnimail/api/v3/Omnihell.php(66): GuzzleHttp\Promise\Promise->wait()|n#11 /Users/eileenmcnaughton/buildkit/build/wmff/civicrm/Civi/API/Provider/MagicFunctionProvider.php(101): civicrm_api3_omnihell_get(Array)|n#12 /Users/eileenmcnaughton/buildkit/build/wmff/civicrm/Civi/API/Kernel.php(168): Civi\API\Provider\MagicFunctionProvider->invoke(Array)|n#13 /Users/eileenmcnaughton/buildkit/build/wmff/civicrm/Civi/API/Kernel.php(99): Civi\API\Kernel->runRequest(Array)|n#14 /Users/eileenmcnaughton/buildkit/build/wmff/civicrm/api/api.php(23): Civi\API\Kernel->runSafe(|'Omnihell|', |'get|', Array, NULL)|n#15 /Users/eileenmcnaughton/buildkit/build/wmff/civicrm/Civi/Test/Api3TestTrait.php(287): civicrm_api(|'Omnihell|', |'get|', Array)|n#16 /Users/eileenmcnaughton/buildkit/build/wmff/civicrm/Civi/Test/Api3TestTrait.php(163): OmnimailBaseTestClass->civicrm_api(|'Omnihell|', |'get|', Array)|n#17 /Users/eileenmcnaughton/buildkit/build/wmff/sites/default/civicrm/extensions/org.wikimedia.omnimail/tests/phpunit/OmnihellGetTest.php(34): OmnimailBaseTestClass->callAPISuccess(|'Omnihell|', |'get|', Array)|n#18 |[internal function|]: OmnihellGetTest->testOmnihellGet()|n#19 /Users/eileenmcnaughton/buildkit/build/wmff/vendor/phpunit/phpunit/src/Framework/TestCase.php(1062): ReflectionMethod->invokeArgs(Object(OmnihellGetTest), Array)|n#20 /Users/eileenmcnaughton/buildkit/build/wmff/vendor/phpunit/phpunit/src/Framework/TestCase.php(913): PHPUnit_Framework_TestCase->runTest()|n#21 /Users/eileenmcnaughton/buildkit/build/wmff/vendor/phpunit/phpunit/src/Framework/TestResult.php(686): PHPUnit_Framework_TestCase->runBare()|n#22 /Users/eileenmcnaughton/buildkit/build/wmff/vendor/phpunit/phpunit/src/Framework/TestCase.php(868): PHPUnit_Framework_TestResult->run(Object(OmnihellGetTest))|n#23 /Users/eileenmcnaughton/buildkit/build/wmff/vendor/phpunit/phpunit/src/Framework/TestSuite.php(733): PHPUnit_Framework_TestCase->run(Object(PHPUnit_Framework_TestResult))|n#24 /Users/eileenmcnaughton/buildkit/build/wmff/vendor/phpunit/phpunit/src/TextUI/TestRunner.php(517): PHPUnit_Framework_TestSuite->run(Object(PHPUnit_Framework_TestResult))|n#25 /Users/eileenmcnaughton/buildkit/build/wmff/vendor/phpunit/phpunit/src/TextUI/Command.php(186): PHPUnit_TextUI_TestRunner->doRun(Object(PHPUnit_Framework_TestSuite), Array, true)|n#26 /Users/eileenmcnaughton/buildkit/build/wmff/vendor/phpunit/phpunit/src/TextUI/Command.php(116): PHPUnit_TextUI_Command->run(Array, true)|n#27 /Users/eileenmcnaughton/buildkit/build/wmff/vendor/phpunit/phpunit/phpunit(52): PHPUnit_TextUI_Command::main()|n#28 {main}|nFailed asserting that a integer is empty.' details=' /Users/eileenmcnaughton/buildkit/build/wmff/civicrm/Civi/Test/Api3TestTrait.php:106|n /Users/eileenmcnaughton/buildkit/build/wmff/civicrm/Civi/Test/Api3TestTrait.php:164|n /Users/eileenmcnaughton/buildkit/build/wmff/sites/default/civicrm/extensions/org.wikimedia.omnimail/tests/phpunit/OmnihellGetTest.php:34|n /Users/eileenmcnaughton/buildkit/build/wmff/vendor/phpunit/phpunit/phpunit:52|n ' flowId='8701']

@Eileenmcnaughton let me know if I should pull in Trilogy for help!

Thanks - I'll see if a new day brings new eyes

Change 535954 had a related patch set uploaded (by Ejegg; owner: Ejegg):
[wikimedia/fundraising/crm@master] Slight cleanup for Omnihell

https://gerrit.wikimedia.org/r/535954

Change 534736 merged by jenkins-bot:
[wikimedia/fundraising/crm@master] Add finally-working attempt to log into Silverpop & get the query html

https://gerrit.wikimedia.org/r/534736

Change 535954 merged by jenkins-bot:
[wikimedia/fundraising/crm@master] Slight cleanup for Omnihell

https://gerrit.wikimedia.org/r/535954

@CCogdill_WMF I deployed a cut at this today but it's failing at the moment because of the firewall. I opened T232803 for that. Note that until that is fixed the whole job is failing & mailing data is not being updated. I think it will be resolved in the next day & there is no real impact but just letting you know

Eileenmcnaughton added a subtask: Restricted Task.Sep 13 2019, 5:03 AM

Change 537344 had a related patch set uploaded (by Eileen; owner: Eileen):
[wikimedia/fundraising/crm@master] Add ability to turn omnihell on & off

https://gerrit.wikimedia.org/r/537344

Change 537344 merged by jenkins-bot:
[wikimedia/fundraising/crm@master] Add ability to turn omnihell on & off

https://gerrit.wikimedia.org/r/537344

@CCogdill_WMF I just managed to get past the last hurdle blocking data coming in - try doing this query

select hash, query_criteria_220 FROM civicrm_mailing m LEFT JOIN civicrm_value_mailing_data_21 v ON v.entity_id = m.id WHERE query_criteria_220 IS NOT NULL LIMIT 10;

AND let me know if you can work with the data like that

Ah yessss this is so awesome, Eileen!! Would be good to rename that hash
field if possible to utm_source or mailing_id or something so it's easier
to find and join across tables, but I think this is exactly what I wanted
:D And just in time for our new analysts to use.

The hash field is in core table so we wouldn't want to change the field - I think it's duplicated in the civicrm_campaign table but I think my reason for that I think was civicrm didn't support custom fields for mailings at the time.

Dwisehaupt closed subtask Restricted Task as Resolved.Sep 26 2019, 6:03 PM
Ejegg closed this task as Resolved.Oct 1 2019, 8:14 PM

Thanks so much for this! Just one last question--how often will this be
updated? Daily? Will there ever be a lag?

Ejegg added a comment.Oct 1 2019, 8:37 PM

Hi @CCogdill_WMF - This was added to the existing job that fetches the recipient information, so it should run every 10 minutes. That job sometimes fails for a bunch of consecutive runs when API calls time out, and that would cause all recipient data including queries to be up to a few hours behind.

Ooh awesome, sounds like hardly any lag at all, even in the event of a
failure.

Thank you!