Page MenuHomePhabricator

Scrape HTML from IBM UI and bring queries into civi
Open, Needs TriagePublic


We investigates and chose a direction here: T227363

We want to try and scrap the queries from the UI and save them in civi as a whole text field. @Eileenmcnaughton , @Ejegg Please add more technical info if I am missing something.

notes from Brian

We login a post to the login url, full code attached. I think on pod4 you might login to instead.

Then we get mailing summary data from "action=displayHtmlBody&mailingId=XXXXX" and "action=mailingSummary&mailingId=XXXXX"

You can get the Query criteria from and pull what’s in the newQueryEnglish span and format however you need.

private async Task<HttpClient> GetAuthenticatedWebClientAsync()

    var httpClientHandler = new HttpClientHandler { UseCookies = true };
    var cookieContainer = new CookieContainer();
    httpClientHandler.CookieContainer = cookieContainer;
    var httpClient = new HttpClient(httpClientHandler);
    httpClient.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36");

    // Load the login page and extract a few key values
    var loginGetResponse = await httpClient.GetAsync(this.WebLoginUri);
    string loginHtml = await loginGetResponse.Content.ReadAsStringAsync();
    var ltRegex = new Regex(@"name=""lt"" value=""(?<ltValue>.*?)""", RegexOptions.IgnoreCase);
    var executionRegex = new Regex(@"name=""execution""\s+value=""(?<executionValue>.*?)""", RegexOptions.IgnoreCase);
    string lt = ltRegex.Match(loginHtml).Groups["ltValue"].Value;
    string execution = executionRegex.Match(loginHtml).Groups["executionValue"].Value;

    // Post the username and password to the action url to log in the user
    var loginRequest = new HttpRequestMessage(HttpMethod.Post, this.WebLoginUri);
    var formValues = new Dictionary<string, string>();
    formValues.Add("username", this.tenant.Username);
    formValues.Add("password", this.tenant.Password);
    formValues.Add("lt", lt);
    formValues.Add("execution", execution);
    formValues.Add("_eventId", "submit");
    loginRequest.Content = new FormUrlEncodedContent(formValues);

    var loginPostResponse = await httpClient.SendAsync(loginRequest);
    var loginPostHtml = await loginPostResponse.Content.ReadAsStringAsync();

    if (loginPostHtml.Contains("loginForm"))
        throw new Exception("Could not log in via web interface");

    return httpClient;