Page MenuHomePhabricator

Grants output metrics
Closed, ResolvedPublic

Description

SQL / Python code to transform input data in to Output Rank Metrics

Input MetricsOutput Rank Metrics
Historical Grants 2010 to current USD ($), Total Grants Calendar Year 2021 USD ($), Historical Grants 2010-2021 Count, Annual Grants presence weighted by PPP (Current international $) (Percentile), Historical Grants presence weighted by PPP (2017 constant international $) (Percentile), Historical Grants 2010-2021 Count (Percentile)Grants Presence
Five Year Grants ChangeGrants Growth
Annual Grants (PPP Weighted )* (5-year growth/5), Historical Grants (PPP weighted ) * (5-year growth/5), Historical Grants 2010-2021 Count (Percentile), Annual Grants 2021 Count (Percentile)Grants Leadership

Event Timeline

Raw grants data loaded under

ntsako.grants

initial roll-up at

kcv.grants_output_rank_metrics

pending:

  • 3-letter to 2-letter country codes
  • not having all countries causes issues with percent rank
KCVelaga_WMF renamed this task from Grants Metrics Transformation to Grants Leadership Output Metrics transformation.Oct 17 2022, 4:08 PM
KCVelaga_WMF moved this task from In Code Review to In QA Review on the Equity-Landscape board.

Transformation code

WITH
    grants_input_metric_hstack AS (
        SELECT
            geography_level,
            iso2_country_code AS country_code,
            CAST(COALESCE(grants_usd_historical_to_date, 0) AS FLOAT) AS grants_usd_historical_to_date,
            CAST(COALESCE(grants_usd_calendar_year, 0) AS FLOAT) AS grants_usd_calendar_year,
            CAST(COALESCE(grants_count_historical_to_date, 0) AS FLOAT) AS grants_count_historical_to_date,
            CAST(COALESCE(grants_count_calendar_year, 0) AS FLOAT) AS grants_count_calendar_year,
            CAST(COALESCE(grants_usd_calendar_year_presence_weighted, 0) AS FLOAT) AS grants_usd_calendar_year_presence_weighted,
            CAST(COALESCE(grants_usd_historical_presence_weighted, 0) AS FLOAT) AS grants_usd_historical_presence_weighted,
            CAST(COALESCE(grants_count_calendar_year_ranked, 0) AS FLOAT) AS grants_count_calendar_year_ranked,
            CAST(COALESCE(grants_five_year_change, 0) AS FLOAT) AS grants_five_year_change,
            CAST(COALESCE(grants_total_ann_five_year_change, 0) AS FLOAT) AS grants_total_ann_five_year_change,
            CAST(COALESCE(grants_total_historical_five_year_change, 0) AS FLOAT) AS grants_total_historical_five_year_change
        FROM
            input_metrics),
            
    grants_input_metric_vstack AS (
        SELECT
            geography_level,
            country_code,
            STACK(10,
                    grants_usd_historical_to_date, 'grants_usd_historical_to_date',
                    grants_usd_calendar_year, 'grants_usd_calendar_year',
                    grants_count_historical_to_date, 'grants_count_historical_to_date',
                    grants_count_calendar_year, 'grants_count_calendar_year',
                    grants_usd_calendar_year_presence_weighted, 'grants_usd_calendar_year_presence_weighted',
                    grants_usd_historical_presence_weighted, 'grants_usd_historical_presence_weighted',
                    grants_count_calendar_year_ranked, 'grants_count_calendar_year_ranked',
                    grants_five_year_change, 'grants_five_year_change',
                    grants_total_ann_five_year_change, 'grants_total_ann_five_year_change',
                    grants_total_historical_five_year_change, 'grants_total_historical_five_year_change') AS (metric_value, metric)
        FROM 
            grants_input_metric_hstack),
    
    grants_input_metrics_ranks AS (
        SELECT
            geography_level,
            country_code,
            metric,
            PERCENT_RANK() OVER (
                PARTITION BY geography_level, metric
                ORDER BY metric_value) AS p_rank
        FROM 
            grants_input_metric_vstack),
    
    presence AS (
        SELECT
            geography_level,
            country_code,
            ROUND(
                PERCENT_RANK() OVER (
                    PARTITION BY geography_level
                    ORDER BY AVG(p_rank))*10, 2) AS metric_value,
            'grantee_presence' AS metric
        FROM
            grants_input_metrics_ranks
        WHERE
            metric IN ('grants_usd_historical_to_date', 'grants_usd_calendar_year', 
                        'grants_usd_calendar_year_presence_weighted', 'grants_usd_historical_presence_weighted', 
                        'grants_count_historical_to_date')
        GROUP BY geography_level, country_code),
        
    growth AS (
        SELECT
            geography_level,
            country_code,
            ROUND(
                PERCENT_RANK() OVER (
                    PARTITION BY geography_level
                    ORDER BY AVG(p_rank))*10, 2) AS metric_value,
            'grantee_growth' AS metric
        FROM
            grants_input_metrics_ranks
        WHERE
            metric IN ('grants_five_year_change')
        GROUP BY geography_level, country_code),

    leadership AS (
        SELECT
            geography_level,
            country_code,
            ROUND(
                PERCENT_RANK() OVER (
                    PARTITION BY geography_level
                    ORDER BY AVG(p_rank))*10, 2) AS metric_value,
            'grants_leadership' AS metric
        FROM
            grants_input_metrics_ranks
        WHERE
            metric IN ('grants_total_ann_five_year_change', 'grants_total_historical_five_year_change', 
                        'grants_count_historical_to_date', 'grants_count_calendar_year')
        GROUP BY geography_level, country_code)
        

SELECT * FROM presence
UNION ALL
SELECT * FROM growth
UNION ALL
SELECT * FROM leadership

access output rank metrics at

SELECT
    geography_level,
    country_code,
    country_name,
    sub_continent,
    continent,
    grantee_presence,
    grantee_growth,
    grants_leadership
FROM
    kcv.all_output_rank_metrics
JAnstee_WMF changed the task status from Open to In Progress.Nov 14 2022, 3:58 PM
JAnstee_WMF triaged this task as High priority.
KCVelaga_WMF renamed this task from Grants Leadership Output Metrics transformation to Grants output metrics.Dec 10 2022, 11:28 AM

@JAnstee_WMF the output metrics also align well now: https://docs.google.com/spreadsheets/d/1smlxmLZN3igND0vW1Zhsr5BRnXgWxx_zbrd5rxMhkqc/edit#gid=1985942781&range=I2:K250

once these are reviewed along with the inputs (T309276), Grants QA can be closed.

@KCVelaga Looks good and I am signing off with the exception of some column headers that still need adjusting.

NM code label / KCV code label >> RECOMMENDATION
Inputs:
grants_usd_historical_to_date / total_historical_grants_to_date >> sum_historical_grants_to_date
grants_usd_calendar_year / total_calendar_year_grants >> sum_calendar_year_grants
grants_usd_calendar_year_presence_weighted / total_annual_grants_presence_weighted >> annual_grants_weighted
grants_usd_historical_presence_weighted / total_historical_grants_presence_weighted >> historical_grants_weighted
grants_five_year_change / five_year_grants_change >> annual_grants_change_rate
grants_total_ann_five_year_change / total_ann_grants_five_year_change >> annual_grants_by_annual_change_rate
grants_total_historical_five_year_change / total_historical_grants_five_year_change >> historical_grants_by_annual_change_rate

Outputs:
grantee_presence >> grants_presence
grantee_growth >> grants_growth

Also noted here: https://docs.google.com/spreadsheets/d/1F8KTv6qwLmE7OhvoXRnFqgJ4A4QiBQtPttI5TzJrgGI/edit#gid=0

@JAnstee_WMF: @ntsako will handle column name changes.

Ntsako: assigning back to you as this is signed-off.