Description
As part of T401384: FY25-26 SDS2.1.5 User Experience - Attribute Selection we have to work on improving xLab validation to provide advice to users when selected contextual attributes modify the selected risk level when registering an instrument. Also when the users is adding contextual attributes and they may have implications for privacy or the risk level than can be assigned to their instrument. And all this must be done according to the Data Collection Guidelines.
This task aims to implement the needed validation and guidance in xLab.
Affected contextual attributes
According to the Data Collection Guidelines, the relevant contextual attributes would be the following:
- agent_ua_string: Considered as Personal Information (not available yet but we are working on it T385180: Implement agent.ua_string as contextual attribute)
- performer_id: Considered as user ID
- performer_name: Considered as username
- page_id: Considered as long-term viewing history
- page_title: Considered as long-term viewing history
Relevant criteria
The Data collection risk tiering grid defines the 5 criteria that any data collection must meet to be considered as Low risk. Some of these criteria are already implicitly met by the platform and xLab (the ones related to the data subject, data sender, data recipient and the retention period), so here we will focus on the one that is related to the collected data itself.
Depending on that data (which is collected via contextual attributes) the risk level could change to be considered as medium in the following cases:
- The data collected does not include:
multiple items of unhashed personal information(not applicable, there is only one contextual attribute, agent_ua_string, that can be considered as personal information)- personal information + username/user ID or app ID:
- Any combination of agent_ua_string + perfomer_id/performer_name/agent_app_install_id could make this criterion fail
- Technically page_id/page_title + agent_app_install_id when the user is logged-in could make this criterion fail (not possible to know ahead of time but a warning message could be shown)
- long-term viewing history + unique ID
- Any combination of page_id/page_title + performer_id/performer_name could make this criterion fail
granular geographic data + unique ID(not applicable, there are no contextual attributes related to geographical data)sensitive data(not applicable, there are no contextual attributes related to sensitive data)
There are also some cases where the instrument risk level will have to be considered as high:
- The data collected includes agent_ua_string + performer_name/performer_id + page_id/page_title because two low risk criteria (see above) would fail:
- Any combination of agent_ua_string + perfomer_id/performer_name/agent_app_install_id could make this criterion fail
- Any combination of page_id/page_title + performer_id/performer_name could make this criterion fail
Warning/error messages and user guidance
The main goal here should be to give advice to users when registering their instruments, specifically when filling the fields that are related to contextual attributes and the risk level. The following would be potential scenarios where xLab can take some actions:
- The user is selecting the contextual attributes:
- xLab could show a message suggesting the required risk level based on the selected contextual attributes as they select them
- xLab could show a warning message if page_id/page_title + agent_app_install_id are selected because that would require a security and legal review in the case the user is logged-in (as we mentioned above, not possible to know ahead of time)
- The user defines Risk assessment pending:
- xLab wouldn't need to check anything else because the user hasn't finished yet the instrument configuration. They could modify the instrument later and add/remove contextual attributes and define the corresponding risk level. xLab should wait until then
- The user chooses Low risk as the risk level:
- xLab will check that the selected contextual attributes meet that risk level, according to the relevant criteria explained above. If not, an error message will be thrown and the user won't be able to register/modify the instrument until they fix this
- The user chooses Medium risk as the risk level:
- xLab could check if there really is a combination contextual attributes that requires that risk level. In case there isn't, xLab could show an error message explaining this (taken the criteria above into account). The user should change the risk level to be able to register/modify the instrument
- The user chooses High risk as the risk level:
- xLab could check if there really is a combination contextual attributes that requires that risk level. In case there isn't, xLab could show an error message explaining this (taken the criteria above into account). The user should change the risk level to be able to register/modify the instrument
- The user is using a custom schema:
- Some additional attributes could be collected via the custom schema so we should, at least, show a warning message about it so that instrument owners can check those attributes and whether the chosen risk level is the appropriate one
In any case, if the user doesn't have yet the corresponding Security and legal review, they will have to set the risk level as Risk assessment pending. That will allow them to register/modify the instrument but it won't be possible to activate it until the Risk Level is set to a specific tier, and the Security and legal review link is provided, if needed. The user will always be able to modify again the instrument to set the appropriate risk level and the corresponding link.
Acceptance criteria
Scenario 0: Before selection
- By updating the “Contextual attributes” field description to mention their impact on risk “Collect extra information about the users who triggered the event and the wiki where the event occurred. Some attribute combinations will increase the risk level of this instrument. Learn more”.
- We could include information about "risk-increasing" combinations in the contextual attributes' doc page, which would complement the field's description and also support selection. Maybe this is what was meant by the AC "Documentation should be updated to include a section on regulation and data collection guidelines".
Scenario 1: Users select contextual attributes that impact risk before selecting the Risk level.
- Display an information message below the Risk level field, where it can inform the user’s selection. Copy e.g., : “Based on the selected Contextual attributes, this instrument has a minimum risk level of “{{Tier:Risk}}” and it requires a Security and Legal review”. The message is more significant in the context of that field, as it supports user selection.
Scenario 2: Users had selected or select a risk level that's lower than their latest attribute selection
In this case, regarding the selected attributes, we will consider the following:
- Any combination of agent_ua_string + perfomer_id/performer_name/agent_app_install_id would require Tier 2: Medium risk as the selected risk level
- Any combination of page_id/page_title + performer_id/performer_name would require Tier 2: Medium risk as the selected risk level
- Any combination of agent_ua_string + performer_name/performer_id + page_id/page_title would require Tier 3: High risk as the selected risk level
- As specified in the ticket, displaying an inline warning message under the Contextual attributes field sounds good. I'd suggest indicating something like: "The selected attributes increase the data collection risk level of this instrument. Please review the Risk level field selection" (because a corresponsing error message will be displayed there)
- The “Risk level” field should display an error state, as indicated in the task description. The copy could be simplified: “The contextual attributes selected increase the risk of this instrument to “{{Tier:Risk level}}”
No specific scenario
- The selection of a higher level of risk won't be validated, given that there might be other factors influencing this choice
- A warning message is shown (along with the contextual attributed field) when page_id/page_title + agent_app_install_id are selected regardless the risk level the user selects (that could require a security and legal review in the case the user is logged-in and it's not possible to know ahead of time)
- A warning message (along with the Risk level field) is shown when the user selects a custom schema (additional and unknown attributes could be collected via that custom schema)
- If the user doesn't have yet the corresponding Security and legal review, they will be able to continue choosing Risk assessment pending as the Risk level for their instrument. That will allow them to register/modify the instrument but it won't be possible to activate it until the Risk Level is set to a specific tier, and the Security and legal review link is provided, if needed. The user will always be able to modify again the instrument to set the appropriate risk level and the corresponding link (that's the current behaviour)
Documentation
- Documentation should be updated to include a section on regulation and data collection guidelines: https://wikitech.wikimedia.org/wiki/Experimentation_Lab/Regulation_section
- The contextual attributes section is also updated with the combination of contextual attributes that can increase the risk level of an instrument: https://wikitech.wikimedia.org/wiki/Experimentation_Lab/Contextual_attributes#Privacy_considerations




