Response to EU AI Office's WG 2 - Risk identification and assessment for systemic risk: Feedback to the third draft of the General-Purpose AI Code of Practice
March 28, 2025
Safety and Security Section Preamble
Please provide below your thoughts on the preamble of the Safety and Security Section of the Code of Practice.
The preamble of the Safety and Security Section of the General-Purpose AI Code of Practice presents several significant shortcomings that undermine its effectiveness and clarity. Primarily, it suffers from a lack of specificity, clarity, and decisive direction, resulting in an overly broad and vague framework that allows too many optional approaches. This flexibility significantly dilutes the regulatory strength and potential effectiveness of the Code.
Firstly, the preamble does not sufficiently specify what constitutes a systemic risk. Given the importance of systemic risk as a foundational concept in this section, the absence of explicit and illustrative examples severely limits its clarity and practical applicability. This ambiguity risks inconsistent interpretations and potentially weakens enforcement and compliance.
Secondly, the designation of critical systemic risks—such as risks to fundamental rights, public health, privacy, and democratic processes—as optional significantly weakens the protective intent of the Code. These risks should be explicitly mandatory and rigorously enforced, as they directly affect core human rights and societal stability. Making them optional suggests a troubling concession to commercial interests, undermining the Code's commitment to safeguarding human rights and public safety.
Thirdly, the emphasis on proportionality, while intended to support SMEs, remains inadequately defined. Without clear guidelines or standards for operationalising proportionality, there is a real risk of inconsistencies and gaps in the application of the Code, allowing entities to potentially circumvent rigorous safety assessments and mitigations.
Moreover, the preamble inadequately addresses the necessity for independent external assessments. The absence of an explicit mandate for external, independent oversight significantly compromises the transparency, objectivity, and accountability of systemic risk assessments. Relying too heavily on internal evaluations creates inherent conflicts of interest and reduces public trust in the process.
Finally, while the preamble briefly mentions the Precautionary Principle, it fails to provide practical guidance on its application. Without concrete procedural detail, this principle becomes merely symbolic rather than a robust component of risk management strategy.
In conclusion, to enhance effectiveness and credibility, the preamble requires substantial revisions, including illustrative definitions of systemic risks, mandatory assessments for all systemic risks not only those detailed in Appendix 1.1, clear operational guidelines for proportionality, mandatory external evaluations, and detailed practical guidance on precautionary measures.
Risk assessment for providers of GPAISRs
Risk assessment, Commitment II.2: Systemic risk assessment and mitigation along the entire model lifecycle, including during model development
Commitment II.2 should be substantially edited and clarified. While the current commitment appropriately emphasises systemic risk assessment throughout the model lifecycle, including development phases—it leaves too much flexibility for providers to determine what constitutes appropriate milestones. Such ambiguity risks creating loopholes and inconsistent application, potentially undermining the effectiveness of the risk assessment process.
Specifically, the commitment's lack of precise definitions and mandated checkpoints during critical lifecycle phases—such as initial data collection, pre-training, training, fine-tuning, deployment, and ongoing post-market monitoring—significantly weakens its regulatory strength. This gap not only reduces clarity and accountability but also increases the likelihood of human rights violations slipping through undetected, particularly at early stages when issues can more readily be addressed.
Moreover, the absence of explicitly required external assessments at these milestones further compounds risks related to conflicts of interest, internal biases, and incomplete transparency. As seen in other high-stakes industries, independent external evaluations play a critical role in maintaining objectivity and rigor. Incorporating mandatory external reviews at defined checkpoints would significantly strengthen the commitment by ensuring unbiased risk assessments and consistent application across providers.
Lastly, systemic risk mitigation must not be perceived merely as a procedural hurdle, but rather a foundational safeguard for human rights and societal interests. Clarifying mandatory risk mitigation measures, clearly outlining procedures, timelines, and reporting requirements for each phase of the lifecycle would enhance compliance, transparency, and public trust in AI systems.
In summary, Commitment II.2 should explicitly define mandatory lifecycle milestones, mandate independent external assessments at these points, and provide clearer guidance for systemic risk mitigation measures. Such substantial edits would significantly improve the commitment's capacity to protect human rights and promote responsible AI deployment.
Risk assessment, Measure II.2.1: Planning development
While Measure II.2.1 lightly emphasises having an appropriate Safety and Security Framework and initiating systemic risk assessment early, it offers too much interpretative leeway regarding when and how precisely these steps should occur. The vague requirement that these actions should begin at the latest 4 weeks after notifying the AI Office creates potential timing issues. It implicitly allows for crucial early development activities, such as pre-training data collection or initial model architecture design, to proceed without adequate risk oversight.
To enhance its clarity and robustness, the measure must explicitly mandate risk assessment and mitigation at clearly defined early stages, such as initial project conceptualisation, data collection, and architectural design decisions; well before significant development milestones or substantial resource investments. Furthermore, the timeframe specified (at the latest 4 weeks) should be revised to require that a robust, detailed risk assessment and mitigation plan be developed and documented prior to any substantial training or resource allocation. NB. There are several tools that can support this effort -- Google's SAIF Risk Assessment Tool and Unesco's Ethical Impact Assessment among others.
Also, the measure should explicitly require external validation or independent oversight of the initial planning phase. Independent verification at this critical early stage would help reduce the risk of biased assessments and ensure that the foundational elements of the AI model adequately protect human rights and prevent foreseeable systemic harms.
In summary, Measure II.2.1 should be substantially edited to clearly specify early-stage milestones, impose stricter timelines for implementing risk assessments, and mandate independent external verification of these critical initial planning steps. This would significantly strengthen the measure's efficacy, enhance transparency, and ensure robust protection against systemic risks.
Risk assessment, Measure II.2.2: During development
Measure II.2.2 attempts to ensure systemic risk assessments occur during model development, including pre-training, training, and post-training phases. However, its current wording introduces ambiguity and too much flexibility, potentially undermining effective enforcement. Specifically, the measure permits providers broad discretion in defining appropriate milestones, such as compute-based, development-process-based, or metrics-based milestones, which risks inconsistent application and inadequate monitoring of systemic risks.
To address these critical shortcomings, the measure should be substantially clarified in several ways:
Explicit Milestone Definitions: The measure should clearly prescribe specific, standardised milestones, rather than allowing providers full autonomy to select their own. For instance, milestones might explicitly include initial data set compilation, model architecture finalisation, halfway points of training, initiation of fine-tuning or reinforcement learning, and pre-deployment evaluation stages.
Mandatory Independent Reviews: Each clearly defined milestone should trigger mandatory external, independent evaluations to ensure objectivity, rigor, and transparency in the risk assessment process. External assessments at multiple stages significantly reduce potential biases or conflicts of interest inherent in purely internal evaluations.
Detailed Documentation and Transparency: The measure should explicitly require comprehensive documentation of decisions, assessments, and mitigations at each milestone, publicly available or accessible to the AI Office for scrutiny. Transparent documentation ensures better accountability and allows independent stakeholders to verify compliance rigorously.
Real-time Monitoring and Automated Triggers: The measure currently references automated benchmarks but lacks clarity on how these benchmarks should be operationalised. Clear requirements on automated monitoring methods and specific criteria for triggering immediate development pauses or deeper assessments would enhance responsiveness to emergent risks.
I believe Measure II.2.2 would more robustly ensure consistent, thorough, and transparent systemic risk assessment throughout AI model development, significantly strengthening protections against potential human rights violations and systemic harms if you implement these substantial clarifications and edits.
Risk assessment, Commitment II.3: Systemic risk identification
Commitment II.3, on Systemic risk identification, addresses a critical aspect of the risk management framework, yet it suffers from significant shortcomings. The primary issue is that it makes only certain systemic risks mandatory while designating others; including significant risks to fundamental rights, public health, democratic integrity, privacy, and safety—as optional. This distinction lacks a clear justification and significantly weakens the protective intent of the Code, potentially exposing individuals and societies to severe, foreseeable harms.
To improve this commitment substantially I recommend the following.
Mandatory comprehensive risk coverage — All systemic risks listed, particularly those related to fundamental rights (discrimination, privacy violations), democratic processes, and public safety, must be explicitly mandatory for assessment and mitigation. The current optional approach for critical systemic risks contradicts established international standards such as those outlined in the Hiroshima Code of Conduct, OECD AI Principles, and the AI Act itself, and significantly dilutes accountability.
Explicit criteria and methods —Clarify precisely how systemic risks must be identified, specifying standardised criteria or methodologies. For instance, clearly requiring methods like detailed threat modelling, comprehensive risk scenario development, and explicit consideration of known historical incidents would greatly enhance the thoroughness and consistency of risk identification.
Enhanced transparency and documentation — The commitment should require clear documentation on how systemic risks were selected or excluded, including justifications, methodologies applied, and stakeholders involved. Such transparency facilitates external verification, accountability, and trust.
Independent review of risk identification — Require independent external assessments or audits of the systemic risk identification process to reduce the risk of internal biases, conflicts of interest, or incomplete evaluations.
If you implement these suggestions then Commitment II.3 would provide clearer guidance, ensure thorough risk coverage, enhance transparency, and significantly strengthen the overall framework’s protective capabilities against systemic and human rights risks posed by AI systems.
Risk assessment, Measure II.3.1: Systemic risk selection
Measure II.3.1 currently delegates the responsibility of systemic risk selection primarily to signatories themselves. This arrangement introduces a significant conflict of interest, as providers could selectively identify risks based on business convenience or resource constraints rather than genuine societal concerns. Allowing AI developers to self-determine which risks merit attention creates potential gaps, especially concerning critical areas such as fundamental rights, privacy, discrimination, and democratic integrity.
Given the complexity and gravity of these systemic risks, the responsibility for their identification should not rest exclusively with providers. Instead, it should be guided by mandatory, externally-defined criteria and oversight by an independent body, such as the EU AI Office. Clear external standards would ensure consistency, impartiality, and comprehensive coverage of systemic risks. Additionally, external oversight would strengthen accountability, transparency, and public trust in the assessment process.
Moreover, robust independent validation or audits of the risk selection process should be explicitly required to minimise potential biases and conflicts of interest inherent in self-assessment. Providers should also document and justify their risk selection transparently, ensuring external evaluators and stakeholders can verify and hold signatories accountable for their assessments.
In sum, revising the measure to clearly mandate external definitions, oversight, and rigorous independent validation would substantially enhance its integrity, effectiveness, and alignment with the protective intent of the EU AI Act.
Risk assessment, Measure II.3.2: Determining systemic risk scenarios
Measure II.3.2 addresses the crucial step of determining systemic risk scenarios, but it currently provides insufficient guidance and clarity, weakening its practical effectiveness. As written, the measure allows signatories considerable flexibility to define scenarios based on their interpretation, capabilities, or available resources. Such broad flexibility poses serious concerns, as providers may inadvertently—or intentionally—overlook scenarios that might highlight severe yet inconvenient or costly risks.
To address these critical shortcomings, the measure should explicitly require standardised methodological guidelines for scenario development. Clear guidelines could include structured methods such as robust threat modelling, historical incident analysis, expert elicitation, stakeholder consultations, and scenario stress-testing. These structured methods help ensure thoroughness, consistency, and comparability of systemic risk scenarios across all providers.
Moreover, the measure should explicitly mandate external, independent review or validation of these systemic risk scenarios to prevent internal biases or conflicts of interest. Independent oversight helps ensure that risk scenarios are both comprehensive and sufficiently detailed, capturing potential risks to fundamental rights, privacy, democracy, public health, and safety—even if these scenarios might present challenges or additional costs for the provider.
Furthermore, documentation and transparency requirements should be strengthened. Providers should not only clearly document their scenarios and rationales but also make these available for external scrutiny by the AI Office or independent auditors. Such transparency facilitates accountability, enabling external stakeholders to verify the comprehensiveness and appropriateness of the scenarios considered.
In sum, Measure II.3.2 should be substantially revised to mandate clearer methodological standards, independent validation, and enhanced transparency and accountability mechanisms. This approach will ensure robust, comprehensive, and impartial identification of systemic risk scenarios, significantly enhancing the protective effectiveness of the Code.
Risk assessment, Commitment II.4: Systemic risk analysis
Commitment II.4, concerning systemic risk analysis, appropriately emphasises the importance of thorough and consistent analysis of identified risks. However, as currently worded, it remains overly broad and lacks sufficient detail about how providers should conduct these analyses and what specific standards or tools they should employ.
To enhance clarity and effectiveness, the commitment should explicitly reference and recommend standardised, well-established analytical tools and frameworks. For example, structured analytical methods such as:
Algorithmic Impact Assessments (AIAs) — developed by the Canadian government, the Algorithmic Impact Assessment systematically evaluate risks based on severity, scale, and likelihood.
UNESCO Ethical Impact Assessment — tool emphasising ethics, human rights, and sustainability in AI development.
ISO/IEC standards for AI risk management — is an established international standard that define risk assessment methods, processes, and documentation clearly and rigorously.
Including explicit references to these established tools would provide providers with clear guidance on acceptable standards, ensuring consistency and comparability of systemic risk analyses across the industry. Note well that the AI Safety report of 2025 and the Global Index for AI Safety report of 2025 are useful resources to address systemic risk analysis.
Also, the commitment should mandate independent external audits or validations of systemic risk analyses to maintain objectivity and avoid internal biases. Providers should be required to document their analytical methodologies comprehensively, transparently demonstrating how and why certain tools or standards were chosen, and ensuring these documents are accessible to external evaluators or the AI Office.
In summary, Commitment II.4 should be substantially clarified by explicitly recommending specific, internationally recognised analytical tools and standards, mandating independent external validation, and strengthening documentation and transparency requirements. These enhancements will ensure robust, consistent, and comprehensive systemic risk analysis aligned with best practices and international frameworks.
Risk assessment, Measure II.4.1: Systemic risk estimation
Measure II.4.1 rightly emphasises estimating the likelihood and severity of systemic risks but remains overly vague on how providers should concretely perform these estimations. Currently, the measure's broad language allows for inconsistent approaches, potentially undermining comparability and accuracy across assessments. To address this, the measure should specify standardised risk estimation methods (e.g., probabilistic modelling, historical data analysis, structured expert judgment) and criteria for clearly differentiating severity levels. In addition, the measure should explicitly require providers to justify their risk estimations transparently, outlining assumptions, uncertainties, and the reasoning behind assigned likelihood and impact scores. This would reduce subjectivity, increase accountability, and strengthen oversight.
Risk assessment, Measure II.4.2: Safely derived models
Measure II.4.2 concerning safely derived models is currently ambiguous and lacks sufficient clarity on essential safeguards required to deem a model as safely derived. While it aims to enable model providers to rely on previously assessed models, it does not adequately define clear criteria or thresholds for establishing equivalence or comparability in risk profiles between original and derived models.
The measure must specify explicit conditions under which a derived model can inherit safety assurances from a previously assessed model. Clear standards; such as similarity in training data, architectural modifications, or usage scenarios, should be articulated to prevent misuse or overly broad assumptions of safety.
Moreover, independent external validation should be required to confirm equivalence or safety inheritance claims. Providers should thoroughly document and transparently justify how derived models maintain or enhance the safety assurances of the original models.
Clarifying these points would prevent providers from inappropriately leveraging past assessments to bypass necessary risk analyses, thus ensuring that derived models remain genuinely safe and robust against systemic risks.
Risk assessment, Measure II.4.3: Model-independent information
Measure II.4.3, which addresses using model-independent information in risk assessments, currently lacks specificity and enforceability. It broadly suggests using external information but fails to detail exactly what types or quality of information should be considered, how providers should incorporate it systematically, or how to validate its relevance and accuracy. This ambiguity creates considerable room for inconsistent or superficial applications.
To clarify and strengthen this measure, specific categories of required model-independent information should be explicitly defined—for instance, documented historical incidents, public risk registries, industry-wide analyses, international standards (OECD, UNESCO, ISO/IEC), and academic research databases. Providers should also be explicitly required to document the process by which they select, review, and apply such external information, providing transparent justification and methodology. Moreover, independent verification or auditing should be mandated to ensure the thoroughness and legitimacy of the selected sources, eliminating the risk of selective or biased inclusion of external information.
These explicit requirements would significantly improve clarity, consistency, accountability, and overall rigor of systemic risk assessments conducted by providers.
Risk assessment, Measure II.4.4: State-of-the-art model evaluations
Measure II.4.4 on State-of-the-art model evaluations is significantly weakened by the removal of explicit KPIs (key performance indicators). By eliminating clearly defined KPIs, the measure loses critical benchmarks for objectively evaluating and comparing model safety and performance against established standards. Without these concrete metrics, evaluations become subjective, inconsistent, and less transparent, reducing accountability and the ability to detect systemic risks effectively.
The measure should therefore explicitly reintegrate detailed, measurable KPIs, clearly defined by relevant international standards or best practices (e.g., fairness benchmarks, robustness and reliability scores, transparency indicators, or privacy preservation metrics). Providers should be required not only to measure their models against these standardised KPIs but also to document and publicly report outcomes, including detailed justifications for their scoring and comparisons.
Furthermore, external independent validation or auditing of these KPI-based evaluations must be mandated, ensuring objectivity, fairness, and preventing internal bias or selective reporting.
Reintroducing clear KPIs, mandating transparent documentation, and requiring external validation would substantially enhance the measure’s clarity, rigor, comparability, and effectiveness in addressing systemic risks.
Risk assessment, Measure II.4.5: Rigorous model evaluations
Measure II.4.5 on Rigorous model evaluations currently lacks essential detail regarding what constitutes rigorous evaluation. The term itself is subjective and insufficiently precise, allowing significant discretion that could result in inconsistencies and inadequate assessments. To meaningfully address systemic risks, the measure should explicitly define rigorous evaluation methods, criteria, and standards.
Specifically, it should include clear references to established evaluation frameworks, benchmarks, or best practices—such as standardised testing protocols, adversarial robustness testing, bias audits, transparency checks, or stress tests under realistic scenarios. Also, the measure should mandate clearly defined evaluation criteria, metrics, and minimum performance thresholds that models must meet to demonstrate safety and compliance.
Another crucial improvement is mandating external independent oversight or validation of these evaluations. Allowing providers to self-determine rigorousness risks internal biases, conflicts of interest, and selective reporting. Independent third-party verification ensures objectivity, consistency, and trustworthiness of evaluation results.
Finally, providers should be explicitly required to document and transparently report their evaluation methodologies, outcomes, and justifications for meeting or failing defined criteria. Such transparency would strengthen accountability, enable meaningful external scrutiny, and reinforce public confidence in model safety and reliability.
In sum, explicitly defined evaluation standards, external independent verification, and comprehensive transparency requirements are essential for making Measure II.4.5 truly rigorous, consistent, and effective in mitigating systemic AI risks.
Risk assessment, Measure II.4.7: Models as part of systems
Measure II.4.7, addressing Models as part of systems, currently acknowledges that risks can arise from AI models operating within broader technological systems or integrated workflows. However, the measure is overly broad and lacks specificity, making its application uncertain. Its general wording does not clearly outline expectations for evaluating systemic risks in integrated environments, nor does it explicitly mandate coordinated risk management strategies across interconnected system components.
To strengthen this measure, clear and detailed guidelines are necessary. Firstly, the measure should explicitly require providers to assess and document how interactions between their AI models and other system components can amplify existing risks or create new ones. Providers should use standardised assessment methodologies—such as systems-oriented risk analyses, scenario simulations, and interface testing—to systematically evaluate these interactions and their implications.
Secondly, explicit guidelines should be included for collaboration and information-sharing among different providers or stakeholders responsible for different system components. Since integrated systems often span multiple providers, clear requirements for coordinated risk assessments, mitigation strategies, and communication protocols between parties are essential.
Thirdly, the measure must mandate independent external oversight or validation of these integrated assessments to ensure objectivity, transparency, and thoroughness. Without independent oversight, there is a high likelihood of gaps and biases in identifying systemic risks.
Finally, the measure should explicitly call for clear documentation and public transparency regarding how systemic integration risks are identified, assessed, mitigated, and continuously monitored over time.
These explicit guidelines and mandatory external validation could ensure rigorous and consistent assessments of AI models within broader systems, significantly enhancing overall systemic risk mitigation.
Risk assessment, Measure II.4.9: Exploratory research and open-ended red-teaming
easure II.4.9 appropriately recognises the importance of exploratory research and open-ended red-teaming as crucial practices for uncovering unforeseen risks in AI models. The emphasis on proactive exploration and adversarial testing is commendable, as these techniques help surface vulnerabilities and emerging risks not easily identified through structured evaluations alone.
However, the measure could be improved by clarifying certain key aspects. Firstly, it should explicitly require documentation standards for exploratory research and red-teaming exercises. Providers should be mandated to thoroughly document methodologies, participant expertise, specific vulnerabilities or risks identified, and the actions taken in response.
Secondly, guidelines should clearly articulate the scope and scale of required red-teaming activities, ensuring they are adequately broad and diverse to realistically stress-test the model under challenging and varied scenarios. Explicit encouragement or mandatory inclusion of diverse perspectives—including external experts, interdisciplinary teams, and civil society representatives—would significantly enhance the robustness and comprehensiveness of these exercises.
Finally, clear expectations for reporting and transparency should be established. Providers should be required to publicly report summaries or key findings from red-teaming exercises, including identified risks, vulnerabilities, and subsequent mitigation actions. This transparency will increase accountability, inform public trust, and contribute valuable insights to the broader AI safety community.
Taxonomy of Systemic Risks, Appendix 1.1: Selected types of systemic risk
Appendix 1.1, which outlines selected types of systemic risks, significantly narrows the scope of risks that require mandatory assessment, leaving many well-documented and highly impactful risks—such as discrimination, privacy violations, threats to democratic processes, and environmental harm—as optional. This tiered approach undermines the EU AI Act’s broader objective of safeguarding fundamental rights and public interest and creates a concerning misalignment with international standards such as the Hiroshima Process, OECD AI Principles, and UNESCO’s recommendations.
The selective inclusion of only four mandatory risks (e.g., CBRN, cyber offense, loss of control, and harmful manipulation) sends a troubling signal that harms impacting societal structures and vulnerable groups are secondary, despite growing evidence of their real-world consequences. Discrimination, for instance, is a well-known and persistent risk across AI systems, and privacy violations are already prompting legal action in several jurisdictions. Making such risks optional dilutes accountability and allows providers to avoid engaging with harms that are harder to quantify but no less damaging.
This appendix should be substantially revised to re-prioritize a comprehensive and rights-based taxonomy of systemic risks. All risks related to fundamental rights, public safety, and democratic stability should be elevated to mandatory status. The taxonomy should also include guidance on how risks may evolve over time and mechanisms for regular updates based on new evidence, stakeholder input, and post-deployment incident data.
In summary, Appendix 1.1 should be expanded, rights-aligned, and designed for adaptability—ensuring that systemic risk assessments are holistic, responsive, and grounded in the public interest.
Taxonomy of Systemic Risks, Appendix 1.2: Other types of risks for potential consideration in the selection of systemic risks
Appendix 1.2 currently presents a list of other types of risks for potential consideration, but its framing as optional undermines the seriousness and urgency of many of the included risks. Several of the listed risks—such as those related to public health and safety, fundamental rights (e.g., discrimination, freedom of expression, and privacy), threats to democratic processes, and environmental degradation—are not merely theoretical or peripheral concerns. They are well-documented, pressing, and already materialising in the real-world deployment of powerful AI systems.
Categorising these risks as optional and leaving it to providers to potentially consider them, weakens the Code’s ability to uphold the public interest and erodes alignment with the EU AI Act’s intent. It also creates space for inconsistent implementation, where some providers may choose to overlook critical harms that are harder to measure or politically sensitive, despite their societal relevance.
To improve the appendix, many of these risks—especially those impacting human rights, democratic institutions, and vulnerable populations—should be moved to the mandatory risk category in Appendix 1.1. For risks that remain in Appendix 1.2, clearer guidance should be provided on how providers are expected to justify inclusion or exclusion, including documentation requirements and review by the AI Office. Furthermore, the list should be regularly updated based on input from civil society, affected communities, and emerging evidence from deployment environments.
In its current form, the appendix presents a wide array of risks without clear prioritisation or enforcement mechanisms, potentially weakening the overall effectiveness of the risk assessment framework. Substantial edits are needed to elevate overlooked risks, ensure consistency, and uphold the EU’s leadership in rights-based AI governance.
Taxonomy of Systemic Risks, Appendix 1.3: Nature of systemic risks
Appendix 1.3 attempts to describe the nature of systemic risks but lacks sufficient clarity and practical utility. The framing is overly abstract and reads more like a conceptual reflection than a functional guide for implementation. Given the centrality of systemic risk in the Code of Practice, this appendix should offer sharper, operational definitions and real-world examples to ensure consistent understanding and application across providers.
Currently, the language remains vague with phrases like difficult to predict, emergent, or arising through interaction effects are valid descriptions, but without concrete guidance or illustrative case studies, they do little to help providers identify these risks in practice. This vagueness can lead to underreporting, inconsistent interpretations, or strategic avoidance of systemic risk classification altogether.
To improve the appendix, it should offer the following:
Concrete definitions and criteria — Clearly define systemic risks with measurable thresholds or indicators (e.g., scale of harm, number of users affected, cross-sector impact) to help distinguish them from ordinary risks.
Illustrative examples — Include practical, real-world examples (e.g., algorithmic discrimination in hiring at scale, misinformation amplification via chatbots, or privacy breaches from regurgitated training data) to guide understanding of how systemic risks manifest in practice.
Clarify the role of interdependencies —The current text references interactions between models and environments but doesn’t clarify how to assess these complex relationships. A structured framework or checklist could help providers identify interaction effects more systematically.
Mandate documentation and justification — Providers should be required to not only assess for systemic risk characteristics but also to justify and document their determinations in a way that is accessible to regulators and independent reviewers.
Appendix 1.3 should not be treated as a conceptual appendix alone; it must serve a practical, enforceable function. Substantially refining this section would support a clearer, more consistent, and more accountable application of the Code’s risk taxonomy, reinforcing the overall goal of protecting public safety, rights, and democratic stability.
Taxonomy of Systemic Risks, Appendix 1.4: Sources of systemic risk
Appendix 1.4 on Sources of Systemic Risk overlooks the real-world impact that certain AI design and deployment choices can have on vulnerable populations, including women, children, people with disabilities, the elderly, and marginalised communities. While the appendix lists broad categories such as model capabilities, deployment environments, and user interaction, it fails to frame these sources of risk through the lens of equity, inclusion, and human rights. This omission risks minimising or overlooking the disproportionate harms that AI systems can cause to groups already facing structural disadvantages.
To strengthen this appendix, each source of systemic risk should be re-examined with an explicit focus on how it may impact vulnerable populations. For example, under deployment environments, the appendix should require providers to assess whether AI systems will be used in high-stakes or sensitive contexts (e.g., education, healthcare, welfare services) that directly affect disadvantaged groups. Similarly, when evaluating user interactions, providers should be asked to consider accessibility challenges faced by people with disabilities or linguistic barriers that may impact certain communities.
Moreover, the appendix should include guidance for conducting inclusive and participatory assessments involving representatives from affected populations. This can help uncover risks that would otherwise be missed in a purely technical evaluation. Providers should also be encouraged to consider how biased training data or flawed assumptions about average users can lead to systemic exclusion or harm.
Finally, a stronger connection should be made between identifying sources of systemic risk and the obligation to mitigate them; especially when those risks are likely to exacerbate existing inequalities. Risk mitigation strategies should be proportionate not only to the severity of harm but also to the vulnerability of the affected population.
By reframing Appendix 1.4 to centre human impact and equity, the Code of Practice can better fulfil its purpose of protecting fundamental rights and ensuring AI systems serve all members of society—especially those most at risk of being harmed.
Kadian Davis-Owusu, PhD
Co-founder, TeachSomebody