<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="../part2stratml.xsl"?><PerformancePlanOrReport><Name>Government by Algorithm: Artificial Intelligence in Federal Administrative Agencies</Name><Description>... we describe the challenges that lie ahead for agencies seeking to develop and deploy AI/ML tools, and where possible, recommend how to mitigate them. We focus on six major implications: (1) the challenges of building AIcapacity in the public sector, including data infrastructure, human capital, and regulatory barriers; (2) the difficulties inherent in promoting transparency and accountability; (3) the potential for unwanted bias and disparate impact; (4) potential risks to hearing rights and due process; (5) risks and responses associated with gaming and adversarial learning; and (6) the role of contracting for supplementing agency technical expertise and capacity.</Description><OtherInformation>Managed well, algorithmic governance tools can modernize public administration, promoting more efficient, accurate, and equitable forms of state action. Managed poorly, government deployment of AI tools can hollow out the human expertise inside agencies with few compensating gains, widen the public-private technology gap, increase undesirable opacity in public decision-making, and heighten concerns about arbitrary government action and power. Given these stakes, agency administrators, judges, technologists, legislators, and academics should think carefully about how to spur government innovation involving the appropriate use of AI tools while ensuring accountability in their acquisition and use. This report seeks to stimulate that thinking.</OtherInformation><StrategicPlanCore><Organization><Name>Administrative Conference of the United States</Name><Acronym>ACUS</Acronym><Identifier>_a9c4480e-5f04-11ea-8ee8-a29b1183ea00</Identifier><Description>This report was commissioned by the Administrative Conference of the United States in furtherance of its mission to“study the efficiency, adequacy, and fairness of . . . administrative procedure”; “collect information and statistics from . . . agencies and publish such reports as it considers useful for evaluating and improving administrative procedure”; and to “improve the use of science in the regulatory process.” 5 U.S.C. §§ 591, 594. The opinions, views, and recommendations expressed are those of the authors. They do not necessarily reflect those of the Conference or its members.</Description><Stakeholder StakeholderTypeType="Person"><Name>David Freeman Engstrom</Name><Description>Co-Author -- Stanford University -- David Freeman Engstrom is the Bernard D. Bergreen Faculty Scholar and an Associate Deanat Stanford Law School. He is an elected member of the American Law Institute and a faculty affiliate at the StanfordInstitute for Human-Centered Artificial Intelligence (HAI), CodeX: The Stanford Center for Legal Informatics, and theRegulation, Evaluation, and Governance Lab (RegLab). He received a J.D. from Stanford Law School, an M.Sc. from OxfordUniversity, and a Ph.D. in political science from Yale University and clerked for Chief Judge Diane P. Wood on the U.S. Courtof Appeals for the Seventh Circuit. </Description></Stakeholder><Stakeholder StakeholderTypeType="Person"><Name>Daniel E. Ho</Name><Description>Co-Author -- Stanford University -- Daniel Ho is the William Benjamin Scott and Luna M. Scott Professor of Law, Professor of Political Science,and Senior Fellow at the Stanford Institute for Economic Policy Research at Stanford University. He directs the Regulation,Evaluation, and Governance Lab (RegLab) at Stanford, and is a Faculty Fellow at the Center for Advanced Study in theBehavioral Sciences and Associate Director of the Stanford Institute for Human-Centered Artificial Intelligence (HAI). Hereceived his J.D. from Yale Law School and Ph.D. from Harvard University and clerked for Judge Stephen F. Williams on theU.S. Court of Appeals for the District of Columbia Circuit.</Description></Stakeholder><Stakeholder StakeholderTypeType="Person"><Name>Catherine M. Sharkey</Name><Description>Co-Author -- New York University -- Catherine Sharkey is the Crystal Eastman Professor of Law at NYU School of Law. She is anappointed public member of the Administrative Conference of the United States, an elected member of the AmericanLaw Institute, and an adviser to the Restatement Third, Torts: Liability for Economic Harm and Restatement Third, Torts:Remedies projects. She was a 2011-12 Guggenheim Fellow. She received an M.Sc. from Oxford University and a J.D. fromYale Law School. She clerked for Judge Guido Calabresi of the U.S. Court of Appeals for the Second Circuit and JusticeDavid H. Souter of the U.S. Supreme Court.</Description></Stakeholder><Stakeholder StakeholderTypeType="Person"><Name>Mariano-Florentino Cuéllar</Name><Description>Co-Author -- Stanford University and Supreme Court of California -- Mariano-Florentino Cuéllar is a Justice on the Supreme Court of California, the HermanPhleger Visiting Professor of Law at Stanford University, and a faculty affiliate at the Stanford Center for AI Safety. A Fellowof the Harvard Corporation, he also serves on the boards of the Hewlett Foundation, the American Law Institute, andthe Stanford Institute for Human-Centered Artificial Intelligence (HAI), and chairs the boards of the Center for AdvancedStudy in the Behavioral Sciences and AI Now. He received a J.D. from Yale Law School and a Ph.D. in political science fromStanford University and clerked for Chief Judge Mary M. Schroeder of the U.S. Court of Appeals for the Ninth Circuit.</Description></Stakeholder><Stakeholder StakeholderTypeType="Person"><Name>Ryan Azad</Name><Description>Contributor</Description></Stakeholder><Stakeholder StakeholderTypeType="Person"><Name>Jami Butler</Name><Description>Contributor</Description></Stakeholder><Stakeholder StakeholderTypeType="Person"><Name>Mikayla Hardisty</Name><Description>Contributor</Description></Stakeholder><Stakeholder StakeholderTypeType="Person"><Name>Alexandra Havrylyshyn</Name><Description>Contributor</Description></Stakeholder><Stakeholder StakeholderTypeType="Person"><Name>Luci Herman</Name><Description>Contributor</Description></Stakeholder><Stakeholder StakeholderTypeType="Person"><Name>Liza Starr</Name><Description>Contributor</Description></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Federal Administrative Agencies</Name><Description/></Stakeholder></Organization><Vision><Description>Artificial intelligence (AI) transforms how government agencies do their work.</Description><Identifier>_a9c449da-5f04-11ea-8ee8-a29b1183ea00</Identifier></Vision><Mission><Description>To spur careful thought about government innovation involving the appropriate use of AI tools while ensuring accountability in their acquisition and use.</Description><Identifier>_a9c44b42-5f04-11ea-8ee8-a29b1183ea00</Identifier></Mission><Value><Name>Innovation</Name><Description/></Value><Value><Name>Transparency</Name><Description>Transparency and Accountability -- Administrative law—the mix of constitutional and statutory law that governs howagencies do their work—is premised on transparency, accountability, and reason-giving.51When government takes action that affects rights, it must explain why. Yet many of thealgorithmic tools that federal agencies use to make and support public decisions are not,by their structure, fully explainable.52 The challenge is how to craft concrete legal andregulatory mechanisms for algorithmic tools that meaningfully fulfill transparency valuesand ensure fidelity to the agency’s legislative mandate and other legal commitments (e.g.,non-arbitrariness, non-discrimination, privacy).</Description></Value><Value><Name>Accountability</Name><Description/></Value><Goal><Name>Internal Capacity</Name><Description>Build Internal Capacity</Description><Identifier>_a9c44d04-5f04-11ea-8ee8-a29b1183ea00</Identifier><SequenceIndicator>1A</SequenceIndicator><Stakeholder><Name/><Description/></Stakeholder><OtherInformation>No agency can effectively deliver on its mission without access to the people, infrastructure, and organizational resources necessary to understand and respond to its environment. As policymakers and civil servants increasingly seek to rely on AI and algorithmic governance, a core challenge for agencies is how to generate the necessary technical capacity—the ability to identify, develop, responsibly use, and maintain complex technical solutions.Our report suggests that building internal capacity, rather than simply embracing a default practice of contracting out for technical capacity, will be crucial to realizing algorithmic governance’s promise and avoiding its perils.</OtherInformation><Objective><Name>Infrastructure &amp; Data</Name><Description>BUILD TECHNICAL INFRASTRUCTURE AND DATA CAPACITY</Description><Identifier>_a9c44e62-5f04-11ea-8ee8-a29b1183ea00</Identifier><SequenceIndicator>1A.1</SequenceIndicator><Stakeholder StakeholderTypeType="Organization"><Name>SSA</Name><Description>By one estimate, SSA has over 14 petabytes of data, but data is stored in roughly 200 separate databases. Linking, cleaning, and merging such data remains an ongoing process. Most of SSA’s supporting applications remain written in outdated (COBOL) programming language, stemming from initial development some 30 years ago. SSA is in a process of updating these applications into more modern languages, but such modernization is resource intensive, requiring, for instance, personnel trained in different generations of languages.9</Description></Stakeholder><OtherInformation>Because AI tools require complex software packages and computing power to process large datasets, agencies may have to upgrade legacy systems or integrate new systems with old ones.7 This is a challenge for agencies that excel, as one agency official facetiously put it, at “having the latest technology of the last decade.”8 ...Since all AI tools—whether supervised or unsupervised—are data-hungry, agencies must also invest in the necessary input data. Investing in data capacity requires addressing the interrelated challenges of data collection, data standardization, and data security.</OtherInformation></Objective><Objective><Name>Data Collection</Name><Description>Collect the right data.</Description><Identifier>_a9c44fc0-5f04-11ea-8ee8-a29b1183ea00</Identifier><SequenceIndicator>1A.2</SequenceIndicator><Stakeholder StakeholderTypeType="Organization"><Name>NHTSA</Name><Description>For example, NHTSA’s enforcement and vehicle safety research divisions seek to use AI/ML to model historical crash data for simulated testing of automated vehicles. But NHTSA may currently lack authority to compel manufacturers to produce crash data.15 The agency’s voluntary data collection mechanism16 captures only a fraction of the vast data that manufacturers generate.17Data collection poses related logistical challenges for agencies that rely on third-party data. Third-party data may be hard to obtain, incomplete, or unrepresentative due to selective or inaccurate reporting.18</Description></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Pharmaceutical Companies</Name><Description>For example, pharmaceutical companies may not want to share the most comprehensive clinical trial data on which the FDA could train its AI/ML.19</Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>FDA</Name><Description>At present, agencies like NHTSA and the FDA are encouraging third parties to voluntarily provide data.</Description></Stakeholder><OtherInformation>Deploying AI tools requires collecting the right data and enough of it. But before collecting data at scale, agencies may need to clarify their statutory and regulatory authority. A far-flung statutory fabric, including constitutional provisions,  federal, state, and local laws, defines government duties and obligations around data and includes transparency statutes such as the federal Freedom of Information Act and state law equivalents. At the federal level, the Privacy Act and amendments provide the closest to a comprehensive scheme for information practices.10 Among other things, agencies must, where possible, obtain data from individuals and may not use data for secondary purposes without consent.11 The law also significantly constrains the government’s ability to knit together datasets across agencies.12 Other pillars of the federal regime include the Paperwork Reduction Act, which constrains an agency’s ability to collect new data from the public,13 and the Information Quality Act, which constrains agencies’ ability to open-source data holdings to achieve transparency.14Agencies may also face data collection limitations due to lack of specific authority.</OtherInformation></Objective><Objective><Name>Data Standardization</Name><Description>Consider which data to standardize and at what stage.</Description><Identifier>_a9c4510a-5f04-11ea-8ee8-a29b1183ea00</Identifier><SequenceIndicator>1A.3</SequenceIndicator><Stakeholder StakeholderTypeType="Organization"><Name>FDA</Name><Description>As detailed in Part II, an alternative path at the FDA is to defer the agency’s NLP projects until it can obtain standardized, fit-for-purpose data.</Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>IRS</Name><Description>Some standardization issues arise from the data storage or submission medium. The IRS, for example, continues to process paper-filed tax returns that often contain missing information.20</Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>SSA</Name><Description>Even digital data may not be standardized. The SSA, as another example, processes unstructured digital text, such as paragraphs describing disability circumstances and non-uniform medical records maintained in PDF files.</Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>SEC</Name><Description>The SEC, too, struggles to compare companies in its centralized CIRA system because companies can use varying semantic tags or use incorrect tags.21</Description></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Regulated Entities</Name><Description>Many agencies face a trade-off between data depth and uniformity.22 In addressing data standardization challenges, agencies must consider which data they are willing to standardize and at what stage: at the collection phase—by “outsourcing” standardization to regulated entities—or at the processing phase by developing advanced tools that can standardize unstructured data.</Description></Stakeholder><OtherInformation>To be of any use, data must be in an appropriate format.  Different types of AI tools require different levels of data standardization, but standardization can pose significant barriers to virtually any AI deployment.</OtherInformation></Objective><Objective><Name>Data Security</Name><Description>Leverage technology to reconcile data sharing needs with data privacy concerns.</Description><Identifier>_a9c4525e-5f04-11ea-8ee8-a29b1183ea00</Identifier><SequenceIndicator>IA4</SequenceIndicator><Stakeholder StakeholderTypeType="Organization"><Name>Department of Veterans Affairs</Name><Description>Researchers at the Department of Veterans Affairs, for example, used cryptographic hashes to obscure lab results and other sensitive data in its partnership with Alphabet’s DeepMind unit.25</Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>Alphabet</Name><Description/></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>IRS</Name><Description>An official at the IRS similarly proposed employing Generative Adversarial Networks (GANs),26 </Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>CFTC</Name><Description>and the CFTC proposed anonymizing data to enable collaboration with market participants.27</Description></Stakeholder><OtherInformation>Data often comes with security requirements. The Federal Information Security Management Act, for example, requires agencies to develop data security programs, breach notification policies, and disposal routines,23 and then subjects them to civil suits for failures.24 Building data capacity requires agencies to address these requirements, typically by developing strict internal guidelines for the use and sharing of data that contains personal information. Agencies should also leverage technology to reconcile data sharing needs with data privacy concerns.</OtherInformation></Objective></Goal><Goal><Name>Staff Capacity</Name><Description>BUILD INTERNAL STAFF CAPACITY</Description><Identifier>_a9c453bc-5f04-11ea-8ee8-a29b1183ea00</Identifier><SequenceIndicator>1B</SequenceIndicator><Stakeholder StakeholderTypeType="Generic_Group"><Name>Software Engineers</Name><Description>Software engineers, especially those outside the agency, may lack the insights or training necessary to faithfully translate law into code. While in-house production may strain project budgets and introduce recruitment challenges, building internal staff capacity may yield tools that are better tailored to the relevant task and legal requirements.</Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>SSA</Name><Description>Several agencies have already demonstrated the value of embedded expertise. As detailed in Part II, the SSA developed NLP tools to identify potential errors in draft disability determinations as a result of a multi-year strategy to hire and then repurpose lawyers with technical skillsets. 29 This strategy helped facilitate an iterative design process in which system architects could readily work back and forth between code choices and legal, policy, and organizational considerations.</Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>IRS</Name><Description>The Internal Revenue Service’s (IRS) development of algorithmic enforcement tools similarly illustrate the value of in-house, embedded expertise in automating tasks that are inherently dynamic.</Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>SEC</Name><Description>As Part II’s case study of the SEC noted, enforcement agencies must engage in continuous, iterative updating of their AI tools as enforcers unearth new modes of wrongdoing. 30</Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>OPM</Name><Description>The Competitive Service hiring process also constrains recruitment,31 although the Office of Personnel Management (OPM) has taken steps to ease hiring burdens for technical positions, including by establishing a “data scientist” classification.32 OPM also recently established a governmentwide “direct hire” appointing authority for a variety of STEM positions and for all IT positions for agencies that can demonstrate “the existence of a severe shortage of candidates or critical hiring need.”33</Description></Stakeholder><OtherInformation>An agency’s AI tools must be both usable and compliant. As to usability, optimal design and deployment will often depend on a deep understanding of the problem an algorithmic tool seeks to solve, an ability to convince skeptical agency staff to utilize the tool, and a user-friendly interface that eases that pitch. And as to compliance, algorithmic tools themselves encode legal and policy choices, some of which will be subject to judicial review.28</OtherInformation><Objective><Name>Stability &amp; Balance</Name><Description>Offer job stability and work-life balance.</Description><Identifier>_a9c45524-5f04-11ea-8ee8-a29b1183ea00</Identifier><SequenceIndicator>1B.1</SequenceIndicator><Stakeholder><Name/><Description/></Stakeholder><OtherInformation>Federal agencies seeking to build internal technical capacity must grapple with budgetary and other human resource constraints. In addition to overall budget caps, civil service laws capping allowable salaries can price government agencies out of the technical labor markets. Agencies can offer job stability and work-life balance, whereas technology companies incentivize talent by offering competitive salaries or stock options.</OtherInformation></Objective></Goal><Goal><Name>Strategy &amp; Sandboxes</Name><Description>INVEST IN AI STRATEGY AND “SANDBOXES”</Description><Identifier>_a9c456aa-5f04-11ea-8ee8-a29b1183ea00</Identifier><SequenceIndicator>1C</SequenceIndicator><Stakeholder><Name/><Description/></Stakeholder><OtherInformation>Deploying AI technology requires agencies to invest in comprehensive strategies to test, evaluate, update, and retire AI tools. An important part of this strategic process is articulating metrics for measuring the success of innovations that align with the agency’s risk-profile and level of comfort with failure. Agencies should also develop testing “sandboxes” that allow for failure and iterative evaluation of new governance tools and, for agencies that regulate private-sector AI, a testing infrastructure that can help guide regulated entities.</OtherInformation><Objective><Name>Evaluation Metrics</Name><Description>Develop metrics for measuring the success of AI tools.</Description><Identifier>_a9c45826-5f04-11ea-8ee8-a29b1183ea00</Identifier><SequenceIndicator>1C.1</SequenceIndicator><Stakeholder><Name/><Description/></Stakeholder><OtherInformation>In advance of deployment, agencies should develop metrics for measuring the success of AI tools. These evaluation metrics should be tied to the agency’s broader mission, rather than focused purely on efficiency or return on investment.34Correspondingly, agencies should establish a process for “returning to the drawing board” when tools fail to satisfy these metrics. Given the dynamic nature of AI/ML models, these metrics should also guide subsequent evaluations and decisions about when to refine or retire a given tool.35 Frontline enforcers may provide ongoing feedback on models.36</OtherInformation></Objective><Objective><Name>Testing &amp; Risk</Name><Description>Develop a comfort level with technological failure.</Description><Identifier>_a9c45aa6-5f04-11ea-8ee8-a29b1183ea00</Identifier><SequenceIndicator>1C.2</SequenceIndicator><Stakeholder StakeholderTypeType="Organization"><Name>FDA</Name><Description>Agencies like the FDA must maintain a relatively low risk tolerance: Failing to detect adverse postmarket effects of a pharmaceutical can have critical public health consequences...For example, at the FDA, “INFORMED has created a unique sandbox for networking, ideation and sharing of technical and organizational resources, empowering project teams with the tools needed to succeed in developing novel data science solutions.”42 These sandboxes, moreover, can signal minimumstandards for AI and help regulated entities “de-risk” their development decisions. The proliferation of guidance and reports can also serve this goal. In the context of cybersecurity, the FDA has provided some guidance on what the agency expects to see in premarket submissions, including certain specific design features and cybersecurity design controls.43</Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>IRS</Name><Description>By contrast, the IRS has continued to experiment with technology despite low accuracy rates.38 ... Although it began over twenty years ago, the IRS’s Compliance Data Warehouse established a foundation that is enabling the agency to consider more complex AI applications moving forward.41Similarly, agencies that regulate AI deployments in the private sector should also build regulatory “sandboxes.”</Description></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Medical Device Manufacturers</Name><Description>Recent approval of several AI-included devices along with the release of a discussion paper on its plans to regulate AI/ML-based software as a medical device,44 serve to provide additional guidance to manufacturers.</Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>MITRE</Name><Description>Further, the FDA, in conjunction with MITRE, released a report entitled Medical Device Cybersecurity: Regional Incident Preparedness and Response Playbook.45</Description></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Health Care Delivery Organizations</Name><Description>The FDA explained that the report can serve ”as a customizable tool for health care delivery organizations to aid in their preparedness and response activities for medical device cyber incidents.”46</Description></Stakeholder><OtherInformation>“Sandbox” testing and regulatory infrastructures -- To build technical capacity, agencies will likely have to develop a comfort level with technological failure—and this may be easier for some agencies than for others.37 ...Risk-taking is crucial to developing successful tools, and agencies seeking to employ AI must be willing to fail.39 As many agencies found, initial efforts and failures create a “supersized sandbox”—a playground for developing futureAI applications and learning important lessons. Agencies should structure projects to allow some margin of error and treat failures not as losses but as opportunities to share lessons across the agency.40 </OtherInformation></Objective></Goal><Goal><Name>Accountability</Name><Description>LINK CAPACITY TO ACCOUNTABILITY</Description><Identifier>_a9c45d4e-5f04-11ea-8ee8-a29b1183ea00</Identifier><SequenceIndicator>2</SequenceIndicator><Stakeholder StakeholderTypeType="Organization"><Name>FDA</Name><Description>However, some agencies are more likely than others to incorporate accountability and transparency by design—with agencies such as the FDA and NHTSA being more incentivized, given that both are subject to a high potential of judicial review and public scrutiny.</Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>NHTSA</Name><Description/></Stakeholder><OtherInformation>Building internal expertise and technical capacity may also be essential to accountability and building trust.47 The scholarly literature may be moving away from individual, privately enforced rights as the best way to achieve accountability in favor of “accountability by design.”48 Kroll et al. offer a catalog of tools that engineers can incorporate into algorithmic systems to facilitate evaluation and testing.49 This “accountability by design” trend links to longstanding calls among administrative law scholars for agencies to develop an “internal law of administration” distinct from—and often more effective than—externally imposed accountability.50</OtherInformation><Objective><Name>Bridges</Name><Description>BRIDGE TRANSPARENCY AND ACCOUNTABILITY</Description><Identifier>_a9c45ede-5f04-11ea-8ee8-a29b1183ea00</Identifier><SequenceIndicator>2.1</SequenceIndicator><Stakeholder><Name/><Description/></Stakeholder><OtherInformation>Subjecting algorithmic decision systems to meaningful accountability poses two main challenges: achieving transparency into a tool’s workings, and then selecting the best regulatory mechanism for translating that information into desired compliance.</OtherInformation></Objective><Objective><Name>Transparency</Name><Description>Achieve transparency into a tool’s workings.</Description><Identifier>_dfdd7290-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>2.1.1</SequenceIndicator><Stakeholder><Name>System Engineers</Name><Description>Even a system’s engineers may not understand how it arrived at a particular result or be able to isolate the data features that drove the model’s prediction.</Description></Stakeholder><OtherInformation>The gold standard of transparency in any decision-making context is a full account of a decision’s “provenance,” including its inputs, outputs, and the main factors that drove it.53 The problem, as Part II noted, is that machine learning models are often inscrutable...Algorithmic outputs are also often nonintuitive in that the data relationships they surface may not map to any common-sense understanding of how the world works. Even full disclosure of a system’s source code and data and an opportunity to observe its operation “in the wild” will not necessarily facilitate either insight or accountability.54Two approaches to transparency have begun to emerge in response to these concerns.</OtherInformation></Objective><Objective><Name>Mixed Modes</Name><Description>Mix modes of explanation to achieve desired transparency.</Description><Identifier>_dfdd815e-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>2.1.1.1</SequenceIndicator><Stakeholder><Name/><Description/></Stakeholder><OtherInformation>One camp focuses on how to mix modes of explanation to achieve desired transparency. For instance, an incomplete accounting of a particular decision can be supplemented by a “system-level” accounting of the tool that made it,55 including data descriptions,56 modeling choices,57 and general descriptions of factors that drive the model’s predictions.58</OtherInformation></Objective><Objective><Name>Simplification</Name><Description>Simplify models to make them more parseable.</Description><Identifier>_dfdd909a-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>2.1.1.2</SequenceIndicator><Stakeholder><Name/><Description/></Stakeholder><OtherInformation>A second camp advocates simplification of models to make them more parseable.59 These measures might take the form of a ceiling on the number of data features used or outright bans on particular tools (e.g., facial recognition) or particular models, such as powerful “deep learning” techniques that generate more accurate predictions but are often less interpretable.60</OtherInformation></Objective><Objective><Name>Regulation</Name><Description>Select the best regulatory mechanism for translating that information into desired compliance.</Description><Identifier>_dfdd966c-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>2.1.2</SequenceIndicator><Stakeholder StakeholderTypeType="Generic_Group"><Name>Regulatory Architects</Name><Description>Here regulatory architects have numerous options. They can choose mechanisms that promote legal accountability (e.g., judicial review of agency action) or political accountability (e.g., public ventilation through notice and comment or mandatory agency-conducted “impact assessments”61).</Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>FDA</Name><Description>They can also opt for “hard” rules (e.g., prohibitions on certain models,a licensing or certification requirement prior to use akin to FDA drug approvals, or liability rules that allow injured parties to recover damages), “soft” rules (e.g., impact assessments designed to ventilate concerns about algorithmic tools but confer no substantive rights),62 or something in between (e.g., notice, consent, correction, and erasure rights like those given data subjects in the European Union’s General Data Protection Regulation63 or the U.S. Fair Credit Reporting Act64).</Description></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Public Enforcers</Name><Description>If hard rules are chosen, regulatory designers can choose to delegate enforcement authority to public enforcers, including, as some advocate, an “FDA for AI,”65 or to private enforcers deputized to sue in court or incentivized via whistleblower bounty schemes.66 Finally, regulatory architects can opt for ex ante regulation before a model runs—think once again of an FDA-style pre-certification scheme or prohibitions on uses or model types—or ex post regulation of results, as with lawsuits seeking damages.67</Description></Stakeholder><OtherInformation>Even where AI systems can be made transparent, there remains the challenge of choosing regulatory mechanisms that can translate that transparency into meaningful accountability.</OtherInformation></Objective><Objective><Name>Principles</Name><Description>Establish working premises that can frame the possibilities and limits of competing approaches.</Description><Identifier>_a9c4608c-5f04-11ea-8ee8-a29b1183ea00</Identifier><SequenceIndicator>2.2</SequenceIndicator><Stakeholder><Name/><Description/></Stakeholder><OtherInformation>DESIGN PRINCIPLES -- For the moment, no single best solution from this menu of options has emerged. However, Part II’s in-depth case studies, by showcasing a wide range of AI-based governance tools, help establish some working premises that can frame the possibilities and limits of competing approaches.</OtherInformation></Objective><Objective><Name>Accountability &amp; Efficacy</Name><Description>Reveal trade-offs between accountability and efficacy.</Description><Identifier>_dfdd99c8-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>2.2.1</SequenceIndicator><Stakeholder StakeholderTypeType="Organization"><Name>SEC</Name><Description>As just one example, requiring the SEC to deploy a less sophisticated but more interpretable algorithmic tool in making enforcement decisions may make it easier for regulated parties or agency overseers to evaluate the tool’s workings but may also bring substantial costs, subjecting regulated parties to undue prosecutions and wasting scarce agency resources in the process. Here and elsewhere, interpretability may come only at the cost of efficacy.69</Description></Stakeholder><OtherInformation>First, consideration of actual use cases reveals hard trade-offs between accountability and efficacy. Imposing constraints on model choices—by, for example, limiting the number of data features or prohibiting more sophisticated modeling approaches—trades off interpretability against a tool’s analytic power and, thus, its usefulness.68</OtherInformation></Objective><Objective><Name>Tasks &amp; Interests</Name><Description>Consider the pros and cons of transparency by governance task and the rights and interests at issue.</Description><Identifier>_dfdd9b94-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>2.2.2</SequenceIndicator><Stakeholder StakeholderTypeType="Generic_Group"><Name>Veterams</Name><Description>One might conclude, for instance, that disability or veterans’ benefits determinations are too important to risk erroneous determinations and, in any event, present a lower risk of gaming by beneficiaries.</Description></Stakeholder><OtherInformation>Second, the pros and cons of transparency will often vary by governance task and the rights and interests at issue. In the enforcement context, public disclosure of the “internals” of an algorithmic enforcement tool can impair or defeat the tool’s utility by facilitating evasion and gaming by regulated parties—an issue we explore in more detail later in Part III’s section on “Adversarial Learning.” In certain mass adjudicatory contexts, by contrast, full open-sourcing of algorithmic tools might make sense as an accountability measure.</OtherInformation></Objective><Objective><Name>Administrative Law</Name><Description>Reckon with the existing structure of administrative law.</Description><Identifier>_dfdd9fea-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>2.2.3</SequenceIndicator><Stakeholder StakeholderTypeType="Generic_Group"><Name>Courts</Name><Description>This is problematic because the doctrine of constitutional avoidance—which holds that courts should avoid ruling on constitutional issues in favor of other grounds—means that much, or even most, of the hard work of regulating algorithmic governance tools will come not in the constitutional clouds but rather in the streets of administrative law.72</Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>Supreme Court</Name><Description>So-called “reviewability” doctrines in administrative law offer a compelling example. Current administrative law, as exemplified by the Supreme Court’s Heckler v. Chaney decision, insulates agency enforcement decisions from judicial review except where Congress has clearly specified a standard for the agency’s exercise of discretion or where an agency has wholly “abdicated” its enforcement duties.73</Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>Congress</Name><Description/></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Judges</Name><Description>The reasons are many, but the main anxiety is about judges’ ability to reconstruct or evaluate specific enforcement decisions, which often rest on subtle judgments about how best to allocate scarce agency resources.Interestingly, algorithmic enforcement tools may make these reviewability concerns worse or better. On one hand, the black box nature of machine learning tools may further obscure agency enforcement decisions, strengthening the rationale for hiving off those decisions from judicial review. Something very near the opposite, however, may also result.</Description></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Agencies</Name><Description>By allowing agencies to formalize and make explicit organizational priorities, algorithmic tools have the potential to render enforcement decision-making somewhat more tractable than the dispersed human judgments of enforcement staff. For instance, if appropriately balanced with the need for a degree of confidentiality of agency enforcement goals, code may help provide the missing “focal point” for judicial evaluation of agency enforcement decisions and rebut the current doctrine’s presumption against reviewability. Moreover, because algorithms encode legal principles and agency priorities, they perform regulatory work and so may qualify as “rules” under administrative law, thus requiring mandatory ventilation via the notice-and-comment process or exposing them to pre-enforcement judicial review. The counter-intuitive result is that continued proliferation of algorithmic enforcement tools may, on net, yield an enforcement apparatus that is more transparent and less opaque than the current system.74Reviewability only scratches the surface of ways that the administrative law will modulate federal agency use of AI tools. Administrative law may also need to adapt in determining whether agency decisions made or supported by an algorithmic tool are “arbitrary and capricious.” Courts will thus grapple once more with whether such review is a matter of light-touch review75 or deeper “hard look” review.76 And, as we explore in more detail elsewhere in Part III, agency use of AI-based tools to support adjudication raises distinct legal questions relating to hearing rights and due process.</Description></Stakeholder><OtherInformation>Third, efforts to build effective accountability systems will have to reckon with the existing structure of administrative law. To date, much academic debate has focused, at a high level of abstraction, on procedural due process under the Fifth and Fourteenth Amendments to the United States Constitution.70 Far less work explores the more fine-grained statutory requirements of administrative law and, even then, offers mostly a surface-level tour of potentially applicable doctrines.71</OtherInformation></Objective><Objective><Name>Limitations</Name><Description>Understand the limits of administrative law in achieving algorithmic accountability.</Description><Identifier>_dfdda2a6-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>2.2.4</SequenceIndicator><Stakeholder StakeholderTypeType="Generic_Group"><Name>Judges</Name><Description>It does little good to give judges transparency into an algorithmic system’s “internals” if they lack the technical understanding necessary to make sense of it.</Description></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Citizens</Name><Description>The same is true of ordinary citizens who are the objects of algorithmic decisions.</Description></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Engineers</Name><Description>If engineers cannot understand a system’s outputs, then there is little reason to believe that less technically trained actors can do any better.</Description></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>SEC</Name><Description>Actionable transparency can also falter when data and algorithms change dynamically. For instance, the SEC’s supervised learning model for Form ADV disclosures is trained on past referrals to the SEC’s enforcement arm, but the pool of referrals grows over time, with different human input for each referral. This means that each model may be distinct. A model reviewed at one stage (during the notice-and-comment process) may already be substantively different upon deployment. Conversely, problematic predictions at one point (a specific enforcement decision) might vanish as the model is updated. By their nature, the notice-and-comment process and APA-type judicial proceedings are static and may not generate the information required to understand an algorithm in action.</Description></Stakeholder><OtherInformation>Fourth, looking across concrete use cases underscores administrative law’s potential limits in achieving algorithmic accountability. Meaningful accountability must be built upon actionable transparency. </OtherInformation></Objective><Objective><Name>Data &amp; Disclosure Laws</Name><Description>Understand that administrative law works in tandem with an array of data and disclosure laws that, at least in their current form, can sharply limit transparency</Description><Identifier>_dfdda490-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>2.2.5</SequenceIndicator><Stakeholder StakeholderTypeType="Organization"><Name>SSA</Name><Description>In the SSA context, individual data is protected under the Privacy Act of 1974.77</Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>SEC</Name><Description>Similarly, the raw disclosures that serve as inputs for the SEC’s enforcement tools are publicly available, but data from prior investigations—that is, the filings that triggered elevated review—are likely protected under the Freedom of Information Act’s exemption for law enforcement purposes.78</Description></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Contractors</Name><Description>Finally, a contractor-provided algorithmic tool’s technical guts may be protected by patent, copyright, or trade secrecy laws,79 and government use of the tool provides no further right of disclosure.80</Description></Stakeholder><OtherInformation>Finally, administrative law works in tandem with an array of data and disclosure laws that, at least in their current form, can sharply limit transparency.</OtherInformation></Objective><Objective><Name>Reforms</Name><Description>Consider whether to retrofit existing accountability frameworks or mint new ones.</Description><Identifier>_a9c46230-5f04-11ea-8ee8-a29b1183ea00</Identifier><SequenceIndicator>2.3</SequenceIndicator><Stakeholder><Name/><Description/></Stakeholder><OtherInformation>CONCRETE REFORM IDEAS -- Given these challenges, judges, agency administrators, and legislators will face difficult questions about whether to retrofit existing accountability frameworks or mint new ones.</OtherInformation></Objective><Objective><Name>Retrofitting &amp; Reinterpretation</Name><Description>Retrofit or, to the extent feasible, reinterpret the APA to enable prudent ex ante review of algorithmic tools through the notice-and-comment process and/or judicious ex post review by courts.</Description><Identifier>_dfdda8fa-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>2.3.1</SequenceIndicator><Stakeholder StakeholderTypeType="Organization"><Name>Congress</Name><Description>On the latter, ex post side, Congress or courts may wish to relax the presumption against reviewability of enforcement decisions under Heckler v. Chaney. 81</Description></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Courts</Name><Description/></Stakeholder><OtherInformation>A minimalist option would retrofit or, to the extent feasible, reinterpret the APA to enable prudent ex ante review of algorithmic tools through the notice-and-comment process and/or judicious ex post review by courts.</OtherInformation></Objective><Objective><Name>Triggers</Name><Description>Set new triggers for when an algorithmic tool is subject to notice and comment.</Description><Identifier>_dfddab98-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>2.3.2</SequenceIndicator><Stakeholder StakeholderTypeType="Generic_Group"><Name/><Description/></Stakeholder><OtherInformation>On the ex ante side, an amended APA could set new triggers for when an algorithmic tool is subject to notice and comment. One could peg notice and comment to whether staff use of the tool is mandatory or voluntary as a crude proxy for how much the tool displaces human discretion. A more technical approach would key notice and comment to the numerical threshold the tool establishes. For example, an enforcement tool that flags potential violators as “high risk” necessarily sets a probability threshold from 0 to 1. The higher the threshold, the greater the risk that human discretion is displaced.82 The chosen threshold also fixes the relative number of false negatives and false positives to be expected. As a result, the choice of threshold cannot be made without weighing the social costs of each type of error—precisely where public participation via notice and comment may be most useful.</OtherInformation></Objective><Objective><Name>Oversight Board</Name><Description>Create an AI oversight board.</Description><Identifier>_dfddaddc-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>2.3.3</SequenceIndicator><Stakeholder StakeholderTypeType="Generic_Group"><Name/><Description/></Stakeholder><OtherInformation>Given the limitations of ex ante and ex post review under the APA, a more comprehensive institutional solution would be to create an AI oversight board, either within each agency or as a freestanding agency with oversight over all other agencies. Staffed with technologists, lawyers, and agency representatives, an oversight board could be tasked with monitoring, investigating, and recommending adjustments to agency adoption and use of AI.83</OtherInformation></Objective><Objective><Name>Benchmarking</Name><Description>Require agencies to engage in prospective benchmarking.</Description><Identifier>_dfddb476-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>2.3.4</SequenceIndicator><Stakeholder StakeholderTypeType="Organization"><Name>SSA</Name><Description>In the SSA context, for instance, the Insight system could be deactivated for a random hold-out set and each case adjudicated in analog fashion.</Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>SEC</Name><Description>In the SEC context, investigators could be required to investigate a subset of cases without the aid of risk scores.</Description></Stakeholder><OtherInformation>A third possibility would be to require agencies to engage in prospective “benchmarking”84—that is, to create random hold-out sets to compare AI-assisted outcomes and human (status quo) decision-making... Benchmarking would provide a practical test of a tool’s facial validity, smoking out bias and arbitrariness and enabling agencies, courts, and the public to meaningfully assess the impact of AI use cases.Benchmarking would also generate new training data and provide a check on procurement-provided tools.</OtherInformation></Objective><Objective><Name>Expertise &amp; Supervision</Name><Description>Embed expertise and internal agency supervision.</Description><Identifier>_a9c463e8-5f04-11ea-8ee8-a29b1183ea00</Identifier><SequenceIndicator>2.4</SequenceIndicator><Stakeholder><Name/><Description/></Stakeholder><OtherInformation>LINKING ACCOUNTABILITY TO CAPACITY -- While these are promising reforms, it is worth noting that formal accountability frameworks are not the only way to ensure responsible agency use of algorithmic governance tools. Internal agency supervision and embedded expertise can also be a powerful source of accountability. As noted in the previous section on capacity building, embedded expertise facilitates “accountability by design” in which agency technologists proactively design and maintain systems that are more transparent and auditable and less arbitrary and biased not as a response to legal or other external threats, but as a matter of good government, good engineering, and professional ethics.</OtherInformation></Objective></Goal><Goal><Name>Bias</Name><Description>Develop agency-level mechanisms to detect, monitor, and correct for bias.</Description><Identifier>_dfddb854-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>3</SequenceIndicator><Stakeholder StakeholderTypeType="Organization"><Name/><Description/></Stakeholder><OtherInformation>Bias, Disparate Treatment, and Disparate Impact -- The administrative state’s growing adoption of AI tools risks compounding biases against vulnerable groups. If biases go unchecked, agency tools will only deepen existing inequities and also likely run afoul of anti-discrimination law. Yet, many proposed solutions to combat bias would themselves violate other core legal principles, such as equal protection. In short, agencies can find themselves in a bind. Given these challenges, it is critical that agency administrators, legislators, judges, and academics devote more attention to developing agency-level mechanisms to detect, monitor, and correct for bias, as well as appropriate legal regimes to govern them.</OtherInformation><Objective><Name>Evidence</Name><Description>Consider the evidence of bias.</Description><Identifier>_dfddbb2e-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>3.1</SequenceIndicator><Stakeholder><Name/><Description/></Stakeholder><OtherInformation>EMERGING EVIDENCE OF BIAS -- It is well-documented that AI tools have the potential to exacerbate bias against vulnerable groups. Three lessons have emerged from a rapidly developing literature on fairness and machine learning.</OtherInformation></Objective><Objective><Name>Encoding</Name><Description>Consider the potential for machine learning to encode bias.</Description><Identifier>_dfddc218-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>3.1.1</SequenceIndicator><Stakeholder StakeholderTypeType="Generic_Group"><Name>African-Americans</Name><Description>Criminal risk assessment scores, for instance, may exhibit higher “false positive rates” (wrongly classifying individuals as “high risk”) for African-American than white individuals.86</Description></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Women's College Graduates</Name><Description>An NLP-based engine for job applicants may score applicants who graduated from women’s colleges more poorly, because of the existing demographic composition of the work force.87</Description></Stakeholder><OtherInformation>First, the potential for machine learning to encode bias is significant.85 </OtherInformation></Objective><Objective><Name>Fairness</Name><Description>Consider divergent notions of fairness.</Description><Identifier>_dfddc524-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>3.1.2</SequenceIndicator><Stakeholder><Name/><Description/></Stakeholder><OtherInformation>Second, while many potential approaches to “fair machine learning” have been proposed, a basic challenge is that divergent notions of fairness can be mutually incompatible.88 In the presence of underlying differences between demographic groups, for instance, it is not possible to simultaneously equalize false positive rates, false negative rates, and predictive parity across groups.</OtherInformation></Objective><Objective><Name>AI versus Humans</Name><Description>Consider how AI-assisted decisions fare compared to human decisions.</Description><Identifier>_dfddc740-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>3.1.3</SequenceIndicator><Stakeholder><Name/><Description/></Stakeholder><OtherInformation>Third, critical questions remain as to how AI-assisted decisions fare compared to human decisions, given that human decisions are themselves often the origin of bias.89</OtherInformation></Objective><Objective><Name>Risk</Name><Description>Consider the risk of bias across the administrative state.</Description><Identifier>_dfddcc0e-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>3.2</SequenceIndicator><Stakeholder StakeholderTypeType="Organization"><Name>Internal Revenue Service</Name><Description>The Internal Revenue Service, for instance, developed a Return Revenue Program (RRP) to detect fraudulent refunds.</Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>Federal Bureau of Prisons</Name><Description>This RRP program uses a wide range of sources, including data from the Federal Bureau of Prisons and prison systems in all states.91</Description></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Subgroups</Name><Description>Such record linkage poses a risk of disparate impact on subgroups, although it remains hard to assess in the abstract. To illustrate, consider a similar setting that has been subject to more examination. The Allegheny Family Screening Tool for child welfare relies extensively on record linkage of administrative data from means-tested programs.</Description></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Poor People</Name><Description>Eubanks argues that the system hence relies on data for the poor that it does not observe for the wealthy (e.g., private drug treatment, mental health counseling).</Description></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Children</Name><Description>The effect is that it disproportionately rates the poor as “high risk” of child welfare placements.92</Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>PTO</Name><Description>In the PTO’s prior art search, a machine learning model trained on historical labeled data may replicate the tendency by patent examiners to neglect non-patent literature.94</Description></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Patent Examiners</Name><Description/></Stakeholder><OtherInformation>THE POTENTIAL FOR BIAS IN USE CASES BY THE ADMINISTRATIVE STATE-- Our case studies corroborate this risk across the administrative state. The sources of such bias can be varied.</OtherInformation></Objective><Objective><Name>Representation</Name><Description>Consider that training data may be unrepresentative of the population of interest. </Description><Identifier>_dfddced4-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>3.2.1</SequenceIndicator><Stakeholder StakeholderTypeType="Organization"><Name>CBP</Name><Description>Facial recognition technology that has been trained disproportionately on lighter skin tones, for instance, may be significantly less accurate for darker skinned individuals,90 potentially introducing bias into CBP’s reliance on facial recognition.</Description></Stakeholder><OtherInformation>First, training data may be unrepresentative of the population of interest. </OtherInformation></Objective><Objective><Name>Skew</Name><Description>Consider skew toward certain demographic groups.</Description><Identifier>_dfddd0f0-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>3.2.2</SequenceIndicator><Stakeholder StakeholderTypeType="Generic_Group"><Name/><Description/></Stakeholder><OtherInformation>Second, a number of use cases rely on linking different administrative datasets together, and coverage may skew toward certain demographic groups...</OtherInformation></Objective><Objective><Name>Replication</Name><Description>Consider the possibility that AI systems may simply replicate existing bias in human decisions.</Description><Identifier>_dfddd5e6-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>3.2.3</SequenceIndicator><Stakeholder StakeholderTypeType="Generic_Group"><Name/><Description/></Stakeholder><OtherInformation>Third, some systems may simply replicate existing bias in human decisions. If agencies used a predictive model for which comments are likely relevant, for instance, such decisions may simply encode existing agency tendencies to rely on lengthier documents, written in non-vernacular, submitted by legal counsel.93</OtherInformation></Objective><Objective><Name>Response</Name><Description>Grapple with bias.</Description><Identifier>_dfddd960-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>3.3</SequenceIndicator><Stakeholder><Name/><Description/></Stakeholder><OtherInformation>THE WAY FORWARD -- Grappling with such forms of bias will be a significant undertaking for federal agencies adopting AI/ML.</OtherInformation></Objective><Objective><Name>Standards &amp; Methods</Name><Description>Engage with evolving standards and methods for assessing the potential for bias in machine learning.</Description><Identifier>_dfdddb90-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>3.3.1</SequenceIndicator><Stakeholder><Name/><Description/></Stakeholder><OtherInformation>First, the emerging consensus within machine learning is that “blinding” algorithms to protected characteristics is unlikely to be effective. As the feature set (i.e., the number of variables in the model) grows, protected characteristics, such as race and gender, can be inferred with extremely high probability.95Formal blindness can be functional discrimination.Researchers have hence argued that “fairness-through-awareness,” not blinding, will be a more promising approach to ensure fairness.96 Yet because there are no consensus measures for fairness, government agencies will have to increasingly engage with evolving standards and methods for assessing the potential for bias in machine learning and such judgments may be highly domain-specific.</OtherInformation></Objective><Objective><Name>Disparate Impact</Name><Description>Consider disparate impact.</Description><Identifier>_dfdde054-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>3.3.2</SequenceIndicator><Stakeholder StakeholderTypeType="Organization"><Name>Supreme Court</Name><Description>The Supreme Court has not clarified the operation of those principles specifically in the context of machine learning, but its affirmative action cases illustrate the tension.</Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>University of Michigan</Name><Description>In the affirmative action context, the Supreme Court held that the University of Michigan law school’s consideration of race in “individualized” admissions was constitutional,97 but held that the practice of awarding 20 points on a 150-point scale for underrepresented minorities in undergraduate admissions violated equal protection.98 “[I]ndividualized consideration,” the Court noted, “demands that race be used in a flexible, nonmechanical way.”99 Machine learning, however, challenges this doctrinal distinction. Is an algorithm that uses 1,000 features, including a protected attribute, “individualized” or is it “mechanical”? Is the mere use of the point scale problematic, or is it about the relative weight of protected characteristics?</Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>L.A. Water &amp; Power</Name><Description>In L.A. Water &amp; Power v. Manhart, the Supreme Court found that the use of gender in calculating pension plan contributions violated equal protection, despite the actuarial gender difference in longevity.100</Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>Wisconsin Supreme Court</Name><Description>In State v. Loomis, the Wisconsin Supreme Court did not find a due process violation when gender was used in a criminal risk assessment score, finding that the “use of gender promotes accuracy that ultimately inures to the benefit of the justice system.”101</Description></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>States</Name><Description>Due to the doctrinal uncertainty, states and localities using criminal risk assessment scores remain split in whether they rely on gender.102</Description></Stakeholder><OtherInformation>Second, the rise of AI decision tools will increasingly challenge conventional principles of antidiscrimination law. As noted, protected characteristics can be inferred with high likelihood as the feature set (of unprotected characteristics) grows. This challenges the anticlassification principle, which posits that the law should not classify individuals based on protected attributes (e.g., gender and race). Similarly, the rise of AI/ML tools will test doctrinal frameworks of narrow tailoring and “individualized” consideration under the Equal Protection Clause...To the extent that the machine learning literature calls for awareness of protected attributes to promote fairness, it is on a collision course with equal protection doctrine.Even if an algorithm passes constitutional muster, it is unclear how administrative law will grapple with claims of disparate impact. Litigants may claim that the adoption of an algorithmic decision tool causes disparate impact across demographic groups and that the failure to address and explain such consequences is arbitrary and capricious. Yet whether courts will entertain such claims and how courts weigh the fairness-accuracy trade-off remains an open question. The D.C. Circuit, for instance, has held that disparate impact arguments may not be brought under the APA when Title VI of the Civil Rights Act—then assumed to provide a private right of action—provides an alternative adequate remedy.103 Since that decision, the Supreme Court held that there was no private right of action under Title VI, but no court has explicitly considered whether that opens the door to disparate impact claims under the APA. Mounting evidence of the potential for disparate impact with AI decision tools will put pressure on courts to grapple with this gap.104</OtherInformation></Objective><Objective><Name>Protocols</Name><Description>Establish systematic protocols for assessing the potential for an AI tool to encode bias.</Description><Identifier>_dfdde342-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>3.3.3</SequenceIndicator><Stakeholder StakeholderTypeType="Organization"><Name>FDA</Name><Description>Might FDA’s adverse event reporting system be driven by reporting bias along demographic groups, say due to differences in access to health care?105</Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>SSA</Name><Description>If SSA builds out its expedited review program using electronic health records, does that advantage certain types of applicants who are more likely to have access to health care providers with interoperable electronic health record systems?</Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>PTO</Name><Description>Would neural network models deployed by the PTO actually fail to capture temporal drift and, as a result, disadvantage path-breaking research by smaller entrepeneurs?106</Description></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Federal Administrative Agencies</Name><Description>The upshot here, as earlier, is that developing internal capacity to rigorously evaluate, monitor, and assess the potential for disparate impact will be critical for trustworthy deployment of AI in federal administrative agencies.107</Description></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Administrators</Name><Description>Fortunately, administrators, technologists, legislators, and judges can draw from the rapidly emerging literature on bias in machine learning to proactively assess the potential for bias.</Description></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Technologists</Name><Description/></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Legislators</Name><Description/></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Judges</Name><Description/></Stakeholder><OtherInformation>Third, no agency examined in this report has established systematic protocols for assessing the potential for an AI tool to encode bias. While some application areas (e.g., facial recognition) present obvious risks, the need for such protocols may be even greater for use cases where bias is less obvious...In sum, the rise of algorithmic decision-making raises novel and important questions about disparate impact...Efforts will need to be focused on developing the appropriate institutional mechanisms for detecting, monitoring, and correcting for bias in AI decision tools.</OtherInformation></Objective></Goal><Goal><Name>Hearing Rights</Name><Description>Provide affected parties the opportunity to submit evidence in-person or on paper.</Description><Identifier>_dfdde586-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>4</SequenceIndicator><Stakeholder StakeholderTypeType="Organization"><Name>SSA</Name><Description>The role of hearing rights cuts across adjudicatory contexts, from formal adjudication at the SSA to more informal patent decisions at the PTO and enforcement decisions at the SEC, IRS, or CMS.</Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>PTO</Name><Description/></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>SEC</Name><Description/></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>IRS</Name><Description/></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>CMS</Name><Description/></Stakeholder><OtherInformation>Hearing Rights &amp; Algorithmic Governance -- Much of the decision-making of modern administrative government comes after a “hearing.” Such hearings provide affected parties the opportunity to submit evidence in-person or on paper to a decision-maker, often an administrative judge or other agency employee. An array of laws dictates the form these hearings take, among them the default requirements of the APA, agency enabling acts, agency regulations, and the Constitution’s Due Process Clause. In some contexts, the procedural bundle is meager: An agency need only rise above the floor set by due process by providing advance notice of the decision and a brief opportunity to be heard.108 In others, constitutional and statutory mandates require an administrative approximation of a full-dress trial, with rules dictating who can participate, the types of evidence that can be considered, record-keeping requirements, appeal rights, and restrictions on ex parte contacts.109 How will the rise of AI decision tools alter the form and function of these hearings and how should administrative law adapt in response? ...This section makes three points about the future of hearing rights in the face of the AI revolution. First, while the most optimistic version of AI tools may improve accuracy and efficiency of adjudicatory decisions, such tools may also expose trade-offs in normative values underpinning hearing rights. Second, we articulate how procedural due process and statutory hearing rights may need to adapt if AI tools proliferate. A core challenge in the near-term will be crafting legal and institutional vehicles to detect and address systemic sources of error in light on the current structure of individualized decision-making. Third, the rise of AI tools in adjudication potentially raises longer-term, foundational questions: Do due process and statutory hearing rights imply a right to a human decision-maker? And what role is left for hearing rights in a world in which legal and regulatory mandates are crafted, adjudicated, and enforced with increasingly limited human involvement?</OtherInformation><Objective><Name>Promises &amp; Perils</Name><Description>Consider the promises and perils of algorithmic adjudication.</Description><Identifier>_dfddea72-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>4.1</SequenceIndicator><Stakeholder StakeholderTypeType="Organization"><Name>SSA</Name><Description>On the latter, one need look no further than the significant backlogs at the SSA and PTO, which can delay desperately needed disability benefits and innovation-spurring intellectual property protections.The promise of AI is that it may cut the Gordian knot of this accuracy/efficiency trade-off by making possible efficiency gains without reductions in accuracy and vice versa. Some AI-based decision tools may even yield simultaneous improvements in both, yielding better decisions and at lower cost. The SSA’s tool for clustering like cases, for instance, potentially enables adjudicators to work through cases more quickly and more equitably, improving the consistency of decision making. Similarly, the SSA’s “easy grant” identification tool routes easy cases to staff-level decision-makers for rapid resolution so that administrative judges can focus their energies on more difficult and demanding cases. These and other AI-based tools profiled in Part II’s case studies might finally crack the code of mass adjudication, improving accuracy while shrinking the inter-judge decision disparities and backlogs that have long plagued a wider range of agencies.If AI tools are indeed able to solve the quantity-quality trade-off, they may also make room for other adjudicatory values.As noted in the SSA chapter, AI tools might help reclaim a part of constitutional due process that has been in part sidelined in modern jurisprudence: the dignity interests of the parties. By eliminating rote and repetitive tasks, AI might free adjudicators to focus on procedural fairness: to engage parties more extensively, to issue tentative orders, and to explain the complex legal provisions to affected parties. A long line of research establishes that individuals may perceive a process as more legitimate if afforded a voice.110 Dignitary interests may have value independent of accuracy.111Despite such optimistic glosses, AI-based tools also raise significant concerns.</Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>PTO</Name><Description/></Stakeholder><OtherInformation>THE PROMISE AND PERIL OF ALGORITHMIC ADJUDICATION -- Conventional wisdom holds that due process poses an accuracy-efficiency trade-off. Adding procedures can improve a decision’s accuracy by ensuring close consideration of a wider range of evidence and subjecting arguments and evidence to more robust and often adversarial testing. But process is also costly. Importantly, procedure’s costs are both social (e.g., the resources required to operate the system) and individual (e.g., the costs to parties in real resources and delay).</OtherInformation></Objective><Objective><Name>Discretion &amp; Independence</Name><Description>Consider the displacement of adjudicator discretion and independence.</Description><Identifier>_dfdded74-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>4.1.1</SequenceIndicator><Stakeholder StakeholderTypeType="Organization"><Name>SSA</Name><Description>Adjudicators at the SSA may review cases solely to pass Insight quality flags, progressively ignoring errors that evade automated detection.</Description></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Administrative Judges</Name><Description>Machine predictions might allow an administrative judge to readily compare her inclination to that of others, threatening notions of decisional independence.</Description></Stakeholder><OtherInformation>First, AI tools displace adjudicator discretion and independence, potentially draining the system of its deliberative and adaptive capacities.112 Importantly, displacement of discretion can occur even where manual review nominally remains. One reason is automation bias—i.e., the over-reliance of decision-makers on automated predictions, even when such deference is unreasonable and mistaken.113 Faced with rigid quotas, patent examiners may be unwilling to expend additional effort to second-guess AI-prioritized search results...And algorithmic search tools may diminish engagement with the record, functionally undermining de novo review. All of these dynamics can stifle the emergence of exceptions and the dynamic, iterative effort to conform legal mandates to changing circumstances.</OtherInformation></Objective><Objective><Name>Ignorance</Name><Description>Consider the possibility that adjudicators may simply ignore AI tools.</Description><Identifier>_dfddf01c-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>4.1.2</SequenceIndicator><Stakeholder StakeholderTypeType="Generic_Group"><Name>Adjudicators</Name><Description/></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Decision-Makers</Name><Description>Under-reliance on algorithmic tools may be particularly likely when decision-makers are field experts, as is the case with administrative judges.115</Description></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Administrative Judges</Name><Description>And because administrative judges may vary in their receptiveness to AI tools and in their willingness to review machine outputs or deviate from recommended results, inter-judge decision disparities and high reversal rates may persist.</Description></Stakeholder><OtherInformation>Second, adjudicators may simply ignore AI tools. Such aversion to algorithms erodes the accuracy and efficiency gains of automation, even where human decision-making may be demonstrably inferior.114 ...A key challenge then is to build decision tools that complement, rather than substitute for, human decision-making—i.e., human-centered AI.</OtherInformation></Objective><Objective><Name>Mistakes</Name><Description>Mitigate the risk that algorithmic systems may simply get things wrong,</Description><Identifier>_dfddf576-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>4.1.3</SequenceIndicator><Stakeholder><Name/><Description/></Stakeholder><OtherInformation>Last, algorithmic systems may simply get things wrong, eroding decision quality under a false veneer of efficiency gains. Statutory interpretation and implementation are open-ended and difficult tasks. Algorithmic outputs might deviate from the statutory mandate or prove non-policy-compliant.And, as we describe next, current systems are ill-suited to detecting such sources of systemic error.</OtherInformation></Objective><Objective><Name>Adaptation</Name><Description>Adapt hearing rights to harness AI’s positive potential while mitigating its costs.</Description><Identifier>_dfddf896-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>4.2</SequenceIndicator><Stakeholder StakeholderTypeType="Generic_Group"><Name>Judges</Name><Description>For now, we make several points that can help guide judges, administrators, and legislators in adapting the current system.</Description></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Administrators</Name><Description/></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Legislators</Name><Description/></Stakeholder><OtherInformation>GET HEARING RIGHTS RIGHT -- As AI-based decision tools proliferate, how can hearing rights adapt to harness AI’s positive potential while mitigating its costs? Administrative hearings come in myriad policy contexts and, as already noted, the procedures that apply in each take many forms. The optimal mix of procedural rights may vary significantly across contexts. </OtherInformation></Objective><Objective><Name>Errors</Name><Description>Identify and remedy systemic sources of error.</Description><Identifier>_dfddfb16-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>4.2.1</SequenceIndicator><Stakeholder StakeholderTypeType="Person"><Name>Danielle Citron</Name><Description>Specific to the algorithmic context, Danielle Citron argues that there is an additional doctrinal challenge: The Supreme Court’s longstanding test for procedural due process, which requires courts to focus on only the case at hand and weigh the private interest, the government interest, and the likely value of additional process, may neglect the fact that algorithmic tools are designed to operate at scale. 117 Lost in case-level balancing is the possibility that a one-time but costly increase in procedural scrutiny of an algorithmic tool can yield massive social benefits across the thousands or millions of cases to which the tool is applied.118</Description></Stakeholder><OtherInformation>First, the current system of hearing rights fits awkwardly with the most pressing challenge raised by algorithmic decision tools: identifying and remedying systemic sources of error. Part of the challenge is inherent to the structure of individualized hearing rights. A single judicial challenge to agency decision-making may correct a specific error, but such challenges are unlikely to surface and remediate entrenched pathologies within the system.116</OtherInformation></Objective><Objective><Name>Rulemaking &amp; Adjudication</Name><Description>Consider the foundational distinction between rulemaking and adjudication.</Description><Identifier>_dfde011a-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>4.2.2</SequenceIndicator><Stakeholder StakeholderTypeType="Organization"><Name>Supreme Court</Name><Description>In Heckler v. Campbell, the Supreme Court affirmed the statutory authority of the SSA to decide common issues in adjudications via rulemaking.121</Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>SSA</Name><Description>Despite the fact that the Social Security Act requires “individualized determinations based on evidence adduced at a hearing,” the Court held that the act “does not bar the Secretary from relying on rulemaking to resolve certain classes of issues.”122 As AI systems become more sophisticated, a key question will be when they function as “legislative rules” that have “binding effects” on the agency and regulated parties, triggering notice-and-comment rulemaking. Was SSA required to undergo notice-and-comment for its QDD system?And if so, should it have been required to disclose more of the underlying feature set and model? Answers to these questions are easier for top-down expert-based AI systems (if-then rules).</Description></Stakeholder><OtherInformation>Second, AI-based decision tools may progressively scramble the foundational distinction between rulemaking and adjudication under the APA and Constitution. For adjudication, procedural due process and applicable statutes safeguard the interests of a single person or a small group of affected people.119 For rulemaking, the Constitution requires little and the APA requires only a general level of public participatory engagement when a rule is addressed to a large class of people with common circumstances.120 ...But modern machine learning systems are “bottom-up” in that they construct rules based on learned associations from prior decisions. Whether the system has a binding effect hence depends empirically on (a) the level of adherence to the rule, and (b) the extent to which models prospectively adapt. Such adaptation also makes it more challenging ex ante to disclose the nature of the decision system in contrast to a decision tree from an expert-based system.</OtherInformation></Objective><Objective><Name>Accuracy</Name><Description>Think about more appropriate legal and institutional vehicles to challenge the accuracy not only of individual decisions, but also of algorithms.</Description><Identifier>_dfde044e-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>4.2.3</SequenceIndicator><Stakeholder StakeholderTypeType="Generic_Group"><Name>Judges</Name><Description>Going forward, judges, administrators, and legislators will need to think about more appropriate legal and institutional vehicles to challenge the accuracy not only of individual decisions, but also of algorithms. The APA’s interpretation of a binding rule may need to be pegged to the degree to which human discretion is displaced or, alternatively put, the degree to which a human remains “in the loop.”</Description></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Agencies</Name><Description>Agencies will need to experiment with the best ways to surface, investigate, and debug potential errors when adjudicators and affected parties suspect such errors.</Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>SSA</Name><Description>Such mechanisms appear to be lacking currently. In its audit of the Insight system, for instance, the SSA’s Office of Inspector General surveyed adjudicators and 20% indicated that the flagged errors were inaccurate and 35% reported that there was no method for submitting feedback where improvement was necessary.123 </Description></Stakeholder><OtherInformation>Third, as technology advances, parties may petition agencies to adopt such systems. Forms of pure internal agency management are typically seen to escape notice and comment and judicial review, but as AI systems become increasingly powerful, parties might challenge the failure to adopt an AI-based system as arbitrary and capricious or as violating due process...The broader scholarship suggests that appeal rights alone are unlikely to provide a full solution, so other institutional and managerial solutions, such as quality control programs, audits, oversight, and external review, are well worth piloting and evaluating.</OtherInformation></Objective><Objective><Name>Implications</Name><Description>Consider the implications for hearing rights.</Description><Identifier>_dfde06e2-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>4.3</SequenceIndicator><Stakeholder StakeholderTypeType="Organization"><Name>SSA</Name><Description>The SSA Insight system’s ability to spot errors in draft decisions and arm administrative judges with raw materials, including agency non-acquiescence decisions, point to a world in which decision-making becomes more fully automated.</Description></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Administrative Judges</Name><Description>While administrative judges and other adjudicators may balk at full automation, some interview subjects seek tools that can build a “decisional shell” around a case by gathering factual and legal materials to which an adjudicator can then more efficiently apply her human discretion. Yet even decisional shells will displace human discretion based on editorial judgments about which legal issues, and which materials, are and are not relevant. These tools may be different in degree, not in kind.</Description></Stakeholder><OtherInformation>HEARING RIGHTS INTO THE FUTURE -- The academic literature is replete with references to “robojudges”124 and even an eventual state of “legal singularity,”125 when machines can perfectly predict the outcomes of cases before they are filed. Only slightly less futuristic are predictions that the law will steadily transform into a “vast catalogue of precisely tailored laws,” or “microdirectives,”126 that adjust in real-time—for instance, an individualized speed limit for a given driver with a given amount of experience operating in specific driving conditions—and are enforced via automatic penalties.127These possibilities may seem far-fetched in the current moment, but the more limited tools profiled in this report do gesture toward the longer technological horizon...While it may be far away, fully automated decision-making raises rich, and existential, questions to the American legal system, built around participatory rights and adversarialism.Does the notion of due process imply the right to a human decision-maker?128 Full automation promises “a fast and refined prediction of the relevant legal effect”129 and thus achieves one of the highest purposes of law, but may drain the law’s capacity to adapt and to ventilate legal rules though dialogue and debate in fully public interpretive exercises.130Something may be lost when the process of enforcing collective value judgments about right conduct plays out in server farms rather than as part of a prolonged and often messy deliberative and adjudicatory process, even where the machine-driven version proves perfectly accurate.131 These debates are well beyond the scope of this report. But the tools profiled herein suggest it is not too early to start them.</OtherInformation></Objective></Goal><Goal><Name>Gaming &amp; Adversarial Learning</Name><Description>Address risk of adversarial learning and gaming by regulated parties.</Description><Identifier>_dfde0c46-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>5</SequenceIndicator><Stakeholder StakeholderTypeType="Generic_Group"><Name>Regulated Parties</Name><Description/></Stakeholder><OtherInformation>A challenge that cuts across growing agency reliance on algorithmic governance tools is the risk of adversarial learning and gaming by regulated parties.</OtherInformation><Objective><Name>Variables &amp; Cutoffs</Name><Description>Guard against the manipulation of variables or cutoffs.</Description><Identifier>_dfde0f98-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>5.1</SequenceIndicator><Stakeholder StakeholderTypeType="Organization"><Name>PTO</Name><Description>As a concrete illustration, consider how adversaries might exploit the PTO’s tools to adjudicate applications, as described in Part II. These tools help classify patent and trademark applications according to the PTO’s taxonomy, as well as search for “prior art” and visually similar trademarks.</Description></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Patent Applicants</Name><Description>Patent applicants have long tweaked their applications to try to obtain a desired classification, pushing their application to a unit with higher grant rates. Machine learning magnifies these opportunities for gaming.</Description></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Adversaries</Name><Description>For example, adversaries could manipulate images in their patent applications to include random noise, which has been shown to dupe leading machine learning models into mis-classifying images.135Adversaries could thereby divert their applications to units more likely to rule in their favor, undermining the fairness and accountability of the underlying algorithm.</Description></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Regulated Parties</Name><Description>Where algorithms are known to rely on particular variables or cutoffs, regulated parties can manipulate those variables and the values they take in order to secure a desirable result from the system.</Description></Stakeholder><OtherInformation>THE RISK OF GAMING AND ADVERSARIAL LEARNING -- Whenever the government brings greater transparency to previously discretionary decisions, those decisions become more gameable, with parties adjusting their behavior to maximize their chances at a favorable outcome. Algorithmic governance is no exception...“Adversarial machine learning,” or the use of machine learning to fool algorithmic models, only exacerbates this inherent risk. 132 With simpler forms of adversarial machine learning, adversaries can, for instance, exploit algorithmic tools to obtain favorable determinations, without changing the underlying characteristic the algorithm is designed to measure.133 At the extreme, regulatory targets can even gain access to the tool itself and feed it new data to corrupt its outputs.134</OtherInformation></Objective><Objective><Name>Efficacy &amp; Political Support</Name><Description>Consider the implications of gaming and adversarial learning for the efficacy of algorithmic governance tools as well as political support for their use.</Description><Identifier>_dfde1240-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>5.2</SequenceIndicator><Stakeholder StakeholderTypeType="Generic_Group"><Name>Gamers</Name><Description>Gamers, depending on one’s perspective on automation, range “from parasitic, to benign, to downright noble.”137 Gaming opportunities can also be deliberately built into a system to avoid unduly regressive policies, promote redistribution, or otherwise blunt the force of rigid regulatory regimes. Some have suggested that lax tax enforcement of the cash economy is one such example, where gaming in fact serves potentially desirable redistributive ends.138</Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>SEC</Name><Description>Consider, for example, the SEC’s Form ADV Fraud Predictor, which aims to identify bad apple investment brokers and subject them to greater regulatory scrutiny.</Description></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Investment Brokers</Name><Description/></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Regulated Parties</Name><Description>Regulated parties with knowledge of that tool’s inner workings can adversarially craft their disclosures; they can include or omit key language in order to foil the system’s classifier and keep their personnel off the SEC’s radar.This type of gaming poses profound distributive concerns.Better-heeled and more sophisticated regulated individuals and entities may have the time, resources, or know-how to navigate or even reverse-engineer algorithmic systems and then take the evasive actions necessary to yield positive determinations and avoid adverse ones.</Description></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Computer Scientists</Name><Description>As noted in Part II’s profile of the SEC’s enforcement tools, larger and better-resourced firms with a deeper bench of computer scientists and quantitative analysts may prove better able than smaller ones to reverse-engineer algorithmic enforcement tools and avoid regulatory action.</Description></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Quantitative Analysts</Name><Description/></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Contractors</Name><Description>These distributive concerns can be amplified by contractor conflicts. Government contractors may seek to monetize or exploit their relationship to algorithmic tools for financial gain in other business relationships. Given that contractors are responsible for roughly 30% of AI/ML use cases, these concerns are grave.</Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>PTO</Name><Description>To take but one example, the company that provides the PTO’s classification tool also sells services to patent applicants, advertising its PTO experience as one of its major assets.139 Yet because not all parties can afford such services, better-resourced companies and individuals will be better able to game the system, whether to obtain government benefits or avoid regulatory scrutiny.These distributive challenges may politicize the use of AI/ML tools over time.</Description></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Regulatory Have Nots</Name><Description>While regulatory “haves” may welcome government uptake of algorithmic tools if they believe they are better-equipped to game them or that the new tools will yield enforcement against a more diverse set of regulatory targets,140 the “have nots,” including the poor but also more middling segments of society, may not support a more efficient and effective algorithm-wielding government if they believe they will disproportionately shoulder its burdens.</Description></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Citizens</Name><Description>Indeed, some initial research suggests that citizens tend to rate algorithmic decision-making negatively compared to the status quo.141 Support for government innovation can evaporate quickly if it is perceived as unfairly wielded.</Description></Stakeholder><OtherInformation>THE EFFECT ON AGENCIES AND ALGORITHMIC SYSTEMS -- Gaming and adversarial learning have profound implications for the efficacy of algorithmic governance tools as well as political support for their use.At the outset, it is worth noting that gaming can sometimes be salutary. While gamers are often self-serving—that is, seeking to maximize their take or minimize their loss within an algorithmic system—they need not be.136 ...That said, gaming often reduces the accuracy and efficacy of algorithmic systems.</OtherInformation></Objective><Objective><Name>Mindfulness</Name><Description>Be mindful in developing and deploying algorithmic tools.</Description><Identifier>_dfde17f4-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>5.3</SequenceIndicator><Stakeholder StakeholderTypeType="Generic_Group"><Name>Programmers</Name><Description>For example, programmers can increase model complexity, reconfigure models periodically, and/or add randomness, all of which will make models harder to game.142 They can also build models that rely on immutable traits, which regulated parties cannot readily change.143 And they can use generative adversarial networks, training new tools against hostile adversaries that seek to fool them, which will make the algorithm less susceptible to attack in the long run.144But these measures, while making AI/ML tools harder to game, also come at a cost. They risk making models less interpretable to regulated parties (let alone the average citizen), reducing transparency and accountability. </Description></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Agencies</Name><Description>Another reform would have agencies impose sanctions to encourage compliance with underlying regulatory procedures.</Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>PTO</Name><Description>For example, the PTO sanctions parties who breach duties of disclosure, candor, and good faith. To be effective, however, this approach requires strong mechanisms to detect regulatory violations, which can prove expensive and difficult to implement.</Description></Stakeholder><OtherInformation>THE WAY FORWARD -- Given these challenges, administrative agencies need to be mindful in developing and deploying algorithmic tools. Architects of algorithmic models must consider whether and how to design their models to minimize opportunities for gaming and adversarialism...Ultimately, none of these reforms is a panacea. As administrative agencies develop algorithmic tools, they must balance the risk of gaming against other public values, including transparency, efficacy, and distributive concerns. Sometimes, agencies must tolerate gaming and adversarialism in service of a more transparent, more effective algorithmic system. In other cases, the right answer may be to create no algorithm at all, especially if it would lead to an expensive arms race of machine learning tools, without ultimately improving efficacy or citizen confidence in the system.</OtherInformation></Objective></Goal><Goal><Name>External Sourcing</Name><Description>Address the benefits and costs of external sourcing over internal sourcing and flesh out some of the trade-offs.</Description><Identifier>_dfde1b64-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>6</SequenceIndicator><Stakeholder StakeholderTypeType="Generic_Group"><Name/><Description/></Stakeholder><OtherInformation>The External Sourcing Challenge: Contractors and Competitions -- Part I’s canvass of federal agency use of AI identified 157 use cases. While more than half of these (53%) were developed in-house by agency technologists, nearly as many came from external sources, with one-third (33%) coming from private commercial sources via the procurement process and a further, non-trivial proportion (14%) resulting from non-commercial collaborations, including agency-hosted competitions and government-academic partnerships. This roughly even split between internal and external sourcing suggests that each approach has significant advantages and disadvantages that agency personnel must weigh when developing AI-based tools. This section focuses on the benefits and costs of external sourcing over internal sourcing and fleshes out some of the trade-offs agencies face when choosing between them.</OtherInformation><Objective><Name>Internal Production</Name><Description>Consider the benefits of internal agency production of AI tools.</Description><Identifier>_dfde1e3e-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>6.1</SequenceIndicator><Stakeholder StakeholderTypeType="Person"><Name>Kurt Glaze</Name><Description>An apt illustration, as described in detail previously, is the Insight tool internally developed at the Social Security Administration by Kurt Glaze, the attorney-turned-programmer. In designing that system, Glaze specifically designed the error flags that can be raised in draft decisions based upon “the flags that [he would have] wanted to have available as an adjudicator.”147</Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>Social Security Administration</Name><Description/></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Policymakers</Name><Description>Importantly, colocation of policymakers and technologists can matter even where an agency opts to make its own tools.</Description></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Technologists</Name><Description/></Stakeholder><Stakeholder StakeholderTypeType="Person"><Name>James Ridgway</Name><Description>James Ridgway, who helped oversee the Board of Veteran’s Appeals Caseflow project, ensured that the staff of the U.S. Digital Service would remain on site to avoid “deliver[ing] a system two years later that no one [would] use.”148</Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>Board of Veteran’s Appeals</Name><Description/></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>U.S. Digital Service</Name><Description/></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>SEC</Name><Description>Embedded technical expertise may also be necessary to automate tasks that are dynamic and changeable. For example, algorithmic enforcement tools like those deployed by the SEC use classifiers trained on past enforcement actions to “shrink the haystack” of current violators and direct the attention of line-level enforcement staff. But as noted in Part II, the misconduct those tools target is rarely static. Embedded expertise facilitates the continuous, iterative updating of algorithmic enforcement tools necessary to incorporate new modes of wrongdoing unearthed by agency staff and avoid an undue focus on past forms of misconduct.</Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>IRS</Name><Description>Finally, internal agency development of AI tools limits leakage of information about a tool’s technical and operational details that can undermine its utility. Here again, the enforcement tools under development and in use at the SEC, IRS, CMS, and EPA provide a compelling illustration because of their potential vulnerability to being reverse-engineered and evaded through adversarial learning.</Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>CMS</Name><Description/></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>EPA</Name><Description/></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>DHS</Name><Description>Another example is DHS’s facial recognition system, which attackers might, with access to technical details about the system, be able to trick into incorrectly matching an innocent face with the no-fly list or permitting an individual on the no-fly list to escape detection. Even relatively simple attacks, as noted previously, can defeat the most advanced algorithmic systems.149Leakage of a tool’s technical and operational details facilitates those attacks.</Description></Stakeholder><OtherInformation>THE “MAKE” DECISION REVISITED -- Sourcing decisions, as noted in the earlier section on capacity-building challenges, reflect the basic make-or-buy choice that agencies often face when performing governance tasks.145 An agency can either hire and train personnel and assemble the raw materials needed to perform government tasks, or it can contract through the procurement process to buy them.146As described previously, internal agency production of AI tools requires substantial agency technical capacity but can also yield a range of benefits. Advantages of internal sourcing include tools that are better-tailored, more policy compliant, and more accountable.</OtherInformation></Objective><Objective><Name>External Production</Name><Description>Consider the benefits and costs of of external production.</Description><Identifier>_dfde241a-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>6.2</SequenceIndicator><Stakeholder StakeholderTypeType="Generic_Group"><Name>Private Sector</Name><Description>One reason is that the private sector is not burdened by the compensation and hiring limitations that restrict the pool of talent that government agencies can tap.</Description></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Government Agencies</Name><Description>Budget constraints, civil service laws capping allowable salaries, and political sensitivities mean that government agencies may be priced out of labor markets for employees with advanced technical skillsets.</Description></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Agency Leaders</Name><Description>Further, agency leadership may not prioritize technological innovation.</Description></Stakeholder><Stakeholder StakeholderTypeType="Person"><Name>Gerald Ray</Name><Description>Gerald Ray, a longtime Administrative Appeals Judge at the SSA who eventually became deputy executive director of the Office of Appellate Operations (OAO), worked around limitations on hiring technologists by identifying attorneys skilled in data analysis and computer science in order to develop the agency’s AI toolkit.</Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>OPM</Name><Description>While the Office of Personnel Management (OPM) has since established a “data scientist” classification, thus easing the hiring burdens for technical positions, compensation caps and other limitations remain.150</Description></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Contractors</Name><Description>Because contractors typically operate at a remove from agency operations, external sourcing can also impose heavy monitoring and transaction costs. Where monitoring costs are low, as with well-specified services like garbage collection, the government can gain from the efficiency and expertise of the private sector. Where monitoring costs are high, however, and the governance tasks at issue involve significant discretion, private contractors may have incentives to engage in strategic corner-cutting, thus systematically degrading quality.153 Profit-motivated contractors may also be less likely to ground key design and implementation decisions in public values like transparency and non-discrimination than civil servants as a matter of professional identity.154 In the AI context, technically complex but standardized tasks, such as consolidating databases and upgrading computer infrastructure,155 may prove more amenable to external contracting than the design and maintenance of enforcement tools, where the need for tailoring and updating is greater and consideration of public values are thought to be more salient.</Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>PTO</Name><Description>In the patent and trademark context, the same contractor that produced the PTO’s classification tool advertises its experience supporting the agency in order to sell its services to patent applicants.156 This raises the potential for conflicts of interest and deliberate leakage of information about governance tools. At the same time, because contractors seek to maximize commercial gain, they also face incentives to cloak the technical and operational details of AI tools by invoking intellectual property and trade-secret protections.</Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>DHS</Name><Description>As just one example, the DHS reported that it could not explain the failure rates of iris scanning technology due to the “proprietary technology being used.”157 This example underscores both the potential accountability costs of procurement-generated AI tools and also the importance of developing and maintaining a baseline level of internal technical capacity even when an agency chooses to buy AI tools.</Description></Stakeholder><OtherInformation>THE “BUY” DECISION: PROS AND CONS -- While internal sourcing has many virtues, the benefits of external production are also significant. First, external sourcing may yield more technically sophisticated tools...Differences across the public and private sector can also make externally sourced governance tools cheaper than internally sourced ones. A long academic literature concludes that the private sector will often produce goods and services at lower cost because of a better-incentivized workforce and tighter managerial control.151 Government-side employment constraints again loom large, including limits on hiring and firing and an inability to offer incentive-based compensation.While external sourcing has numerous benefits, its drawbacks are also significant. Some of these are merely the flip-side of internal sourcing’s advantage in generating well-tailored, policy compliant, and accountable tools.Algorithmic enforcement tools, as just noted, may require frequent updating to maximize efficacy in ways that generic commercialized AI systems, and the often protracted back-and-forth of the procurement process, are ill-suited to provide.152 ...Finally, external sourcing of algorithmic governance tools raises significant conflict-of-interest concerns...In sum, usability may militate in favor of internal capacity building. Privately produced, procurement-generated tools may boast the most cutting-edge analytics, but may also be less tailored to the task at hand, be less attuned to legal requirements and an agency’s bureaucratic realities, and do not necessarily come with ongoing and regular engagement between technologists and agency enforcement staff. In contrast, in-house production may strain agency budgets, but will yield governance tools that are, on average, better tailored to subtle governance tasks, more law- and policy-compliant, more attuned to complex organizational dynamics, and less subject to information leakage and conflicts of interest that can reduce a tool’s efficacy and raise significant distributive concerns.158</OtherInformation></Objective><Objective><Name>Noncommercial Collaborations &amp; Competitions</Name><Description>Pursue noncommercial collaborations and competitions.</Description><Identifier>_dfde285c-5f5d-11ea-9cae-4db22783ea00</Identifier><SequenceIndicator>6.3</SequenceIndicator><Stakeholder StakeholderTypeType="Generic_Group"><Name>Professional Associations</Name><Description>Collaborations with professional associations, academe, and NGOs allow the government to leverage mutually beneficial relationships and gain access to external talent and expertise while maintaining control and monitoring quality.</Description></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Academe</Name><Description/></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>NGOs</Name><Description/></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>FDA</Name><Description>Examples of successful non-commercial collaborations are growing and include: the FDA’s partnerships to address cybersecurity risk;160 </Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>NHTSA</Name><Description>NHTSA’s use of IMB’s Watson to process and respond to safety complaints;161</Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>IBM</Name><Description/></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>Stanford</Name><Description>Stanford’s partnership with the EPA;162</Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>EPA</Name><Description/></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>VA</Name><Description>the VA’s partnership with Google’s DeepMind to protect personal information and thus permit data-sharing;163</Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>Google</Name><Description/></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>FDA</Name><Description>and the FDA’s “regulatory science” with Johns Hopkins, MIT, Stanford, and Harvard.164 The FDA is also “exploring the use of a neutral third party [to] collect large annotated imaging data sets for purposes of understanding the performance of a novel AI algorithm.”165</Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>Johns Hopkins</Name><Description/></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>MIT</Name><Description/></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>Harvard</Name><Description/></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>NHTSA</Name><Description/></Stakeholder><OtherInformation>A THIRD WAY: NON-COMMERCIAL COLLABORATIONS -- While the make-or-buy choice clearly entails significant trade-offs, a third approach may be gaining momentum: noncommercial collaborations and competitions.159 Questions of scalability remain, but this third approach highlights the potential for government to realize the benefits of make and buy while avoiding some of the costs of each...Government-sponsored competitions, which leverage the public’s ideas and talent around declared government priorities, often with prize money attached, are a potentially valuable source of innovation and an increasingly prevalent part of the capacity-building landscape.166 Through the use of prize money, public recognition, and even follow-up contracting work, government can leverage the public’s talent to generate and prototype ideas.167 While there is relatively little empirical or theoretical work on the subject, the benefits seem clear: incentivizing innovation while maximizing return by only rewarding success.168At the same time, of the 28 competitions documented in Part I, half showed no public evidence of government adoption or intended adoption of technology created in the competition, raising doubts about their usefulness. Moreover, while competitions have grown exponentially from $247,000 in prize money awarded in FY2011 to over $30 million in FY2016, this amount remains small in comparison to the trillions in annual government outlays, raising questions about whether competitions can sufficiently scale to meet agency needs.169Finally, competition-generated tools are sometimes criticized as interstitial and small-bore. They may not substitute for a comprehensive automation strategy and, as with tools generated through the traditional procurement process, be insufficiently attuned to the complexities of tasks or organizational environments.Finally, agencies can collaborate with each other, pooling scarce resources to tackle parallel technical challenges.Preliminary examples of such partnerships include the FDA and DHS’ announced cybersecurity “memorandum of agreement … for greater coordination and cooperation … for addressing cybersecurity in medical devices.”170 The FDA’s analysis of drug adverse event reports is remarkably similar in objectives to NHTSA’s identification of trends in consumer complaints, raising the prospect of cross-agency technical collaboration through a central team building shared AI infrastructure.171 Agencies like the FDA and SSA are consolidating technical expertise and self-assessing their technical infrastructure in order to improve technical performance, and a new proposed bill has been introduced to promote innovation and develop AI governance government-wide.172</OtherInformation></Objective></Goal></StrategicPlanCore><AdministrativeInformation><StartDate>2020-02-29</StartDate><EndDate/><PublicationDate>2020-03-05</PublicationDate><Source>https://www-cdn.law.stanford.edu/wp-content/uploads/2020/02/ACUS-AI-Report.pdf</Source><Submitter><GivenName>Owen</GivenName><Surname>Ambur</Surname><PhoneNumber/><EmailAddress>Owen.Ambur@verizon.net</EmailAddress></Submitter></AdministrativeInformation></PerformancePlanOrReport>
