Overview of Approach

IMPACC Monitoring Framework

Development of Final Measure Sets

We developed a rapid, consensus-based process to establish health system-wide AI monitoring measures, piloting it with our system-wide AI scribe implementation. The goal was to create an initial set of feasible metrics, or metrics focused on immediate priorities such as safety, equity, and workflow impact. First, a literature review identified measure concepts previously evaluated in studies. Using the UCSF IMPACC AI Monitoring Metrics Framework, which builds on the HSS Trustworthy AI Playbook1, we identified key domains to prioritize for AI scribe monitoring metrics and mapped measures from the literature and additional measures to the framework. Next, we conducted a modified Delphi consensus panel with AI officers, informaticists, data scientists, clinicians, and researchers to rate the importance and feasibility of each measure on a 1-9 scale. Panelists also added new concepts as needed. Measures with high importance (≥7) and at least moderate feasibility (≥5) advanced to specification. Interdisciplinary groups then detailed sampling frames, data sources, and analytic strategies. The full group met again to review and finalize the measures before system-wide implementation.

Want more information?

To access measure sets and specifications or additional information regarding the measure set development process described please reach out via email.

AI Monitoring Existing Evidence

References
Adler-Milstein J, DeMasi O, Soleimani H, Beck S, Byron ME, Oates A, Thombley R, Yazdany J, Murray SG. Subjective and Objective Impacts of Ambulatory AI Scribes. Am J Manag Care. 2026;32(1):In Press
Holmgren AJ, Fenton CL, Thombley R, Soleimani H, Croci R, DeMasi O, Byron ME, Murray SG, Adler-Milstein J, Yazdany J. Ambient Artificial Intelligence Scribes and Physician Financial Productivity. JAMA Netw Open. 2026;9(1):e2553233. doi:10.1001/jamanetworkopen.2025.53233
Kim JY, Hasan A, Kellogg KC, Ratliff W, Murray SG, Suresh H, Valladares A, Shaw K, Tobey D, Vidal DE, Lifson MA, Patel M, Raji ID, Gao M, Knechtle W, Tang L, Balu S, Sendak MP. Development and preliminary testing of Health Equity Across the AI Lifecycle (HEAAL): A framework for healthcare delivery organizations to mitigate the risk of AI solutions worsening health inequities. PLOS Digit Health. 2024 May 9;3(5):e0000390. doi: 10.1371/journal.pdig.0000390. PMID: 38723025; PMCID: PMC11081364.
Rotenstein LS, Wachter RM. Are Artificial Intelligence-Generated Replies the Answer to the Electronic Health Record Inbox Problem? JAMA Netw Open. 2024 Oct 1;7(10):e2438528. doi: 10.1001/jamanetworkopen.2024.38528. PMID: 39401042.
Nong P, Adler-Milstein J, Apathy NC, Holmgren AJ, Everson J. Current Use And Evaluation Of Artificial Intelligence And Predictive Models In US Hospitals. Health Aff (Millwood). 2025 Jan;44(1):90-98. doi: 10.1377/hlthaff.2024.00842. PMID: 39761454.
Adler-Milstein J, Redelmeier DA, Wachter RM. The Limits of Clinician Vigilance as an AI Safety Bulwark. JAMA. 2024 Apr 9;331(14):1173-1174. doi: 10.1001/jama.2024.3620. PMID: 38483397.
Adler-Milstein J, Chen JH, Dhaliwal G. Next-Generation Artificial Intelligence for Diagnosis: From Predicting Diagnostic Labels to "Wayfinding". JAMA. 2021 Dec 28;326(24):2467-2468. doi: 10.1001/jama.2021.22396. PMID: 34882190; PMCID: PMC12049696.