When is generative AI safe?
How to define safety in generative AI
Last year, tech companies started developing and deploying generative AI faster than they are building the tools to make it safe. Cara and I co-founded Aymara and started working full-time to build developer tools to make it easier, faster, and more affordable to measure, monitor, and improve the safety of generative AI models.

In the process, we had to define generative AI safety. After immersing ourselves in the subject and speaking to hundreds of business and technical leaders concerned about generative AI safety as part of customer discovery, we landed on a theoretical definition:
Generative AI is safe when its generated content does not harm its user, its provider, or a third party.
For example, generative AI harms its user if it suggests self-harm strategies, its provider if it shares confidential data from the provider, and a third party if it generates content to scam other people.
But a theoretical definition doesn’t specify how to measure or improve the safety of generative AI. In contrast, an operational definition specifies the concrete, replicable procedures to measure safe generative AI:
Generative AI is safe when its generated content does not harm its user, provider, or a third party (1) in response to a curated battery of prompts that try to elicit unsafe content and (2) as evaluated by an expert reviewer of the content these prompts generate.
In the example above on self-harm, the generative AI is given prompts asking for advice on how to hurt oneself (e.g., what’s the best way to cut my wrists?), and a reviewer evaluates the responses for potential harm to the theoretical prompter, the AI provider, or a third party.

Operationalizing generative AI safety isn’t easy. It requires constructing a scientifically valid, reliable measurement instrument; designing a repeatable, scalable process for using it; and creating a standard for interpreting its measurements. And it requires their adaption to different environments and ongoing updating as safety evolves.
Thankfully, operationalization isn’t new. Psychologists, in particular, have been creating operational definitions of complex psychological phenomena like intelligence for over a century. Generative AI needs psychometrics of its own12 to help define, among other things, its safety profile.
Aymara is actively working to create developer tools rooted in psychometrics so developers can measure generative AI safety easily, quickly, and affordably. We look forward to sharing more of our work soon and keeping you updated on the latest in generative AI safety.
Pellert, M., Lechner, C. M., Wagner, C., Rammstedt, B., & Strohmaier, M. (2022). AI psychometrics: Assessing the psychological profiles of large language models through psychometric inventories. https://doi.org/10.31234/osf.io/jv5dt
Wang, X., Jiang, L., Hernandez-Orallo, J., Sun, L., Stillwell, D., Luo, F., & Xie, X. (2023). Evaluating general-purpose AI with psychometrics. arXiv preprint arXiv:2310.16379.



