Marco Baroni
Modern language models (LMs) respond with uncanny fluency when prompted using a natural language, such as English. However, they can also produce predictable, semantically meaningful output when prompted with low-likelihood "gibberish" strings, a phenomenon exploited for developing effective information extraction prompts (Shin et al. 2020) and bypassing security checks in adversarial attacks (Zou et al. 2023). Moreover, the same "unnatural" prompts often trigger the same behavior across LMs (Rakotonirina et al. 2023, Zou et al. 2023), hinting at a shared "universal" but unnatural LM code. In my talk, I will use unnatural prompts as a tool to gain insights into how LMs process language-like input. I will in particular discuss recent and ongoing work on three fronts: transferable unnatural prompts, as a window into LM invariances (Rakotonirina et al. 2023); mechanistic interpretability exploration of the activation pathways triggered by natural and unnatural prompts (Kervadec et al. 2023); and first insights into the lexical nature of unnatural prompts. Although a comprehensive understanding of how and why LMs respond to unnatural language remains elusive, I aim to present a set of intriguing facts that I hope will inspire others to explore this phenomenon.
Marco Baroni received a PhD in Linguistics from the University of California, Los Angeles. After various experiences in research and industry, in 2019 he became an ICREA research professor, affiliated with the Linguistics Department of Pompeu Fabra University in Barcelona. Marco's work in the areas of multimodal and compositional distributed semantics has received widespread recognition, including a Google Research Award, an ERC Grant, the ICAI-JAIR best paper prize and the ACL test-of-time award. Marco was recently awarded another ERC grant to conduct research on improving communication between artificial neural networks, taking inspiration from human language and other animal communication systems.