ACML 2025 Towards Robust and Trustworthy Large Language Models - Issues and Mitigation Strategies
Overview
Recent advancements in large language models (LLMs) have achieved superhuman performance across various domains and tasks, unlocking their potential in real-world applications. However, robustness and trustworthiness issues hinder their reliable deployment. Robustness issues refer to inconsistent model behaviors under equivalent conditions, such as sensitivity to minor prompt variations. Trustworthiness encompasses issues like hallucinations–situations where LLMs produce factually incorrect or input-conflicting outputs–and fairness issues, including biases toward certain races, genders, value systems, or languages. Addressing these is crucial for building reliable applications with these powerful yet vulnerable models.
We will conclude the tutorial by outlining future research directions in this area.
Speakers
Outline
Introduction
Definitions, importance, and taxonomy of robustness and trustworthiness issues
Adversarial attacks and jailbreaking
Intentional disruptions in model behavior (Zou et al., 2023, Chao et al., 2025).
Prompt variations
Effects of semantically equivalent prompt changes on model performance (Sclar et al., 2024).
Position bias
Bias toward information based on input position (Wang et al., 2025); impact on tasks like pairwise response evaluation (Zheng et al., 2023), multi-choice questions (Zheng et al., 2024), and retrieval-augmented generation (RAG) (Liu et al., 2024).
Hallucinations
When the LLM’s generation is factually incorrect, contradicts with its own generation, or conflicts with the provided input context (Huang et al., 2025).
Fairness and social biases
Biases in ethnicity, gender, value systems, or languages.
Reasoning models
Specific issues introduced by LLMs trained to generate reasoning tokens before providing the final answer (Kumar et al., 2025).
Multimodal models
Issues introduced by models that process multimodal inputs (Tong et al., 2024).
Conclusion and open challenges
Summary of best practices for developing robust and trustworthy LLMs, open research questions, and future directions.