Machine learning, deep learning, and generative AI are now widely used in many scientific and industrial fields, particularly for data analysis, simulation, and design. Their success stems from their performance and their ability to tackle complex problems, especially on a large scale ones.
However, these models have significant limitations, particularly regarding reproducibility, verification, and interpretability of the results. Despite the huge improvement of the research, no universal methodology for reliably quantifying errors in these models exist.
In this context, it becomes essential to assess reliability and uncertainties associated with predictions. By leveraging the VVUQ framework (Verification, Validation, and Uncertainty Quantification)—already used in numerical simulation—this approach enables the verification of model performance, the validation of their alignment with reality, and the measurement of uncertainties.
In this talk, I will present the conclusions of a workshop organized by the CEA. This workshop provided an opportunity to have an overview of the state of the art of existing methods, explore their limits, identify relevant application areas, and propose research avenues and use cases to improve the validation of AI models. |
 |
Biography: Christophe Calvin, Director of Research and Senior Fellow at the CEA, serves as the Administrator of Algorithms and Source Code for CEA Research (ADAC). He has spent his entire career in the fields of numerical simulation and high-performance computing (HPC). In recent years, he has focused on the impact of AI on research and the FAIR management of scientific and technical data. In this capacity, he oversees the CEA’s policy regarding its scientific and technical data (DST). Involved in software development since the beginning of his career, he is also interested in how development models evolve in response to technological advancements and organizational models. |