Back

Minisymposium Presentation

Ethics of Large Language Models and Code Generation

Wednesday, June 5, 2024
10:30
-
11:00
CEST
Climate, Weather and Earth Sciences
Climate, Weather and Earth Sciences
Climate, Weather and Earth Sciences
Chemistry and Materials
Chemistry and Materials
Chemistry and Materials
Computer Science and Applied Mathematics
Computer Science and Applied Mathematics
Computer Science and Applied Mathematics
Humanities and Social Sciences
Humanities and Social Sciences
Humanities and Social Sciences
Engineering
Engineering
Engineering
Life Sciences
Life Sciences
Life Sciences
Physics
Physics
Physics

Presenter

Jay
Lofstead
-
Sandia National Laboratories

Jay Lofstead is a Principal Member of Technical Staff at Sandia National Laboratories. His research interests focus around large scale data management and trusting scientific computing. In particular, he works on storage, IO, metadata, workflows, reproducibility, software engineering, machine learning, and operating system-level support for any of these topics. Broadly across these topics, he is also deeply interested in ethics related to these topics and computing in general and how to drive inclusivity across the computation-related science domains. Dr. Lofstead received his Ph.D. in Computer Science from the Georgia Institute of Technology in 2010.

Description

Programming scientific computing systems is a complex job constantly under labor pressures. Financial incentives from industry and potential work visa and export control restrictions make employment in this sector even more difficult. With the appearance of LLM-based code generators like CoPilot, several ethical concerns arise. First, how are these models trained and created? The training data comes from potentially illegally used materials, such as open source code that requires attribution, but none is generated. Second, the correctness of these systems requires higher skills to validate the code generated does exactly and only what the requestor wants. Third, security flaws introduced by poisoning the source pool on repositories like Github make relying on the code safety questionable.

This talk will delve into some of the ethical problems with these code generators and whether or not it is ethical to use them, trust or not, to achieve the advancement of scientific inquiry. The chronic labor shortages encourage using these tools as a short cut, but is that wise, ethical, or even useful?

Authors