Event
Machine Learning as a Magnifying Glass to study Society
- 10 April 2024
- Expired!
- 3:00 pm - 4:00 pm
Location
- Attendance: in person
- Language: EN
Event
Machine Learning as a Magnifying Glass to study Society
Machine Learning Algorithms (MLAs) are trained on vast amounts of data and work by learning patterns and finding non-linear and often black-box mathematical relations between that data. A central challenge MLAs face is that the data used to train them is not generated in a social vacuum: if the data or the targets are biased, the models will also be biased. This creates an important problem: how should MLAs be trained to identify relevant differences in data while not perpetuating or even amplifying prejudice or social bias?
To date, the main approach has been deductive, or top-down: researchers or coders start by listing known biases, such as racial prejudice, and then search for signs of their presence in the data, the models, or in societies. The implicit assumptions are that a) all biases or all types of biased features are known apriori, b) they are identifiable; and c) once identified, they can be debiased-against. However, there is no comprehensive and universal list of biases, new biases emerge dynamically, and the coder or researcher’s contextual backgrounds influence the debiasing approaches. In summary, even screened datasets or models are likely to contain biased patterns.
Therefore, it is crucial to develop inductive systems to identify biases in MLAs.
The talk will be divided into two parts. In the first, I will describe ongoing work from the SPAC lab, particularly the first (to the best of our knowledge), experimental audit study for detecting possible differential tracking in misinformation websites and its impact on third-party content and search engine results.
In the second part, I will discuss the possibility of expanding on this and other work and taking advantage of MLAs to identify novel biases. That MLAs so efficiently learn from widely recognized prejudice suggests that it should be possible to use algorithms to reverse the problem and develop statistical, bottom-up tools to identify latent, unknown biases. This is a very preliminary project, and I would value the community’s input.