Project Overview
In This Work
This project came about towards the end of my doctoral studies as a synthesis of ideas in fairness and theoretical reinforcement learning. We wanted to lay out well-motivated fairness objectives and understand the impact of sequential decision-making processes on multiple stakeholders. However, we also placed significant emphasis on the dynamics of learning and algorithms that could guarantee their behavior after a relatively small amount of exploration and learning. This course of study led to concepts of learning requiring approximate optimality with respect to various fairness objectives, and exploration efficiency guarantees generalizing those found in the mistake-bound, KWIK, and PAC-MDP RL frameworks.
Ultimately, this work raises the argument that objectives in toy RL problems and modern RL systems should be approached as fundamentally different. While RL agents acting independently in simple environments reasonably maximize their own value functions, as reinforcement learning is used to control and define systems of greater complexity that impact more and evermore people, we should think of these learning problems in terms of how the actions of an agent impact society at large. In other words, we argue for a shift from egocentric to socially aware reinforcement learning as the complexity and influence of RL systems continues to expand. This work was first presented to the world at RLDM 2022 at Brown University, and a greatly expanded and more theoretical treatment on the subject was given in our award-winning RLC 2024 paper, presented at the University of Massachusetts Amherst.