| Call Number | 13322 |
|---|---|
| Day & Time Location |
MW 1:10pm-2:25pm To be announced |
| Points | 3 |
| Grading Mode | Standard |
| Approvals Required | None |
| Instructor | Shipra Agrawal |
| Type | SEMINAR |
| Method of Instruction | In-Person |
| Course Description | Theory of Markov Decision Processes (MDP) and Dynamic Programming. Design and convergence properties of Reinforcement Learning (RL) algorithms including Q-learning and Policy iteration methods. Function approximation and deep RL algorithms: DQN, policy gradient, actor-critic methods. Exporation-Exploitation and regret bounds in RL. Multi-agent RL. RL with Human Feedback (RLHF). RL and Monte Carlo Tree Search (MCTS) for Agentic Systems. Note: Only one of ORCS E4529 or 6529 may be taken for credit. |
| Web Site | Vergil |
| Department | Industrial Engineering and Operations Research |
| Enrollment | 0 students (50 max) as of 5:06PM Saturday, November 8, 2025 |
| Subject | Op Research - Computer Science |
| Number | E6529 |
| Section | 001 |
| Division | School of Engineering and Applied Science: Graduate |
| Section key | 20261ORCS6529E001 |