Verifying Value Iteration and Policy Iteration in Coq
Ohio University / OhioLINK, 2021
Hochschulschrift
Zugriff:
Reinforcement learning is a growing field of research, but little work is being done to verify the correctness of reinforcement learning algorithms. Researchers are exploring the use of reinforcement learning in safety critical systems such as self-driving cars and autonomous aircraft, so mathematical proofs of correctness of the underlying reinforcement learning algorithms would greatly improve our confidence in the systems that utilize reinforcement learning. This project verifies convergence and optimality of two fundamental reinforcement learning algorithms: value iteration and policy iteration. These algorithms converge and are optimal if they eventually produce an optimal policy. It also is designed to be extensible to future research into verified reinforcement learning.
Titel: |
Verifying Value Iteration and Policy Iteration in Coq
|
---|---|
Autor/in / Beteiligte Person: | Masters, David M. |
Link: | |
Veröffentlichung: | Ohio University / OhioLINK, 2021 |
Medientyp: | Hochschulschrift |
Schlagwort: |
|
Sonstiges: |
|