基本信息
views: 292
![](https://originalfileserver.aminer.cn/sys/aminer/icon/show-trajectory.png)
Bio
My goal is to make the conceptual advances necessary for machine learning systems to be reliable and aligned with human values. This includes the following directions:
Robustness: How can we build models robust to distributional shift, to adversaries, to model mis-specification, and to approximations imposed by computational constraints? What is the right way to evaluate such models?
Reward specification and reward hacking: Human values are too complex to be specified by hand. How can we infer complex value functions from data? How should an agent make decisions when its value function is approximate due to noise in the data or inadequacies in the model? How can we prevent reward hacking–degenerate policies that exploit differences between the inferred and true reward?
Scalable alignment: Modern ML systems are often too large, and deployed too broadly, for any single person to reason about in detail, posing challenges to both design and monitoring. How can we design ML systems that conform to interpretable abstractions? How do we enable meaningful human oversight at training and deployment time despite the large scale? How will these large-scale systems affect societal equilibria?
Robustness: How can we build models robust to distributional shift, to adversaries, to model mis-specification, and to approximations imposed by computational constraints? What is the right way to evaluate such models?
Reward specification and reward hacking: Human values are too complex to be specified by hand. How can we infer complex value functions from data? How should an agent make decisions when its value function is approximate due to noise in the data or inadequacies in the model? How can we prevent reward hacking–degenerate policies that exploit differences between the inferred and true reward?
Scalable alignment: Modern ML systems are often too large, and deployed too broadly, for any single person to reason about in detail, posing challenges to both design and monitoring. How can we design ML systems that conform to interpretable abstractions? How do we enable meaningful human oversight at training and deployment time despite the large scale? How will these large-scale systems affect societal equilibria?
Research Interests
Papers共 115 篇Author StatisticsCo-AuthorSimilar Experts
By YearBy Citation主题筛选期刊级别筛选合作者筛选合作机构筛选
时间
引用量
主题
期刊级别
合作者
合作机构
Cited0Views0Bibtex
0
0
biorxiv(2024)
arxiv(2024)
Cited0Views0Bibtex
0
0
arxiv(2024)
Cited0Views0Bibtex
0
0
arxiv(2024)
Cited0Views0Bibtex
0
0
ICLR 2023 (2023)
CVPR 2024 (2023)
Cited0Views0EIBibtex
0
0
ICLR 2024 (2023)
Load More
Author Statistics
Co-Author
Co-Institution
D-Core
- 合作者
- 学生
- 导师
Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn