基本信息
views: 34

Bio
My research is focused on deep learning, AI safety and alignment, and more specifically, with understanding how to communicate or “specify” what behavior is desired. I’m pursuing the three research directions I view as most promising in this area:
Learning what humans want from human feedback (e.g. via reward modelling).
Managing the incentives an AI system has to influence the world (e.g. to prevent user manipulation in content recommendation systems).
Getting deep nets to understand the world the same way people do (e.g. so that they can solve out-of-distribution generalization problems).
Learning what humans want from human feedback (e.g. via reward modelling).
Managing the incentives an AI system has to influence the world (e.g. to prevent user manipulation in content recommendation systems).
Getting deep nets to understand the world the same way people do (e.g. so that they can solve out-of-distribution generalization problems).
Research Interests
Papers共 59 篇Author StatisticsCo-AuthorSimilar Experts
By YearBy Citation主题筛选期刊级别筛选合作者筛选合作机构筛选
时间
引用量
主题
期刊级别
合作者
合作机构
arxiv(2025)
Cited0Views0Bibtex
0
0
Yoshua Bengio,Geoffrey Hinton,Andrew Yao,Dawn Song,Pieter Abbeel,Trevor Darrell, Yuval Noah Harari,Ya-Qin Zhang,Lan Xue,Shai Shalev-Shwartz,Gillian Hadfield,Jeff Clune,Tegan Maharaj,Frank Hutter,Atilim Gunes Baydin,Sheila McIlraith, Qiqi Gao,Ashwin Acharya,David Krueger,Anca Dragan,Philip Torr,Stuart Russell,Daniel Kahneman,Jan Brauner,Soren Mindermann
Scienceno. 6698 (2024): 842-845
SSRN Electronic Journal (2024)
CoRR (2024)
Cited0Views0EIBibtex
0
0
CoRR (2024)
Cited0Views0EIBibtex
0
0
CoRR (2024)
TMLR 2024 (2024)
Stephen Casper, Carson Ezell, Charlotte Siegmann,Noam Kolt, Taylor Lynn Curtis, Benjamin Bucknall,Andreas Haupt, Kevin Wei,Jérémy Scheurer,Marius Hobbhahn,Lee Sharkey,Satyapriya Krishna, Marvin Von Hagen,Silas Alberti,Alan Chan, Qinyi Sun, Michael Gerovitch,David Bau,Max Tegmark,David Krueger,Dylan Hadfield-Menell
Stephen Casper,Xander Davies,Claudia Shi,Thomas Krendl Gilbert,Jérémy Scheurer,Javier Rando,Rachel Freedman,Tomek Korbak,David Lindner,Pedro Freire,Tony Tong Wang,Samuel Marks,Charbel-Raphael Segerie,Micah Carroll,Andi Peng,Phillip J.K. Christoffersen,Mehul Damani,Stewart Slocum,Usman Anwar,Anand Siththaranjan,Max Nadeau,Eric J Michaud,Jacob Pfau,Dmitrii Krasheninnikov, Xin Chen,Lauro Langosco,Peter Hase,Erdem Biyik,Anca Dragan,David Krueger,Dorsa Sadigh,Dylan Hadfield-Menell
TMLR 2024 (2024)
Load More
Author Statistics
#Papers: 59
#Citation: 8345
H-Index: 21
G-Index: 38
Sociability: 6
Diversity: 1
Activity: 23
Co-Author
Co-Institution
D-Core
- 合作者
- 学生
- 导师
Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn