谷歌浏览器插件
订阅小程序
在清言上使用

Rethinking Machine Learning Benchmarks in the Context of Professional Codes of Conduct.

Symposium on Computer Science and Law(2024)

引用 0|浏览13
暂无评分
摘要
Benchmarking efforts for machine learning have often mimicked (or even explicitly used) professional licensing exams to assess capabilities in a given area, focusing primarily on accuracy as the metric of choice. However, this approach neglects a variety of essential skills required in professional settings. We propose that professional codes of conduct and rules can guide machine learning researchers to address potential gaps in benchmark construction. These guidelines frequently account for situations professionals may encounter and must handle with care. A model may excel on an exam but still fall short in critical scenarios, deemed unacceptable under professional codes or rules. To motivate this idea, we conduct a case study and comparative examination of machine translation in legal settings. We point out several areas where standard deployments and benchmarks do not assess key requirements under professional rules. We suggest further refinements that would bring the two closer together, including requiring a measurement of uncertainty so that models opt out of uncertain translations. We then share broader insights on constructing and deploying foundation models, particularly in critical domains like law and legal translation.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要