时间: 2020-06-22 09:38
编者按：6月16日，CVPR 2020 大会以全球连线的形式如期开幕。在大会的首场主题演讲中，微软公司 CEO 萨提亚·纳德拉与微软公司前执行副总裁沈向洋进行了一场精彩的炉边对谈，分享了对计算机视觉、人工智能研究与应用前景的思考与展望。本文为大家整理了完整的文字实录。
沈向洋：大家早上好。欢迎参加 CVPR 2020 大会，欢迎来到西雅图，非常感谢大家从世界各地连线参加这次会议。我是 Harry Shum （沈向洋），很高兴担任本次 CVPR 大会首个主题演讲的主持人。感谢大家观看我和微软公司 CEO 萨提亚·纳德拉的炉边对话。你好，萨提亚，非常荣幸邀请你参加 CVPR 2020。我认识你有将近20年，而且有幸在你麾下工作多年。面对今天的观众，能不能请你首先和大家分享一下，你是如何在印度长大，来美国学习计算机科学，在1992年加入微软，并最终成为微软的 CEO 的经历。
萨提亚：首先，非常感谢 Harry，很高兴受邀作为演讲嘉宾参加 CVPR。我们正在经历一个前所未有的时期，能有这样的机会和大家相聚在一起，探讨计算机视觉等技术领域的重大突破、以及科技创新能为世界带来的积极贡献，让我感到特别兴奋，很高兴能与大家共聚一堂！Harry 要我谈谈我的个人经历，我在印度海得拉巴长大，在那里度过了很多年，那时的我从未想到会有此时此景。20世纪70年代中期的海得拉巴和今天完全不同，那时有两样来自美国的东西，最终改变了我的一生。第一是美国的技术，个人电脑，我很幸运在那个时候接触到了电脑，从此有了自己的梦想，第二是美国的移民政策，让我有机会来到美国求学。之前我从未去过孟买西部，结果却来到了美国的威斯康星，这在当时是个不可思议的选择。但由此开启的机遇，无论是上学还是进入微软工作，最终成就了我的人生和我今天的样子。因此，我对过去所经历的一切始终心怀感激。今天我非常确信，机遇能带来巨大的影响，它让我这样一个印度孩子活出了不一样的人生。我认为这是一种殊荣，我们应该思考，对于我们这些有此殊荣的人来说，我们在创造技术和技术平台的过程中，该如何推动它普遍造福于每一个人。这样的想法，是推动我前进的根本动力。这也是微软使命 “予力全球每一人、每一组织，成就不凡”的根基。它鞭策我每天努力工作，激励我们全力以赴地做出最好的成果。
沈向洋：非常好，感谢萨提亚。你的故事非常鼓舞人心。CVPR 是计算机视觉的盛会，现在让我们直奔主题。微软从很早以前便投身于计算机视觉领域，就在几天前我的好朋友、CVPR 2020 大会的主席 Gerard Medioni 教授还提醒我说，微软研究院支持 CVPR 大会已经有差不多30年了。你能否与我们分享一下，微软为何如此热衷于计算机视觉？你在计算机视觉领域最关注的焦点有哪些？
萨提亚：当然可以。Harry，你本人就是微软这个历程的关键推动者。从21世纪初开始，我们在计算机视觉领域就开始了包括骨架追踪、人体感知等基础性的研究。在2010年，我们将这些技术转化成了第一个商业产品，Kinect，我认为这是一个突破性的产品，而且它也的确成为了当年最畅销的消费产品。在此基础上，微软剑桥研究院做出了能够完全重构 3D 环境的 Kinect Fusion，随后又有了 HoloDesk。我始终记得第一次看到演示的场景，它让人与 3D 环境完全融为一体，让你可以在桌面上摆弄一个 3D 物体。2015年，我们完成了产品化的 HoloLens，2019年，又推出了第二代的 HoloLens，HoloLens 2，它提供了两倍的视野、两倍的舒适度，并且还带来了很多精彩应用。
除此之外，让我感到特别兴奋的，还是看到计算机视觉技术在云和边缘设备上的创新应用。我们在2016年，在物体识别领域达到了接近人类的水平。2018年发布的 Azure Kinect 提供了板载边缘计算的能力，从此我们把 Azure 认知服务带到了 Azure Kinect 上。有了这些完善的工具链，我们就可以围绕计算机视觉，做出更有创意的研究和应用。事实上，已经有很多 ISV 和第三方开发者将 Azure Kinect 应用到从生产制造到医疗卫生的不同场景。在美国的医院里，每年平均会发生大约100万起跌倒事故。一家名为 Ocuvera 的 ISV 开发出了视频监控解决方案，利用 Azure Kinect 来分析病人的动作特征，当无人照料的患者试图起身离开床的时候，系统就会向护士和护工发出警告，从而提前避免跌倒事故的发生，其准确率已经达到了96%。
位于俄亥俄州的美国全国儿童医院，在使用 Kinect 对婴儿行为不协调的情况进行早期筛查，发现脑瘫等疾病的征兆。他们使用计算机视觉模型搭建了原型，来判断婴儿动作的健康程度，让护理人员能够尽早采取干预治疗手段。对于此类疾病，尽早治疗极其重要。医疗设备公司 Evolve 用 Kinect 来改善中风后遗症患者物理康复治疗的效率，他们的方案将传统的身体训练与互动游戏相结合，并且针对每一位患者的个人情况进行了定制。
当我们在 CVPR 大会上讨论计算机视觉的未来时，有三个突破方向让我尤其感兴趣，我希望能推动它们的发展，给现实世界带来积极的影响。第一个方向，我称之为“4D理解（4D Understanding）”，之前我也和 Harry 聊过，可以把它发展成“现实即服务（Reality-as-a-Service）”。比如说，在医院或者工厂车间这样特别关注安全和质量的地方，如果我们想要利用实时的计算机视觉技术，对人、地、物这些要素进行推理以确保安全，这将是一个非常了不起的突破。我们已经在一些案例中看到了实际部署的情况，请让我用一段视频，来动态展示一下效果。
视频内容翻译：我们正在开发一项名为“4D Understanding”的技术，它整合了来自多个 Azure Kinect 的数据，通过计算机视觉模型进行实时的空间分析。系统能够跟踪物品、人、互动行为及群组活动。云端的视频和动作理解模型，会发现这个人正在用不安全的方式举起大罐子，他用的是背部而不是腿来发力。在物品识别窗口，一个人正在组装零件。红色和绿色的圆圈显示的是手部跟踪。Azure 认知服务的计算机视觉 API 针对物品进行了训练，能够检测出这些对象。另外有一些模型则用来分析组装的动作。这能让我们发现被遗漏的步骤。在这里，系统检测到有一根电缆没有被组装进去，因此组件被判断为未完成。通过整合多项计算机视觉技术，我们的系统可以实时提供用来指导决策的洞察。
萨提亚：在远程协作无处不在的今天，另一个让我感兴趣的领域，是“背景替换（Background Matting）”。即使人在家中坐，我们也可以把你搬到舞台上去。事实上，在最近举办的微软 Build 开发者大会上，我们就把演讲者在家中的影像拍摄下来，天衣无缝地投射到一个虚拟舞台上，完全不需要使用绿幕。我想这也是计算机视觉的一项突破，下面的视频，将展示我们和华盛顿大学合作的成果。
视频内容翻译：利用计算机视觉模型和 Azure Kinect 的景深数据能制作出虚拟背景。为了保障大家的健康，今年的微软 Build 年度开发者大会以虚拟形式召开。演讲者出现在虚拟的舞台上，你可以看到他们本人其实是在一个普通的房间里，并没有使用绿幕。我们让演讲者利用连接到笔记本电脑的 Azure Kinect 给自己录像。Kinect 可以记录 RGB 色值和景深数据，将其放入以华盛顿大学的研究为基础开发的人工智能模型，就能生成动态的透明蒙版，然后我们就可以用虚拟舞台替换掉背景。与目前市面上的其它技术相比，背景的质量非常不错。我们希望能够制作出逼真的虚拟背景，从而创造出更加身临其境的体验。背景替换的相关代码，已在 GitHub 上开源。
萨提亚：第三个突破，是我们在将近一年前所展示的”全息瞬移（Holoportation）”。这段视频演示了我们的同事 Julia White 在台上用英语演讲，而她的全息影像同时在讲日语，这其中综合运用了神经网络文本到语音 TTS（Text-to-Speech）、全息计算等技术。像这样自由地超越时间、空间和语言的局限，在我看来是一项了不起的突破，我希望这样的技术能够得到加速发展。
首先，Julia 启动了 HoloLens 2。这时她的掌心里出现了一个微型版本的自己
紧接着，一段炫酷的特效后，真人比例 1:1 大小的全息影像版 Julia 出现在了大家面前
“复刻版”Julia 的表情神态和语音语调与“本尊”如出一辙，更让人震撼的是，她居然用流利的日语做起了演讲（要知道，Julia 本人并不会日语）
沈向洋：非常精彩，萨提亚，这也唤起了我最宝贵的回忆。我想起了很多年前，我们在微软刚刚成立计算机视觉研究小组时的情景，我们有 Rick Szeliski、Matthew Turk 还有很多了不起的人物。你也提到微软研究院在全球有很多分院，比如英国的剑桥研究院、中国的亚洲研究院，还有印度研究院等等。很多来自微软研究院的视觉技术，已经成功地融入到了微软的产品中。最让我兴奋的，就像你所说的，是科学研究和产品之间的密切联系和转化，比如你说的 Kinect、HoloLens 还有很多项目，像是 Julia White 的这个全息瞬移的视频。我想计算机视觉的发展前景，一定是不可限量的。
沈向洋：接下来，让我们从计算机视觉转到人工智能，AI。微软投身人工智能研究也有很长时间了，特别是在比尔·盖茨先生1991年创建微软研究院之后。我还记得微软研究院最初成立的三个研究小组就是自然语言处理、语音和视觉，这些都是 AI 的基础。最近，你也在反复强调，云计算和人工智能将是微软未来成长的关键。上个月在 Build 大会上宣布的 AI 超级计算机也非常激动人心。那么，微软对于人工智能接下来的发展的看法是什么？
在我看来，过去几年中最值得关注的一个事情，是大规模计算，能够计算更多参数的模型将带来更令人惊奇的结果，特别是在语言方面。你知道，从循环神经网络（RNN）到 Transformer 模型,最后得到的结果都是巨大无比的。当你还在微软领导科研团队时，我们发布了带有170亿个参数的“图灵模型”。现在，我们又和 Open AI 合作，把这个数字提高到了1700亿，这是非常激动人心的进步。而我们还更进一步，特别为此打造了超级计算机。在处理这种级别的超大模型时，我们要面对种种挑战，甚至需要克服“摩尔定律”的局限，因此我们必须要重新发明整个系统，让超大规模机器学习成为可能。很高兴我们最终在 Azure 上建成了 AI 超级计算机，我们和 Open AI 正在上面训练这些模型。同时我们正在把这些模型平台化，让其他人也可以在这些模型的基础上，进行一些微调，来满足他们自己的使用需求。让我更加兴奋的是，我们还可以举一反三，将这些从文本、语音、图像中学习到的 AI 训练的方法推而广之，来对知识形成更好的表达。因此，我想，在接下来的几年中，我们将看到来自系统层、建模技巧、训练技巧，当然还有应用层面的更多突破。比如说在医疗保健领域，如果我们希望能在精准给药方面有所进步，则需要在临床报告、医疗影像等方面的创新，并且把这些创新汇聚起来推动真正的突破。
沈向洋：的确非常值得期待。关于你提到的170亿个参数的图灵模型，还有1750亿个参数的 GPT-3 大规模模型，我还想补充几句。我们知道，在微软内部，很多研究小组不但在利用 Azure 训练自己的模型，甚至实现了小样本学习、单样本学习，乃至零样本学习。这其中蕴藏的机遇真的是非常惊人。萨提亚，今天我们的主题演讲是通过虚拟的方式在线进行的，因为我们正在经历一个特殊时期。既然计算机视觉、人工智能有这么多令人兴奋的前景，我想请你分享你对于 AI 视觉技术最真实的想法：现在我们该如何利用这些技术帮助大家，过好自己的生活、做好自己的工作——不仅是在当前面对疫情的时候，更重要的，还是在疫情过后的世界里。能否和我们分享一些案例，告诉我们微软在做什么，微软在如何帮助人们，特别是帮助那些在一线工作的人。
萨提亚：的确如此，Harry，我想这场疫情将人们对数字技术的迫切需求推到了前台，我们在思考技术该如何在全社会的规模上，帮助人们去应对、恢复、以及重构今后工作和生活的方式。我想，这三个阶段其实是同时进行的，而包括计算机视觉在内的数字技术，将在其中发挥重要作用。事实上，我们刚才看到的那段视频，展现了工厂车间里的远程感知、远程监控，以“现实即服务”的方式来确保安全，用数字孪生来保障安全运行，这对于制造业来说，都是非常重要的趋势。我们在制造业看到的另一个应用，是对生产线进行及时、快速的调整，比如说迅速转产制造呼吸机。在这个过程中需要专家的远程指导，来帮助工人重组生产线，HoloLens 结合 Dynamics 的 Remote Guides 应用，在这个过程中发挥了重要的作用。这是在制造业上。
在医疗卫生领域，在英国的医院里，我们看到了 HoloLens 和 Microsoft Teams 结合的应用。医生在照料受新冠病毒感染的患者时，不但穿着全套个人防护装备，还佩戴着 HoloLens。HoloLens 能够拍摄到医生看到的视野，并将其通过 Microsoft Teams 传送出来，让隔离区外的其他医生也能看到患者，并远程给出治疗建议。安全和协作以一种全新的方式，在抗击疫情的第一线发挥着作用。在医护教学方面，凯斯西储大学医学院让学生在家使用 HoloLens 远程参与解剖课的教学，确保能够以逼真的体验继续教学课程。这是很了不起的突破。
沈向洋：很高兴听到你的想法，萨提亚。我的感悟是，无论我们面对怎样的挑战，比如当前的疫情，我们总能通过创新找到出路。尽管需要付出巨大的努力，但我们终将走出困境。萨提亚，让我们回到微软公司的话题上来。过去六年多，你做了大量的工作领导微软实现了成功的转型。你在社区建设上尤其投入了大量的精力，并且做出了很多大胆的尝试，比如收购面向商业人才的 LinkedIn 和面向开发者的 GitHub。事实上，CVPR 是一个汇聚了计算机视觉研究者和从业者的大社区，这两年，每年 CVPR 大会的参与者已经达到了接近1万人的规模。我们中的很多人都想从你和你的经历中得到一些建议和启示，来促进整个社区的成长。你认为，我们这个计算机视觉社区，应该如何相互帮助、共同工作、共同成长，并更好地贡献社会呢？
萨提亚：当然，Harry。我们说予力全球每一人、每一组织，成就不凡，其中的关键就是利用数字技术，帮助人们以及人们所建立的机构和社区共同创造、共同繁荣。这是微软使命的中心思想，也是微软商业模式的核心所在。只有我们所服务的整个世界变得更好，我们才能变得更好。无论是帮助小企业更具生产力，帮助覆盖全球的大型国际公司更具竞争力，还是帮助公共服务部门提高效率，帮助教育、医疗得到发展，帮助大型社区共同繁荣。对我们来说，这是核心所在。你刚才提到的那些收购，包括开发者社区 GitHub、面向商业人才的 LinkedIn，还有 Minecraft 等游戏玩家社区，我们很荣幸能够服务这些社区，同时这些社区也让我们的根基更为扎实。
在计算机视觉领域也是同样的道理。计算机视觉研究者相互团结，创造科技突破的传统由来已久，微软研究院与学术界合作，共同推进产品创新的先例也是不胜枚举。来自苏黎世联邦理工学院（ETH）的 Marc Pollefeys 就是最好的例证。他和微软合作，推动了很多产品的创新，但同时他也在 ETH 创办了世界级的研究中心。这样的跨界合作正是社区建设的核心。这不仅限于计算机视觉，也适用于人工智能的更广大领域，并延伸到整个数字技术的范畴之中。在微软，我们希望能够促进生态系统平台的思考，帮助社区团结在一起，更重要的，是促进不同社区之间的相互合作，通过合作放大社区的力量。
沈向洋：说得好，萨提亚，你带给我们很多启示。社区的一个重要属性就是国际化。就像 GitHub 是国际化的，LinkedIn 是国际化的，游戏社区是国际化的，计算机视觉的 CVPR 社区也不例外。那么，作为一家跨国公司的领导者，面对很多你熟悉的学科社区，你觉得他们该如何更好地推进国际合作呢？
萨提亚：好的，Harry，接下来我们来聊一聊国际合作。无论是 CVPR 这样的科研社区，还是微软这样的跨国公司，我想我们必须要理智思考的一件事是我们的工作，无论是相互合作还是独自完成，如何才能真正帮到每一个国家的每一个社区。所谓全球化，如果不能让当地从中获益，就无从谈起。事实上，从上一轮的全球化来看，我们看到它让很多人受益，但也有很多人被撇在了后面。因此我现在想说，微软应该在某种程度上有所作为，这就是为什么我无论去到世界任何地方，都会注意观察，并且表达微软希望积极参与和帮助地区和国家发展的愿望。我希望我们的星星之火，可以为促进小企业、大企业、公共服务部门、医疗、教育的发展，帮助改善当地资源供给、就业情况，提高技术水平，做出些许贡献。作为一个全球社区，无论是科研社区还是跨国公司，我们必须在推动全球合作的同时，积极参与和对本地发展作出贡献。如果我们能在这方面有更多的想法，在这个方向上贡献出更大的力量，我们就越能保持发展的活力。
沈向洋：非常好的观点，萨提亚。事实上，在 CVPR 社区以及其他的大规模计算社区，比如 ICCV，我们的想法也都是这样。这也是为什么 CVPR 大会几乎在每一座拥有大学或者研究院的美国城镇都举办过。ICCV 大会是在不同的大洲轮流举办，就像你说的，只有当地社区都得到繁荣发展，才能真正成为一个全球性的组织。非常好。那么，萨提亚，我们现在还有一点时间，我这里有几个提前从观众那里收到的问题。第一个问题是个很适时的问题，我知道你也在这上面思考了很多。关于人工智能，关于有道德地运用人工智能，关于负责任的人工智能，我们意识到你和微软花了大量精力来阐述这个问题，还在这方面做了很多艰难的决定。你能否分享一些在这方面心得和教训？
萨提亚：当然了，Harry。我们一直在理智地思考，如何制定一套设计规则，确保在创造 AI 时，能够将核心的道德思考烙印到工程开发的流程中去。在我们看来，确定 AI 安全和有道德地使用 AI 的设计原则，就像在编程时确定运行环境一样重要。在这个设计原则中，我们首先建立了一套具体的工程学原则，从公平、可靠，到安全、隐私等。这样，我们保证符合道德成为设计流程的一部分，我们将其作为首要的设计要求而不是一个抽象的概念。
在计算机视觉领域，我们一直在实践这样的要求。比如基于我们的 Face API 的面部识别。首当其冲的挑战就是我们该如何确保消除偏见。感谢 NIST 推出了可靠的评分标准，现在可以对不同种族人群的面部识别效果进行比较，从而确保我们的模型中不存在偏见，由此创造出的透明度标准也很有帮助。很快，我们将为客户提供帮助指南，告诉他们该如何根据自己的数据，去度量 Face API 的性能表现、设置正确的阈值，并对错误匹配进行平衡。这是一个例子。
在另一边，是对运行环境和有道德地使用 AI 的思考。我们必须意识到，有时候，即使在设计过程中完全心怀善意，如果没有在运行环境中植入能够保护隐私和民主自由的措施，最终也可能在无意间得到坏的结果。过去两年，我们一直专注于开发和执行严格的原则来管理我们的面部识别技术，自2018年以后，我们也在呼吁政府制定相应的严格保护法规。我们公开了我们在相关项目中采用的定义原则。我们还拒绝了很多不符合原则的项目。我们没有把面部识别技术卖给美国警局部门；我们也承诺，在美国出台符合人权的严格的全国性法律之前，不会将这项技术卖给美国警局部门。我们在积极呼吁制定严格的全国性法律，否则，我们将看到负责任的企业离开这个市场，让另一些人乘虚而入。
沈向洋：这的确是规范 AI 的有效途径。从设计原则和运行环境两方面，让人们必须认真思考 AI 设计的重要道德问题和责任。毫无疑问，计算机视觉也是其中一个重要的部分。那么，萨提亚，最后一个问题是，作为全球第一号公司的 CEO，你每天都在思考重大的机遇，以及如何帮助更多的人们。那么，能否告诉我们，你觉得今天最适合应用云计算、人工智能，以及计算机视觉的行业有哪些？
萨提亚：非常好的问题，Harry，因为从某种意义上看，这是我和微软的同事们思考最多的问题——下一步，我们怎样才能让数字技术产生更加深远的影响呢？想想看，过去10到15年的发展很显眼，但我想说有些应用场景其实要窄得多。消费级互联网方面的突破有很多，但如果你去看生产力以及生产力推动经济增长的曲线，观察它对小企业、大企业，对经济中的不同方面，对世界的不同地区带来的影响，就会发现，我们的增长率甚至还不如20世纪90年代到21世纪初由 PC 兴起带来的增长。如果你去看 Robert Gordon 对美国生产力的评价，就会看到，他明确指出在1870到1940年代之间有着惊人的进步。他还指出信息技术特别是个人电脑，带来了20世纪90年代到21世纪初的生产力增长。但从那之后，我们就没有实现生产力的明显增长。原因何在？其中可能有统计学和计量方法上的偏差。但我想说的是，我希望在下一个阶段，在人工智能、云计算，以及计算机视觉这样的技术的助力之下，我们能看到更多行业的普遍增长。
我对此满怀期待。比如说医疗健康领域。美国 GDP 的19%来自医疗健康领域。那么，我们是不是有可能在精准给药方面获得突破呢？我们可以利用临床数据、分子图像，在如何治疗病人以及管理诊疗方面真正取得突破。在这个一切皆可远程的世界里，自主性——无论是从内而外的还是自外向内的，例如我们在“现实即服务”的视频中看到的就是从内而外的经济形态，人在运动，物也在运动，有人在观察，并帮助确保这些人和物安全地运行，或者这些物体自动化地在现实世界里运行。能在现实世界中自主运动的物体将彻底改变交通运输，还有很多场合下的运行安全。零售业、商业，都将因此发生显著的改变。现在大家都在说线下、线上的全渠道，事实上新冠疫情的影响推动了诸如无接触购物、线上下单到店自取等解决方案的快速发展。我想这将是零售业的一个重大分水岭。而能确保食物安全的精准农业，也将是另一个大的领域。
还有一个让我兴奋的领域，与计算机视觉尤其相关，那就是无障碍设计。全球有10亿人因为身体不便无法参与到社会经济中来。如今我们所掌握的技术，像是机器阅读理解，可以帮助阅读障碍的人们读书；像微软开发的Seeing AI这样的工具，借助最新的计算机视觉突破，为视觉障碍人士讲述这个世界的模样；还有 EyeGaze 项目，能够通过追踪渐冻症（ALS）患者目光的运动，帮助他们打字并与他人沟通。我由衷希望，我们在人工智能领域能够取得真正的突破，带来更新、更强大的无障碍技术，帮助世界各地的这十几亿人，参与到社会活动和经济生活中来。
Harry: Good morning everyone. Welcome to CVPR 2020, and welcome to Seattle. Really appreciated everyone log on from everywhere around the world. My name is Harry Shum, I'm your host for the first keynote at CVPR. Thank you for joining the fireside chat with Satya Nadella, CEO of Microsoft Corporation. Hi, Satya, we are very honored to have you joining at the CVPR 2020. I have known you for almost 20 years and had the privilege to work for you over many years. but for our audience, first of all can you please share with us your journey from growing up in India, just studying computer science in the US, to joining Microsoft in 1992 and ultimately becoming the CEO of Microsoft.
Satya: First of all, Harry thank you so much, it's such a pleasure to join you and be a speaker at CVPR and I know we're living through unprecedented times on many dimensions, and so for this group to come together and talk about sort of the breakthroughs of computer vision and technology, and also the positive impact that technology can have in the world is really great, and it's an honor for me to join you all. To your point Harry, I mean, I grew up in the city of Hyderabad in India for most of my life and never did I think that I would be here. You know, as I was growing up in at that time in the mid 70’s, Hyderabad was a very different place than it is even today. But I must say I've been shaped by I think 2 very uniquely American things, American technology, the PC, even reaching me where I was growing up even letting me dream the dream so to speak, and then later on the American immigration policy that let me come here go to school. I had never been to western Bombay and then I showed up in Wisconsin, that was quite a shock to the system, but that said the opportunities that were given to me whether it's in school or at, you know, at Microsoft, really shaped a lot of obviously my life and who I am today and I'm very very thankful for all of it. And now I'm very grounded given that opportunity in the platform is what's the impact? As for a kid growing up in India to be able to live the life I've lived. I also know that that's a privilege and the question is for those of us who have that privilege what are we doing, in terms of creating technology and technology platforms in particular so that it can truly democratize access, so that's what I think fundamentally motivates me, it's what grounds even Microsoft's mission in which you well know about empowering every person and every organization on the planet to achieve more, is what excites me to come to work each day and make sure that we are doing our very best work.
Harry: That’s great. Thank you, Satya. Certainly, your story is very inspiring to many of us, including many computer vision researchers. So since we had the CVPR so let's get straight into computer vision. Microsoft has a long history in the computer vision space. In fact it just a few days ago you know my good friends, chairman of CVPR 2020, professor Gerard Medioni reminded me that Microsoft Research (MSR) has been the supporter of CVPR for almost 30 years. So can you share with us why Microsoft is so excited about computer vision, you know, what you are particularly excited about the computer vision now.
Satya: Absolutely. I mean, Harry you obviously were very much part of the Microsoft journey here all through even, starting with the early 2000’s where there was basic research on computer vision with skeleton tracking, you know, human sensing and that obviously got translated into the very first product into 2010, Kinect, which I think was a breakthrough product at and in fact it is the best-selling consumer product of the year, and then, you know, that led to I think our Cambridge team doing some amazing work around Kinect Fusion to be able to do full 3D reconstruction. Then the HoloDesk. I'll always remember the first time I saw that demo where you could do a full synthesis of a 3D environment, put a 3D object on it in this case on a desk, that you know open our eyes to what ultimately became HoloLens product in 2015. Of course now you know in 2019 we now even have the next generation HoloLens with HoloLens 2 with twice the field of view, twice the comfort, with amazing breakthrough applications. In parallel one of the other exciting things was to see how will now innovating on computer vision with the cloud in the edge. And so whether it's object recognition parity that we achieved in 2016 or the release of Azure Kinect with its onboard edge computing capability in 2018, now it creates between the cognitive services in Azure and Azure Kinect, you really have I think a fantastic tool chain to do very innovative computer vision research as well as applications. In fact, we've seen many ISVs and 3rd party developers apply the power of Azure connect to scenarios across industries including healthcare. Consider that roughly one million falls occur in the US hospitals each year. Ocuvera is an ISV that built a video-based monitoring solution that uses Azure Kinect to analyze the patients movement patterns and senses when they're trying to get out of bed unassisted. It alerts the nurse and caregivers before a fall occurs with 96% accuracy. Researchers at Nationwide Children's Hospital in Columbus, OH are using Kinect for early detection movement disorders like cerebral palsy in infants. The prototype they built uses a computer vision model to classify an infant’s movement as healthy or at risk so that caregiving and caregivers can intervene early, super important. And medical device company evolve is using it to improve the efficacy of physical therapy for stroke survivors. Their solution incorporates traditional therapeutic exercises in interactive games, which are tailored to each patient’s individual needs. And so as I'm thinking about CVPR and the future of computer vision there are three great breakthroughs that I'm excited about that will push the frontiers of the impact of computer vision in the real world. The first is what I would call 4D understanding. Sometimes you and I have talked about it as Reality-as-a-Service. So if you take a space like a hospital or manufacturing plant and especially around safety and quality. If we want to be able to reason over people, the place, and the things – and in real time using computer vision -- help ensure safety, that can be an amazing breakthrough and we're seeing that already in deployment in many cases. So I just wanted to roll the video to show you that in action.
Satya: The next area, Harry, that we're also excited about in this world of remote everything, is background matting, so if you want, say you're presenting at home, but we want to put you on stage, in fact we recently had the build developer conference and we were able to take presenters who just recorded themselves at home, and then we were able to in fact superimpose them in a virtual stage without needing that green screen. And that's I think again breakthroughs in computer vision that in fact we worked with the University of Washington on, so let's roll the video.
Satya: And Harry the third breakthrough I would say is what we demoed in fact just exactly probably a year ago around Holoportation. So this is a video which shows Julia White, who is one of our colleagues, presenting on stage in English. And there is a hologram of her speaking Japanese because of the combination of neural TTS, holographic computing coming together where you can, in fact, transcend time and space and language. And that, to me, is just an amazing breakthrough and I hope that these are the types of technologies that will get further accelerated.
Harry: Yes Satya, I think this is just so fantastic, you know, really bring back you know my fond memory of, you know, when we even started the first computer vision group, you know, many years ago. Now with great people like Rick Szeliski and Matthew Turk and others, and you also mentioned about the Microsoft Research has many different labs internationally like the Cambridge lab, in China and India. And many vision technologies coming from MSR have contributed to the successful products from Microsoft. Even more exciting to me, you know, from what you described is really this reinforcement between research and product., from the Kinect to more research, then to HoloLens now. You are talking about this amazing video from Julia White with the Holoportation. I think in the future with computer vision is very exciting.
Harry: Thank you Satya. So next I'd like to move from computer vision to artificial intelligence, to AI. So Microsoft has invested in AI for a long time. Especially since Bill Gates started Microsoft Research in 1991. I remember the first three groups from MSR were Natural Language Processing, Speech and Vision – all very much AI. And the recently you made it very clear that the cloud and AI are the key for Microsoft’s future growth. And it's also very exciting to see the AI supercomputer announced just last month at Build conference. So what is Microsoft's vision in AI going forward.
Satya: Yeah you know it's so fascinating as you said, in fact the first three groups that were created in Microsoft Research were Speech, Vision and Language. And here we are in 2020 talking about the same three with higher ambition and more success. But it's definitely great to see. I think that for me, one of the fundamental things that we are being at least doing in the last couple of years is looking at these large scale, you know, high-parameter-count models, that are yielding amazing results in particular on language. Especially with you know going from RNNs to these transformer models and the results we are seeing are tremendous. We released something called the Turing model with 17 billion parameters even when you are leading that team right here at Microsoft, and now we've gone you know and partnered even with Open AI and they of course have taken it to the next level with 170 billion parameters or what have you, and yeah it's pretty stunning to see that. And the thing that we did even was build specialized compute like, I mean, after all when you are doing these kind of large scale models, and with all of the challenges you have even with Moore's Law as we speak, we have had to sort of reinvent the system on which you can do these large scale learning. And it's great to see, we’ve built essentially the world’s supercomputer, an AI supercomputer right in Azure, on which we, you know Open AI is training these models, and in fact we're going to use these models as platforms, so others can use these models to do even tuning on top of it for their particular use cases. So we're very excited. I'm also excited about this with the multimodal nature. The dream is always being how can you have these AI training regimes that are learning from text, learning from speech, learning from images. So that there is better representation of knowledge. And so therefore I think that that's all things where I think in the years to come you'll see breakthroughs in systems layer, the modeling techniques and the training techniques and of course the breakthroughs around applications, which I think like in healthcare, you know, if I look at what needs to happen around precision medicine. You need breakthroughs where you're able to take the clinical notes, you're able to take the clinical images, and bring all of it together in order to really drive breakthroughs.
Harry: It’s really exciting. I just want to respond a little bit about that those large models you mentioned about from the Turing model with 17 billion parameters to GPT-3 with 175 billion parameters. And those models and you know Microsoft using Azure to really enable and empower in many research groups in the company to not only train their own models but even apply to the few-shot learning, one-shot learning, even no-shot learning. And the opportunities there is really really amazing. Satya, you know, now even with these keynotes you are doing, you know, we're doing this online that virtually, right now we're really living in this extraordinary time. And with all this excitement in computer vision and AI, I want you pick your brain and how you really think about that the AI vision technologies, and how they should be applied to help people, to deal with their daily lives, daily work during the pandemic right now and even more importantly are in the post COVID world. Can you share with us some examples, how Microsoft is doing it and how Microsoft is helping other people, especially you also mention about how we can help first responders?
Satya: You know absolutely, Harry, I mean. I think one of the things is this crisis has perhaps brought to the fore the need of digital technology and how it can help us at large as a society, both respond, recover as well as reimagine how we work and live going forward. I think all these three phases are going to be happening simultaneously and digital technology including computer vision is going to play a huge role in it. In fact, the video we saw earlier of how remote sensing, remote monitoring of a manufacturing plant, that reality as a service, to ensure safety and safety of operations with Digital Twins, can be very very important for in the manufacturing sector. In fact, the other the application we saw in manufacturing was, you know, we had to rejigger production lines in real time, for example, in order to make ventilators. And that means you needed expertise that was remote to help people who were reformatting those production lines, and that was all done through these HoloLens applications and Dynamics applications called Remote Guides, which is amazing to see. That is one example in manufacturing. In health care, in hospitals in the United Kingdom, we saw a very interesting use of both HoloLens and Teams in combination. So a doctor would go in care for the COVID patient wearing their HoloLens along with the rest of PPE. But the HoloLens, since it saw everything and transmitted that capture back through Teams so that all the other doctors could be outside the patients’ room, and yet you know give instructions on care. So there is really brought to fore both safety and collaboration in new ways in the front lines. And when it comes to medical teaching we saw Case Western Reserve University's Medical School, in fact, send their students home with HoloLens and were able to continue their education around the anatomy class, where the students and the teacher were able to fully ensure that that curriculum continued with full fidelity and it's amazing to see that type of breakthrough. So I think that we're going to start seeing breakthroughs in whether it's in manufacturing, whether it's in healthcare or an education where computer vision is going to be key to this world of remote everything.
Harry: Yeah indeed. I think that you said it very well. It's now, in a way, this acceleration of digital transformation. I think the pandemic just made us to think about that even furthermore. You know one thing I want to ask you is actually your view about the future of work and in this post COVID world. Did you see that you know people will do that more remotely and I think I’ve read somewhere you said that is unthinkable that we will only do things now from now on virtually? Even so some other companies are actually announcing more and more along this line.
Satya: I think that you know I think at a core level I think we will always want to have this capability of remoting every function inside of our enterprise. Whether it's remote sales, remote operations, remote support, remote work at scale. So I think that there's no question. Because I think it's going to be foundational to business continuity and resilience. I think we're also going to learn a lot Harry about what is the effectiveness of remote work for what role, in which industry, in which function. I'm positive that there will be certain roles, in certain industries which absolutely. In fact, if you look at even Microsoft even before pandemic, we had many roles that were 100% remote. And they were very productive. And there were certain roles that required people to come together to collaborate sometimes. So I would say I think what I see is instead of replacing one dogma with another dogma, what is more important is for us to exercise, in fact, the advantage we now have, that we now have the ability to remote anything, to then purpose fit it, so once we come out of the COVID-19 crisis we will use the flexibility to help people, not only their productivity but also their wellbeing, their needs, for example even in the in Seattle region where we have now sent a lot of people home, we're realizing that some people would rather have workspace at work once the COVID-19 crisis goes away, because they want dedicated workspace with good network connectivity, because we have some structural problems, even in the developed countries and in cities like Seattle where Wi-Fi and bandwidth constraints exist. So I want us to be grounded in the realities of people everywhere around the world, and what is the best way to exercise the flexibility, recognizing that remote work can be fantastic and empowering for many people.
Harry: That’s great to hear your thoughts, Satya. My takeaway is that whatever challenges we face, like the pandemic right now, we can always innovate out of this. And it takes a lot of effort, but we’ll go through this. So Satya let's get back to a Microsoft, back to the company. So you have done a marvelous job leading and transforming Microsoft in the past 6 plus years. You have always paid attention and made big bets on communities. Such as LinkedIn for business professionals and GitHub for developers. in fact, the CVPR is a big community for computer vision researchers and the industry practitioners. In fact, the last couple of years the attendance at the conferences approaching 10,000 people a year. So many of us would like to ask you some advice and wisdom, from you and your experience, you know, with fostering the communities. How each of us in the computer vision community can help each other, work together, grow together and the contribute for even better society.
Satya: Absolutely Harry. I mean to us, when we talk about empowering every person and every organization on the planet to achieve more, you know, the central point of that is through digital technology, how do we help people and institutions and communities people built, to thrive. That's been central to Microsoft's mission, is central to Microsoft's business model. We only do well if the world around us that we're serving, whether it's small businesses becoming more productive, large multinational companies in every part of the world becoming more competitive, public sector institutions are becoming more efficient, educational outcomes, health outcomes and communities at large are thriving. So it's very central to us. And in fact even the acquisitions you mentioned, whether it was the developer community with GitHub, the professionals with LinkedIn or even gamers with Minecraft are all communities we have the privilege to serve and it's about their outcomes that ground us. And similarly with computer vision, I think that there's a very rich history of computer vision researchers coming together to create technology breakthroughs, in fact, even though the work we're doing across academia as well as Microsoft Research in our product innovation, and you know Marc Pollefeys at ETH is a great example of that. He obviously is working with us on some of the product breakthroughs, he's also really creating a great world-class research center at ETH, that cross-pollination is core to community building. And it's also not just about computer vision, it’s about computer vision in the broader field of AI, it's the broader field of digital technologies. So I think we absolutely at Microsoft we want to be someone who can help bring that ecosystem platform thinking, so that these communities can come together, and more importantly, can work together with other communities to amplify their work.
Harry: That's really great to hear Satya, there’s a lot of wisdom there. And the one particular aspect of communities is actually international. Just like GitHub is international, LinkedIn is international, gamers are international, computer vision CVPR community is no exception. So, as you think about how you run this multinational company and now with even more communities in some lessons you have learned how can do better in terms of international collaboration.
Satya: Yeah in fact I would say, Harry, this next phase of international collaboration, whether it's for the research community at CVPR or even for a multinational company like Microsoft. I think one thing we have to be very grounded on is how is our work, collectively and individually, helping every community in every country. Talking about globalization, without its benefits being locally relevant, I think we lose permission. In fact if anything that is what we have learned, I think is in the last phase of globalization, many benefited but unfortunately many were left behind. So I think what we have to talk about now and Microsoft that's why I sort of ground us whenever I go to any part of the world I sort of look and say what has Microsoft's participation in that region, that country, led to the local surplus – those small businesses, large businesses, public sector, health outcomes, education outcomes. So unless and until we can concretely point to points of light, progress around local supplies, local employment, local skills. I think we as a global community of, whether it's researchers or multinational companies, will lose permission to even operate. So I think that that's what I would say we all need to recommit ourselves to that next phase of local impact while globally cooperating. I think the more we can think about it and frame what we do in those lines I think we will really be able to keep the progress going.
Harry: That's really great point, Satya, you might be also happy to hear that you know in fact in the CVPR community and even the large computing community like ICCV as always kind of thinking like that as well. That's why practically every University town and college town in the US has probably by now has hosted a CVPR in some year. And ICCV of course also rotates in different continents, as you said that only, you know, when we have local communities thriving can we actually have a global organization. That's fantastic. So Satya we still have a little bit of time, so I actually want to ask you a couple questions that we have received from the audience, you know, ahead of time. The first one is actually a very timely one I know you have been thinking about a lot. It's about AI, it’s about the ethical use of AI, it’s the responsibility of AI and we have noticed that you and the Microsoft have paid a lot of attention to space and even make some tough decisions along this line. So I wonder if you can share with the audience, you know, some of the lessons.
Satya: Absolutely Harry. I mean one of the things that we have been grounded in is when we create AI, how do we ensure we have a set of design principles that codify the core ethical considerations right into the engineering process. It's helpful in fact to think about AI safety and ethical use of AI has a design time as well as a runtime engineering consideration. So, for example on the design time, we started by establishing a set of concrete engineering principles, from fairness to accountability, to security, privacy and so on. So that we can ensure that is part of the design process, we have these as first-class constructs not just abstractions. Then in the context of computer vision we're practicing this. Take what's happening with facial recognition in our face API. One of the first challenges was how do we ensure that there is no bias. And thanks to NIST there are robust benchmarks now to measure the performance against the number of ethnic groups to ensure that there's no bias in our models, and to create a level of transparency that is very helpful. And soon we’ll be providing guidance to our customers and how they can measure the face API performance relative to their own data, to set the right thresholds and balance these false matches. So that's one example. Then on the other side are the runtime considerations and the ethical use of AI. And I think we will all have to realize that sometimes even with all the good intentions during design time, if you don't have safeguards in runtime protecting privacy and our Democratic freedoms for example, they could be really bad unintended consequences. And for the past two years, we've been focused on developing and implementing strong principles that govern our use of facial recognition, and we've also been calling for strong government regulation since 2018. We have published principles that we use to define which projects for example we’ll engage on. And there are many projects we say no to, which fall outside these guidelines. We do not sell our facial recognition technology to police departments in the United States today. And we made that commitment that we will not sell this technology to US police departments until there is a strong national law grounded in human rights. We need to use this moment to advocate for strong national law, otherwise we'll see responsible companies leaving the market and others stepping in. So we think that going forward, companies like ours need to do our best work around the practice of responsible AI, by building it into our engineering process and then when it comes to usage during runtime we have both principles we have to adhere to as well as government regulation around the world.
Harry: Right that's actually a great way to frame it. There’s actually designed time and the run time, that we have to really think about those important ethical and the responsibility problems for AI, and of course computer vision is very important part of that. So Satya my last question for you is that, as the CEO of the number-one company in this universe, and every day you must be thinking about the all those big opportunities and how you can help more people. So, tell us which industry you are most excited about right now to apply the cloud and the AI, and computer vision.
Satya: I mean it's like it's a great question, Harry, because in some sense one of the things I think a lot about and we at Microsoft think a lot about is, how can digital technology in this next phase have much broader impact. I mean if you think about it, the last 10, 15 years have been phenomenal. But I would claim that some of the usage scenarios are narrower. There have been a lot more consumer Internet breakthroughs. But if you look at the broad arc of productivity and productivity leading to economic growth that is helping small businesses, large businesses, helping every sector of the economy, in every part of the world, we've not really achieved even some of the growth rates that were there in the 90’s and the early 2000’s, thanks to the PC. I mean in fact if you go to Robert Gordon's critique of productivity in the United States, in particular, he sorts of points out that between 1870 and 1940 there was amazing progress, and he will even claim that you know the information technology especially the PC led to productivity growth in the 90’s in the early 2000’s. But since then, we've not had great productivity gain, why is it? Some of it could be statistics and how we measure it. But that said, I hope that in this next phase, thanks to AI, thanks to cloud and technologies like this in computer vision, we can see broad sectorial growth. I'm excited about, for example, healthcare. After all, in the US 19% of our GDP is in healthcare. So can we really have breakthroughs in precision medicine, where we are able to take clinical data, the molecular profile and then really make breakthroughs in how people are treated and how care is administered. Autonomy, after all, in a remote everything world, whether it's inside-out or outside-in, like that video we saw of Reality-as-a-Service, is what I described as inside-out economy, which is people are moving, things are moving, and someone is observing and helping to keep things safe. Or if it’s an autonomous object that is moving in the real world. I think autonomy will change transportation as well as operational safety in many many locations. Retail, commerce, we know that's going to be significantly changing even. There it's going to be omni-channel, right? It's offline and online, in fact even COVID-19 has brought forth some of these solutions like contactless shopping, curbside pickup. These are going to be I think will have a huge ramification around how retail is done. Precision agriculture for food security, because that's another huge area. The other area where I'm very excited about is even with computer vision, is accessibility. We have a billion people in the world who still don't participate in our society and economy because of their accessibility needs. And I think that we now have technologies, whether it's machine reading and comprehension technologies for people with dyslexia, so that they can read, or it's Seeing AI-like tools that we built are using latest breakthroughs in computer vision for someone with visual impairment to be able to interpret the world. So, or eye gaze, someone with a ALS to be able to type and communicate with just the gaze of their eyes. So I think that, I hope, we will have real breakthroughs in AI that even bring, you know, newer and greater accessibility technology empowering the billion-plus people in the world who need to participate in our societies and economies. So I'm excited about all of these.
Harry: Indeed it’s really really exciting and you know I picked up the core of your thesis here is productivity. It’s productivity for everyone and everywhere.
Satya: Yeah, that is correct.
Harry: Satya, my real last question, my real last question for you, that I have to ask you for this audience, is that, are you, is Microsoft going to hire more computer vision people.
Satya: We always are there to hire more computer vision people and quite frankly to partner even with all the computer vision people out there in this community. Again, thank you so much Harry, for the opportunity. I think that this is a very important community. I'm so glad even with all the constraints that this community is getting together to talk about the advances. And as you rightfully said, to me what matters the most is not just our technology and technology breakthrough for its own sake, but I know what motivates everyone at this conference is how is this technology leading to that economic growth that is more inclusive, more equitable and really helping everyone on the planet get better at achieving their dreams. And that I think is what each of us cares deeply about and it's fantastic to have this opportunity to talk to you all.
Harry: Thank you very much Satya, it’s so wonderful. I really appreciate your time joining us and thanks everyone online for tuning in, and we’ll have a wonderful future with computer vision. Thank you!
本文经授权转载自“微软研究院AI头条”微信公众号 ，作者微软亚洲研究院 。感谢原作者的分享。