s data without privacy consideration and other owners' data with differential privacy guarantees. This setting was initiated in [Jain et al., 2021] with a focus on linear regressions. In this paper, we study this setting for stochastic convex optimization (SCO). We present an algorithm that is a variant of DP-SGD [Song et al., 2013; Abadi et al., 2016] and provides theoretical bounds on its population loss. We compare our algorithm to several baselines and discuss for what parameter setups our algorithm is more preferred. We also empirically study joint differential privacy in the multi-class classification problem over two public datasets. Our empirical findings are well-connected to the insights from our theoretical results. ","authors":[{"id":"617855dd60a9657359707126","name":"Yangsibo Huang"},{"name":"Haotian Jiang"},{"name":"Daogao Liu"},{"id":"53f44a42dabfaee02ad28ae5","name":"Mohammad Mahdian"},{"id":"562c7b6345cedb3398c36331","name":"Jieming Mao"},{"id":"53f43122dabfaeb22f437c22","name":"Vahab Mirrokni"}],"create_time":"2023-05-26T05:02:17.407Z","hashs":{"h1":"ldojd","h3":"p"},"id":"64702deed68f896efa520032","num_citation":0,"pdf":"https:\u002F\u002Fcz5waila03cyo0tux1owpyofgoryroob.aminer.cn\u002F31\u002FF5\u002FCC\u002F31F5CCD8BE2C2373A494AAF08E99241E.pdf","title":"Learning across Data Owners with Joint Differential Privacy","urls":["db\u002Fjournals\u002Fcorr\u002Fcorr2305.html#abs-2305-15723","https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2305.15723","https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.15723"],"venue":{"info":{"name":"CoRR"},"volume":"abs\u002F2305.15723"},"versions":[{"id":"64702deed68f896efa520032","sid":"2305.15723","src":"arxiv","year":2023},{"id":"64a2964bd68f896efa298430","sid":"journals\u002Fcorr\u002Fabs-2305-15723","src":"dblp","year":2023}],"year":2023},{"abstract":" Federated learning allows distributed users to collaboratively train a model while keeping each user's data private. Recently, a growing body of work has demonstrated that an eavesdropping attacker can effectively recover image data from gradients transmitted during federated learning. However, little progress has been made in recovering text data. In this paper, we present a novel attack method FILM for federated learning of language models (LMs). For the first time, we show the feasibility of recovering text from large batch sizes of up to 128 sentences. Unlike image-recovery methods that are optimized to match gradients, we take a distinct approach that first identifies a set of words from gradients and then directly reconstructs sentences based on beam search and a prior-based reordering strategy. We conduct the FILM attack on several large-scale datasets and show that it can successfully reconstruct single sentences with high fidelity for large batch sizes and even multiple sentences if applied iteratively. We evaluate three defense methods: gradient pruning, DPSGD, and a simple approach to freeze word embeddings that we propose. We show that both gradient pruning and DPSGD lead to a significant drop in utility. However, if we fine-tune a public pre-trained LM on private text without updating word embeddings, it can effectively defend the attack with minimal data utility loss. Together, we hope that our results can encourage the community to rethink the privacy concerns of LM training and its standard practices in the future. ","authors":[{"id":"63ab9cb7cb0eafdb3179bc2b","name":"Samyak Gupta","org":"Princeton University","orgid":"5f71b2831c455f439fe3c663","orgs":["Princeton University"]},{"id":"617855dd60a9657359707126","name":"Yangsibo Huang","org":"Princeton University","orgid":"5f71b2831c455f439fe3c663","orgs":["Princeton University"]},{"id":"54590e85dabfaeb0fe2cfa71","name":"Zexuan Zhong","org":"Princeton University","orgid":"5f71b2831c455f439fe3c663","orgs":["Princeton University"]},{"id":"542fc54bdabfae0c059c4744","name":"Tianyu Gao","org":"Princeton University","orgid":"5f71b2831c455f439fe3c663","orgs":["Princeton University"]},{"id":"6142e5629e795e72b1876f4a","name":"Kai Li"},{"id":"562ce7f545cedb3398cfa6d0","name":"Danqi Chen","org":"Department of Computer Science, Princeton University","orgs":["Department of Computer Science, Princeton University"]}],"citations":{"google_citation":1,"last_citation":1},"create_time":"2022-05-18T13:47:04.687Z","hashs":{"h1":"rptfl","h3":"lm"},"id":"628464665aee126c0facb2e2","keywords":["Federated learning","Privacy","Natural Language Processing"],"lang":"en","num_citation":18,"pdf":"https:\u002F\u002Fcz5waila03cyo0tux1owpyofgoryroob.aminer.cn\u002F4B\u002F9F\u002F5D\u002F4B9F5DFE307A2ECB9D33D0CF73D7DCDF.pdf","pdf_src":["https:\u002F\u002Farxiv.org\u002Fpdf\u002F2205.08514","https:\u002F\u002Fapi.openreview.net\u002Fpdf\u002F0dd93f164106bc32859fb8d3252a26104b52c600.pdf"],"title":"Recovering Private Text in Federated Learning of Language Models","update_times":{"u_c_t":"2023-10-24T05:50:43.059Z","u_v_t":"2023-04-12T18:42:54.829Z"},"urls":["db\u002Fconf\u002Fnips\u002Fneurips2022.html#GuptaHZGLC22","http:\u002F\u002Fpapers.nips.cc\u002Fpaper_files\u002Fpaper\u002F2022\u002Fhash\u002F35b5c175e139bff5f22a5361270fce87-Abstract-Conference.html","https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.08514","https:\u002F\u002Fopenreview.net\u002Fforum?id=dqgzfhHd2-"],"venue":{"info":{"name":"NeurIPS 2022"}},"venue_hhb_id":"5ea1e340edb6e7d53c011a4c","versions":[{"id":"628464665aee126c0facb2e2","sid":"2205.08514","src":"arxiv","year":2022},{"id":"63a413f690e50fcafd6d19d9","sid":"neurips2022#97020","src":"conf_neurips","year":2022},{"id":"6479e3acd68f896efa4e54c5","sid":"conf\u002Fnips\u002FGuptaHZGLC22","src":"dblp","year":2022}],"year":2022},{"abstract":"Dataset auditing for machine learning (ML) models is a method to evaluate if a given dataset is used in training a model. In a Federated Learning setting where multiple institutions collaboratively train a model with their decentralized private datasets, dataset auditing can facilitate the enforcement of regulations, which provide rules for preserving privacy, but also allow users to revoke authorizations and remove their data from collaboratively trained models. This paper first proposes a set of requirements for a practical dataset auditing method, and then present a novel dataset auditing method called Ensembled Membership Auditing (EMA). Its key idea is to leverage previously proposed Membership Inference Attack methods and to aggregate data-wise membership scores using statistic testing to audit a dataset for a ML model. We have experimentally evaluated the proposed approach with benchmark datasets, as well as 4 X-ray datasets (CBIS-DDSM, COVIDx, Child-XRay, and CXR-NIH) and 3 dermatology datasets (DERM7pt, HAM10000, and PAD-UFES-20). Our results show that EMA meet the requirements substantially better than the previous state-of-the-art method. Our code is at: https:\u002F\u002Fgithub.com\u002FHazelsuko07\u002FEMA.","authors":[{"id":"617855dd60a9657359707126","name":"Yangsibo Huang","org":"Electrical and Computer Engineering\u002FComputer Science Department, Princeton University, NJ, USA","orgid":"5f71b2831c455f439fe3c663","orgs":["Electrical and Computer Engineering\u002FComputer Science Department, Princeton University, NJ, USA"]},{"id":"63808149fc451b2d602b75dd","name":"Chun-Yin Huang","org":"Electrical and Computer Engineering Department, University of British Columbia, Vancouver, BC, Canada","orgid":"5f71b28e1c455f439fe3cad2","orgs":["Electrical and Computer Engineering Department, University of British Columbia, Vancouver, BC, Canada"]},{"id":"619319d06750f87a2dbb076a","name":"Xiaoxiao Li","org":"Electrical and Computer Engineering Department, University of British Columbia, Vancouver, BC, Canada","orgid":"5f71b28e1c455f439fe3cad2","orgs":["Electrical and Computer Engineering Department, University of British Columbia, Vancouver, BC, Canada"]},{"id":"6142e5629e795e72b1876f4a","name":"Kai Li","org":"Electrical and Computer Engineering\u002FComputer Science Department, Princeton University, NJ, USA","orgid":"5f71b2831c455f439fe3c663","orgs":["Electrical and Computer Engineering\u002FComputer Science Department, Princeton University, NJ, USA"]}],"citations":{"google_citation":0,"last_citation":0},"create_time":"2022-11-18T14:01:58.826Z","doi":"10.1109\u002FTMI.2022.3220706","hashs":{"h1":"damct","h3":"mlm"},"id":"6377654990e50fcafda9858a","issn":"1558-254X","lang":"en","num_citation":0,"pages":{"end":"1","start":"1"},"title":"A Dataset Auditing Method for Collaboratively Trained Machine Learning Models.","update_times":{"u_a_t":"2022-11-21T12:18:24.542Z","u_c_t":"2023-03-29T09:24:19.303Z"},"urls":["https:\u002F\u002Fieeexplore.ieee.org\u002Fdocument\u002F9951407","http:\u002F\u002Fdx.doi.org\u002F10.1109\u002Ftmi.2022.3220706","https:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpubmed\u002F36378795"],"venue":{"info":{"name":"IEEE transactions on medical imaging","name_s":"IEEE Trans Med Imaging"},"issue":"7","volume":"PP"},"venue_hhb_id":"5ea19a6cedb6e7d53c00abe5","versions":[{"id":"6377654990e50fcafda9858a","sid":"36378795","src":"pubmed","vsid":"8310780","year":2022},{"id":"648d3a30d68f896efabd3be2","sid":"10.1109\u002Ftmi.2022.3220706","src":"crossref","year":2022},{"id":"64a0a612d68f896efa9ea399","sid":"9951407","src":"ieee","year":2023}],"year":2022}],"profilePubsTotal":22,"profilePatentsPage":0,"profilePatents":null,"profilePatentsTotal":null,"profilePatentsEnd":false,"profileProjectsPage":1,"profileProjects":{"success":true,"msg":"","data":null,"log_id":"2ZCF4kEzEeLZJhEEFZSNBPqkM37"},"profileProjectsTotal":0,"newInfo":null,"checkDelPubs":[]}};