Actions Speak Louder than Words: Entity-Sensitive Privacy Policy and Data Flow Analysis with PoliCheck

USENIX Security Symposium, pp. 985-1002, 2020.

Cited by: 0|Bibtex|Views77
EI
Other Links: academic.microsoft.com|dblp.uni-trier.de
Weibo:
Validation Methodology: For validation, one-of-three authors began by reading through the sentences that were extracted from each privacy policy to ensure correctness of policy statement extraction

Abstract:

Identifying privacy-sensitive data leaks by mobile applications has been a topic of great research interest for the past decade. Technically, such data flows are not “leaks” if they are disclosed in a privacy policy. To address this ...More

Code:

Data:

0
Introduction
  • Privacy is a long-standing open research challenge for mobile applications.
  • Subsequent empirical studies [16, 18, 23,24,25, 27] have demonstrated pervasive and continual disclosure of privacy-sensitive information such as device identifiers and geographic location.
  • In the case of mobile applications, data collection and sharing are often considered acceptable if it is disclosed in the privacy policy for the application.
  • While there have been several manual analyses of application privacy policies [9, 20], it is hard to computationally reason about what privacy policies say, and how applications adhere to them
Highlights
  • Privacy is a long-standing open research challenge for mobile applications
  • We propose POLICHECK, which provides an entity-sensitive flow-to-policy consistency model to determine if an application’s privacy policy discloses relevant data flows
  • We model an application a as a tuple, a = (F, P), where F is a set of data flows observed for the application and P is a set of sharing and collection policy statements extracted from the application’s privacy policy
  • Validation Methodology: For validation, one-of-three authors began by reading through the sentences that were extracted from each privacy policy to ensure correctness of policy statement extraction
  • Several efforts have sought to more fully automate the detection of privacy leaks by contrasting data flows with the application’s privacy policy
  • We proposed POLICHECK and an entity-sensitive flow-to-policy consistency model
Methods
  • The core contribution of this paper is the formalization and enhancement of flow-to-policy consistency analysis with the knowledge of which entities collect information.
  • Data Flow Extraction: AppCensus [6] identifies privacy sensitive data flows in Android apps using the approach proposed by Reyes et al [27].
  • Reyes et al instrument the Android operating system to log access to sensitive resources and use the Android VPN API to intercept and log network traffic
  • They exercise the application with Monkey [5] and collect both the system and network logs.
  • Domain-to-Entity Mapping: While data flows are represented as a type of data being transmitted to a domain or IP address, privacy policies discuss data flows using terms for entities instead of domains (e.g., cdp.cloud.unity3d.com
Results
  • The authors present the evaluation of POLICHECK and additional findings from the evaluation.
  • The authors read through the rest of the policy to determine if any statements disclose the data flow.
  • If it is not apparent and there is any uncertainty, the authors mark the flow as “uncertain” to avoid bias.
  • Note that the authors marked 27 flows as uncertain, resulting in 153 data flows across 151 applications Results: POLICHECK achieves an overall 90.8% precision (139/153) for performing flow-to-policy consistency analysis.
  • POLICHECK had 35 true positives and 3 false positives
Conclusion
  • Privacy threats from mobile applications are arguably a greater risk than malware for most smartphone users.
  • Several efforts have sought to more fully automate the detection of privacy leaks by contrasting data flows with the application’s privacy policy.
  • These works have a fundamental limitation: they do not consider the entity receiving the data.
  • POLICHECK provides the highest-precision method to date to determine if apps properly disclose their privacy-sensitive behaviors
Summary
  • Introduction:

    Privacy is a long-standing open research challenge for mobile applications.
  • Subsequent empirical studies [16, 18, 23,24,25, 27] have demonstrated pervasive and continual disclosure of privacy-sensitive information such as device identifiers and geographic location.
  • In the case of mobile applications, data collection and sharing are often considered acceptable if it is disclosed in the privacy policy for the application.
  • While there have been several manual analyses of application privacy policies [9, 20], it is hard to computationally reason about what privacy policies say, and how applications adhere to them
  • Objectives:

    While dynamic analysis may under-approximate data flows if sufficient code coverage is not achieved during testing, the goal was to optimize for precision over recall.
  • Based on the output of the entity-insensitive consistency analysis, the authors aim to measure the potential error rate
  • Methods:

    The core contribution of this paper is the formalization and enhancement of flow-to-policy consistency analysis with the knowledge of which entities collect information.
  • Data Flow Extraction: AppCensus [6] identifies privacy sensitive data flows in Android apps using the approach proposed by Reyes et al [27].
  • Reyes et al instrument the Android operating system to log access to sensitive resources and use the Android VPN API to intercept and log network traffic
  • They exercise the application with Monkey [5] and collect both the system and network logs.
  • Domain-to-Entity Mapping: While data flows are represented as a type of data being transmitted to a domain or IP address, privacy policies discuss data flows using terms for entities instead of domains (e.g., cdp.cloud.unity3d.com
  • Results:

    The authors present the evaluation of POLICHECK and additional findings from the evaluation.
  • The authors read through the rest of the policy to determine if any statements disclose the data flow.
  • If it is not apparent and there is any uncertainty, the authors mark the flow as “uncertain” to avoid bias.
  • Note that the authors marked 27 flows as uncertain, resulting in 153 data flows across 151 applications Results: POLICHECK achieves an overall 90.8% precision (139/153) for performing flow-to-policy consistency analysis.
  • POLICHECK had 35 true positives and 3 false positives
  • Conclusion:

    Privacy threats from mobile applications are arguably a greater risk than malware for most smartphone users.
  • Several efforts have sought to more fully automate the detection of privacy leaks by contrasting data flows with the application’s privacy policy.
  • These works have a fundamental limitation: they do not consider the entity receiving the data.
  • POLICHECK provides the highest-precision method to date to determine if apps properly disclose their privacy-sensitive behaviors
Tables
  • Table1: Types of conflicting policy statements in a privacy policy: narrowing definitions (N1−4) and logical contradictions from PolicyLint [<a class="ref-link" id="c4" href="#r4">4</a>], and flow-sensitive contradictions
  • Table2: Data types tracked via dynamic analysis
  • Table3: Data Flows and Apps for each Disclosure Type
  • Table4: Sensitivity Analysis of Flow-to-Policy Consistency: Entity-insensitive models frequently misclassify data flows
Download tables as Excel
Related work
  • In recent years, there has been an increased focus on analyzing flow-to-policy inconsistencies in mobile applications. The works differ in how app behavioral flows and privacy policies are analyzed. While much of the prior works [29, 34, 38] use Android’s application program interface (API) calls to evaluate privacy breaches, Wang et al [32] extended the taint sources to include sensitive data entered through an app’s UI. For policy analysis, Zimmeck et al [38] and Yu et al [34] rely on keyword-based approaches, of using bi-grams and verb modifiers respectively, to infer the privacy policies, while Slavin et al [29] and Wang et al [32] use crowdsourced ontologies for policy analysis. POLICHECK makes significant advancement over all these prior works by considering DNS domains of data-receiving entity for comprehensive entity-sensitive analysis. Accuracy of the analysis is further improved by considering entities, statement sentiment, and accounting for different semantic granularities and internal contradictions. Our empirical results (Section 5) further demonstrate the effectiveness of these capabilities.
Funding
  • This work is supported in part by NSF grant CNS-1513690
  • Any findings and opinions expressed in this material are those of the authors and do not necessarily reflect the views of the funding agencies
Reference
  • California Consumer Privacy Act (CCPA). https://oag.ca.gov/privacy/ccpa.
    Findings
  • Children’s Online Privacy Protection Rule. https://www.ftc.gov/enforcement/rules/rulemakingregulatory-reform-proceedings/childrens-onlineprivacy-protection-rule.
    Locate open access versionFindings
  • The EU General Data Protection Regulation. https://eugdpr.org.
    Findings
  • Benjamin Andow, Samin Yaseer Mahmud, Wenyu Wang, Justin Whitaker, William Enck, Bradley Reaves, Kapil Singh, and Tao Xie. PolicyLint: Investigating Internal Privacy Policy Contradictions on Google Play. In Proceedings of the USENIX Security Symposium, August 2019.
    Google ScholarLocate open access versionFindings
  • Android Studio. UI/Application Exerciser Monkey. https://developer.android.com/studio/test/monkey.html, 2019. Accessed: May 15, 2019.
    Findings
  • AppCensus AppSearch. appcensus.io/.
    Google ScholarFindings
  • Steven Arzt, Siegfried Rasthofer, Christian Fritz, Eric Bodden, Alexandre Bartel, Jacques Klein, Yves Le Traon, Damien Octeau, and Patrick McDaniel. FlowDroid: Precise Context, Flow, Field, Object-sensitive and Lifecycle-aware Taint Analysis for Android Apps. In Proceedings of the ACM Conference on Programming Language Design and Implementation (PLDI), 2014.
    Google ScholarLocate open access versionFindings
  • David Barrera, H. Günes Kayacik, Paul C. van Oorschot, and Anil Somayaji. A Methodology for Empirical Analysis of Permission-based Security Models and Its Application to Android. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS), October 2010.
    Google ScholarLocate open access versionFindings
  • J. Bowers, B. Reaves, I. Sherman, P. Traynor, and K. Butler. Regulators, Mount Up! Analysis of Privacy Policies for Mobile Money Applications. In Proceedings of the USENIX Symposium on Usable Privacy and Security (SOUPS), 2017.
    Google ScholarLocate open access versionFindings
  • Manuel Egele, Christopher Kruegel, Engin Kirda, and Giovanni Vigna. PiOS: Detecting Privacy Leaks in iOS Applications. In Proceedings of the ISOC Network and Distributed System Security Symposium (NDSS), February 2011.
    Google ScholarLocate open access versionFindings
  • William Enck, Peter Gilbert, Byung-Gon Chun, Landon P. Cox, Jaeyeon Jung, Patrick McDaniel, and Anmol N. Sheth. TaintDroid: An Information-Flow Tracking System for Realtime Privacy Monitoring on Smartphones. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI), October 2010.
    Google ScholarLocate open access versionFindings
  • William Enck, Damien Octeau, Patrick McDaniel, and Swarat Chaudhuri. A Study of Android Application Security. In Proceedings of the USENIX Security Symposium, August 2011.
    Google ScholarLocate open access versionFindings
  • Adrienne Porter Felt, Erika Chin, Steve Hanna, Dawn Song, and David Wagner. Android Permissions Demystified. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS), October 2011.
    Google ScholarLocate open access versionFindings
  • Xinming Ou Fengguo Wei, Sankardas Roy and Robby. Amandroid: A Precise and General Inter-component Data Flow Analysis Framework for Security Vetting of Android Apps. In Proceedings of the ACM Conference on Computer and Communications Security (CCS), November 2014.
    Google ScholarLocate open access versionFindings
  • In the Matter of Goldenshores Technologies, LLC, and Erik M. Geidl. https://www.ftc.gov/enforcement/casesproceedings/132-3087/goldenshores-technologies-llcerik-m-geidl-matter.
    Locate open access versionFindings
  • Michael Grace, Wu Zhou, Xuxian Jiang, and AhmadReza Sadeghi. Unsafe Exposure Analysis of Mobile In-App Advertisements. In Proceedings of the ACM Conference on Security and Privacy in Wireless and Mobile Networks (WiSec), 2012.
    Google ScholarLocate open access versionFindings
  • Catherine Han, Irwin Reyes, Amit Elazari Bar On, Joel Reardon, Álvaro Feal, Kenneth A. Bamberger, Serge Egelman, and Narseo Vallina-Rodriguez. Do You Get What You Pay For? Comparing The Privacy Behaviors of Free vs. Paid Apps. In Workshop on Technology and Consumer Protection (ConPro), May 2019.
    Google ScholarLocate open access versionFindings
  • Jin Han, Qiang Yan, Debin Gao, Jianying Zhou, and Robert Deng. Comparing Mobile Privacy Protection through Cross-Platform Applications. In Proceedings of the ISOC Network and Distributed Systems Symposium (NDSS), February 2013.
    Google ScholarLocate open access versionFindings
  • Hamza Harkous, Kassem Fawaz, Rémi Lebret, Florian Schaub, Kang G. Shin, and Karl Aberer. Polisis: Automated Analysis and Presentation of Privacy Policies Using Deep Learning. In Proceedings of the USENIX Security Symposium, 2018.
    Google ScholarLocate open access versionFindings
  • K. Butler J. Bowers, I. Sherman and P. Traynor. Characterizing Security and Privacy Practices in Emerging Digital Credit Applications. In Proceedings of the ACM Conference on Security and Privacy in Wireless and Mobile Networks (WiSec), 2019.
    Google ScholarLocate open access versionFindings
  • Ehimare Okoyomon, Nikita Samarin, Primal Wijesekera, Amit Elazari Bar On, Narseo Vallina-Rodriguez, Irwin Reyes, Álvaro Feal, and Serge Egelman. On The Ridiculousness of Notice and Consent: Contradictions in App Privacy Policies. In Workshop on Technology and Consumer Protection (ConPro), May 2019.
    Google ScholarLocate open access versionFindings
  • Hao Peng, Chris Gates, Bhaskar Sarma, Ninghui Li, Alan Qi, Rahul Potharaju, Cristina Nita-Rotaru, and Ian Molloy. Using Probabilistic Generative Models for Ranking Risks of Android Apps. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS), October 2012.
    Google ScholarLocate open access versionFindings
  • Abbas Razaghpanah, Rishab Nithyanand, Narseo Vallina-Rodriguez, Srikanth Sundaresan, Mark Allman, Christian Kreibich, and Phillipa Gill. Apps, Trackers, Privacy, and Regulators: A Global Study of the Mobile Tracking Ecosystem. In Proceedings of the Network and Distributed System Security Symposium (NDSS), 2018.
    Google ScholarLocate open access versionFindings
  • Joel Reardon, Alvaro Feal, Primal Wijesekera, Amit Elazari Bar On, Narseo Vallina-Rodriguez, and Serge Egelman. 50 Ways to Leak Your Data: An Exploration of Apps’ Circumvention of the Android Permission System. In Proceedings of the USENIX Security Symposium, 2019.
    Google ScholarLocate open access versionFindings
  • Jingjing Ren, Ashwin Rao, Martina Lindorfer, Arnaud Legout, and David R. Choffnes. ReCon: Revealing and Controlling Privacy Leaks in Mobile Network Traffic. In Proceedings of the ACM SIGMOBILE MobiSys, pages 361–374, 2016.
    Google ScholarLocate open access versionFindings
  • Irwin Reyes, Primal Wiesekera, Abbas Razaghpanah, Joel Reardon, Narseo Vallina-Rodriguez, Serge Egelman, and Christian Kreibich. “Is Our Children’s Apps Learning?” Automatically Detecting COPPA Violations. In Workshop on Technology and Consumer Protection (ConPro), May 2017.
    Google ScholarLocate open access versionFindings
  • Irwin Reyes, Primal Wijesekera, Joel Reardon, Amit Elazari Bar On, Abbas Razaghpanah, Narseo VallinaRodriguez, and Serge Egelman. “Won’t Somebody Think of the Children?” Examining COPPA Compliance at Scale. In Proceedings on Privacy Enhancing Technologies (PETS), July 2018.
    Google ScholarLocate open access versionFindings
  • Sanae Rosen, Zhiyun Qian, and Z. Morely Mao. AppProfiler: A Flexible Method of Exposing Privacy-related Behavior in Android Applications to End Users. In Proceedings of the ACM Conference on Data and Application Security and Privacy (CODASPY, February 2013.
    Google ScholarLocate open access versionFindings
  • Rocky Slavin, Xiaoyin Wang, Mitra Bokaei Hosseini, James Hester, Ram Krishnan, Jaspreet Bhatia, Travis D. Breaux, and Jianwei Niu. Toward a Framework for Detecting Privacy Policy Violations in Android Application Code. In Proceedings of the International Conference on Software Engineering (ICSE), 2016.
    Google ScholarLocate open access versionFindings
  • https://www.ftc.gov/enforcement/casesproceedings/132-3078/snapchat-inc-matter.
    Locate open access versionFindings
  • John W. Stamey and Ryan A. Rossi. Automatically Identifying Relations in Privacy Policies. In Proceedings of the ACM International Conference on Design of Communication (SIGDOC), 2009.
    Google ScholarLocate open access versionFindings
  • Xiaoyin Wang, Xue Qin, Mitra Bokaei Hosseini, Rocky Slavin, Travis D. Breaux, and Jianwei Niu. GUILeak: Tracing Privacy Policy Claims on User Input Data for Android Applications. In Proceedings of the International Conference of Software Engineering (ICSE), 2018.
    Google ScholarLocate open access versionFindings
  • Primal Wijesekera, Arjun Baokar, Ashkan Hosseini, Serge Egelman, David Wagner, and Konstantin Beznosov. Android Permissions Remystified: A Field Study on Contextual Integrity. In Proceedings of the USENIX Security Symposium, August 2015.
    Google ScholarLocate open access versionFindings
  • Le Yu, Xiapu Luo, Xule Liu, and Tao Zhang. Can We Trust the Privacy Policies of Android Apps? In Proceedings of the IEEE/IFIP Conference on Dependable Systems and Networks (DSN), 2016.
    Google ScholarLocate open access versionFindings
  • Razieh Nokhbeh Zaeem, Rachel L. German, and K. Suzanne Barber. PrivacyCheck: Automatic Summarization of Privacy Policies Using Data Mining. ACM Transactions on Internet Technology (TOIT), 2013.
    Google ScholarLocate open access versionFindings
  • Yuan Zhang, Min Yang, Bingquan Xu, Zhemin Yang, Guofei Gu, Peng Ning, X. Sean Wang, and Binyu Zang. Vetting Undesirable Behaviors in Android Apps with Permission Use Analysis. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS), November 2013.
    Google ScholarLocate open access versionFindings
  • Sebastian Zimmeck and Steven M. Bellovin. Privee: An Architecture for Automatically Analyzing Web Privacy Policies. In Proceedings of the USENIX Security Symposium, 2014.
    Google ScholarLocate open access versionFindings
  • Sebastian Zimmeck, Ziqi Wang, Lieyong Zou, Roger Iyengar, Bin Liu, Florian Schaub, Shomir Wilson, Norman Sadeh, Steven M. Bellovin, and Joel Reidenberg. Automated Analysis of Privacy Requirements for Mobile Apps. In Proceedings of the ISOC Network and Distributed System Security Symposium (NDSS), 2017.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments