Research Methodology, Pre-Installed Android App Analysis

Explainer
Android-Privacy is not a luxury

Abstract

Over the past few years, smart phones have become incredibly inexpensive, connecting millions of people to the internet for the first time. While growing connectivity is undeniably positive, some device vendors have recently come under scrutiny for harvesting user data and invasive private data collection practices.

Due to the open-source nature of the Android operating system vendors can add pre-installed apps (often called “bundled apps” or "bloatware") to mobile phones. In 2019, the first large-scale study of pre-installed software on Android devices from more than 200 vendors has shown that many of these apps come with security vulnerabilities, facilitate potentially harmful behaviours and backdoored access to sensitive data and services without user consent or awareness. The study concludes that the supply chain around Android's open source model lacks transparency.

Pre-installed apps are not available on the Google Play Store and therefore often do not receive the same level of audit. They also come with pre-accepted permissions (i.e using a custom permission to access users’ location data outside of the location services subsystem), which means that users will be unaware of the data that these apps collect. Finally, many of these apps send data to third parties, both large, and well-known companies such as Google, Facebook, Tencent and Baidu, as well as lesser known companies in the advertising and tracking ecosystem.

Objective

The objective of this ongoing research is to build on existing research and illustrate what the lack of transparency around the Android supply chain looks like in practice for a select number of low-end phones. Our goal is to draw further attention to the issue globally and to showcase how privacy threats often disproportionately affect those with limited resources. The ultimate goal is put pressure on Google and telecommunications companies to better protect the privacy and security of their users – especially those in emerging markets and with limited resources.

Acquisition of data

We used the Android Developer Bridge (ADB) to copy over any file in the APK format (Android application package) in /vendor, /system/persistent, /system/priv-app/ and /system/app.

We also established if /data/app is populated. Finally, we extracted the certificate files in /system/etc/security/cacerts/ for later analysis. Once we have extracted the APK files we ran a minimisation process, which is intended to remove apps that exist within all versions of Android (e.g the phone dialer, the settings app, the clock etc). Even though we won't know if these have been modified, these apps are outside the scope of this research.

Analysis

Extracted APK files were run through the following two processes:

• A static analysis using Exodus_Standalone by Exodus Privacy, which lists the apps’ permissions, signing certificate and integrated trackers
• If needed, a dynamic analysis of selected apps using the methodology and set up we have previously used for our apps analysis

Exodus Privacy is a tool that is designed to detect the presence of known trackers in an application without running it. An online version of the tool allows anyone to analyse applications on the Google Play Store. Exodus-Privacy/exodus-standalone is licensed under the GNU Affero General Public License v3.0, which is designed for app developers to test applications before its publication on Google Play.

Using Exodus-Privacy/exodus-standalone we conducted a static analysis of an app’s install package [APK]. Android applications are developed in JVM compatible languages like Java, Kotlin, etc. As a result, class names are readable directly in the binary file of the program without requiring decompilation. Exodus analyses APK files to find trackers, which can be identified by their own name-space. As an example, Amazon Ads library root name-space/package is com.amazon.device.ads, this library shares the com.amazon package name with other Amazon libraries. So, at this point, a tracker signature is a name-space (a package name).

The analysis generates a report that describes found code signature of trackers in the application (i.e. a piece of software meant to collect data about you or your usages, as well as permissions in the application (i.e. actions that the application can do on your phone) according to severity levels as they are defined by Google's protection levels. Google classifies some permissions as dangerous, since they “cover areas where the app wants data or resources that involve the user's private information, or could potentially affect the user's stored data or the operation of other apps”.

In relation to the certificates, in a similar manner to apps, we excluded all the certificates which are by default included within the Android operating system (we documented which ones are omitted) to establish if any additional certificates have been added by the manufacturer and/or by telcos.

We will use Privacy International's data interception environment to do dynamic analysis in a similar vein to our previous Facebook AppData project, the methodology can be found in the report here.