Step by step governance of privacy | Android black magic

theme: smartblue

background

The content of this article belongs to a small broken station. Please do not reprint it without authorization

Recently, many companies are facing the same problems as us, and cooperate with the online information office to rectify their privacy rights. It mainly involves the prohibition of calling sensitive APIs, such as IMEI, Android, ip, MAC address and so on, before users agree to privacy authorization.

Another previous article introduced how to search for calls with sensitive permissions by decompiling apk products through python, and then notify the caller to rectify.

Teach you how to efficiently check where sensitive permissions are used in APK and where a system method is called

However, there is a problem with the method of the above leaders, because the project will continue to iterate, and it needs to be checked every period of time, and then remind the business party to make changes. It is too passive.

Because the network library and the basic components of the buried point rely on unique identifiers and are not allowed to be called before the privacy permission is given, the initialization task is disordered. At the same time, these basic warehouses should also be provided to other app s in station b. One part is for privacy rights governance, and the other part is to sort out our initialization tasks.

In fact, the scheme is relatively simple. We will first abstract a privacy middleware. When privacy permission is not granted, all api calls return null values.

Then we need to replace the api calls in the business with privacy middleware.

Demo project

pipeline

Jenkins Pipeline (or simply "Pipeline") is a suite of plugins which supports implementing and integrating continuous delivery pipeline into jenkins It means that pipeline is a set of plug-ins officially provided by jenkins, which can be used to realize and integrate continuous delivery in jenkins.

pipeline is a process that defines the steps to complete a CI/CD process. Instead of manually and automatically completing CI/CD, this process is defined by the user.

The pipeline on gitlab corresponds to gitlab-ci.yaml.

This is all the steps performed after a branch code is pushed to the remote end. After all these steps pass, the code can be combined and entered.

GithubActions

At present, github provides a very simple ci/cd access scheme. Interested leaders can try it.

The results can refer to this https://github.com/Leifzhang/AndroidLint/runs/3276855999?check_suite_focus=true

Static inspection

If you are interested in learning the basic use of lint, please refer to my previous articles Android custom lint development Talk about Android Lint again

Because the code warehouse of station b is basically the mono repo mode of source code, all source codes are together, which provides us with convenience for static code inspection.

At the same time, because all codes and inputs need to complete the static scanning pipline first, we can ensure that all subsequent codes and inputs are standardized, so that we can effectively and continuously avoid this problem.

The number of api changes involved this time is relatively large, and each prompt modification text is also different. It will be very troublesome to develop lint. At this time, we need to provide a simpler and more expansible way to make these simple lint configurable.

This part is not our original technology, but refers to the technology stack of meituan and mihuyou before, and then iterates and transforms it. And before, mihuyou boss also shared on his github.

github reference AndroidLint

Json format

First, let's take a look at this simple json definition, because we need to do dynamic json matching according to these json.

Because constructor and method call are actually two different lint writing methods, we define two different methods here

{
  "methods": [
    {
      "name_regex": "android.net.wifi.WifiManager.getSSID",
      "message": "gie gie This is privacy api Please use PrivacyUtil.getWifiName Replace oh",
      "excludes": [
        "com.xxxxxx.privacy.PrivacyImp"
      ]
    },
    {
      "name_regex": "android.net.wifi.WifiManager.getBSSID",
      "message": "gie gie This is privacy api Please use PrivacyUtil.getWifiName Replace oh",
      "excludes": [
        "com.xxxxx.privacy.PrivacyImp"
      ]
    },
    {
      "name_regex": "Settings.Secure.getString",
      "message": "gie gie This is privacy api Please use PrivacyUtil.getAndroidId Replace oh",
      "excludes": [
        "com.bilibili.privacy.PrivacyImp",
        "com.bilibili.adcommon.util.LocationUtil"
      ]
    },
    {
      "name_regex": "android.telephony.TelephonyManager.getImei",
      "message": "gie gie This is privacy api Please use PrivacyUtil.getDeviceId Replace oh",
      "excludes": [
        "com.xxxx.privacy.PrivacyImp"
      ]
    },
    {
      "name_regex": "android.telephony.TelephonyManager.getDeviceId",
      "message": "gie gie This is privacy api Please use PrivacyUtil.getDeviceId Replace oh",
      "excludes": [
        "com.xxxx.privacy.PrivacyImp"
      ]
    },
    {
      "name_regex": "android.telephony.TelephonyManager.getDeviceId",
      "message": "gie gie This is privacy api Please use PrivacyUtil.getDeviceId Replace oh",
      "excludes": [
        "com.xxxx.privacy.PrivacyImp"
      ]
    },
    {
      "name_regex": "java.net.getHostAddress",
      "message": "gie gie This is privacy api Please use PrivacyUtil.getIpAddress Replace oh",
      "excludes": [
        "com.xxxxx.privacy.PrivacyImp"
      ]
    },
    {
      "name_regex": "android.content.pm.PackageManager.getInstalledApplications",
      "message": "gie gie This is privacy api Please use PrivacyUtil.getPackageList Replace oh",
      "excludes": [
        "com.Xxxxxx.privacy.PrivacyImp"
      ]
    },
    {
      "name_regex": "android.content.pm.PackageManager.getInstalledPackages",
      "message": "gie gie This is privacy api Please use PrivacyUtil.getAppList Replace oh",
      "excludes": [
        "com.xxxxx.privacy.PrivacyImp"
      ]
    }
  ],
  "constructions": [
    {
      "name_regex": "",
      "message": ""
    }
  ]
}

The above is our current list of privacy related json, and the code related to our middleware is excluded. We hope to rectify all privacy related APIs to the middleware at one time.

Because this appeal is relatively simple, we only define two arrays: method and constructor. name_regex represents rule matching, message indicates prompt copy, and excludes represents white list. Because our demand is to uniformly call the middleware defined by us, and all known middleware are on our white list.

Dynamically configurable Lint

The difficulty here is how to make lint code read our configured json file.

class DynamicLint : Detector(), Detector.UastScanner {

    lateinit var globalConfig: DynamicConfigEntity


    override fun beforeCheckRootProject(context: Context) {
        super.beforeCheckRootProject(context)
        globalConfig = GsonUtils.inflate(context.project.dir)
    }

}

The Detector provides a beforeCheckRootProject method. This method will pass in the current directory information. We need to obtain our configurable json file information through this Context context.

In addition, there is a small detail here, because our project adopts the mode of compose building, and this Context normally passes in only the Module path, so a simple recursive search is needed here.

private fun findCodeQuality(projectDir: File): File? {
     if (projectDir.parent != null) {
         val parent = projectDir.parentFile
         val file = parent.listFiles()?.firstOrNull {
             it.name == ".codequality" && it.isDirectory
         }
         return file ?: findCodeQuality(parent)
     }
     return null
 }

A simple recursive call addressing. I can solve one of the few spicy chicken algorithm problems, ha ha.

      /**
       * name Is an exact match, nameRegex is a regular match, and name > nameRegex in the matching priority
       * inClassName Is the class of the method currently to be matched
       * exclude Is to exclude matching classes (currently excluded by class granularity)
       */
      private fun match(
          nameRegex: String?,
          qualifiedName: String?,
          inClassName: String? = null,
          exclude: List<String> = emptyList(),
          excludeRegex: String? = null
      ): Boolean {
          qualifiedName ?: return false

          //exclude

          if (inClassName != null && inClassName.isNotEmpty()) {
              if (exclude.contains(inClassName)) return false

              if (excludeRegex != null &&
                  excludeRegex.isNotEmpty() &&
                  Pattern.compile(excludeRegex).matcher(inClassName).find()
              ) {
                  return false
              }
          }

          if (nameRegex != null && nameRegex.isNotEmpty() &&
              Pattern.compile(nameRegex).matcher(qualifiedName).find()
          ) {//Match nameRegex in
              return true
          }
          return false
      }
  }

Code checking and matching is carried out through the above code. This part of logic is also relatively simple. If you are interested, just take a look.

How to verify

Although we constrained the code in the project once through lint, it has been compiled into class will not be recognized by this UastScanner.

class lint scanning can be performed through ClassScanner, but the logic is relatively complex. After writing this, I am actually asm finished.

Moreover, if we test this requirement, there will be no way to test this requirement for the testing students. So we need another way to provide some hook capabilities at runtime. When these privacy APIs are called, either generate a file record or crash directly.

Dynamic hook based on Epic

epic's hook mechanism is based on art's elft file format, so it can hook all method calls in the code. Although it is passive, it can avoid the problem of privacy permission calls caused by extreme situations such as reflection and calls in the third-party sdk.

First, we can use the dynamic json file defined in the previous project, and then copy it directly to the assets folder of debug through soft connection.

Soft connection is a common command in linux. Its function is to establish a different link for a file in another location. The specific usage is: ln -s source file target file.

    fun hookManager(context: Context) {
        val steam = context.resources.assets.open("dynamic.json")
        val configEntity = GsonUtils.inflate(steam)
        configEntity.methods.forEach {
            start(it)
        }
    }


    fun start(entity: DynamicEntity) {
        if (entity.name_regex.isNotEmpty()) {
            val methodName = entity.name_regex.substring(entity.name_regex.lastIndexOf(".") + 1)
            val className = entity.name_regex.substring(0, entity.name_regex.lastIndexOf("."))
            val lintClass = Class.forName(className)
               DexposedBridge.hookAllMethods(lintClass, methodName, object : XC_MethodHook() {
                   override fun beforeHookedMethod(param: MethodHookParam?) {
                       super.beforeHookedMethod(param)

                       Log.i("EpicHook", "EpicHook")
                   }
               })
        }
    }

After that, in the case of debug packages, the corresponding hook file configuration is generated through the anti serialization of json, and then DexposedBridge. is called. Hookallmethods method.

Warm tip: because the dynamic hook framework is extremely unstable, please don't publish this function online. At the same time, it's best to bring version control logic, because it will crash in Android 10.

Remember that the debug tool must not be brought to the line, because the functions designed for debugging are generally risky operations, so these variants must be added.

Privacy calls in third-party libraries

Although we already have the ability of dynamic hook, because dynamic hook will not execute exceptions until the method is called, some pages with deep calling logic may not be covered.

The better solution is to replace the privacy code of the third party through asm and subcontract them to our middleware.

In this way, multiple insurances can be achieved, which can greatly deal with the review of institutions.

Locate sensitive permissions through Transform + Asm

This part is also relatively simple. I wrote a small demo to verify this part of the correction. The following instructions indicate the attempt to repair the deviceid.

This part can locate the problem of api call of the third-party library in more detail and help us to promote the adjustment of the third-party library.

The old method will be used. Through Asm's Tree api, judge whether the current method stack frame is the getDeviceId method of "android/telephony/TelephonyManager". If so, modify it and replace it with our defined static method.

There is a small tip here, because we must have obtained android/telephony/TelephonyManager and completed the stack pressing of methods before, so we need to remove the previous method call.

summary

Because we have configurable this static inspection capability this time, we only need to change the scanning rules for subsequent needs. It has greatly expanded our ability to deal with the review passively, and at the same time, it has better affirmed our current large position model.

The main purpose of our sharing this time is to make our contribution to China's harmonious mobile ecology. We all have the responsibility to purify the network environment. The importance of user privacy to today's society is self-evident. Because all codes are subject to static inspection and manual review, it can be ensured that all subsequent codes have completed this part of the review ability. I hope this article can help you.

Added by blunt on Sun, 06 Mar 2022 04:57:41 +0200