In depth study of smali grammar

preface

When injecting code into an apk file, we often face the decompiled smali code rather than the direct java source code file. Therefore, it is necessary to understand the basis of smali syntax. Here we first introduce Dalvik virtual machine: Dalvik is a virtual machine specially designed by Google for Android platform. Although Android programs can be developed using the Java language, Dalvik VM and Java VM are two different virtual machines. Dalvik VM is register based, while Java VM is stack based. Dalvik VM has a special file execution format dex (Dalvik Executable), while Java VM executes Java bytecode. DVM S are faster than JVM s and take up less space.

smali file structure

The following smali code is taken from a test demo (obtained by decompiling the. apk file through apktool. Here, the smali syntax format is introduced first). The purpose is to have a general understanding of the content structure of smali file, which is conducive to an overall grasp of the syntax details later.

.class public abstract Lcom/happy/learnsmali/BaseActivity;
.super Landroidx/appcompat/app/AppCompatActivity;
.source "BaseActivity.kt"

# interfaces
.implements Lcom/happy/learnsmali/action/ActivityAction;
.implements Lcom/happy/learnsmali/action/ClickAction;
.implements Lcom/happy/learnsmali/action/HandlerAction;
.implements Lcom/happy/learnsmali/action/BundleAction;
.implements Lcom/happy/learnsmali/action/KeyboardAction;


# annotations
.annotation system Ldalvik/annotation/MemberClasses;
    value = {
        Lcom/happy/learnsmali/BaseActivity$Companion;,
        Lcom/happy/learnsmali/BaseActivity$OnActivityCallback;
    }
.end annotation

.annotation system Ldalvik/annotation/SourceDebugExtension;
    value = "SMAP\nBaseActivity.kt\nKotlin\n*S Kotlin\n*F\n+ 1 BaseActivity.kt\ncom/happy/learnsmali/BaseActivity\n+ 2 fake.kt\nkotlin/jvm/internal/FakeKt\n*L\n1#1,179:1\n1#2:180\n*E\n"
.end annotation

# static fields
.field public static final Companion:Lcom/happy/learnsmali/BaseActivity$Companion;

.field public static final RESULT_ERROR:I = -0x2


# instance fields
.field private final activityCallbacks$delegate:Lkotlin/Lazy;


# direct methods
.method public static synthetic $r8$lambda$mAxgPA6JBXhjuhBfNvUeqmKUmlk(Lcom/happy/learnsmali/BaseActivity;Landroid/view/View;)V
    .locals 0

    invoke-static {p0, p1}, Lcom/happy/learnsmali/BaseActivity;->initSoftKeyboard$lambda-0(Lcom/happy/learnsmali/BaseActivity;Landroid/view/View;)V

    return-void
.end method

.method static constructor <clinit>()V
    .locals 2

    new-instance v0, Lcom/happy/learnsmali/BaseActivity$Companion;

    const/4 v1, 0x0

    invoke-direct {v0, v1}, Lcom/happy/learnsmali/BaseActivity$Companion;-><init>(Lkotlin/jvm/internal/DefaultConstructorMarker;)V

    sput-object v0, Lcom/happy/learnsmali/BaseActivity;->Companion:Lcom/happy/learnsmali/BaseActivity$Companion;

    return-void
.end method

.method public constructor <init>()V
    // ...
.end method

In the above code, it's normal if you just start to contact smali code and see it in a fog. Next, I'll analyze and understand the meaning of these symbols, which is conducive to achieving twice the result with half the effort when we decompile apk and inject code.

Inheritance, interface and package information in smali

First, let's look at the first few lines:

.class public abstract Lcom/happy/learnsmali/BaseActivity; // . class indicates the class path package name + class name
.super Landroidx/appcompat/app/AppCompatActivity;		   // . super indicates the path of the parent class
.source "BaseActivity.kt"								   // Indicates the source file name

# interfaces
.implements Lcom/happy/learnsmali/action/ActivityAction;
.implements Lcom/happy/learnsmali/action/ClickAction;
.implements Lcom/happy/learnsmali/action/HandlerAction;
.implements Lcom/happy/learnsmali/action/BundleAction;
.implements Lcom/happy/learnsmali/action/KeyboardAction;


# annotations
.annotation system Ldalvik/annotation/MemberClasses;
    value = {
        Lcom/happy/learnsmali/BaseActivity$Companion;,
        Lcom/happy/learnsmali/BaseActivity$OnActivityCallback;
    }
.end annotation

Lines 1-3 define basic information: indicates the baseactivity of the active file KT decompiled smali file (line 3). The file path is at com/happy/learnsmali / (line 2), which inherits from Android X / appcompat / APP / appcompatactivity (line 3).

Lines 5-9 define interface information: indicates that the interface classes implemented by BaseActivity class include:

  • com/happy/learnsmali/action/ActivityAction
  • com/happy/learnsmali/action/ClickAction
  • com/happy/learnsmali/action/HandlerAction
  • com/happy/learnsmali/action/BundleAction
  • com/happy/learnsmali/action/KeyboardAction

Lines 11-16 define the internal class: it means that the BaseActivity class has two internal classes - Companion and OnActivityCallback.

After analyzing the file information at the beginning of smali, we can construct java code:

class BaseActivity extends AppCompatActivity 
    implements ActivityAction, ClickAction, HandlerAction, BundleAction, KeyboardAction {
    
    class Companion {
        // ...
    }
    
    class OnActivityCallback {
        // ...
    }
}

Other methods

# virtual methods   //Representation is a virtual method
.method protected onCreate(Landroid/os/Bundle;)V
    .locals 1
    .param p1, "savedInstanceState"    # Landroid/os/Bundle;

    .line 10
    invoke-super {p0, p1}, Landroid/app/Activity;->onCreate(Landroid/os/Bundle;)V

    .line 11
    const/high16 v0, 0x7f050000

    invoke-virtual {p0, v0}, Lcom/justart/samlidemo/MainActivity;->setContentView(I)V

    .line 12
    return-void
.end method
  • The method is based on Method starts with end method ends;
  • The last V in the first line indicates that the return type is void;
  • Method parameter Landroid/os/Bundle; Indicates that the parameter of the method onCreate() is of type Bundle;
  • The name of the method represented by the parameter. Instancedsave param;
  • Finally, return void indicates that the returned value type is void;

data type

  • byte: B
  • char: C
  • double: D
  • float: F
  • int: I
  • long: J
  • short: S
  • void: V
  • boolean: Z
  • array: [XXX
  • Object: Lxxx/yyy

I believe that JNI foundation will understand the above data types. Here we analyze the last two items above:

array: [XXX

Add [before the basic type to indicate the array type. For example, int array and byte array are [I, [B].

Object: Lxxx/yyy

Types starting with L are represented as objects. For example, String objects are represented as Ljava/lang/String; (the object type needs to be followed by a semicolon), where java/lang represents Java Lang package, String represents an object under the package path.

There may be doubts about children's shoes here. If the class uses Ljava/lang/String; How should the inner class be defined in smali? Children's shoes that may have used Java reflection flashed the $symbol in their mind. Yes, in smali syntax, Ljava/lang/String$xxx is also used; To indicate that xxx is the inner class of string class.

register

One of the biggest differences between Dalvik VM and JVM is that Dalvik VM is register based. What does register based mean? Personal understanding is that it is a bit similar to assembly language, which stores and transmits data through registers. In smali, local registers are represented by letters and numbers beginning with v, such as v0, v1, v2,..., while parameter registers are represented by letters and numbers beginning with p, such as p1, p2, p3. In particular, the p0 parameter register does not necessarily represent the first parameter. In non static functions, p0 represents this, p1 represents the first parameter, and p2 represents the second parameter in the function. In the static function, p0 corresponds to the first parameter (because the static method of Java has no concept of object). There are no restrictions on local registers, which can be used arbitrarily in theory.

Member variable

Next, we will continue to introduce the content of member variables:

# static field
.field private static final PREFS_INSTALLATION_ID:Ljava/lang/String; = "installationId"
//...

# instance field
.field private _activityPackageName:Ljava/lang/String;

The static field and instance field defined above are member variables in the following format:

. field pubilc / private [static] [final] Varname: < type >

Although static field and instance field are both member variables, they are different. Of course, the most obvious difference is whether it is related to objects. Static field is a class level concept, while instance field is an object level concept.

The occurrence of member variables means that there are assignment and value of variables. In smali syntax, value taking instructions include iget, sget, iget Boolean, sget Boolean, iget object, sget object, etc., while value assignment instructions include iput, sput, iput Boolean, sput Boolean, iput object, sput object, etc.

iget / iput respectively represent the value and assignment of the member variable of instance field;

sget / sput respectively represent the value and assignment of static field member variables;

Whether it is the value and assignment instruction of instance field or static field member can be judged according to the instruction prefix. The suffix with - object indicates that the operation is a member variable and the variable is an object type. Without this suffix, the operation is a basic data type. In particular, the boolean basic data type uses a suffix with - boolean.

Here is an example:

const/4 v0, 0x0  
iput-boolean v0, p0, Lcom/disney/xx/XxActivity;->isRunning:Z

In the above example, the v0 local register is used and 0x0 is passed to the v0 local register, and then the second sentence uses the Iput Boolean instruction to pass the value in the v0 register to com disney. xx. The member variable isrunning of xxactivity. That is equivalent to: this isRunning = false; (as mentioned above, in the non static function, p0 is represented as this, and here it is represented as the object instance of com.disney.xx.XxActivity).

static field member variable

sget-object v0, Lcom/disney/xx/XxActivity;->PREFS_INSTALLATION_ID:Ljava/lang/String;

The operation instruction sget object is used to obtain static member variables and save them in the following local parameter list. Here, put it on COM disney. xx. Static member prefs in xxactivity class_ INSTALLATION_ The value of ID is passed to the local register v0.

instance field member variable

iget-object v0, p0, Lcom/disney/xx/XxActivity;->_view:Lcom/disney/common/WMWView;

The operation instruction Iget object is also used to obtain class member variables and save them in the following local parameter list. Here put com disney. xx. Object members in the xxactivity class_ view is assigned to the local register v0.

By observing the static field static member variable and instance field class member variable above, the following format can be summarized:

**< local register >, [< parameter register >], < class variable to which the variable belongs > - > Varname: < variable type >**

The format of the put instruction is similar to that of the get instruction mentioned above. Here you can directly look at the following example:

const/4 v3, 0x0  
sput-object v3, p0, Lcom/disney/xx/XxActivity;->globalIapHandler:Lcom/disney/config/GlobalPurchaseHandler;

Java code represents: this globalIapHandler = null; (null = 0x0)

.local v0, wait:Landroid/os/Message;  
const/4 v1, 0x2  
iput v1, v0, Landroid/os/Message;->what:I

Java code representation: wait what = 0x2; (wait is an instance of Message)

function call

Format of function definition:

function (type1type2type3...)RetValue

It should be noted that the parameter type of the function needs to be defined as the type in smali syntax, and there can be no other separators between parameters. Examples are as follows:

helloSmali ()V
Indicates void helloSmali()

helloSmali ([BI)Z
Indicates boolean helloSmali(byte[], int)

helloSmali (ZLjava/lang/String;[I[I)V
Indicates void helloSmali(boolean, String, int[], int [])

In smali, functions and member variables are also divided into two types, but different from static field static member variables and instance field member variables in member variables, direct method and virtual method are used in functions. So what's the difference between direct method and virtual method? In short, direct method is a private function, while virtual method is a public and protect function.

Therefore, when calling functions, there are invoke direct, invoke virtual, and several different instructions such as invoke static, invoke super, and invoke interface. At the same time, there is also the invoke XXX / range instruction, which is called when the number of parameters is greater than 4.

invoke-static

invoke-static {}, Lcom/disney/xx/UnlockHelper;->unlockCrankypack()Z

Invoke static indicates that a class static function is called. The Java code is expressed as: unlockhelper Unlockcrankpack(), notice that invoke static is followed by {}, which indicates the instance + parameter list of calling the method. Since this method requires neither parameters nor class static methods, the {} is empty. Let's take another example:

const-string v0, "fmodex"  
invoke-static {v0}, Ljava/lang/System;->loadLibrary(Ljava/lang/String;)V

Static void system is called here Loadlibrary (string) to load the so library, and v0 means to pass the parameter fmodex.

invoke-super

Indicates the instruction used to call the parent method, which can be seen in the overloaded method.

invoke-direct

Indicates the method of calling the private function, such as:

invoke-direct {p0}, Lcom/disney/xx/XxActivity;->getGlobalIapHandler()Lcom/disney/config/GlobalPurchaseHandler;

The GlobalPurchaseHandler getGlobalIapHandler() here means that getGlobalIapHandler() is a method with private permission defined in the XxActivity class.

invoke-virtual

Indicates that the protected or public function is called.

sget-object v0, Lcom/disney/xx/XxActivity;->shareHandler:Landroid/os/Handler;  
invoke-virtual {v0, v3}, Landroid/os/Handler;->removeCallbacksAndMessages(Ljava/lang/Object;)V

v0 here can be expressed as shareHandler:Landroid/os/Handler, while v3 is expressed as Ljava/lang/Object of removeCallbacksAndMessages method; Type.

invoke-xxxxx/range

Indicates that when the method parameter > = 5, / range needs to be added after it.

Some children's shoes may notice that the above examples are calling the operation of the function. It seems that there is no operation to take the return value of the function? In the smali code, if the called function returns non void, move result (return basic data type) and move result object (return object) are also required:

const/4 v2, 0x0  
invoke-virtual {p0, v2}, Lcom/disney/xx/XxActivity;->getPreferences(I)Landroid/content/SharedPreferences;  
move-result-object v1

v1 means calling this An object of type SharedPreferences returned by the getpreferences (0) method.

invoke-virtual {v2}, Ljava/lang/String;->length()I  
move-result v2

v2 stands for string The basic type of int returned by length().

Example analysis

The above preliminarily analyzes the function variables, method definitions and calls, and the following further analyzes the smali syntax through examples:

.method protected onDestroy()V
    .locals 0

    .line 79
    invoke-super {p0}, Landroidx/appcompat/app/AppCompatActivity;->onDestroy()V

    .line 80
    invoke-virtual {p0}, Lcom/happy/learnsmali/BaseActivity;->removeCallbacks()V

    .line 81
    return-void
.end method

This is the familiar onDestroy() function. First, we see the first sentence in the function: locals 0 indicates the number of local registers used in this function. Here, because the called method does not use the local register, the number of local registers is 0. If I add: this Isexited = true, then the above method should be modified to:

.method protected onDestroy()V
    .locals 1

    .line 79
    invoke-super {p0}, Landroidx/appcompat/app/AppCompatActivity;->onDestroy()V

    .line 80
    invoke-virtual {p0}, Lcom/happy/learnsmali/BaseActivity;->removeCallbacks()V
    
    .line 81
    const/4 v0, 0x1
    iput-boolean v0, p0, Lcom/happy/learnsmali/BaseActivity;->exited:Z

    .line 82
    return-void
.end method

Because the modified onDestroy() function uses a local register v0, it is locals 0 is modified to locals 1 . In addition, you may notice Line is the identifier, which indicates the line number of the corresponding code in Java where the line of code smali is located. Usually, when the debugging program crashes on Android Studio, the code line number in logcat that prompts the crash is also this value. Of course, this identifier is not required, but it is recommended to keep it for debugging convenience.

Information sharing

At the end of the article, the author shares the materials compiled and sorted out in the process of learning smali grammar with the partners in need:

It includes some more basic detail operators of smali (which can be used as manual query), how to reverse steps for an APP, etc.

Access: WeChat search, and concerned about the official account Android security engineering, and then reply to the smali keyword acquisition.

Keywords: Android

Added by darence on Mon, 07 Feb 2022 22:16:06 +0200