Compilation learning notes -- assembler format

  • Definitions, assumptions and references of segments

    • Definition of segment

      • format

        • segment_name  segment [align] [combine] ['class']
          	statements
          segment_name ends
          ;keyword segment Indicates the beginning of the segment definition, keyword ends Indicates the end of the segment definition(end of segment),segment_name Represents the segment name. Any segment must have a segment name,statements Statement representing assembly language
          ;align Indicates alignment, which is one of the following five keywords:byte word dword para page
          ;These keywords are used to specify that the defined segment is bounded by bytes, words, doublewords, sections and pages. Sections are 16 bytes and pages are 256 bytes. If the alignment mode is omitted during segment definition, it is section by default. Generally, it is not necessary to specify the alignment mode
          ;combine Indicates the merge type. It refers to one of the following five keywords
          ;public stack common memory at
          ;public It is generally used for the definition of code segment or data segment. If the segment name is the same and the merge type is public The segments of are merged into one segment when connected, stack Used for stack segment definition. If the segment name is the same and the merge type is stack When the program is loaded into memory and ready to run, ss Automatically initialized to the segment address of the stack segment, sp Automatically initialized to the offset address of the byte at the end of the segment+1,Generally, there is no need to write. If a stack segment is defined in the program, the merging type of the segment should be specified as stack
          ;'class'Represents a class alias. Its content is fixed and can be taken at will. Its function is to enable the connector to put this segment together with other segments with the same class alias when connecting. Generally, it does not need to be specified
          
          ;In general, it's good to define segments like this
          segment_name segment
          statements
          segment_name ends
          
          ;Stack segments are generally defined this way
          stk segment stack
          db 100h dup(0);Definition 100 h Bytes, and the value of each byte is 0
          stk ends
          
    • Assumptions of paragraph

      • The assembly instruction assert can be used to establish the corresponding relationship between segment registers and segments

      • format

        • assume segreg:segment_name
          ;segreg Represents the segment register(cs,ds,es,ss)One of them
          
      • Note that the corresponding relationship between segment register and segment established by assume only helps the assembler and compiler to replace the segment address with the segment register, but does not assign a value to the segment register, that is, it cannot guarantee that the segment register is equal to the corresponding segment address

      • When the program starts running, except that the two registers cs and ss will be automatically assigned by the operating system, ds and es are not equal to the segment address specified by assume

    • Reference to segment

      • Reference segment address with segment name

        • Segment names can be used instead of segment addresses

        • Reference segment address with seg operator

          • format

            • seg variablename;Variable name,Representation variable variablename Segment address of the segment
              seg labelname;Label name,Address of the segment where the label is located
              
  • End of procedure

    • End of source program

      • End with assembly instruction end

        • Format:

          • end labelname;end Indicates that the source program ends here, labelname Is the name of a label in the code segment, which instructs the program to run from this label end Post ellipsis labelname,The program starts from the first instruction in the code segment
            
    • Segment prefix

      • PSP(program segment prefix) is a memory with a length of 100h bytes

      • When DOS runs any executable program, it first allocates a PSP for the program in memory, then DOS reads the contents of the program file and loads it into the memory after the PSP. Finally, DOS sets DS and ES as PSP segment address, SS and SP as stack segment address and last byte offset address of stack segment + 1, and CS as code segment address of the program, IP is set as the offset address of the label specified by end in the source program, and CS:IP in the program starts running

      • deviationLength (byte)content
        0000h2int 20h(0cdh,20h)
        0016h2PSP segment address of parent program
        002ch2Environment block address
        0080h1Command line parameter length
        0081hIndefinite (up to 7Fh bytes)Command line parameters
    • Termination of proceedings

      • The end in the assembly source program only represents the end of the source program. It is only an assembly instruction. After the program is compiled, end will disappear and will not be converted into any code

      • If you really want the program to terminate, you usually use the 4ch function call of DOS

      • mov ah,4ch
        mov al,return_code
        int 21h
        ;among al The return code in is used to pass the return information of this program to the parent program (the caller of the currently running program)
        
      • In addition to the 4c function of DOS, the int 20h interrupt call and the 00h function of DOS can also terminate the program. However, these two calls require the value of the code segment register CS to be equal to the PSP segment address. If CS is not equal to the PSP segment address when calling these two functions, it will cause a crash

      • There is another method of termination

        • code segments
          assume cs:code
          	push ds;At the beginning of the program ds Pushed onto the stack, actually saved PSP Segment address
          	mov ax,0
          	push ax
          	
          	....
          	retf;Pop 0 from stack,ds,This time will IP Assign value to 0,cs Assign as ds that is PSP Segment address, equivalent to executing jmp psp:0000,There is one stored there int 20h Instruction, and the program terminates execution
          code ends
          
  • Assembly language statement

    • Format of Assembly statement

      • name mnemonic operand ;comment
        ;name Called name item, it mainly refers to variable name and label name, and can also be segment name, process name, etc. name item is not necessary in assembly language, and most statements do not need it
        ;mnemonic For mnemonics, it mainly refers to 8086 instruction (e.g mov,add,jmp),It can also be assembly instruction(assume,end,segment)And pseudo instructions(db,dw)
        ;operand Is an operand and a mnemonic parameter
        ;comment For comments, always start with a semicolon
        
    • Constants and constant expressions

      • The constants supported by assembly include integer constants, character constants and string constants

        • Integer constant

          • 8-bit, 16 bit and 32-bit integer constants can be positive numbers, negative numbers and non signed numbers, which can be expressed in decimal, binary, octal and hexadecimal

          • 10,-10;Decimal, no suffix
            1011B;Binary, B Is suffix
            177Q;octal number system, Q Is suffix
            3Fh,0FFh;hexadecimal, h Is suffix
            
        • Character constant

          • A single character enclosed in single or double quotation marks, 'a', 'a', which is numerically equal to the ASCII code of the character
        • String constant

          • A string of characters enclosed in single or double quotation marks, 'ABC', 'ABC'
          • Note that the string constant in the assembly contains characters in quotation marks, and there is no \ 0 to end the string
        • Constant expression

          • Assembly supported operators

          • operatorformatmeaning
            ++Expressionjust
            --Expressionnegative
            +Expression + expressionplus
            -Expression - expressionreduce
            *Expression * expressionride
            /Expression / expressionexcept
            MODExpression MOD expressionSeeking remainder
            SHRExpression 1 SHR expression 2Shift right expression 2 bits
            SHLExpression 1 SHL expression 2Shift left
            NOTNOT expressionReverse
            ANDExpression AND expressionAnd
            ORExpression OR expressionor
            XORSimilarXOR
            SEGSEG variable name or label nameSegment address
            OFFSETdittoOffset address
        • Symbolic Constant

          • Symbolic Constant refers to the constant expressed in symbolic form, which is defined by EQU, =

          • symbol equ expression
            symbol = expression;symbol Is the symbol name, expression Is an expression
            ;=The operand of can only be a constant or constant expression of numeric type or character type. The same symbol is allowed to be used=Redefine
            ;equ In addition to the familiar or constant expression of numeric type or character type, the operand of can also be a string or even an assembly statement, but the same symbol cannot be redefined
            char = 'A'
            exitfun equ  <mov ah,4ch>
            dosint equ  <int 21h>
            code segment
            assume cs:code
            main:
            	mov ah,2
            	mov dl,char;amount to mov dl,'A'
            	dosint ;int 21h
            	char = 'B';Redefine
            	mov ah,2
            	mov dl,char; mov dl,'B'
            	dosint
            	exit fun ;mov ah,4ch
            	dosint
            code ends
            end main
            
      • Variables and labels

        • Variable name and label name

          • Variable names cannot start with numbers, $and? It cannot be used as a variable name alone. The maximum number of characters contained in a variable name is 31. By default, the case of the variable name is not distinguished. It cannot be defined repeatedly, can not be a keyword, and can not contain spaces
        • Definition of variables

          • Variable name db|dw|dd Initial value
            ;db(define byte)
            ;dw (define word)
            ;dd(define double word)
            
            ;ex:
            x db 3Fh
            y db 1,2,3 ;Equivalent to defining y[3]
            z db 'ABC',0Dh,0Ah,'$';z[6]
            abc dw 1234h,5678h;abc(2)
            
            ;dup Command, repeat the same initial value
            abc db 100 dup(0)
            ;abc(100),Each is initialized to 0
            ;dup The value after parentheses can have multiple values
            x db 3 dup(1,2)
            ;Define an array of byte types x,Total 3*2 Elements, respectively(1,2,1,2,1,2)
            ;It can also be nested
            y db 2 dup('A',3 dup('B'),'C')
            ;amount to'ABBBCABBBC'
            
        • Definition of label

          • The label is used as the target of jump (jmp class) or call

          • Label name:;The simplest way to define,Equivalent to the following near
             Label name label near|far|byte|word|dword
            ;near Only the product offset address is included, far Include segment address and offset address
            ;If it's followed by byte,word,dword What is actually defined is a variable
            abc label byte
            db 7Fh
            ;The first sentence defines abc Is a byte type variable, but no memory space is allocated. In this case, abc The address of is equal to the address of the following variable
             amount to abc db 7Fh
            ;This is used to define byte or word variables at the same time
            www label word
            abc db 12h,34h
            ;www That's 3412 h
            bbb label byte
            xyz dw 5678h
            ;bbb It's 78 h
            
        • Variable reference

          • In the code segment, VAR or [var] can be used as the operand of 8086 instruction to represent the value of variable
        • Variable cast

          • word ptr [var]
            byte ptr
            dword ptr
            near ptr
            far ptr
            
        • Position counter

          • When compiling the source program, the assembler compiler will use a variable called the position counter to record the offset address in the current segment. When a segment definition starts, the value of the position counter will be assigned to 0. Then, when compiling to a statement, the value of the position counter will automatically add the number of bytes occupied in the source program, You can use $to get the value of the current position counter. Using the value of the position counter, you can get the length of a string

          • data segment
            poem db "abcsdefsdq"
            len db $-offset poem;That's what you get before this instruction, that is poem Length of
            data ends
            
            code segment
            assume cs :code
            main:
            	mov ah,4ch
            	int 21h
            code ends
            end main
            

      - Can see len Sure enough, the length of characters is 10
    
      - Sets or modifies the value of the position counter
    
        - org
    
        - ```assembly
          data segment
          org 1000h;set up $1000 h,such abc The offset address of is 1000 h Not 0000 h
          abc db 12h,34h
          org $+100h
          xyz dw 5678h
          data ends
          
          code segment
          assume cs :code
          main:
          	mov ah,4ch
          	int 21h
          code ends
          end main
          ```
    

You can see that the data content that should have followed ss:0 is missing and filled with 0

It can be seen that 12 and 34 appear at 1000h, indicating that $is actually a bit like a pointer to the stack. It points to the offset address of the next content. If we modify it, it will not be placed continuously according to the established. We can also estimate that the offset address of xyz should be 1000 + 100 + 2 = 1102

So if you don't need it, please don't change the address of $. At the same time, you can also see that the org instruction doesn't occupy memory space. It should play a role in compiling and preprocessing

  • Reference to label
    • If lab or offset is the label name of the lab, it can indicate the offset of the lab
      The content is missing and filled with 0

[external chain picture transferring... (img-SCj8ntcD-1619056079924)]

It can be seen that 12 and 34 appear at 1000h, indicating that $is actually a bit like a pointer to the stack. It points to the offset address of the next content. If we modify it, it will not be placed continuously according to the established. We can also estimate that the offset address of xyz should be 1000 + 100 + 2 = 1102

[external chain picture transferring... (img-5xj3bhd-1619056079925)]

So if you don't need it, please don't change the address of $. At the same time, you can also see that the org instruction doesn't occupy memory space. It should play a role in compiling and preprocessing

  • Reference to label
    • If lab is a label name, lab or offset lab can represent the offset address of the label

Added by nalkari on Wed, 02 Mar 2022 02:10:35 +0200