-
1. General
-
2. Composition of minimum works
-
3. Link script
-
4. Executable program source code analysis
-
5. Compilation and operation
-
5.1 compilation
-
5.2 operation
-
5.3 commissioning
-
-
6. Summary
1. General
Any chip needs a piece of assembly code before startup, which can reflect some characteristics of architecture design. It is often necessary to pay attention to the meaning of this assembly code when doing embedded bottom development, so that when using it, we can fully understand what has been done during startup, and we can repeat the deduction when encountering problems in subsequent programs.
This paper analyzes the initial startup code of riscv64, starting from the smallest bare metal code to thoroughly understand the startup process of riscv64.
The environment used this time is riscv64 qemu, and the compiler is downloaded through the following address:
https://www.sifive.com/software
2. Composition of minimum works
A minimal project consists of two things: link scripts and source code.
The source code can make cpu The executable code is compiled through the cross compilation tool chain to generate executable binary programs.
Linked script files can tell the layout of the program, such as code segments, function entries, etc. With these two files, the program will be compiled lo ad er can be run on the board.
3. Link script
Let's take a look at the hello.ld file.
OUTPUT_ARCH("riscv") OUTPUT_FORMAT("elf64-littleriscv") ENTRY(_start) SECTIONS { /*text:testcodesection*/ .=0x80000000; .text:{*(.text)} /*data:Initializeddatasegment*/ .gnu_build_id:{*(.note.gnu.build-id)} .data:{*(.data)} .rodata:{*(.rodata)} .sdata:{*(.sdata)} .debug:{*(.debug)} .+=0x8000; stack_top=.; /*Endofuninitalizeddatasegement*/ _end=.; }
For link scripts, it is often stipulated how to put the input file into memory according to a specific address.
For the above script:
OUTPUT_ARCH("riscv"): indicates that the schema of the input file is riscv.
OUTPUT_ Format ("elf64 littleriscv"): indicates the small end of elf64. commonly arm , riscv and x86 are small terminals, and small terminals are more mainstream.
Entry (_start): indicates that the function entry is_ start.
Then start the layout of the code segment, and the starting address is 0x8000000. Then put the code segment, data segment, read-only data segment, global data segment, debug segment and so on.
Note here:
.+=0x8000; stack_top=.;
Here, 0x8000 bytes of space is reserved at the top of the stack as the stack space of the program. Because the stack grows upward, some stack space is reserved here.
View the layout of the generator through disassembly
#riscv64-unknown-elf-objdump-dhello hello:fileformatelf64-littleriscv Disassemblyofsection.text: 0000000080000000<_start>: 80000000:f14022f3csrrt0,mhartid 80000004:00029c63bnezt0,8000001c 80000008:00008117auipcsp,0x8 8000000c:04410113addisp,sp,68#8000804c<_end> 80000010:00000517auipca0,0x0 80000014:03450513addia0,a0,52#80000044 80000018:008000efjalra,80000020 000000008000001c: 8000001c:0000006fj8000001c 0000000080000020: 80000020:100102b7luit0,0x10010 80000024:00054303lbut1,0(a0) 80000028:00030c63beqzt1,80000040 8000002c:0002a383lwt2,0(t0)#10010000 80000030:fe03cee3bltzt2,8000002c 80000034:0062a023swt1,0(t0) 80000038:00150513addia0,a0,1 8000003c:fe9ff06fj80000024 80000040:00008067ret
For qemu, sifive_ The starting address of u is 0x80000000. Put the entry of the code segment here.
4. Executable program source code analysis
The layout of the link script has been described earlier, that is, the execution address has been assigned to the program. Each function and the address of the function entry have been planned. How to write the specific entry function?
Look at the programming code of hello.s:
.align 2 .equ UART_BASE, 0x10010000 .equ UART_REG_TXFIFO, 0 .section .text .globl _start _start: csrr t0, mhartid # read hardware thread id (`hart` stands for `hardware thread`) bnez t0, halt # run only on the first hardware thread (hartid == 0), halt all the other threads la sp, stack_top # setup stack pointer la a0, msg # load address of `msg` to a0 argument register jal puts # jump to `puts` subroutine, return address is stored in ra regster halt: j halt # enter the infinite loop puts: # `puts` subroutine writes null-terminated string to UART (serial communication port) # input: a0 register specifies the starting address of a null-terminated string # clobbers: t0, t1, t2 temporary registers li t0, UART_BASE # t0 = UART_BASE 1: lbu t1, (a0) # t1 = load unsigned byte from memory address specified by a0 register beqz t1, 3f # break the loop, if loaded byte was null # wait until UART is ready 2: lw t2, UART_REG_TXFIFO(t0) # t2 = uart[UART_REG_TXFIFO] bltz t2, 2b # t2 becomes positive once UART is ready for transmission sw t1, UART_REG_TXFIFO(t0) # send byte, uart[UART_REG_TXFIFO] = t1 addi a0, a0, 1 # increment a0 address by 1 byte j 1b 3: ret .section .rodata msg: .string "Hello. "
According to the rules of assembly language
.align2
Indicates that the entry program is aligned with 2 ^ 2, that is, 4 bytes.
.equUART_BASE,0x10010000 .equUART_REG_TXFIFO,0
The of UART is defined register The base address of the.
Then mainly from_ Start: start analysis.
csrrt0,mhartid#readhardwarethreadid(`hart`standsfor`hardwarethread`) bnezt0,halt#runonlyonthefirsthardwarethread(hartid==0),haltalltheotherthreads
According to the design of riscv, if a component contains an independent finger taking unit, the component is called core.
A RiscV compatible core can support multiple RiscV compatible hardware threads (harts) through multithreading technology (or hyper threading technology). Harts here refers to hardware thread, which means hardware thread.
The above contains one E51 core and four U54 cores.
This assembly is to suspend other cores and only run the core with hartid == 0.
Then
lasp,stack_top#setupstackpointer
Here, the stack pointer sp is assigned, and sp points to the top of the stack.
laa0,msg#loadaddressof`msg`toa0argumentregister jalputs#jumpto`puts`subroutine,returnaddressisstoredinraregster
For riscv architecture, the a0 register represents the first parameter assignment, and then jumps to the puts function.
At this time, the parameter passed in the past is a0, that is
.section.rodata msg: .string"Hello. "
Point to a read-only character String structured data.
Implementation of puts
It is important to describe the preparation of a serial driver through assembly.
puts:#`puts`subroutinewritesnull-terminatedstringtoUART(serialcommunicationport) #input:a0registerspecifiesthestartingaddressofanull-terminatedstring #clobbers:t0,t1,t2temporaryregisters lit0,UART_BASE#t0=UART_BASE 1:lbut1,(a0)#t1=loadunsignedbytefrommemoryaddressspecifiedbya0register beqzt1,3f#breaktheloop,ifloadedbytewasnull #waituntilUARTisready 2:lwt2,UART_REG_TXFIFO(t0)#t2=uart[UART_REG_TXFIFO] bltzt2,2b#t2becomespositiveonceUARTisreadyfortransmission swt1,UART_REG_TXFIFO(t0)#sendbyte,uart[UART_REG_TXFIFO]=t1 addia0,a0,1#incrementa0addressby1byte j1b 3:ret
First, pass the parameter through the a0 register just now, and then read the string from 1:, beqz t1, 3f means that when t1 == 0, jump to before 3:. The 2: loop will jump out.
2: It is the process of sending data to the serial FIFO.
Here, a string output can be executed normally.
5. Compilation and operation
5.1 compilation
After the above program analysis is completed, it can be compiled.
riscv64-unknown-elf-gcc-march=rv64g-mabi=lp64-static-mcmodel=medany-fvisibility=hidden-nostdlib-nostartfiles-Thello.ld-Isifive_uhello.s-ohello
The above compilation process can generate a hello program.
#readelf-hhello ELFHeader: Magic:7f454c46020101000000000000000000 Class:ELF64 Data:2'scomplement,littleendian Version:1(current) OS/ABI:UNIX-SystemV ABIVersion:0 Type:EXEC(Executablefile) Machine:RISC-V Version:0x1 Entrypointaddress:0x80000000 Startofprogramheaders:64(bytesintofile) Startofsectionheaders:4680(bytesintofile) Flags:0x0 Sizeofthisheader:64(bytes) Sizeofprogramheaders:56(bytes) Numberofprogramheaders:1 Sizeofsectionheaders:64(bytes) Numberofsectionheaders:7 Sectionheaderstringtableindex:6
You can analyze the parameters carried by gcc.
-march: you can specify the compiled architecture, such as rv32 or rv64.
-Static: indicates static compilation.
-mabi=lp64: data model and floating point parameter passing rules
Data model:
- | int word length | long word length | Pointer word length |
---|---|---|---|
ilp32/ilp32f/ilp32d | 32bits | 32bits | 32bits |
lp64/lp64f/lp64d | 32bits | 64bits | 64bits |
Floating point transfer rule
- | Floating point extensions required? | float parameter | double parameter |
---|---|---|---|
ilp32/lp64 | unwanted | Passed through integer register (a0-a1) | Passed through integer registers (a0-a3) |
ilp32f/lp64f | F extension required | Passed through floating-point registers (fa0-fa1) | Passed through integer registers (a0-a3) |
ilp32d/lp64d | F extension and D extension are required | Passed through floating-point registers (fa0-fa1) | Passed through floating-point registers (fa0-fa1) |
-Mcmodel = meadow: for - mcmodel=medlow and - mcmodel = meadow.
-mcmodel=medlow
Use the LUI instruction to get the upper 20 bits of the symbol address. After LUI cooperates with other instructions containing low 12 bit immediate numbers, the accessible address space is - 2GiB ~ 2GiB.
For RV64, it can access 0x0000000000000000 ~ 0x000000007FFFFFFF and 0xffffffff800000000 ~ 0xffffffffffffff. The former area is + 2GiB address space and the latter area is - 2GiB address space. Other address spaces are inaccessible.
-mcmodel=medany
Use the AUIPC instruction to get the upper 20 bits of the symbol address. AUIPC can access the address space of the front and rear 2GiB (PC - 2GiB ~ PC + 2GiB) of the current PC after cooperating with other instructions containing low 12 bit immediate numbers.
For RV64, depending on the current PC value, you can access the address space from PC - 2GiB to PC + 2GiB. Assuming that the current PC is 0x10000000000000000, the accessible address range is 0x0000000080000000 ~ 0x100000007FFFFFFF. Assuming that the current PC is 0xA000000000000000, the accessible address range is 0x90000000008000000 ~ 0xa00000007fffffff.
-fvisibility=hidden: the function interfaces of the dynamic library that need to be displayed externally are displayed.
-nostdlib: does not connect the system standard startup file and standard library file, and only passes the specified file to the Connector.
-Nostart files: an entry program without the main function.
-Thello.ld: load the link address.
5.2 operation
Enter the following command to see the Hello. String output.
#qemu-system-riscv64-nographic-machinesifive_u-biosnone-kernelhello Hello.
5.3 commissioning
The comparison of debugging process only needs to add - s -S after the operation, that is
qemu-system-riscv64-nographic-machinesifive_u-biosnone-kernelhello-s-S
In addition, open another terminal input
riscv64-unknown-elf-gdbhello
Then enter target remote localhost:1234.
Adopted B_ Start break point, and single step jump through si can realize the single step operation of the program.
6. Summary
The operation of the smallest bare metal program of riscv64 is well understood, mainly sorting out its startup address and link file. Also, pay attention to the compilation parameters of gcc, which are also very important for the startup of riscv.
Editor in charge: xj
Original title: riscv64 bare metal programming practice and analysis