riscV bare metal programming

  • 1. General

  • 2. Composition of minimum works

  • 3. Link script

  • 4. Executable program source code analysis

  • 5. Compilation and operation

    • 5.1 compilation

    • 5.2 operation

    • 5.3 commissioning

  • 6. Summary

1. General

Any chip needs a piece of assembly code before startup, which can reflect some characteristics of architecture design. It is often necessary to pay attention to the meaning of this assembly code when doing embedded bottom development, so that when using it, we can fully understand what has been done during startup, and we can repeat the deduction when encountering problems in subsequent programs.

This paper analyzes the initial startup code of riscv64, starting from the smallest bare metal code to thoroughly understand the startup process of riscv64.

The environment used this time is riscv64 qemu, and the compiler is downloaded through the following address:

https://www.sifive.com/software

2. Composition of minimum works

A minimal project consists of two things: link scripts and source code.

The source code can make cpu The executable code is compiled through the cross compilation tool chain to generate executable binary programs.

Linked script files can tell the layout of the program, such as code segments, function entries, etc. With these two files, the program will be compiled lo ad er can be run on the board.

3. Link script

Let's take a look at the hello.ld file.

OUTPUT_ARCH("riscv")
OUTPUT_FORMAT("elf64-littleriscv")
ENTRY(_start)
SECTIONS
{
/*text:testcodesection*/
.=0x80000000;
.text:{*(.text)}
/*data:Initializeddatasegment*/
.gnu_build_id:{*(.note.gnu.build-id)}
.data:{*(.data)}
.rodata:{*(.rodata)}
.sdata:{*(.sdata)}
.debug:{*(.debug)}
.+=0x8000;
stack_top=.;

/*Endofuninitalizeddatasegement*/
_end=.;
}

For link scripts, it is often stipulated how to put the input file into memory according to a specific address.

For the above script:

OUTPUT_ARCH("riscv"): indicates that the schema of the input file is riscv.

OUTPUT_ Format ("elf64 littleriscv"): indicates the small end of elf64. commonly arm , riscv and x86 are small terminals, and small terminals are more mainstream.

Entry (_start): indicates that the function entry is_ start.

Then start the layout of the code segment, and the starting address is 0x8000000. Then put the code segment, data segment, read-only data segment, global data segment, debug segment and so on.

Note here:

.+=0x8000;
stack_top=.;

Here, 0x8000 bytes of space is reserved at the top of the stack as the stack space of the program. Because the stack grows upward, some stack space is reserved here.

View the layout of the generator through disassembly

#riscv64-unknown-elf-objdump-dhello

hello:fileformatelf64-littleriscv


Disassemblyofsection.text:

0000000080000000<_start>:
80000000:f14022f3csrrt0,mhartid
80000004:00029c63bnezt0,8000001c
80000008:00008117auipcsp,0x8
8000000c:04410113addisp,sp,68#8000804c<_end>
80000010:00000517auipca0,0x0
80000014:03450513addia0,a0,52#80000044
80000018:008000efjalra,80000020

000000008000001c:
8000001c:0000006fj8000001c

0000000080000020:
80000020:100102b7luit0,0x10010
80000024:00054303lbut1,0(a0)
80000028:00030c63beqzt1,80000040
8000002c:0002a383lwt2,0(t0)#10010000
80000030:fe03cee3bltzt2,8000002c
80000034:0062a023swt1,0(t0)
80000038:00150513addia0,a0,1
8000003c:fe9ff06fj80000024
80000040:00008067ret

For qemu, sifive_ The starting address of u is 0x80000000. Put the entry of the code segment here.

4. Executable program source code analysis

The layout of the link script has been described earlier, that is, the execution address has been assigned to the program. Each function and the address of the function entry have been planned. How to write the specific entry function?

Look at the programming code of hello.s:

.align 2
.equ UART_BASE,         0x10010000
.equ UART_REG_TXFIFO,   0

.section .text
.globl _start

_start:
        csrr  t0, mhartid             # read hardware thread id (`hart` stands for `hardware thread`)
        bnez  t0, halt                   # run only on the first hardware thread (hartid == 0), halt all the other threads

        la    sp, stack_top           # setup stack pointer

        la    a0, msg                 # load address of `msg` to a0 argument register
        jal   puts                    # jump to `puts` subroutine, return address is stored in ra regster

halt:   j     halt                    # enter the infinite loop

puts:                                 # `puts` subroutine writes null-terminated string to UART (serial communication port)
                                      # input: a0 register specifies the starting address of a null-terminated string
                                      # clobbers: t0, t1, t2 temporary registers

        li    t0, UART_BASE           # t0 = UART_BASE
1:      lbu   t1, (a0)                # t1 = load unsigned byte from memory address specified by a0 register
        beqz  t1, 3f                  # break the loop, if loaded byte was null

                                      # wait until UART is ready
2:      lw    t2, UART_REG_TXFIFO(t0) # t2 = uart[UART_REG_TXFIFO]
        bltz  t2, 2b                  # t2 becomes positive once UART is ready for transmission
        sw    t1, UART_REG_TXFIFO(t0) # send byte, uart[UART_REG_TXFIFO] = t1

        addi  a0, a0, 1               # increment a0 address by 1 byte
        j     1b

3:      ret

.section .rodata
msg:
     .string "Hello.
"

According to the rules of assembly language

.align2

Indicates that the entry program is aligned with 2 ^ 2, that is, 4 bytes.

.equUART_BASE,0x10010000
.equUART_REG_TXFIFO,0

The of UART is defined register The base address of the.

Then mainly from_ Start: start analysis.

csrrt0,mhartid#readhardwarethreadid(`hart`standsfor`hardwarethread`)
bnezt0,halt#runonlyonthefirsthardwarethread(hartid==0),haltalltheotherthreads

According to the design of riscv, if a component contains an independent finger taking unit, the component is called core.

A RiscV compatible core can support multiple RiscV compatible hardware threads (harts) through multithreading technology (or hyper threading technology). Harts here refers to hardware thread, which means hardware thread.

The above contains one E51 core and four U54 cores.

This assembly is to suspend other cores and only run the core with hartid == 0.

Then

lasp,stack_top#setupstackpointer

Here, the stack pointer sp is assigned, and sp points to the top of the stack.

laa0,msg#loadaddressof`msg`toa0argumentregister
jalputs#jumpto`puts`subroutine,returnaddressisstoredinraregster

For riscv architecture, the a0 register represents the first parameter assignment, and then jumps to the puts function.

At this time, the parameter passed in the past is a0, that is

.section.rodata
msg:
.string"Hello.
"

Point to a read-only character String structured data.

Implementation of puts

It is important to describe the preparation of a serial driver through assembly.

puts:#`puts`subroutinewritesnull-terminatedstringtoUART(serialcommunicationport)
#input:a0registerspecifiesthestartingaddressofanull-terminatedstring
#clobbers:t0,t1,t2temporaryregisters

lit0,UART_BASE#t0=UART_BASE
1:lbut1,(a0)#t1=loadunsignedbytefrommemoryaddressspecifiedbya0register
beqzt1,3f#breaktheloop,ifloadedbytewasnull

#waituntilUARTisready
2:lwt2,UART_REG_TXFIFO(t0)#t2=uart[UART_REG_TXFIFO]
bltzt2,2b#t2becomespositiveonceUARTisreadyfortransmission
swt1,UART_REG_TXFIFO(t0)#sendbyte,uart[UART_REG_TXFIFO]=t1

addia0,a0,1#incrementa0addressby1byte
j1b

3:ret

First, pass the parameter through the a0 register just now, and then read the string from 1:, beqz t1, 3f means that when t1 == 0, jump to before 3:. The 2: loop will jump out.

2: It is the process of sending data to the serial FIFO.

Here, a string output can be executed normally.

5. Compilation and operation

5.1 compilation

After the above program analysis is completed, it can be compiled.

riscv64-unknown-elf-gcc-march=rv64g-mabi=lp64-static-mcmodel=medany-fvisibility=hidden-nostdlib-nostartfiles-Thello.ld-Isifive_uhello.s-ohello

The above compilation process can generate a hello program.

#readelf-hhello
ELFHeader:
Magic:7f454c46020101000000000000000000
Class:ELF64
Data:2'scomplement,littleendian
Version:1(current)
OS/ABI:UNIX-SystemV
ABIVersion:0
Type:EXEC(Executablefile)
Machine:RISC-V
Version:0x1
Entrypointaddress:0x80000000
Startofprogramheaders:64(bytesintofile)
Startofsectionheaders:4680(bytesintofile)
Flags:0x0
Sizeofthisheader:64(bytes)
Sizeofprogramheaders:56(bytes)
Numberofprogramheaders:1
Sizeofsectionheaders:64(bytes)
Numberofsectionheaders:7
Sectionheaderstringtableindex:6

You can analyze the parameters carried by gcc.

-march: you can specify the compiled architecture, such as rv32 or rv64.

-Static: indicates static compilation.

-mabi=lp64: data model and floating point parameter passing rules

Data model:

-int word lengthlong word lengthPointer word length
ilp32/ilp32f/ilp32d32bits32bits32bits
lp64/lp64f/lp64d32bits64bits64bits

Floating point transfer rule

-Floating point extensions required?float parameterdouble parameter
ilp32/lp64unwantedPassed through integer register (a0-a1)Passed through integer registers (a0-a3)
ilp32f/lp64fF extension requiredPassed through floating-point registers (fa0-fa1)Passed through integer registers (a0-a3)
ilp32d/lp64dF extension and D extension are requiredPassed through floating-point registers (fa0-fa1)Passed through floating-point registers (fa0-fa1)

-Mcmodel = meadow: for - mcmodel=medlow and - mcmodel = meadow.

-mcmodel=medlow

Use the LUI instruction to get the upper 20 bits of the symbol address. After LUI cooperates with other instructions containing low 12 bit immediate numbers, the accessible address space is - 2GiB ~ 2GiB.

For RV64, it can access 0x0000000000000000 ~ 0x000000007FFFFFFF and 0xffffffff800000000 ~ 0xffffffffffffff. The former area is + 2GiB address space and the latter area is - 2GiB address space. Other address spaces are inaccessible.

-mcmodel=medany

Use the AUIPC instruction to get the upper 20 bits of the symbol address. AUIPC can access the address space of the front and rear 2GiB (PC - 2GiB ~ PC + 2GiB) of the current PC after cooperating with other instructions containing low 12 bit immediate numbers.

For RV64, depending on the current PC value, you can access the address space from PC - 2GiB to PC + 2GiB. Assuming that the current PC is 0x10000000000000000, the accessible address range is 0x0000000080000000 ~ 0x100000007FFFFFFF. Assuming that the current PC is 0xA000000000000000, the accessible address range is 0x90000000008000000 ~ 0xa00000007fffffff.

-fvisibility=hidden: the function interfaces of the dynamic library that need to be displayed externally are displayed.

-nostdlib: does not connect the system standard startup file and standard library file, and only passes the specified file to the Connector.

-Nostart files: an entry program without the main function.

-Thello.ld: load the link address.

5.2 operation

Enter the following command to see the Hello. String output.

#qemu-system-riscv64-nographic-machinesifive_u-biosnone-kernelhello
Hello.

5.3 commissioning

The comparison of debugging process only needs to add - s -S after the operation, that is

qemu-system-riscv64-nographic-machinesifive_u-biosnone-kernelhello-s-S

In addition, open another terminal input

riscv64-unknown-elf-gdbhello

Then enter target remote localhost:1234.

Adopted B_ Start break point, and single step jump through si can realize the single step operation of the program.

6. Summary

The operation of the smallest bare metal program of riscv64 is well understood, mainly sorting out its startup address and link file. Also, pay attention to the compilation parameters of gcc, which are also very important for the startup of riscv.

Editor in charge: xj

Original title: riscv64 bare metal programming practice and analysis

Keywords: Linux risc-v

Added by tkm on Tue, 19 Oct 2021 00:08:37 +0300