Design of Cache Controller

I. Experimental Contents

To realize Cache and its address transformation logic (also called Cache controller), the CPU reads data from Cache by directly connected address transformation. If it cannot read, it must consider reading data from main memory first, then writing data to Cache, and then sending data to CPU; secondly, the CPU writes data to memory.
Description: CLK is the system clock (used for counter counting control, etc.), CLR is the total clearing signal (clearing table memory, counter), WCT is the memory signal of writing Cache table, AB31.AB0 is the address bus of CPU accessing memory, 32-bit data is DB31.DB0 (data bus), RD (0, read) is the read signal of Cache, MWr (1, write) is the main one. Memory write signal, MRd (0, read) is the main memory read signal, M (1, valid, hit) is the Cache effective bit sign signal (area table memory mark), CA17. CA0 is the Cache address, MD31. MD0 is the main memory to send Cache data, D31. D0 is the Cache to send CPU data, LA3. LA0 is the intrablock address.

II. Experimental Principles

The address transformation used in this experiment is a direct association mapping method, which is simple and direct. The hardware implementation is simple, and the access speed is relatively fast, but the block conflict rate is relatively high. The main principle is that a block in main memory can only be mapped to a specific block in Cache.
Assuming that the block number of main memory is B and that of Cache is b, the mapping relationship between them can be expressed as follows:

          b = B mod Cb
 
Cb is the block capacity of Cache. If the block capacity of main memory is Mb and the area capacity is Me, the relationship between direct mapping methods is shown in Figure 1. Main memory is divided into partitions according to the size of Cache. Generally, the capacity of main memory is an integral multiple of the capacity of Cache. The number of blocks in each partition of main memory is equal to the total number of blocks in Cache. Direct mapping can only map those blocks with the same block number in each main memory area to the specific block with the same block number in Cache. For example, Block 0 of main memory can only be mapped to block 0 of Cache, block 1 of main memory can only be mapped to block 1 of Cache. Similarly, block Cb of main memory 1 (the relative block number in block 1 is 0) can only be mapped to block 0 of Cache, as shown in Figure 1. According to the address mapping rule given above, the whole Cache address is identical to the lower part of the main memory address.
  
| Figure 1: Direct Connection

The process of address transformation in direct mapping is shown in Figure 2. The main memory address is divided into three parts: area number E, block number B and block address W. The Cache address is divided into two parts: block number B and block address W. Block number B in main memory address is exactly the same as block number B in Cache address. Similarly, the intra-block address W in the main memory address is exactly the same as the intra-block address w in the Cache address. The part where the main memory address is longer than the Cache address is called the area code E.

| Figure 2: Diagram of Direct Associated Address Transform

In the process of program execution, when accessing Cache, in order to realize the conversion from main memory block number to Cache block number, a small capacity memory (called area table memory) is needed to store the main memory area number. The capacity of this memory is equal to the number of blocks of Cache, and the word length is the length of area number E in the main memory address, plus a valid bit (hit/failure).
In the process of conversion from main memory address to Cache address, the block number B in the main memory address is used to access the area table memory (using block number B as the address of the area table memory to access it), and then the read area number is compared with the area number E in the main memory address. The result of comparison is equal. If the effective bit is 1, the Cache hits, indicating that the block to be accessed has been loaded into Cache. If the comparison results are not equal, the effective bit is 1, which can be replaced, and if the effective bit is 0, the required block can be directly transferred. As for the relatively unequal situation, whether the effective bit is 1 or 0, Cache has not hit, or is called Cache failure, which means that the block to be accessed has not been loaded into Cache. At this time, the main memory address is used to access the main memory. First, the block where the address is located is read into Cache, and then the data of the address in Cache is read and sent to CPU.
The relationship between Cache and CPU and memory is shown in the following figure.

| Figure 3: Basic framework of Cache

As shown in Figure 3, 32-bit main memory address is AB31.AB0 (address bus), 32-bit data is DB31.DB0 (data bus), RD (0, read) is the read signal of Cache, MWr (1, write) is the write signal of main memory, MRd (0, read) is the read signal of main memory, D31.D0 is the data signal sent by Cache to CPU (through a three-state buffer at the exit and then output), MD31.M. D0 is the data signal sent to Cache by RAM.
As shown in Figure 4, the area number E is 14 bits, the block number B is 14 bits, and the intra-block address is 4 bits, so the Cache address is 18 bits, of which the Cache block number B is 14 bits and the intra-block address w is 4 bits, so the capacity of Cache is 256 KB (2 ^ 18) units and the block number B is 14 bits. Then the Cache is divided into 16 KB (2 ^ 14) blocks and the intra block address w is 4 bits, and each block is 16 units (one byte per unit).

| Figure 4 Address Segmentation Diagram

The way to implement Cache's memory is to first implement an 8-bit storage unit, and then use this 8-bit storage unit to construct a 256 Kb X 8-bit Cache (address 18 bits).
A 15-bit (14+1) storage unit is implemented, and then a 16kX 15-bit area table memory (14-bit address is the same as block number B) is constructed with this 15-bit storage unit, which is used to store area number (14-bit) and effective bit M(1-bit). In this part, we also need to implement an area code E comparator, that is, if the area number E of the main memory address is equal to the area number E of the corresponding unit taken out by the block number B as the address in the area table memory, the effective bit sign is M, and the effective bit M=1, then the Cache hits, otherwise the Cache fails, and the M=0 indicates the Cache fails.
When a Cache hits, it sends the data of the corresponding unit in the Cache storage to the CPU, which is a relatively simple process. When Cache fails, the data in the corresponding block in main memory is read out and written into Cache, so the Cache controller generates the read signal MRd (0, read) of the main memory. Since each block of Cache occupies 16 units and is accessed by 32 bits (4 bytes), it is necessary to access the main memory four times continuously, read the data of the block in the memory, that is, 16 bytes, and then write. Enter the corresponding block of Cache, and then modify the area table memory. As for access to main memory, counters are used. When writing data, if there is the address data in Cache, then modify the address content of memory (MWr is 1, write, write signal for main memory); if there is no address data in Cache, directly modify the content of the address unit of memory. When reading/writing memory, attention should be paid to interlocking.
  

3. Implementation of experimental code

There are two modules, one is ten RAM module and the other is cache module. Here, the RAM module is implemented first:
RAM.v

`timescale 1ns / 1ps

module RAM(
    input wire CMWr,//cache and Write Signal in Main Memory
    input wire MRd,//Main Memory Read Signal
    input wire[2:0] counterFromCache,//A counter for reading data
    input wire[31:0] Address,//address
    input wire[31:0] DataFromCPU,//Data from CPU
    output reg[1:0] signalToCache,//The signal passed to the cache triggers the write operation of the cache
    output reg[31:0] DataWithCache//Data passed to cache
    );

    reg[7:0] register[0:255];//Number of registers,2^32Each occupies8bit;Here's just a simulation. Let's make one.256Space
    reg[31:0] wordNum;//Font size

    integer i;

    initial begin
        signalToCache = 0;//Semaphore0
        for(i = 0; i < 256; i = i + 1) begin//Initialize main memory space
            register[i] = 8'b00001111;
        end
        register[32'h0000000] = 0;//Partial Assignment for Easy Testing
        register[32'h0000001] = 1;
        register[32'h0000002] = 2;
        register[32'h0000003] = 3;
        register[32'h0000004] = 4;
        register[32'h0000005] = 5;
        register[32'h0000006] = 6;
        register[32'h0000007] = 7;
        register[32'h0000008] = 8;
        register[32'h0000009] = 9;
        register[32'h000000A] = 10;
        register[32'h000000B] = 11;
        register[32'h000000C] = 12;
        register[32'h000000D] = 13;
        register[32'h000000E] = 14;
        register[32'h000000F] = 15;
    end

    always @(CMWr or MRd or counterFromCache) begin
        if(MRd == 0) begin//Reading signal is valid
            wordNum = Address[31:4] * 16 + counterFromCache * 4;//The count signal of cache acts as addressing base at the same time
            DataWithCache[31:24] = register[wordNum];
            DataWithCache[23:16] = register[wordNum + 1];
            DataWithCache[15:8] = register[wordNum + 2];
            DataWithCache[7:0] = register[wordNum + 3];
            signalToCache = signalToCache + 1;//Semaphore+1
        end
        else if(CMWr == 1) begin//Write signal is valid
            register[Address] = DataFromCPU[31:24];
            register[Address + 1] = DataFromCPU[23:16];
            register[Address + 2] = DataFromCPU[15:8];
            register[Address + 3] = DataFromCPU[7:0];
        end
    end

endmodule

Then implement the Cache module.
Mycache.v

`timescale 1ns / 1ps

module Mycache(
    input wire RD,//The read signal of cache; = 0, read
    input wire CMWr,//cache and Write Signal in Main Memory
    input wire[1:0] signalFromRAM,//A signal from RAM that reads data from RAM once and triggers a cache write operation
    input wire[31:0] DataFromCPU,
    input wire[31:0] DataWithRAM,
    input wire[31:0] Address,
    output reg MRd,//Read signal in main memory
    output reg[2:0] counterToRAM,
    output reg[31:0] DataToCPU
    );

    reg[2:0] counter;//Counter 2 bits, maximum 3
    reg[7:0] cacheData [0:127];//cache capacity size, 2 ^ 18 (18'h3ffff) x 8bit; 128 array spaces open for test use
    reg[14:0] blockTable [0:63];//Area table capacity size, 2 ^ 14 x 15 bit; open 64 spaces for testing
    reg[13:0] blockNum;//Block number 14 bit size
    reg[13:0] partNum;//Area code 14 bit size
    reg[31:0] cacheAddress;//The address of cache; should be 18 bits, later changed

    integer i;//Indexes

    initial begin
        for(i = 0; i < 64; i = i + 1)//Initialization of effective bits of area table memory
            blockTable[i][14] = 0;
        MRd = 1;//Main memory read signal set bit unreadable
        counter = 0;//The counter is set to 0. Because each block is 16 bytes and reads 4 bytes at a time, it needs to read 4 times.
    end

    always @(RD or CMWr or Address or DataFromCPU or DataWithRAM or signalFromRAM) begin
        partNum = Address[31:18];//The first 14 are area codes
        blockNum = Address[17:4];//The middle 14 bits are block numbers

        if(RD == 0 && counter == 0) begin//If it is a read signal, and the counter is 0 (not the state of reading data from main memory)
            if(partNum == blockTable[blockNum][13:0] && blockTable[blockNum][14] == 1) begin//cache hit
                cacheAddress = 16*blockNum + Address[3:0];//Location. Location is block number * size + intra-block address
                DataToCPU[31:24] = cacheData[cacheAddress];//Reading data to CPU data storage mode is large-end mode
                DataToCPU[23:16] = cacheData[cacheAddress + 1];
                DataToCPU[15:8] = cacheData[cacheAddress + 2];
                DataToCPU[7:0] = cacheData[cacheAddress + 3];
            end
            else begin
                MRd = 0;//Otherwise, the main memory read signal is valid; start fetching read data
                counter = 1;//And the value of the counter is set to 1; then it is accumulated in turn.
                counterToRAM = 0;//Set the counter passed to main memory to 0
            end
        end
        else if(RD == 0 && counter >= 1) begin//At this point, cache is reading the data in RAM
            cacheAddress = 16*blockNum + 4*(counter - 1);//Start saving data
            cacheData[cacheAddress] = DataWithRAM[31:24];
            cacheData[cacheAddress + 1] = DataWithRAM[23:16];
            cacheData[cacheAddress + 2] = DataWithRAM[15:8];
            cacheData[cacheAddress + 3] = DataWithRAM[7:0];
            if(counter == 4) begin//When the counter equals 4, 16 bytes are saved and stored.
                MRd = 1;//Invalid data reading in main memory
                cacheAddress = 16*blockNum + Address[3:0];//Start reading data from cache into CPU
                DataToCPU[31:24] = cacheData[cacheAddress];
                DataToCPU[23:16] = cacheData[cacheAddress + 1];
                DataToCPU[15:8] = cacheData[cacheAddress + 2];
                DataToCPU[7:0] = cacheData[cacheAddress + 3];
                blockTable[blockNum][13:0] = partNum;
                blockTable[blockNum][14] = 1;
                counter = 0;//Counter clearing
            end
            else begin
                #10;//Delay, wait for the step to complete
                counter = counter + 1;//Counter increase
                counterToRAM = counter - 1;//The counters passed to RAM are also increased accordingly.
            end
        end
        else if(CMWr == 1) begin//cache writes data from CPU
            if(partNum == blockTable[blockNum][13:0] && blockTable[blockNum][14] == 1) begin
                cacheAddress = 16*blockNum + Address[3:0];
                cacheData[cacheAddress] = DataFromCPU[31:24];
                cacheData[cacheAddress + 1] = DataFromCPU[23:16];
                cacheData[cacheAddress + 2] = DataFromCPU[15:8];
                cacheData[cacheAddress + 3] = DataFromCPU[7:0];
            end
        end
    end

endmodule

Through the top-level module, the data path between them is established.
Cache.v

`timescale 1ns / 1ps
`include "Mycache.v"
`include "RAM.v"

module Cache(
    input wire rd,
    input wire cmwr,
    input wire[31:0] address,
    input wire[31:0] from_cpu,
    output wire mrd,
    output wire[31:0] to_cpu,
    output wire[31:0] datawithram
    );

    wire[2:0] counter;//Counter
    wire[1:0] signal;//cache writes trigger signals from RAM data

    Mycache mycache(.RD(rd), .CMWr(cmwr), .MRd(mrd), .DataToCPU(to_cpu),
        .DataFromCPU(from_cpu), .Address(address), .DataWithRAM(datawithram),
        .counterToRAM(counter), .signalFromRAM(signal));

    RAM myram(.CMWr(cmwr), .MRd(mrd), .counterFromCache(counter), .Address(address),
        .DataFromCPU(from_cpu), .signalToCache(signal), .DataWithCache(datawithram));

endmodule

Then, a simple test module is built.
Test.v

`timescale 1ns / 1ps
`include "Cache.v"

module Test;

    reg rd;
    reg cmwr;
    reg[31:0] from_cpu;
    reg[31:0] address;

    wire[31:0] to_cpu;
    wire mrd;
    wire[31:0] datawithram;

    Cache uut(
        .rd(rd),
        .cmwr(cmwr),
        .from_cpu(from_cpu),
        .to_cpu(to_cpu),
        .address(address),
        .mrd(mrd),
        .datawithram(datawithram)
    );

    initial begin
        //Initialize input
        rd = 1;
        cmwr = 0;
        from_cpu = 0;
        address = 0;
        #100;//Delay #100ns for a period

        //The data is not in cache, and the cpu reads the cache
        address = 32'h00000002;
        rd = 0;
        cmwr = 0;
        #100;

        //The data is in the cache, and the block has been stored in the data
        address = 32'h00000003;
        rd = 0;
        cmwr = 0;
        #100;

        //CPU writes data to cache and RAM
        address = 32'h00000007;
        from_cpu = 32'haabbccdd;
        rd = 1;
        cmwr = 1;
        #100;

        //Writing of test data
        address = 32'h00000005;
        rd = 0;
        cmwr = 0;
        #100;

        //Change only the data in RAM
        address = 32'h00000012;
        from_cpu = 32'h11223344;
        rd = 1;
        cmwr = 1;
        #100;

        //Test the write condition. Data is first written to the cache
        address = 32'h00000012;
        rd = 0;
        cmwr = 0;
        #100;

    end

endmodule

IV. EXPERIMENTAL RESULTS

Added by Shaba1 on Mon, 17 Jun 2019 00:14:55 +0300