Target Reflected Echo Detection Algorithms and Their Implementation on FPGA Part 3: Top-level Implementation of Square, Integral Circuits and Algorithms

Target Reflected Echo Detection Algorithms and Their Implementation on FPGA Part 3:

Top-level Implementation of Square and Integral Circuits and Algorithms

Some time ago, I came into contact with a project of sonar target echo detection. The core function of sonar receiver is to recognize the echo of the excitation signal emitted by the transmitter in the reflected echo which contains a lot of noise. I will share the development process and code of this echo recognition algorithm based on FPGA in several articles. Welcome your comments. The following original content is welcomed to be reproduced by netizens, but please indicate the source: https://www.cnblogs.com/helesheng.

In the first part of this series of blogs, based on the simulation results, I believe that the method of calculating target distance by using the results of "cross correlation of reflected echo and excitation signal" has high performance and computational efficiency. In the second blog post in this series, I implemented data cross-correlation/convolution/FIR filter calculation using a semi-parallel "dual-memory convolution section" structure in the low-cost FPGA of Cyclone series. As the third blog article in this series, I will implement the square sum integral calculation of cross-correlation signals, and combine all the algorithms in the top-level file as a whole.

By looking for The location of the extreme point of the target reflector is used to determine the time point when the target reflects the echo.

In Formula (1)It is the part of cross-correlation algorithm, and its implementation on FPGA has been introduced in the previous paper. According to the symbols defined above, the discretized cross-correlation signal is recorded as R[k]. After further discretization, (1) can be rewritten to the following form that can be implemented by FPGA:

1. Realization of Square Circuit

Using MegaWizard in Quartus-II to configure the square computing circuit, the structure is shown in the following figure.

Figure 1 Square Circuit Configuration

II. Design of Integral Circuit

According to formula (2), in order to calculate the value of objective function P[n], it is necessary to sum (integrate) the historical values. Of course, we can use buffers to store historical k0 values and sum the k0 values in the buffers before each output. If k0 is equal to the length of the excitation signal N, then N-1 addition must be calculated before each output P[n]. When N is 64 (as mentioned earlier), this is almost an impossible task. I designed the circuit structure shown in the following figure to achieve the summation of 64 historical data.

Figure 2 Integrator Circuit Structure

One of the 64-depth shift registers serves to provide "historical data" before 64 samples. First, the current dataAnd the oldest historical dataFind the difference (complement), and then sum the difference continuously. In this way, according to the law of addition and exchange, the final summation result is equivalent to adding the current data, subtracting the oldest historical data. As long as the initial value of the shift register is all 0, after 64 operations, the sum (integral) of 64 historical data is obtained from the output. The shift register configuration with MegaWizard is shown in the following figure.

Figure 3 Structure of shift register

The structure of the former subtractor is shown in the following figure.

Figure 4 Structure of subtractor

The accumulator is implemented in hardware description language with the following code.

1 always @ (posedge start or negedge rst_n)
2 begin
3     if(!rst_n)
4         acc_out[39:0] <= 40'd0;
5     else begin
6         acc_out[39:0] = $signed(acc_out[39:0]) + $signed(acc_in[31:0]);
7     end
8 end

The keyword $signed denotes an adder with signed numbers.

3. Top-level design of algorithm

In order to connect the aforementioned A/D and D/A control circuit, cross-correlation/convolution/FIR filter circuit and square-sum integral circuit into one system, it is necessary to exemplify and connect the above module circuits in the top-level design documents. In addition, the top-level design document will also control the overall working schedule of the system. The top-level file I designed is shown below.

  1 module CONV_POW_AD_DA(
  2 ///////////Top-level module, responsible for calling ADC to collect data, convolute, and then output the results of convolution/FIR with DAC/////////
  3     input rst_n,//Low level reset signal
  4     input iclk20,//External crystal input 20 MHz
  5     output sck_da,//D/A Converter SPI Clock
  6     output mosi_da,//D/A Converter SPI data signal
  7     output cs_da,//D/A Chip Selection Signal of Converter
  8     output ld_da,//D/A Dual Channel Data Loading Signal of Converter
  9     output sck_ad,//A/D Converter SPI Clock
 10     output mosi_ad,//A/D Converter SPI data output
 11     input miso_ad,////A/D Converter SPI data input
 12     output cs_ad//A/D Converter SPI Mouth Selection Signal
 13     );
 14      wire clk;
 15   reg start;
 16   reg[15:0] cnt;//A timer used to generate the overall cycle
 17      reg[11:0] tst_data;//Counter for generating test data
 18   parameter CNT_NUM = 16'd2000;//At 100M clock, 2000 frequency division means 50KHz overflow rate
 19   wire[11:0] ad_data;//AD Internal connection of conversion results
 20   wire[11:0] data_cha;//A Channel data
 21   wire[11:0] data_chb;//B Channel data
 22      wire[15:0] shft_data1;//Data passed between two sections of convolution
 23      wire[15:0] shft_data2;//Data passed between two sections of convolution
 24      wire[15:0] shft_data3;//Data passed between two sections of convolution
 25      wire[15:0] shft_data4;//Data passed between two sections of convolution
 26      wire[39:0] conv_res1;//The results of convolution in Section I
 27      wire[39:0] conv_res2;//CONVOLUTION RESULTS IN SECTION 2
 28      wire[39:0] conv_res3;//CONVOLUTION RESULTS IN SECTION 3
 29      wire[39:0] conv_res4;//CONVOLUTION RESULTS IN SECTION 4
 30      wire[39:0] conv_res;//The result of total convolution
 31      wire[33:0] acc_sum;//Finally, the cumulative sum of each section is calculated.
 32      wire[31:0] shiftout;//Removal data from shift registers
 33      wire[15:0] ac_sig;//AC signal after removal of DC bias
 34      wire[31:0] pow_sig;//Energy of signal
 35      wire signed[31:0] acc_in;//The input of a summation accumulator is also an addition./Output of subtractor
 36      reg signed[39:0] acc_out;//The output of the accumulator
 37      wire signed[19:0] data_out;//Output of Cumulative Results after Cumulative Formulation
 38      
 39      //assign data_cha[11:0] = acc_out[25:14];//passageway a Output 64 Point Energy Accumulation Results
 40      //assign data_cha[11:0] = shiftout[18:7];//passageway a Output data
 41      assign data_cha[11:0] = pow_sig[19:8];//passageway a Square of output AC signal
 42      //assign data_cha[11:0] = ad_data[11:0];//passageway a Output data
 43      //assign data_cha[11:0] = shft_data2[11:0];//passageway a Output data
 44      //assign data_chb[11:0] = conv_res[26:15];//passageway b Output data
 45      //The coefficient itself is 16. bit Symbolic, as FIR If the sum of the coefficients of the filter is 1, then 16 bits The sum of all the signed coefficients is 32768.
 46      //Therefore, it is necessary to shift the filtering result to the right by 15. bit，So take 26 of the results.:15. But because of DA Reference voltage 2.048V，less than AD 3.3V So the filtering result is still slightly smaller.
 47      //assign data_chb[11:0] = data_out[14:3];//passageway b Output cumulative result prescription
 48      assign data_chb[11:0] = acc_out[28:17];//passageway b Output 64 Point Energy Accumulation Results
 49      //assign data_chb[11:0] = pow_sig[20:9];//passageway b Square of output AC signal
 50      //assign data_chb[11:0] = acc_sum[26:15];//passageway b Output filtering/Convolution results
 51      //assign data_chb[11:0] = conv_res1[26:15];//passageway b Output data
 52      
 53      /////Connections used in the following coefficient configurations/////
 54      wire rdy_work;//Configuration Completion Signal, High Level Indicates Configuration Completion
 55      wire[7:0] wr_blk_addr;//Backward Coefficient blk Configured address
 56      wire[15:0] data_bank;//Directional coefficients blk Configured data
 57      wire csh0,csh1,csh2,csh3;//Coefficient allocated for each section blk Gating line
 58      PLL20_100 i_pll20_100(//Instantiation PLL Clock generation
 59         .inclk0(iclk20),
 60         .c0(clk)//pll Output 100 M Working clock
 61         );
 62         
 63    always @ (posedge clk or negedge rdy_work)
 64     begin
 65         if(!rdy_work)
 66         begin
 67             start <= 1'd0;
 68             cnt[15:0] <= 16'd0;
 69                 tst_data[11:0] <= 12'd0;        //Counters for testing
 70         end
 71         else begin
 72             //////////Maintenance Period Counter///////////
 73             if(cnt < CNT_NUM-1) begin
 74                 cnt[15:0] <= cnt[15:0] + 16'd1;
 75                      tst_data[11:0] <= tst_data[11:0];
 76                 end     
 77             else begin
 78                 cnt[15:0] <= 16'd0;
 79                      tst_data[11:0] <= tst_data[11:0] + 12'd1;        //Counters for testing
 80                 end
 81                
 82             ////////Generate startup signal/////
 83             if((cnt > 102)&(cnt <= 105))//In 105th clk Generate startup signal
 84                 start <= 1'D1;//Start MCP4822 Output State Machine
 85             else
 86                 start <= 1'D0;        
 87         end
 88     end
 89 
 90      init_coe_blk i_init_coe_blk(//The instantiation coefficient configuration module reads the coefficients from the coefficient pool and directs them to the coefficients. blk The function of configuring data is only valid at the beginning of reset. ready Signal Control Subsequent Module
 91                     .clk(clk),
 92                     .rst_n(rst_n),//Overall reset signal
 93                     .ready(rdy_work),//Configuration Completion Signal, High Level Indicates Configuration Completion
 94                     .wr_blk_addr(wr_blk_addr),//Backward Coefficient blk Configured address
 95                     .data_bank(data_bank),//Directional coefficients blk Configured data
 96                     .csh0(csh0),//Section I Coefficient of Allocation blk Gating line
 97                     .csh1(csh1),//Section II Coefficient of Allocation blk Gating line
 98                     .csh2(csh2),//Section 3 Coefficient of Allocation blk Gating line
 99                     .csh3(csh3)//Section IV Coefficient of Allocation blk Gating line
100                     );
101      
102     CONV_SER16 i_conv_ser16_I(//Instantiate the first 16-order convolution/fir Circuit
103                     .clk(clk),
104                     .a({4'd0,ad_data}),//AD conversion results as data input
105                     //.a({4'd0,tst_data[11:0]}),//AD Transform results as data input
106                     .en(rdy_work),//Coefficient configuration is completed before enabling
107                     .coe_data_in16(data_bank),//Initialization Coefficient Data Input
108                     .wr_coe_addr(~wr_blk_addr[3:0]),//Address Input of Initialization Coefficient
109                     ////!!!Particular attention is paid here to the fact that the multiply-add data enters the multiply-add operation from the large address, so the address requires a complement to flip the coefficients. Amount to: fliplr();!!!!!//
110                     .wr_coe_clk(clk),//Initialization Coefficient Writing Clock
111                     .wr_coe_en(csh0),//The enabler of coefficient configuration is generated by address decoding of initialization module, which facilitates different coefficients. blk Gating
112                     .start(start),//Input Start Convolution Sum fir Control end
113                     //.shft_out_dp_data(),
114                     .shft_out_dp_data(shft_data1[15:0]),
115                     //.s_latch(conv_res)
116                     .s_latch(conv_res1[39:0])
117                     );
118     CONV_SER16 i_conv_ser16_II(//Instantiate the second 16-order convolution/fir Circuit
119                     .clk(clk),
120                     .a(shft_data1[15:0]),//Data removed from the first section of the cascade
121                     .en(rdy_work),//Coefficient configuration is completed before enabling
122                     .coe_data_in16(data_bank),//Initialization Coefficient Data Input
123                     .wr_coe_addr(~wr_blk_addr[3:0]),//Address Input of Initialization Coefficient
124                     ////!!!Particular attention is paid here to the fact that the multiply-add data enters the multiply-add operation from the large address, so the address requires a complement to flip the coefficients. Amount to: fliplr();!!!!!//
125                     .wr_coe_clk(clk),//Initialization Coefficient Writing Clock
126                     .wr_coe_en(csh1),//The enabler of coefficient configuration is generated by address decoding of initialization module, which facilitates different coefficients. blk Gating
127                     .start(start),//Input Start Convolution Sum fir Control end
128                     .shft_out_dp_data(shft_data2[15:0]),
129                     .s_latch(conv_res2[39:0])
130                     );
131     CONV_SER16 i_conv_ser16_III(//Instantiate the third 16-order convolution/fir Circuit
132                     .clk(clk),
133                     .a(shft_data2[15:0]),//Data removed from the first section of the cascade
134                     .en(rdy_work),//Coefficient configuration is completed before enabling
135                     .coe_data_in16(data_bank),//Initialization Coefficient Data Input
136                     .wr_coe_addr(~wr_blk_addr[3:0]),//Address Input of Initialization Coefficient
137                     ////!!!Particular attention is paid here to the fact that the multiply-add data enters the multiply-add operation from the large address, so the address requires a complement to flip the coefficients. Amount to: fliplr();!!!!!//
138                     .wr_coe_clk(clk),//Initialization Coefficient Writing Clock
139                     .wr_coe_en(csh2),//The enabler of coefficient configuration is generated by address decoding of initialization module, which facilitates different coefficients. blk Gating
140                     .start(start),//Input Start Convolution Sum fir Control end
141                     .shft_out_dp_data(shft_data3[15:0]),
142                     .s_latch(conv_res3[39:0])
143                     );
144     CONV_SER16 i_conv_ser16_IV(//Instantiate the fourth 16-order convolution/fir Circuit
145                     .clk(clk),
146                     .a(shft_data3[15:0]),//Data removed from the first section of the cascade
147                     .en(rdy_work),//Coefficient configuration is completed before enabling
148                     .coe_data_in16(data_bank),//Initialization Coefficient Data Input
149                     .wr_coe_addr(~wr_blk_addr[3:0]),//Address Input of Initialization Coefficient
150                     ////!!!Particular attention is paid here to the fact that the multiply-add data enters the multiply-add operation from the large address, so the address requires a complement to flip the coefficients. Amount to: fliplr();!!!!!//
151                     .wr_coe_clk(clk),//Initialization Coefficient Writing Clock
152                     .wr_coe_en(csh3),//The enabler of coefficient configuration is generated by address decoding of initialization module, which facilitates different coefficients. blk Gating
153                     .start(start),//Input Start Convolution Sum fir Control end
154                     .shft_out_dp_data(shft_data4[15:0]),
155                     .s_latch(conv_res4[39:0])
156                     );
157     PADD    i_PADD (//For convenience DA Output biases all outputs to positive numbers
158                     .clock ( start ),
159                     .data0x ( conv_res1[31:0] ),
160                     .data1x ( conv_res2[31:0] ),
161                     .data2x ( conv_res3[31:0] ),
162                     .data3x ( conv_res4[31:0] ),
163                     .result ( acc_sum[33:0] )
164                     );                    
165 
166     MCP4822 i_mcp4822(.clk100m(clk),
167                       .rst_n(rdy_work),
168                       .start(start),
169                       .dac_data_a(data_cha),//passageway a Output data
170                       //.dac_data_b(dds_data12),//passageway b Connect DDS content
171                       //.dac_data_b(product12),//passageway b High 12-bit join multiplication
172                       //.dac_data_b(conv_data12),//passageway b The result of join convolution
173                            .dac_data_b(data_chb),//passageway b Output data
174                       .cs(cs_da),
175                       .mosi(mosi_da),
176                       .sck(sck_da),
177                       .ld(ld_da)
178                       );   
179     MCP3202 i_mcp3202(
180                      .clk100m(clk),
181                      .rst_n(rdy_work),
182                      .start(start),
183                      .adc_data_a(ad_data),
184                      .cs(cs_ad),
185                      .mosi(mosi_ad),
186                      .miso(miso_ad),
187                      .sck(sck_ad)
188                      );
189                             
190     sub_offset    i_sub_offset (//This subtractor is used from AD The result is a 16-bit complement when the DC bias is removed.
191     .dataa ( {4'd0,acc_sum[26:15]} ),//Convolution results for calculation
192     //.dataa ( {4'd0,ad_data[11:0]} ),//AD It turns out to be a positive integer, with a 16-bit complement to zero.
193     .datab ( 16'h0000 ),//DC bias is considered to be 1/2 full range 16'h0800，No DC bias is 16'h0000
194     .result ( ac_sig[15:0] )//After removing DC bias, the result is 16. bit Complement code
195     );
196     
197     pow_cal    i_pow_cal (//Square operation to calculate signal energy
198     .dataa ( ac_sig[15:0] ),
199     .result ( pow_sig[31:0] )
200     );
201     
202     shft_reg    i_shft_reg (//Instantiated shift register
203         .clock ( start ),
204         .shiftin ( pow_sig[31:0] ),
205         .shiftout ( shiftout[31:0] ),
206         .taps (  )
207         );
208     sub_add    sub_add_inst (//A subtractor that subtracts the number to be taken from the cumulative result by the number fed into the cumulative result. The result may be negative.
209                 //After subtraction, it is fed into the accumulator to maintain the contents of the accumulator: the sum of the latest 64
210     .dataa ( pow_sig[31:0] ),
211     .datab ( shiftout[31:0]),
212     .result ( acc_in[31:0] )
213     );
214 ////////Calculate the sum of the nearest 64 numbers/////////    
215 always @ (posedge start or negedge rst_n)
216 begin
217     if(!rst_n)
218         acc_out[39:0] <= 40'd0;
219     else begin
220         acc_out[39:0] = $signed(acc_out[39:0]) + $signed(acc_in[31:0]);
221     end
222 end    
223 ///////Increasing the Resolution of Small Signals by Accumulating and Reciprocating
224     ip_sqrt    ip_sqrt_inst (
225         .radical ( acc_out[39:0] ),
226         .q ( data_out[19:0] ),
227         .remainder (  )
228         );    
229 endmodule

Most of the above top-level design documents are examples and connection statements of each functional module circuit. Because the code annotations are very detailed, the details will not be repeated here. I mainly introduce the following points:

1. There is only one always procedure assignment statement, which maintains the work of the counter cnt[15:0]. The total synchronous start signal is generated when cnt[15:0] is a specific value, so the period of cnt[15:0] technology also determines the period of start signal, that is, the period of signal.

2. As I prepared for this series“ Control Hardware Interface with Verilog-HDL State Machine ” The advantage of using A/D and D/A to debug the algorithm of FPGA is that the signal of each intermediate node in the algorithm can be presented through D/A converter. The assignment of data_cha[11:0] and data_chb[11:0] in the code has a large number of optional annotations, which are reserved for this reason.

3. In the instantiation code of CONV_SER16 for each convolution section, the interface wr_coe_addr is the initial write address of the coefficient memory (for the content of the initialization part of the coefficient memory, see the description in the second blog article of this series, "Implementing the cross-correlation/convolution/FIR circuit". But when connecting, wr_coe_addr reverses the address wr_blk_addr of the output of the coefficient initializer, because each convolution section multiplies the input first by the content of the coefficient memory with a larger address, so the address should be flipped. Otherwise, the complete sequence of coefficients stored in the coefficient initialization pool will be reversed in each convolution section, which will not produce the correct results.

IV. Test results

The results of FPGA synthesis for Cyclone-I in Quartus-II are shown in the following figure.

Figure 5 Results of synthesis, layout and wiring

Using matlab to generate 6 sinusoidal cycles of signals with 60 sampling points in length, two zeros are added before and after the signals, thus the sinusoidal signals with 64 sampling points in length are obtained. The reason why 60 sampling points contain 6 sinusoidal cycles is to simulate the sampling rate of 5KHz sinusoidal signal @50KSPS. Finally, the blackman window is added to 64 sampling points to reduce the impact of leakage.

Figure 6 Ideal target reflection echo signal

I downloaded the signal to SideTech arbitrary signal generator 33521 and manually triggered 33521 to generate the waveform shown above to simulate the theoretical echo signal. Connect it to the A/D input of the FPGA platform, and output the processing results from the D/A of the FPGA platform. Observing input and processing results with an oscilloscope is shown in the following figure.

Figure 7 Processing results of field measurements

The above figure shows that the algorithm in the FPGA realizes the output waveform simulated by matlab theory in the first part of this series of blog posts, which can meet the requirements of the target echo detection algorithm. The delay of blue convolution results is due to the latency of the FPGA algorithm, which does not affect the real-time performance of the system.

Of course, what we have accomplished here is only a validation experiment of the algorithm. If we want to really use it, we need to further improve the speed of A/D and D/A conversion.

For the principle and implementation of target echo detection algorithm, please start with the beginning of this series of blog posts.[ Target Reflected Echo Detection Algorithms and Their Implementation on FPGA: An Overview of Algorithms].

Keywords: Verilog MATLAB less

Added by headcutter on Sat, 18 May 2019 14:58:01 +0300

Programming VIP

Target Reflected Echo Detection Algorithms and Their Implementation on FPGA Part 3: Top-level Implementation of Square, Integral Circuits and Algorithms

Popular Keywords