'Using non-blocking assignment for sequential execution (not sequential logic)

I am working on a Verilog design where I am using SRAM inside a FSM. I need to synthesize it later on since I want to fabricate the IC. My question is that I have a fully working code using reg registers where I use blocking assignment for concurrent operation. Since there is no clock in this system, it works fine. Now, I want to replace these registers with SRAM based memory, which brings in clock into the system. My first thought is to use non-blocking assignment and changing the dependency list from always @(*) to always @ (negedge clk).

In the code snippet below, I want to read 5 sets of data from the SRAM (SR4). So what I do is I place a counter that counts till 5 (wait_var) for this to happen. By introducing additional counter, this code ensures that at 1st clock edge it enters the counter and at subsequent clock edges, the five sets of data is read from SRAM. This technique works for simple logic such as this.

    S_INIT_MEM: begin
                    // ******Off-Chip (External) Controller will write the data into SR4. Once data is written, init_data will be raised to 1.******
                    if (init_data == 1'b0) begin
                        CEN4            <= CEN;
                        WEN4            <= WEN;
                        RETN4           <= RETN;
                        EMA4            <= EMA;
                        A4              <= A_in;
                        D4              <= D_in;
                    end
                    else begin
                        CEN4            <= 1'b0;    //SR4 is enabled
                        EMA4            <= 3'b0;    //EMA set to 0
                        WEN4            <= 1'b1;    //SR4 set to read mode
                        RETN4           <= 1'b1;    //SR4 RETN is turned ON
                        A4              <= 8'b0000_0000;
                        if (wait_var < 6) begin
                            if (A4 == 8'b0000_0000 ) begin
                                NUM_DIMENSIONS <= Q4;
                                A4 <= 8'b0000_0001;
                            end
                            if (A4 == 8'b0000_0001 ) begin
                                NUM_PARTICLES <= Q4;
                                A4 <= 8'b0000_0010;
                            end
                            if (A4 == 8'b0000_0010 ) begin
                                n_gd_iterations <= Q4;
                                A4 <= 8'b0000_0011;
                            end
                            if (A4 == 8'b0000_0011 ) begin
                                iterations  <= Q4;
                                A4 <= 8'b0000_0100;
                            end
                            if (A4 == 8'b0000_0100 ) begin
                                threshold_val   <= Q4;
                                A4 <= 8'b0000_0101;
                            end
                            wait_var <= wait_var + 1;
                        end
                        //Variables have been read from SR4
                        if(wait_var == 6) begin
                            CEN4 <= 1'b1;
                            next_state <= S_INIT_PRNG;
                            wait_var <= 0;
                        end
                        else begin
                            next_state <= S_INIT_MEM;
                        end
                    end
                end

However, when I need to write a complex logic in the similar fashion, the counter based delay method gets too complex. Eg. say I want to read data from one SRAM (SR1) and want to write it to another SRAM (SR3).

                    CEN1 = 1'b0;
                    A1 = ((particle_count-1)*NUM_DIMENSIONS) + (dimension_count-1);
                    if (CEN1 == 1'b0) begin
                        CEN3 = 1'b0;
                        WEN3 = 1'b0;
                        A3 = ((particle_count-1)*NUM_DIMENSIONS) + (dimension_count-1);
                        if(WEN3 == 1'b0) begin
                            D3 = Q1;
                            WEN3 = 1'b1;
                            CEN3 = 1'b1;
                        end
                        CEN1 = 1'b1;
                    end

I know this still uses blocking assignments and I need to convert them to non-blocking assignments, but if I do and I do not introduce 1 clock cycle delay manually using counter, it will not work as desired. Is there a way to get around this in a simpler manner?

Any help would be highly appreciated.



Solution 1:[1]

The main part is, that non-blocking assignments are a simulation only artifact and provides a way for simulation to match hardware behavior. If you use them incorrectly, you might end up with simulation time races and mismatch with hardware. In this case your verification effort goes to null.

There is a set of common practices used in the industry to handle this situation. One is to use non-blocking assignments for outputs of all sequential devices. This avoids races and makes sure that the behavior of sequential flops and latches pipes data the same way as in real hardware.

Hence, one cycle delay caused by the non-blocking assignments is a myth. If you design sequential flops when the second one latches the data from the first, then the data will be moved across flops sequentially every cycle:

        clk ------v----------------v
        in1 -> [flop1] -> out1 -> [flop2] -> out2
clk 1    1                  1                  0 
clk 3    1                  1                  1
clk 4    0                  0                  1
clk 5    0                  0                  0

In the above example data is propagated from out1 to out2 in the every next clock cycle which can be expressed in verilog as

   always @(posedge clk)
      out1 <= in1;
 
   always @(posedge clk)
      out2 <= out1;

Or you can combine those

   always @(posedge clk) begin
        out1 <= in1;
        out2 <= out1;
   end

So, the task of your design is to cleanly separate sequential logic from combinatorial logic and therefore separate blocks with blocking and non-blocking assignments.

There are cases which can and must be used with blocking assignments inside sequential blocks, as mentioned in comments: if you use temporary vars to simplify your expressions inside sequential blocks assuming that those vars are never used anywhere else.

Other than above never mix blocking and non-blocking assignments in a single always block.

Also, usually due to synthesis methodologies, use if 'negedge' is discouraged. Avoid it unless your synthesis methodology does not care.

You should browse around to get more information and example of blocking/non-blocking assignments and their use.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1