Gate Count Instability from Functionally Equivalent RTL

Hello Yosys team and community,

I’m self-learning digital logic and I’m synthesizing a tiny CPU for nangate45. I’m observing significant instability in my synthesis results. Minor, functionally equivalent RTL changes are causing the total gate count to fluctuate by 100-200 gates. My script is, in essence: read_verilog ...; flatten; synth; dfflibmap -liberty $LIB; abc -liberty $LIB.

I have 2 examples of this

Shift: (within a larger design) A constant-value shift (pc = w >> 2) synthesizes differently than a direct part-select (pc = w[31:2]).

MUX: in several places I have a signal (pc, pc_next, reg1, etc.) mux’ing from different sources (pc, alu, register file read, …) with a lot of overlap. I tried to factor this to a function as in

// General function
function logic [31:0] mux_src;
  input logic [4:0] control;
  input logic [31:0] s1, s2, s3, s4; // ... and so on

  unique case (control)
    S1: mux_src = s1;
    S2: mux_src = s2;
    // ...
    default: mux_src = 'x;
  endcase
endfunction

// Instantiation for a register that never uses 's2'
always_ff @(posedge clk) begin
  pc <= mux_src(pc_ctrl, s1, 'x, s3, ...);
end

For some signals this generates larger output and for some it generates smaller output. It goes up and down by 100-200 gates.

Question: Why do these simple, equivalent structures fail to converge to the same optimized result?

Question: What are the RTL best practices to get optimal yosys results?

This isn’t just limited to functionally equivalent RTL, synthesis results can change based on comments or blank lines in Verilog source. The reason, as described in one of the replies there, is due to the chaotic nature of ABC; while synthesis is deterministic even minor non-functional changes can result in significant differences in output. This also applies across operating systems, where the same design can give different synthesis results.

Question: Why do these simple, equivalent structures fail to converge to the same optimized result?

It’s not a matter of “failing to converge,” but rather that there is no process of convergence. There is no process by which an “optimized result” is found. In its current form, synthesis occurs once and only once. There is no canonicalization of functionally equivalent designs. Even if it is optimized to the same RTLIL, differing source lines or ordering can and will lead to ABC producing a different result.

Question: What are the RTL best practices to get optimal yosys results?

There is no RTL best practice here. All you can do is run ABC multiple times and take the best result. There is evidence that 20 or so iterations can find an optimal result from ABC for most circuits, but some circuits can have thousands of results without reaching optimality. Rather than interfacing with ABC directly, there are also ways within Yosys to change the results: scrambling names, or changing the autoidx or hash seed. If you require an optimal design your best bet is to use those features to synthesize the design multiple times and take the best result. In future Yosys may have support for handling multiple iterations of ABC directly, but for now you have to do it yourself.