Recieve free answers for interview questions, by referring 10 email ids of your friends to vlsichipdesigner@gmail.com
Vlsichipdesign
Home
lets steal some time PDF Print E-mail
User Rating: / 7
PoorBest 

Author : Sarma Jonnavithula

Even before entering a formal layout making phase, some important decisions are taken in chip designing. One such activity is time budgeting. Recently I came across a good technique, which I hitherto thought tough, generally avoided and not recommended, to negotiate timing problems in general. The process of learning tore apart all the myths I had about this technique.

The following explanation is only to introduce the reader to fundamentals of timing analysis on digital circuits. If the reader is familiar with these concepts, he can choose to skip this explanation. Note: Not to annoy physical measurement buffs, all units are assumed and thus wont be mentioned explicitly.

If a particular logic is to be put in between two registers, then for proper generation and storage of logic outputs, the following conditions have to be met. I will use figure below as a reference to explain, in brief, about timing constraints in digital circuits. 

FF1, FF2 are registers. D1 and D2 are sets of data that are to be registered by FF1 and FF2 respectively. Logic cloud in between the two flip-flops generates D2 depending upon Q1. "clk" is the clock driving the two flip-flops. Lets assume that

  • both FF1 and FF2 are characterised by clock-q delay of tc, setup time of ts, hold time of th and are positive edge triggered
  • propogation delay of the logic cloud between the two flip-flops be tp
  • clock frequency is f Hz i.e, clock period if tclk = 1/f

Now, it is a well known fact that for a flip-flop to properly register data, the following conditions have to be met

  • data at input should be held constant for some time, called setup time, before the clock edge
  • data at input should be held constant for some time, called hold time, before the clock edge occurs

Thus, the timing diagram for the current scenario, assuming that d1 doesnt violate setup time or hold time on FF1, as shown in the figure.

 

 

 

tclk = tc + tp + x + ts...............(1) Thus to avoid any timing violation in the worst case, x -> 0. Thus,

tclk >= tc + tp + ts => f <= 1/tclk to ensure no timing violations on FF2.

This is the fundamental condition that arises from how digital circuits work. Now lets move on to the topic of time budgeting and time stealing.

Time Budgeting and time stealing

Time budgeting in chip design is a very interesting activity. In this stage, the designer plays with this basic condition described above to reap the engineering advantage. Even the latest and best software tools cant give or beat the advantage that can be brought about by carefully crafting the design to adjust timing on chip. At the outset this statement might look like an overstatement but I would like to confess that I am truly amazed at this technique.

Lets take a look at this technique.

Consider a circuit like the one shown below. You can find the original figure here (Right click and save as).

Lets assume that

  • FF1, FF2, FF3 and FF4 are characterised by clock-q delay of tc, set up time of ts, hold time of th and are positive edge triggered
  • propogation delay of the logic cloud between the flip-flops FF1 and FF2 to be tp1,
  • propogation delay of the logic cloud between flip-flops FF2 and FF3 to be tp2 with tp2 > tp1
  • propogation delay of the logic cloud between flip-flops FF3 and FF4 to be tp3 with tp3 < tp2
  • clock frequency is f Hz i.e, clock period if tclk = 1/f
  • RAM shown in the figure has an access time of ta
  • tclk < tc + tp1 + ts and tclk < tc + ta + tp2 + ts and tclk > tc + tp3 + ts
  • Timing violation on FF1-FF2 path is by t1
  • Timing violation on FF2-FF3 path is by t2
  • Timing is being met on FF3-FF4 path by t3 slack i.e., x = t3 from equation (1)

Since paths FF1-FF2 and FF2-FF3 are already loaded logic cant be moved from one one path ot another path. Also, since FF2-FF3 has a RAM and a logic cloud that acts on the data from RAM, this logic also cant be moved to the FF3-FF4 path. We can however move logic cloud in FF2-FF3 path to FF3-FF4 path. But for the moment lets assume we cant do that.

So how does one attack this problem? We steal time.

The problem here: tclk, the time for which flip-flops FF2 and FF3 see an edge, is just not enough to fit in the whole logic in. The solution is to extend this time. But then extending this time at the chip level, is not an option because if the clock frequency is reduced at the chip-level, it may violate the specification and thus reduce the throughput of the chip itself. The problem has to be solved locally in between the four flip-flops, which is exactly the technique. This technique is used in the time-budgeting phase, where these problems are foreseen. Confused? Let me explain.

In the path FF1-FF2,

tclk < tc + tp1 + ts by t1. If we can make FF2 see clock frequency of tclk + t1, then this timing violation on FF2 can be avoided.

In the path FF2-FF3,

tclk < tc + tp2 + ts by t2. If we can make FF3 see clock frequency of tclk + t1 + t2, then this timing violation on FF3 can be avoided.

We also know that tp3 < tp2 < tp1, which means added t1, t1+t2 to tclk to FF2 and FF3 respectively can be adjusted on t3. Thus tclk need not be adjusted on FF4. The timing adjusted circuit might as shown in figure below. You can find the original figure here (Right click and save as).

The clock buffers for FF2 and FF3 account the negative slacks t1 and t2 respectively due to the corresponding immovable logic clouds feeding the two flip-flops respectively. Why and how does this circuit work and resolve the timing violations? Following figure explains.

In the figure above, the timing diagram of the modified circuit is shown. First of all, the whole cicuit works in the same amount of time as the unmodified circuit - 4 clock cycles. The second thing to note is we only added two buffers whose delay is known. I will deal with how to implement these buffers shortly. It is to be noted that all the modifications hold good only when t1 + t2 + t3 < tclk. The third and most important thing to note: the modifictions in between these four flip-flops dont effect any other circuit connected to this circuit in any way. Thus the design of the circuit, whose driver is net q4, can take place without any changes even though there are modifications in this circuit.

Now lets talk about the how part of the whole deal. The how part is about implementing the clock buffers that can produce that delay. The implementation cant happen at RTL phase. Reason being that RTL files dont have any timing information i.e., all are ideal primitive cells. These buffers have to be added during synthesis/layout phase. The netlist is generally modified to introduce "buffers" already present in the technology library in use. Technology libraries generally come with varied variety of clock buffers. Proper buffers with required fanout and drive capacities are chosen and introduced in multiples, if required. Note that these buffers generally are small in size too.

Thus the whole process needs a great level of co-ordination and communication between design engineer and layout engineer. Generally, the time budgeting decisions are taken very early in the design phase. Best Practice: The layout engineer is made aware of these decisions well in advance and any suggestions are taken by the designer so that the modifications happen smoothly.

One other repurcussion of the modification is with respect to verification. I always felt that chip process is like a thriller. You get shrills and thrills all the time. A heavily conservative approach will not help as it will eat up time. An take-care-later approach too will not help because changes early in the design process are easy and cheap but changes and modifications at any later stage increase the cost exponentially and are very tough to handle also. This is the reason why the verification engineer is like a whistle blower almost althrough the process. With respect to the example at hand, it is a general practice to introduce the buffers into netlist and do GLS (Gate Level Simulations). During the Gate Level Simulations, the netlist (after modifications) is put to tough testing by the verification environment for any timing violations and any repurcussions of the modifications that went unnoticed.

No advantage comes free. There is a cost to be paid. It might seem that buffers are the cost being paid in this example. But as mentioned before these buffers dont occupy as much area. However, the cost paid by adding buffers manifests itself in a different form - in the form of jitter. Jitter is a serious problem in any chip. Jitter can be defined as "spurious change in the clock edge due to on chip variations". Generally it is recommended that clock tree (clock tree distributes clock to all the flip flops on chip) is not be tampered with. However, when need arises to solve timing problems as has been done in this example, this cost has to be paid. The increase in clock jitter due to introduction of clock buffers can result in some issues and this is where again GLS plays a crucial role. The verification engineer's job is to make sure he doesnt miss the effect of jitter in his verification environment.

 To conclude, I would say time budgeting is one such phase where the problems like above are solved by the engineer by arriving at a good trade off - like what the spirit of engineering development states. This is a very important phase in chip design process and demands utmost care, co-ordination and communication among the engineers involved from design to layout.

Note: The OCV (On Chip Variations) are not taken into account in this whole discussion. However they too play a very important role in time budgeting and have to considered seriously in this process. The reason being increase in jitter as mentioned in the discussion.

Comments

Show/Hide Comment form
 
 

Google translate

Browse this website in: