:: [ece466] Question on PhD qualifier. -- solution ::
HOME


[Date Prev][Date Next][Date Index]

[ece466] Question on PhD qualifier. -- solution



---------- Forwarded message ----------
Date: Thu, 18 Sep 2003 11:38:45 -0500 (CDT)
From: mtheys@uic.edu
To: Shashank Khanvilkar <shashank@mia.ece.uic.edu>
Subject: Re: Question on PhD qualifier.

I think what you wrote makes sense. Of course not knowing the
answer means we never quite know for sure what the professor
who wrote the question was looking for.

My only other concern would be that the question in part (b)
states that the load is moved up by two instructions, and
in your diagram it looks like it has moved up by three instructions.
Overall I think you have sounds logic, its just a matter of getting
a clear understanding of what the question is asking.

Prof. Theys


Shashank Khanvilkar wrote:
*Hello Prof. Thanks for your reply. Please look inline for my comments.
*> Shashank Khanvilkar wrote:
*> *Hello Prof. Theys, 
*> *The below question came in one of the PHd qualifiers and I would 
*> *really appreciate if you can give ur perspective on it.
*> *Thanks 
*> *shashank

*> *Datapath, Pipeling:
*> *Consider a CPU piplelinee with the following stages for instructions using 
*> *register operands:
*> *IF -> Inst. Decode (ID) --> Register Read (RR) -->  Exec --> WriteBack(WB)
*> *
*> *On the average IF takes 2cc, Exec takes 1.5 cc and all other stages take 
*> *1cc. Consider a 500 Mhz clock and assume that the register file has 2 read 
*> *ports and 1 write port.
*> 
*> My confusion is why they state "on average".  A true pipeline would
*> require the 2 cc that you assume resulting in the throughput you state.
*> The "on average" makes it seem like they expect you to do some other
*> work, but I would go with your answer.
*> If they are discussing not a true pipeline but just a multi execution
*> cpu, then the throughput would be (500 / (2 + 1.5 + 1 + 1 + 1)) = 500 / 6.5 
*> 
*> *a. What is the throughput (# of instrctions processed per sec) of this 
*> *pipeline for the above type of instructions?
*> *
*> *(My Soln: I think, since the pipeline should exsecute at the speed of the 
*> *slowest stage, every stage would take 2cc's.. Hence we will get one 
*> *instruction out of the pipeline every 2 cc's, which gives a throughput of 
*> *250 M instructions/sec). 
*You are correct in noticing the "on average" words, which even i thought 
*had a trap. So going by your suggestion, I asked myself the question, why 
*can't this pipeline have 5 stages, each working at 1cc...
*
*Let us assume that each stage indeed takes 1cc. So in this case, if we do 
*want to find the throughput, then we can see that on an average the IF 
*stage takes 2cc, hence on an average one instruction will come out of the 
*pipeline every 2cc, which gives the throughput of 250M, which is the same 
*as my previous ans.. 
*
*Assuming each stage to be 1cc, helps in the second part, as i discuss 
*below.
*
*
*> *
*> *b. Now also consider LOAD instructions that directly address memory 
*> *locations. Suppose a LOAD goes through the following stages:
*> *
*> *IF --> ID --> RR (for address operands)  --> Exec(to compute address) --> 
*> *Read Data (from memory)(RM) --> Write to regiter file (WR)
*> *
*> *with all except the Read Data and WR stages being common with the 
*> *register-instruction pipeling mentioned above. The Read Data stage takes 
*> *3cc's on the average and WR stage takes 1cc or 2cc depending on whether 
*> *the single write port of the register file is free or being used (by the 
*> *WB stage of the register-instruction pipleline), repectively. In order to 
*> *prevent stalling of the register-instruction pipeline in order to wait for 
*> *data from a previous LOAD instruction to be written to a register operand, 
*> *LOAD instructions have been moved ahead from the original position by 2 
*> *instructions; hence assume that there are no pipeling stalls due to the 
*> *Read Data stage of the a LOAD instruction. However, there may be a 
*> *pipeling stall with 0.8 probability if the WR stage of a LOAD takes 2 
*> *cc's.
*> *1. If the pipeline stalls in the above scenario, how many cc's would it 
*> *need to stall for?
*> *2. what is the best-, average- and worst-case throughput of the CPU for a 
*> *program in which 20% of the instructions are LOAD?
*> *
*> 
*> I think you should use the actual cc required for each stage and go
*> from there and see if the timing diagram shows a stall. for the 1.5cc 
*> case you would of course use a 2cc slot. In your example why wouldn't
*> all the stages now require 3cc since the RM requires 3cc? 
*> I think the question is vague in many places. I think you need
*> to use the actual cc counts and not make the assumption that 
*> each stage requires the same amount of time.
*> 
*> Let me know if that makes more sense.
*
*Now here if i still assume 1cc for each stage, I will get the LOAD 
*pipeline to be (Note RM1, RM2 and RM3 are not pipelined, but just drawn 
*for illustration)
*
*IF   ID    RR   Ex   RM1  RM2  RM3  WR
*
*We can see from the above pipline that load will get a value from memory 
*only at the end of RM3. Hence to prevent any data hazards for cases where 
*the next instruction uses the result of the previous load, it is moved up 
*from its original position by two instructions.. 
*
*Thus, if we have a situation like this:
*
*LOAD             IF   ID    RR   Ex   RM1  RM2  RM3  WR
*Reg. Inst.            IF    ID   RR   EX????             (RAW hazard)
*
*
*We overcome it as:
*
*LOAD             IF   ID    RR   Ex   RM1  RM2  RM3  WR
*i                     IF    ID   RR    EX  WB
*i+1                         IF   ID    RR  EX   WB
*i+2                              IF    ID  RR   EX   WB
*Reg. Inst.                             IF  ID   RR   EX  WB    (forwardin)
*
*Now you can see that there is a structural hazard between the LOAD inst. 
*and instruction "i+2". Hence the Load will stall the pipline (which will 
*not stall instructions i+1 and i+2, because they are not in any of the 
*common stages) but will stall the last inst, as it is in EX state. 
*
*Thus we have somthing like this:
*LOAD             IF   ID    RR   Ex   RM1  RM2  RM3  XX  WF
*i                     IF    ID   RR    EX  WB
*i+1                         IF   ID    RR  EX   WB
*i+2                              IF    ID  RR   EX   WB
*Reg. Inst.                             IF  ID   RR   XX  EX  WB    
*
*
*Thus to ans the part (b-i) we can easily say that the pipeline stalls for 
*1cc.
*
*Now for part (b - ii):
*Let us assume here that the LOAD instuctions are well spaced so as not to 
*cause a structural hazard. So in the best case 
*Load instuctions will require a CPI of 1. 
*Hence throughput of the pipleline will still be limited by the average IF 
*delay of 2cc, which is 250M
*
*In the worst case, each of the LOAD takes 2cc. Even in this case, the 
*throughout will be 250M.
*
*Hence the average case will also have a throughput of 250M.
*
*Let me know what you think..
*
*Thanks
*/shank
*
*
*
*
*
*> 
*> Prof. Theys
*> 
*> 
*
*


-- 
--------------------------------------------------------------------------
|  Mitchell D. Theys, PhD   |  Department of Computer Science (MC 152)   |
|    Assistant Professor    |  University of Illinois at Chicago         |
|  Email: mtheys@uic.edu    |  1120 Science and Engineering Offices      |
|  Phone: (312) 413-9267    |  851 South Morgan Street                   |
|  FAX:   (312) 413-0024    |  Chicago, IL 60607-7053, USA               |
--------------------------------------------------------------------------