Announcement

Collapse
No announcement yet.

delays and errors from SCSI drives...

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • delays and errors from SCSI drives...

    Hello,

    I am experiening intermittent problems with my system. The computer waits (for no apparent reason), transfers to and from the harddisks stop for a while; and then seem to continue. In the systemlog of eventviewer, there were entries (exactly at the time of delay) with IDs 9, 11 and 15: all concern a timeout on the SCSI, error on the controller, error on the drive, ...

    Initially, I had a SuperMicro X5DA8, with a Quantum Atlas 10K (CH A, ID 0) and an IBM UltraStar 36lzx (CH A, ID 1). With this issue, I went to SuperMicro who replace both the mainboard and the SCSI-cable. This appeared to solve the problem.
    After a couple of months, it occured again (but rarely)...

    In the mean time, I have changed harddisks: now I have a Cheetah 10K.6 (CH A, ID 0) and a Hitachi IBM UltraStar 36 LZX (warranty replacement, CH A, ID 1). This configuration worked for months without problems. Last week it resurfaced again: numerous entries in the eventlog, unacceptable slowdowns, ...
    I have now moved the Hitachi to CH B, and while the problem seem to occur slight less frequently, it still happens at least daily (commonly when I access read/write to the Hitachi drive).

    I really don't know what do to with this anymore... Both harddrives alone work flawless, but a soon as another harddisk is somewhere in the PC (mustn't even be on the same SCSI bus), I have a chance of experiencing this. I had the same issues with 2 entirely different systems (my first system, and my current one, which has none of the drives/controllers/cables of the first one).

    Any thoughts on what is causing this ? Or how to troubleshoot this ?

    full specs:
    Supermicro X5DA8
    2x Xeon 2.4 GHz (FSB=533)
    2x Infineon 512 MB reg. DDR ECC
    Matrox Parhelia 512 (128 MB)
    Adaptec 2906
    Seagate Cheetah 10K.6
    Hitachi-IBM Ultrastar 36 LZX
    Toshiba SD-M1612 DVD
    Plextor PX708A
    Lian Li PC70
    PC Power Cooling 510XE




    Jörg
    pixar
    Dream as if you'll live forever. Live as if you'll die tomorrow. (James Dean)

  • #2
    The first thing that comes to mind is termination.

    Make sure that whatever is at the end of the cable has termination enabled (easy if you have a terminated cable ), and make sure that no other devices are terminated.

    YOu can enable termination power (TERM PWR on many drives) on as many devices as you like, and should enable it on at least one.

    - Steve

    Comment


    • #3
      Currently, I have 2 cables: one is an LVD/U320 rated terminated cable (twisted, came supplied with the mainboard, 5 drop I believe), which connects the Cheetah. The other is a UW rated 5 drop cable with an LVD/SE terminator at one end, which connects the Hitachi. Neither of the harddrives have the possibility to terminate; for both cables the controller is at one cable end and the terminator is at the other.
      Both channels are set to autoterminate.

      TERMPWR is not enabled on any drive (I read that the onboard controller should take care of that).

      To experiment (and following you advice), I'll now try this config:
      Both harddrives on CH A (cheetah ID0, hitachi ID1), using the U320 rated cable; I will enable TERMPWR on the Cheetah.
      (main reason for doing this is that I have the impression it occurs more frequently when both drives are on the same channel)

      It always happens when I extract 2 or 3 large archives at the same time from one disk to another, so I'll do this...

      I will let you know tomorrow how it turned out!



      Jörg
      pixar
      Dream as if you'll live forever. Live as if you'll die tomorrow. (James Dean)

      Comment


      • #4
        Aside from needing to ensure that your SCSI devices are properly terminated, I'd check that a) your controller is at the latest firmware level and b) your drives pass their manufacturers' diagnostic utilities. If you're running a version of Windows, you might want to try disabling the write cache on these drives to see if there's any difference (also an indication of a growing defect).
        P.S. You've been Spanked!

        Comment


        • #5
          Originally posted by schmosef
          a) your controller is at the latest firmware level
          Already done...
          b) your drives pass their manufacturers' diagnostic utilities.
          Also done...

          If you're running a version of Windows, you might want to try disabling the write cache on these drives to see if there's any difference (also an indication of a growing defect).
          I'm running XP; but I am aware of the SCSI performance issues. However, the SCSI performance issues that is talked about on Storagreview has to do with the writecache being disabled. This fact causes delays, but doesn't yield errors in the eventlog.
          (I have installed the XPCachefilter, which enables the writecache: http://faq.storagereview.com/tiki-pa...roblems&diff=3 , but the problem also occurs when this isn't been installed)


          Jörg
          pixar
          Dream as if you'll live forever. Live as if you'll die tomorrow. (James Dean)

          Comment


          • #6
            I've had this type of problem in two circumstances:

            1) bad cable. I've had two different customer HP Proliant servers need to have a SCSI cable replaced because their backup system would fail on large files. It seemed odd to us that a bad cable would only affect larger files but, we were troubleshooting and had tried everything else. The replacment fixed the problem.

            2) bad drive. I've had a Seagate SCSI drive start to give me those write fail warnings in my dev PC. I'm pretty sure it failed the Seagate diagnostic tool too. Replacing the drive (and cable at the time) fixed the problem for me.

            edit: clarification and grammar
            Last edited by schmosef; 16 August 2004, 11:00.
            P.S. You've been Spanked!

            Comment


            • #7
              Didn't you have the same problem a year ago???
              Chief Lemon Buyer no more Linux sucks but not as much
              Weather nut and sad git.

              My Weather Page

              Comment


              • #8
                Termination

                ...your terminator isn't more than 3" from the last device on the chain..is it ??? if it is it could be getting signal reflection...

                cc
                Last edited by Chucky Cheese; 16 August 2004, 15:09.

                Comment


                • #9
                  The Pit: Yes, every now and then it comes back...

                  Schmosef: cable has been replaced, all the drives and even the mainboard have been replaced, the problem came back...

                  Chunky Cheese: Yes... my terminator is more than 3" from the last drive... This is how the cable looks (C=controller, D=drive, +=connector, *=terminator):
                  C-D-D-+-+-*
                  Should I put it like this:
                  C-+-D-+-D-*
                  to minimize the risk of reflection ?

                  But :
                  Originally posted by spadnos
                  YOu can enable termination power (TERM PWR on many drives) on as many devices as you like, and should enable it on at least one.
                  I set both drives on channel A, set the TERM PWR jumper on the IBM, booted and tested: I copied a 2GB file from one disk to another, whilst extracting 3 .ace-archives. This kind of load virtually always gave the problems, but
                  NOW it worked!
                  After a year it finally appears to be solved!


                  Even the SuperMicro technician (I went there with my system) didn't think of this; he replaced the mainboard and the cable).

                  If there was a kiss-smiley...




                  Jörg
                  pixar
                  Dream as if you'll live forever. Live as if you'll die tomorrow. (James Dean)

                  Comment


                  • #10
                    SCSI Terminator

                    VJ...yes, i would move the terminator closer to the last device.

                    do you have a utility that allows you to look and the "grown defect list" ??

                    has the manufacturers primary defect list grown??

                    i would still move the terminator even though it appaears to have fixed by spandos...good call spandos!

                    ...it goes toward optimization of your scsi bus.

                    cc
                    Last edited by Chucky Cheese; 17 August 2004, 02:23.

                    Comment


                    • #11
                      Re: SCSI Terminator

                      Originally posted by Chucky Cheese
                      VJ...yes, i would move the terminator closer to the last device.

                      i would still move the terminator even though it appaears to have fixed by spandos...good call spandos!

                      ...it goes toward optimization of your scsi bus.
                      So, any of these configs:
                      C-D-+-+-D-*
                      C-+-D-+-D-*
                      C-+-+-D-D-*
                      (something the SuperMicro-guy also didn't do...)

                      do you have a utility that allows you to look and the "grown defect list" ??

                      has the manufacturers primary defect list grown??
                      Seagate's Seatools enterprise can do this for my Cheetah (not sure about the Hitachi). Last time I checked it didn't show any defects. I might have to look for something to view the list on the Hitachi.

                      BTW, is there a performance gain when putting both drives on seperate controllers ? The Cheetah is U320, the Hitachi is U160, both controllers are U320. I think there will be no gain, as even the max data rates of both drives added together still doesn't max out the bus; right?

                      Thanks!


                      Jörg
                      pixar
                      Dream as if you'll live forever. Live as if you'll die tomorrow. (James Dean)

                      Comment


                      • #12
                        YES

                        VJ

                        i like that middle combination the best

                        any performance gain to be had by moving each drive to its own controller would be minimal...i personally would keep all hdds' on their own bus and removable devices on their own bus, unless you don't have any.

                        measuring the difference

                        ...basically, if you take that 2GB file and copy it from one directory to another directory on the same physical disk you should get the internal transfer rate of the drive minus (controller overhead + device contention/arbitration + device overhead), as you might guess vendor performannce will vary. there are general values to use in calculating this, i just can't find them at the moment.

                        cc

                        Comment


                        • #13
                          Re: YES

                          Originally posted by Chucky Cheese
                          i like that middle combination the best

                          any performance gain to be had by moving each drive to its own controller would be minimal...i personally would keep all hdds' on their own bus and removable devices on their own bus, unless you don't have any.
                          Thanks!
                          My zipdrive and scanner are on their own controller (a 2940UW). This looks overkill, but I had this controller lying around...

                          measuring the difference

                          ...basically, if you take that 2GB file and copy it from one directory to another directory on the same physical disk you should get the internal transfer rate of the drive minus (controller overhead + device contention/arbitration + device overhead), as you might guess vendor performannce will vary. there are general values to use in calculating this, i just can't find them at the moment.
                          No problem!
                          I will copy the file just to see what performance I get now... (previously, it was occasionally crippled by the delays).

                          man, we really need a kiss smiley...


                          Jörg
                          pixar
                          Dream as if you'll live forever. Live as if you'll die tomorrow. (James Dean)

                          Comment


                          • #14
                            :x ?
                            Join MURCs Distributed Computing effort for Rosetta@Home and help fight Alzheimers, Cancer, Mad Cow disease and rising oil prices.
                            [...]the pervading principle and abiding test of good breeding is the requirement of a substantial and patent waste of time. - Veblen

                            Comment


                            • #15
                              There would be a very minimal performance gain. Right now everything is running at U160. The only performance you're missing is that the Cheetah's initial cache transfer is at U160 instead of U320, but that's not much at all.
                              Gigabyte P35-DS3L with a Q6600, 2GB Kingston HyperX (after *3* bad pairs of Crucial Ballistix 1066), Galaxy 8800GT 512MB, SB X-Fi, some drives, and a Dell 2005fpw. Running WinXP.

                              Comment

                              Working...
                              X