PDA

View Full Version : SCSI problems.... advise would be welcome...



VJ
10th July 2003, 01:41
Hello,

Yesterday, my system started acting up very weirdly. My first harddisk (Quantum Atlas 10K) made the seek noises it makes when spinning up (few specific clicks and beep-like sounds).

It eventually lead to a blue screen (it flashed very vast, but I thought I read something about "pagefile operation"). Now, the system boots, SMART gives no errors, the disks have been scanned for errors (the Quantum even at boot by XP !).

The driver is (see signature for config):
Aaptec AIC-7902-Ultra320 SCSI
Adaptec
02/12/2002
1.3.0.0
not digitally signed
In the system log, I have numerous entries (id11, grouped in 6 at a time; id15, less frequently, but when it occurs it is listed 20-30 times in a row, for both disks).
Here are the details of the log-entries:



adpu320
id 11: The driver detected a controller error on \Device\Scsi\adpu3201.
0000: 0f 00 10 00 01 00 68 00 ......h.
0008: 00 00 00 00 0b 00 04 c0 .......
0010: 24 50 00 c1 00 00 00 00 $P.....
0018: 46 01 00 00 00 00 00 00 F.......
0020: 00 00 00 00 00 00 00 00 ........
0028: 00 00 00 00 01 00 00 00 ........
0030: 00 00 00 00 05 00 00 00 ........
disk
id 15: The device, \Device\Harddisk0\D, is not ready for access yet.
0000: 0f 00 68 00 01 00 b6 00 ..h....
0008: 00 00 00 00 0f 00 04 c0 .......
0010: 04 01 00 00 9d 00 00 c0 ....?..
0018: 00 00 00 00 00 00 00 00 ........
0020: 00 00 00 00 00 00 00 00 ........
0028: f2 06 07 00 00 00 00 00 .......
0030: ff ff ff ff 00 00 00 00 ....
0038: 40 00 00 0a 00 00 05 00 @.......
0040: 05 20 06 12 08 01 20 00 . .... .
0048: 00 00 00 00 0a 00 00 00 ........
0050: 00 00 00 00 f8 54 c2 85 ....T?
0058: 00 00 00 00 c8 e1 b8 85 ....?
0060: 00 80 3d 86 00 00 00 00 .?=?....
0068: 1e 00 00 00 00 00 00 00 ........
0070: 00 00 00 00 00 00 00 00 ........
0078: 00 00 00 00 00 00 00 00 ........
0080: 00 00 00 00 00 00 00 00 ........
0088: 00 00 00 00 00 00 00 00 ........


This article has got me worried:
http://support.microsoft.com/?kbid=259237

I'm now looking for newer drivers, but there don't seem to be any more recent ones...
I also have contacted SuperMicro tech support...

Any other suggestions ?


Jrg

Marshmallowman
10th July 2003, 02:18
Check your cables?

unplug/replug?

VJ
10th July 2003, 02:22
Tried both things; I have a second cable + terminator, but the problem persists. Pluging/unpluging doesn't seem to help... :(

Currently, I get the event-entries upon boot, and occasionally while the system is running. Fortunutaly without any consequences, but I don't trust the system anymore. I want to know where these messages came from, prevent the cause from occuring and have a "clean" log-file when there are no issues. :)


Jrg

The PIT
10th July 2003, 02:26
me votes for failing hard drive. Have you done any checks on it.

VJ
10th July 2003, 02:43
Yes, I ran scan-software and smart-diagnostics. The IBM supports SMART-selftests and passes them; the Quantum does not support selftests, but reports as OK. Besides, it shouldn't occur when the "failing" drive is disconnected, right ?

The event-log entries were most likely present as soon as I moved my disks to this controller (came from an Adaptec 2940UW). I then contacted SuperMicro regarding this, but they didn't respond. As there were no problems at that time, I decided (stupidely enough ?) to ignore these log-entries. For what it is worth, all hardware is still covered by warranty (drives 5 year, mainboard is quite recent).

I also checked the directions here:
http://support.microsoft.com/default.aspx?scid=kb;EN-US;314093

If both internal and external SCSI devices are attached, make sure that the last device on each SCSI chain is terminated, and make sure that intermediate devices are not terminated.

Check.


If there is only a single SCSI chain (either all internal or external), make sure that the last device of the SCSI chain is terminated and that the SCSI controller itself is terminated. This is usually a BIOS setting.

Check.


Check for loose or poor-quality SCSI cabling. A long chain of cables with mixed internal and external cabling can degrade the signal. A SCSI specification that allows for a long distance assumes that the cabling allows no leakage or interference. The allowable reality is generally a shorter distance. External cables that are six feet long or longer should be replaced with three-foot cables.

2 different scsi-cables and terminators yield the same issues; cable length is well within limits.


Take note of when the event messages were recorded, and try to determine whether the messages coincide with certain processing schedules (such as backups) or heavy disk processing. This might pinpoint the device that is causing the errors.

No particular action is undertaken at the time of the entries.


The tendency of drives to have these types of problems under heavy stress is often due to slow microprocessors. In a multitasking environment, the processor may not be fast enough to process all the input/output (I/O) commands that arrive almost simultaneously.

Problem also occurs when system is close to idle.


Slow down the transfer rate settings if timeouts are associated with tape drives; using a 5-mbs transfer rate usually cures the timeouts.

No tape drives.


Simplify the SCSI/IDE chain by removing devices. If you suspect that a particular device is causing the problem, move that device to another controller. If the behavior follows the device, replace the device.

Simplifying doesn't help (only thing that has not been replaced is the onboard controller).


Check the revisions of the SCSI controller BIOS and of device firmware, and obtain the latest revisions from the manufacturer. (There is a procedure for checking the model number and firmware revision later in this article.)

I haven't found new firmware for either controller or driver.


Check the version of SCSI device driver. The SCSI driver is located in the %SystemRoot%\System32\Drivers folder. Look at the version in the properties for the driver file. If the driver is not up-to-date, see whether the manufacturer has a newer version.

No newer version (on Adaptec or on Supermicro)


Remove any other controllers that might create bus contention issues.

Not possible, bus is hardly fully loaded (no network, U320 is the only traffic-intensive device).


See whether a low-level format performed by the SCSI controller resolves the event messages.

The drives were not low-level formatted by me when I got them; they first were attached to a 2940UW (without any problems, so I didn't perform a low-level format back then)
Could it be this ? From all the options, this is the only one that seems viable...?


Try substituting a different make or model of any suspect hardware.

Not possible (onboard controller, no other ones available).


Jrg

The PIT
10th July 2003, 05:00
Go back to 2940 and see if the problem still occurs.

VJ
10th July 2003, 05:08
Now why didn't I think of that... :rolleyes:
(perhaps because the controller is in another computer, but that should be too much of a problem)
Hopefully, XP will not complain about the controller with the bootdrive being changed... :)

Will be able to try it after the weekend though...
(nice thinking)

Euhm, any ideas in both cases (i.e. if the problem does not appear anymore, and if it the problem persists) ?


Jrg

Byock
10th July 2003, 07:26
It really sounds to me like a failing HDD. My Quantum Viking did this on a 2940UW. Worked fine after, then did it a couple of weeks later, and a week after that, failed. Did you try verifing the disk the the Adaptec controller bios? This has been know to find and fix errors for me. Other than that....Stupid question, but do you have your power mgmt set to turn of the HDD after say 20 minutes? Just a couple of thoughts....

Tjalfe
10th July 2003, 07:32
I had a 9.1GB 7200RPM SCSI Quantum die on me, sounding pretty much as you described. luckily they had 5 year warrenties on theml and when I RMA'd it, I got a 9.1GB 10K RPM drive back :D
has worked fine since :)

VJ
10th July 2003, 07:54
Byock:
Not a stupid idea, but I haven't set the drives to spin down.

Well, this Quantum is acutally a replacement for another one (about 3 years ago), which started giving numerous smart errors.

I will try verifying the media (and low formatting them), but I dreaded doing so, as I would have problems moving my data on other disks... :( Still, better be safe then sorrow... :)
The fact that it just started acting up on the new controller would then be merely a coincidence ? Also, how could one failing drive result in event id 15 for both drives on this controller ?


Jrg

VJ
10th July 2003, 08:38
Supermicro mailed back (pretty fast :)). They advised me to perform a bios update, and they mailed a new SCSI-driver I should try.

If the problem persits, they advised me to disconnect my channel B (cd-writer) and remove my Adaptec 2906 (to see if any of those caused the conflicts).


Jrg

VJ
11th July 2003, 00:37
Problem has been located!

I followed SuperMicro's instructions, but after removing the 2906, my U320 could not find my bootdrive. :confused: . I then started looking further, and it turns out there is too much clearance between the Quantum PCB and the disk. Pressing the PCB results in the system working again, releasing it puts the drive without power. I then put an object to keep the PCB in place, and lo and behold: the system booted, all devices were connected, but there was no error-entry in the system log.

So the Quantum is being sent back. (for the second time, it was replaced after my initial one got corrupt)


Jrg

Technoid
11th July 2003, 02:13
bummer :(

VJ
11th July 2003, 04:19
Yeah... it means I have to completely re-install my system (and I just had it configured with all the software I needed). Either way, better like this (I got to copy all the necessary files) than experience a total failure when in the midst of something...

Odd though, that the system was working for about 3 months without any problem besides the messages in the log-files. I must say, the disk-access seemed a lot faster and more responsive when I had it running with the PCB held in place by an object. So I'm guessing perhaps the power connector is not the only one to suffer from disturbances. :(

Hope it is covered by warranty; either way, I'm going to purchase an additional drive (never hurts :)): waiting for stuff to return usually takes some time (last time it took 3 weeks).


Jrg

mutz
11th July 2003, 04:33
...interesting how simple things end up and how complicated and exhausting the paths are to find the end.

So often I find myself forced to learn things I don't even want to know about, but am richer for it in the end.

Richer, but not so sure I need the scare and pulled out hair...

;)

VJ
25th July 2003, 06:34
Hmm, I recently got a new drive (Seagate Cheetah 10K.6), and it turns out there now is a similar problem...

Again, I get event errors 9, 11 and 15 (time-out, controller error, not ready for access). The problem now occurs when accessing the IBM, causing this access to slow down immensely. Disconnecting the Plextor from the second channel solves the problem.

Supermicro's tech support is very helpfull :up: in locating the problem; they now mailed me new IBM firmware (though I should check; I think my drive has this version); and have a very similar testsetup configured :eek:.

No wonder the problem was difficult to locate (both the Quantum problem as this one yield the same error messages). :(


Jrg