Symptom:
Large file copy slow to iSCSI SAN.
This document (3699166) is provided subject to the disclaimer at the end of
this document.
Environment
Novell NetWare 6.5 Support Pack 5
Novell Open Enterprise Server (OES) Support Pack 1 NetWare
Cybernetics miSAN - http://www.cybernetics.com/news/misannews.html
Cisco 3560 gigabit switch
ISCSIHAM.HAM NetWare iSCSI HAM Driver - Version 1.05.00 December 15, 2005
Situation
Netware iSCSI initiator was connected to this 3rd party iSCSI SAN target.
Disk I/O performance seemed to be sluggish.
Copying a large file to the target device appeared to hang up the server.
Volume acts like it's not accessible any longer. Users cannot map drives or
access the volume.
Current Disk Requests in Monitor showed over 1000.
Resolution
Applied new ISCSI code to address problem with DeviceBlockSize and CHAP.
Need ISCSI.HAM Version 1.05.03 July 26, 2006 or newer.
Additional Information
Troubleshooting Steps
1. Obtained a core dump during apparent hang condition. Pending IOs were at
1000. By using the command NSS /ZLSSIOSTATUS like shown below to verify the
state of IO at the NSS layer.
2. Gathered the ISCSI REPORT. This is done by typing at console: ISCSI
REPORT and this generates a log file in SYS:\ISCSI.TXT
3. Gathered a LAN trace between server and iSCSI SAN.
4. Applied updated Winsock code from NW65SP5UPD1.EXE patch.
NOTES
ISCSI.TXT showed the following:
0x000010FB="[!] initiator_get_connection no connection available
hacb=0xB2E31B
This error means that the connection was not available the time the HACB was
sent down so it has to queue it up and returns this error which is not
really a critical error. Generally means that it's a little busy with some
write requests.
[DeviceHandle=0x00000000]
VenderID="CYBERNET"
ProductID="iSAN Vault "
RevisionLevel="0214"
DeviceAddress=0x026500E0
DeviceLanHandle=33554434
DeviceType=0x0
DeviceLun=0
DeviceSCSIID=2
MaximumFragmentSize=8192
MaximumNumberOfFragments=64
MaximumTransferSize=8192
DeviceRequests=60963
DeviceRequestsQueued=30181
DeviceRequestsAborted=0
DeviceUnitSize=512
DevicePreferredUnitSize=512
DeviceBlockSize=16
DeviceCapacity=-100663296
TotalCapacitySize=""
The DeviceBlockSize=128 once applying updated ISCSI code to address CHAP
authentication defect where it would negotiate this down to 16 which makes
iSCSI send small blocks of data, making it much less efficient.
NSS /ZLSSIOSTATUS
Async IO Information
Write count queue level = 1000
Pending Write IOs on queue = 49169 NOTE: Number of requests held up in NSS
Current Outstanding Write IOs = 1000
Document
Document ID: 3699166
Creation Date: 09-28-2006
Modified Date: 12-24-2008
Novell Product: NetWare
Disclaimer
Problem:
check the Bond 0. Seems as though you have a Jumbo Frame set on Eth 0 "1500" and
the Eth 1 is set to "9000". This could cause some connection issues and miss packet
readings and could cause a possibility of the Novell server not connecting. Though not
entirely sure as we do not have any Novell experience.
Errors from the Eth 0 and Eth 1 are shown below. Try testing without the Bond and without
Jumbo frames first. If all connects correctly please set the Jumbo frames on the iSCSI
server, Switch and the Novell Server to same frame size.
bond0bond0 Link encap:Ethernet HWaddr 02:B8:AD:0F:A5:75
inet addr:10.175.1.1 Bcast:10.175.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MASTER MULTICAST MTU:9000 Metric:1
RX packets:1363132 errors:1884 dropped:0 overruns:0 frame:1884
TX packets:2731468 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:290126338 (276.6 MiB) TX bytes:524019647 (499.7 MiB)
eth0 Link encap:Ethernet HWaddr 02:B8:AD:0F:A5:75
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:1363117 errors:1884 dropped:0 overruns:0 frame:1884
TX packets:2731461 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:290125272 (276.6 MiB) TX bytes:524019143 (499.7 MiB)
Base address:0x2000 Memory:d8020000-d8040000
eth1 Link encap:Ethernet HWaddr 02:B8:AD:0F:A5:75
UP BROADCAST SLAVE MULTICAST MTU:9000 Metric:1
RX packets:15 errors:0 dropped:0 overruns:0 frame:0
TX packets:7 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1066 (1.0 KiB) TX bytes:504 (504.0 b)
Base address:0x2020 Memory:d8060000-d8080000
Solution:
We had set Jumbo Frames at the bond but not on eth0 and eth1.
We changed the parameters of eth0 and eth1 to the right value.
Now it seems to be stable.
Additional information:
Open-E software impacted:
Example ticket number: