Brad Peczka's Blog $ cat /dev/random > /dev/blog

7Jan/100

Exchange 2007 Services Shutdown Order

Following on from my earlier post regarding the fun and games I've had with Exchange 2007, here's a brief running sheet I use when I want to shut down Exchange services, but keep the server running.

This is especially handy when you're performing network adapter driver updates, and your Exchange Information Store is hosted on an ISCSI LUN. Driver updates while the Store is still running == weird, weird issues and potential Store corruption!

net stop msexchangeadtopology /y
net stop msftesql-exchange /y
net stop msexchangeis /y
net stop msexchangesa /y
net stop iisadmin /y

Once these services have shutdown, you're free to proceed with any driver updates or cable pulling, or to continue shutting down other services on the same server.

7Jan/102

Troubleshooting Fun with Exchange 2007 Queues

Exchange 2007 LogoI recently resolved an issue, involving two Exchange 2007 servers in two different AD Sites.  The issue was simply slow email delivery when emailing from Site 'A' to Site 'B', and a quick check showed that both servers had backlogged mail queues with no obvious cause.

Both sites are part of the same domain, both servers are identical in hardware (HP DL380 G5) and patch levels (Windows Server 2003 Standard x64 R2, and Exchange 2007 SP2). Connectivity between both sites tested perfectly, and talking to other servers in each site also revealed no issues. It was only when both the Exchange servers attempted to communicate, that the issue occured.

Mail in both queues reported errors of "451 4.4.0 Primary target IP address responded with: "421 4.4.2 Connection dropped." Attempted failover to alternate host, but that did not succeed. Either there are no alternate hosts or delivery failed to all alternate hosts." or "421 4.4.2 Connection dropped.", which seemed to point to network issues. Packet captures from both servers also showed a large amount of retransmits on both SMTP and SMB communication:

SMTP:

338     XXXMAIL02    192.168.15.63   SMTP  SMTP:Cmd EHLO XXXMAIL02.testdomain.com, 31 bytes
1197   192.168.15.63   XXXMAIL02    SMTP  SMTP:Rsp 250 -YYYMAIL02.testdomain.com Hello [192.168.24.34], 255 bytes
1198  XXXMAIL02    192.168.15.63   SMTP  SMTP:Data Payload, 16 bytes
4159   XXXMAIL02    192.168.15.63   TCP     TCP:[ReTransmit #1198] [Bad CheckSum]Flags=...AP..., SrcPort=44217, DstPort=SMTP(25), PayloadLen=16, Seq=3183382952 - 3183382968, Ack=1774779495, Win=65181
8142   XXXMAIL02    192.168.15.63   TCP     TCP:[ReTransmit #1198] [Bad CheckSum]Flags=...AP..., SrcPort=44217, DstPort=SMTP(25), PayloadLen=16, Seq=3183382952 - 3183382968, Ack=1774779495, Win=65181
11786  XXXMAIL02    192.168.15.63   TCP     TCP:[ReTransmit #1198] [Bad CheckSum]Flags=...AP..., SrcPort=44217, DstPort=SMTP(25), PayloadLen=16, Seq=3183382952 - 3183382968, Ack=1774779495, Win=65181
15476  XXXMAIL02    192.168.15.63   TCP     TCP:[ReTransmit #1198] [Bad CheckSum]Flags=...AP..., SrcPort=44217, DstPort=SMTP(25), PayloadLen=16, Seq=3183382952 - 3183382968, Ack=1774779495, Win=65181
17902  XXXMAIL02    192.168.15.63   TCP     TCP:[ReTransmit #1198] [Bad CheckSum]Flags=...AP..., SrcPort=44217, DstPort=SMTP(25), PayloadLen=16, Seq=3183382952 - 3183382968, Ack=1774779495, Win=65181
20735  XXXMAIL02    192.168.15.63   TCP     TCP:[ReTransmit #1198] [Bad CheckSum]Flags=...AP..., SrcPort=44217, DstPort=SMTP(25), PayloadLen=16, Seq=3183382952 - 3183382968, Ack=1774779495, Win=65181
23227  XXXMAIL02    192.168.15.63   TCP     TCP:[ReTransmit #1198] [Bad CheckSum]Flags=...AP..., SrcPort=44217, DstPort=SMTP(25), PayloadLen=16, Seq=3183382952 - 3183382968, Ack=1774779495, Win=65181

SMB:

1/5/2010 15:22        14560  {TCP:358, IPv4:16}  XXXMAIL02     192.168.15.63   SMB     SMB:R; Negotiate, Dialect is NT LM 0.12 (#5), SpnegoNegTokenInit
1/5/2010 15:22        14650  {TCP:358, IPv4:16}  XXXXMAIL02    192.168.15.63   TCP     TCP:[ReTransmit #14560] [Bad CheckSum]Flags=...AP..., SrcPort=Microsoft-DS(445), DstPort=44946, PayloadLen=186, Seq=3446414444 - 3446414630, Ack=2070264315, Win=65398 (scale factor 0x0) = 65398
1/5/2010 15:22        14943  {TCP:358, IPv4:16}  XXXXMAIL02    192.168.15.63   TCP     TCP:[ReTransmit #14560] [Bad CheckSum]Flags=...AP..., SrcPort=Microsoft-DS(445), DstPort=44946, PayloadLen=186, Seq=3446414444 - 3446414630, Ack=2070264315, Win=65398 (scale factor 0x0) = 65398
1/5/2010 15:23        15334  {TCP:358, IPv4:16}  XXXXMAIL02    192.168.15.63   TCP     TCP:[ReTransmit #14560] [Bad CheckSum]Flags=...AP..., SrcPort=Microsoft-DS(445), DstPort=44946, PayloadLen=186, Seq=3446414444 - 3446414630, Ack=2070264315, Win=65398 (scale factor 0x0) = 65398
1/5/2010 15:23        15862  {TCP:358, IPv4:16}  XXXXMAIL02    192.168.15.63   TCP     TCP:[ReTransmit #14560] [Bad CheckSum]Flags=...AP..., SrcPort=Microsoft-DS(445), DstPort=44946, PayloadLen=186, Seq=3446414444 - 3446414630, Ack=2070264315, Win=65398 (scale factor 0x0) = 65398
1/5/2010 15:23        16383  {TCP:358, IPv4:16}  XXXXMAIL02    192.168.15.63   TCP     TCP:[ReTransmit #14560] [Bad CheckSum]Flags=...AP..., SrcPort=Microsoft-DS(445), DstPort=44946, PayloadLen=186, Seq=3446414444 - 3446414630, Ack=2070264315, Win=65398 (scale factor 0x0) = 65398
1/5/2010 15:23        17225  {TCP:358, IPv4:16}  XXXXMAIL02    192.168.15.63   TCP     TCP:[ReTransmit #14560] [Bad CheckSum]Flags=...AP..., SrcPort=Microsoft-DS(445), DstPort=44946, PayloadLen=186, Seq=3446414444 - 3446414630, Ack=2070264315, Win=65398 (scale factor 0x0) = 65398
1/5/2010 15:24        18568  {TCP:358, IPv4:16}  XXXXMAIL02    192.168.15.63   TCP     TCP:[ReTransmit #14560] [Bad CheckSum]Flags=...AP..., SrcPort=Microsoft-DS(445), DstPort=44946, PayloadLen=186, Seq=3446414444 - 3446414630, Ack=2070264315, Win=65398 (scale factor 0x0) = 65398

Revisiting the issue, it was noticed that XXXMAIL02 had two network adapters in a Team, while YYYMAIL02 was running off a single network adapter. Both servers also had old network card drivers (the cards are HP NC373i Multifunction Gigabit Adapters, which are rebadged Broadcom cards, and were using driver v2.8.13.0 made on 30/06/2006), and as part of the troubleshooting we upgraded these drivers to the latest available versions (v5.0.13.0, 23/06/2009) at the next maintenance window. As part of the upgrade, XXXMAIL02 was changed from a Network Team to a single adapter, to match YYYMAIL02.

(Bootnote: We did the upgrade by installing the latest Proliant Support Pack, and ran into a small issue of note while doing so. You can't upgrade the network drivers straight to v5.0.13.0, otherwise the installation will fail with an error "HP Virtual Bus Device installation requires a newer version. Version 4.6.16.0 is required". The easy way around this is to download v4.6.16.0 from HP (64-bit here, 32-bit here), and install this prior to the running the PSP.)

HP Network Drivers

Within minutes of the upgrade being completed, mail and other traffic was flowing freely between both servers. A speedtest was run using iperf, which showed speeds of ~60Mb/s (previously we were seeing ~557bytes/s), and new emails were being delivered to the server within seconds.

HP iperf Test

This was a tricky one to diagnose - but it proves how often simple things are overlooked, in search of a bigger problem!