Problem Symptom
In a native IPv6 network, all the servers receive IPv6 prefix from IPv6 router, and subsequently generate an Ipv6 address as well as adding a default IPv6 gateway pointing to the IPv6 router.
The problem here is gateway missing, whatever it was created by hand or generated automatically.
Troubleshooting
1. Get TCPIP in-depth trace,
netsh trace start tracefile=c:\ipv6trace.etl provider="Microsoft-Windows-TCPIP" keywords=0xffffffffffffffff level=0xff report=yes maxsize=4096
And decode it,
netsh trace convert c:\ipv6trace.etl c:\ipv6trace.txt
Log Sample, (Route Add/Delete logging)
[0]0000.0000::2017-10-24 00:20:57.535 [Microsoft-Windows-TCPIP]IP: Received router advertisement on Interface = 14 from SourceIpAddress = fe80::d493:91b0:c6a2:7b68 for TargetIpAddress = ff02::1. [0]0000.0000::2017-10-24 00:20:57.535 [Microsoft-Windows-TCPIP]IP: Route 0xFFFFE000007525C0 created on interface 14. Protocol = IPv6, DestinationPrefix = 0.0.0.0 (Ignore IPv4 address), IPv6 address = :: /0, Nexthop = 0.0.0.0 (Ignore IPv4 address), IPv6 address = fe80::d493:91b0:c6a2:7b68. [0]0000.0000::2017-10-24 00:21:10.543 [Microsoft-Windows-TCPIP]IP: Received router advertisement on Interface = 14 from SourceIpAddress = fe80::d493:91b0:c6a2:7b68 for TargetIpAddress = ff02::1. [0]0000.0000::2017-10-24 00:21:10.543 [Microsoft-Windows-TCPIP]IP: Route 0xFFFFE000007525C0 deleted on interface 14, Protocol = IPv6, DestinationPrefix = 0.0.0.0 (Ignore IPv4 address), IPv6 address = :: /0, Nexthop = 0.0.0.0 (Ignore IPv4 address), IPv6 address = fe80::d493:91b0:c6a2:7b68.
2. Get a memory dump when gateway missing,
From dump analysis, we can find some deleted routes from dump, and noticed it should be removed by function: DeleteUnicastRoute. Next, we need to trace the route delete behavior.
0: kd> dt -r1 ffffe00048c26000+0x2d8-0x88 _IPV6_UNICAST_ROUTE
+0x040 SitePrefixLength : 0 '' +0x064 ValidLifetime : 0 <== deleted +0x068 PreferredLifetime : 03. Create a private TCPIP.sys driver, trigger a BSOD when route entry get deleted,
logman create trace "minio_netio" -p "Microsoft-Windows-TCPIP" 0x0000000000000020 0x5 -nb 400 400 -bs 1024 -mode BufferOnly -max 4096 -ets
Root Cause Analysis
Cause 1. Network device sent out RA with Router Life Time 0, and that trigger the gateway missing,
1: kd> kL# Child-SP RetAddr Call Site00 ffffd001`a2396698 fffff801`58100e1b nt!KeBugCheck01 ffffd001`a23966a0 fffff801`58101913 tcpip!IppLogRouteChangeEvents+0x2a302 ffffd001`a2396830 fffff801`58101df4 tcpip!IppDeleteUnicastRoute+0x2f03 ffffd001`a2396860 fffff801`58108b76 tcpip!IppDereferenceRouteForUser+0x8004 ffffd001`a23968b0 fffff801`58108c74 tcpip!IppCommitSetAllRouteParameters+0x2b605 ffffd001`a2396910 fffff801`58108d65 tcpip!IppUpdateUnicastRouteUnderLock+0xa406 ffffd001`a2396990 fffff801`58124d71 tcpip!IppUpdateUnicastRoute+0xd507 ffffd001`a2396a40 fffff801`5814e270 tcpip!IppUpdateAutoConfiguredRoute+0xb508 ffffd001`a2396ac0 fffff801`581597dc tcpip!Ipv6pHandleRouterAdvertisement+0x68c09 ffffd001`a2396ca0 fffff801`5811db65 tcpip!Icmpv6ReceiveDatagrams+0x3b40a ffffd001`a2396d30 fffff801`5811e73b tcpip!IppDeliverListToProtocol+0x390b ffffd001`a2396de0 fffff801`5811ed59 tcpip!IppProcessDeliverList+0x6f0c ffffd001`a2396e40 fffff801`580fbecf tcpip!IppReceiveHeaderBatch+0x2d90d ffffd001`a2396ef0 fffff801`5817c644 tcpip!IppFlcReceivePacketsCore+0x15270e ffffd001`a2397170 fffff801`5817bc06 tcpip!FlpReceiveNonPreValidatedNetBufferListChain+0x8c00f ffffd001`a2397260 fffff803`aaaa69f3 tcpip!FlReceiveNetBufferListChainCalloutRoutine+0x27e10 ffffd001`a2397300 fffff801`5817bcbc nt!KeExpandKernelStackAndCalloutInternal+0xf311 (Inline Function) --------`-------- tcpip!NetioExpandKernelStackAndCallout+0x4712 ffffd001`a23973f0 fffff801`578e1a53 tcpip!FlReceiveNetBufferListChain+0xa413 ffffd001`a2397470 fffff801`578e1e7f NDIS!ndisMIndicateNetBufferListsToOpen+0x12314 (Inline Function) --------`-------- NDIS!ndisIndicateSortedNetBufferLists+0x4115 (Inline Function) --------`-------- NDIS!ndisMDispatchReceiveNetBufferListsInternal+0x1e416 ffffd001`a2397530 fffff801`578e2094 NDIS!ndisMTopReceiveNetBufferLists+0x22f17 (Inline Function) --------`-------- NDIS!ndisInvokeNextReceiveHandler+0x2f18 (Inline Function) --------`-------- NDIS!ndisMIndicateReceiveNetBufferListsInternal+0x8419 ffffd001`a23975c0 fffff801`58867387 NDIS!NdisMIndicateReceiveNetBufferLists+0x1141a ffffd001`a23977b0 fffff801`58868b2d vmxnet3n61x64+0xc3871b ffffd001`a23978b0 fffff801`578e3e12 vmxnet3n61x64+0xdb2d1c (Inline Function) --------`-------- NDIS!ndisMiniportDpc+0x1101d ffffd001`a23978f0 fffff803`aaaea400 NDIS!ndisInterruptDpc+0x1a31e ffffd001`a23979d0 fffff803`aaae9747 nt!KiExecuteAllDpcs+0x1b01f ffffd001`a2397b20 fffff803`aabc98ea nt!KiRetireDpcList+0xd720 ffffd001`a2397da0 00000000`00000000 nt!KiIdleLoop+0x5aAnd we are deleting the exact default route,1: kd> !netioext.routesRoute Comp IfIndex Metric PathCount State Destination Prefix NextHop ---------------- ---- ------- ------ --------- ----- ---------------------------------------- -------------------------- ffffe000bf1a1250 1 12 256 59 Alive ::/0 fe80::xxxx:xxxx:xxxx:xxRT ffffe000be7da040 1 1 256 1 Alive ::1/128 Local address1: kd> dvRoute = 0xffffe000`bf1a1250The request was coming from the default gateway, and here is the packet details, and I manually parsed it below, 1: kd> db 0xffffdff03f39b000 L0x76ffffdff0`3f39b000 33 33 00 00 00 01 00 00-xx RT xM AC 86 dd 6e 00 33......s.....n. ---DEST-MAC------ ----SRC-MAC------ -V6-- -Ver-ffffdff0`3f39b010 00 00 00 40 3a ff fe 80-00 00 00 00 00 00 00 xx ...@:........... -LEN- -ICMP ------SRC IPV6 ADDRESS-------ffffdff0`3f39b020 xx xx xx xx xx RT ff 02-00 00 00 00 00 00 00 00 s............... ----------------- -----DEST IPV6 Address-------ffffdff0`3f39b030 00 00 00 00 00 01 86 00-f6 00 40 00 00 00 00 0d ..........@..... ----------------- -Router Life Timeffffdff0`3f39b040 bb a0 00 00 00 00 01 01-00 00 xx RT xM AC 03 04 ..........s..... -SRC Link-Layer Addr--- -Prefix Informationffffdff0`3f39b050 40 80 00 27 8d 00 00 09-3a 80 00 00 00 00 20 01 @..'....:..... . -- ----------- ----------- --Prefixffffdff0`3f39b060 xx xx xx xx xx FX 00 00-00 00 00 00 00 00 05 01 ..`............. ----------------------------------------- ffffdff0`3f39b070 00 00 00 00 05 dc ......
Cause 2. Network device sent out NA with IsRouter flag false (conflicting with Router). That NA Source address matches to default gateway IPv6 address,
the default gateway was deleted because the following NA was received, 1: kd> db ffffdff03f34a800 L56ffffdff0`3f34a800 33 33 00 00 00 01 00 00-xx RT xM AC 86 dd 60 00 33......^.....`. ---DEST-MAC------ ----SRC-MAC------ -V6-- -Ver-ffffdff0`3f34a810 00 00 00 20 3a ff fe 80-00 00 00 00 00 00 xx xx ... :......... . -LEN- -ICMP ------SRC IPV6 ADDRESS-------ffffdff0`3f34a820 xx xx xx xx xx RT ff 02-00 00 00 00 00 00 00 00 ................ ----------------- -----DEST IPV6 Address-------ffffdff0`3f34a830 00 00 00 00 00 01 88 00-18 78 20 00 00 00 fe 80 .........x ..... ----------------- NA -NA-FLAG--- -----ffffdff0`3f34a840 00 00 00 00 00 00 xx xx-xx xx xx xx xx RT 02 01 ...... ......... -------TARGET--IPV6—ADDRESS--------------------ffffdff0`3f34a850 00 00 xx xx xx xx ..^... ---MAC-ADDRESS---As per definition above, NA Flag 0x20000000 means the device is proactively announcing it is not a Router, and this will invalid default gateway settings, because the gateway must be a Router,Routing info,1: kd> !netioext.routesRoute Comp IfIndex Metric PathCount State Destination Prefix NextHop ---------------- ---- ------- ------ --------- ----- ---------------------------------------- -------------------------- ffffe000bd20b510 1 12 256 68 Alive ::/0 fe80::xxxx:xxxx:xxxx:xxRT The theory is that, fe80::xxxx:xxxx:xxxx:xxRT was a Router as per RA, but we received an unsolicited NA also indicating the same neighbor is not a Router, that will trigger the OS to purge the default gateway route. 1: kd> kL# Child-SP RetAddr Call Site00 ffffd001`7b99a7f8 fffff801`f3d2de1b nt!KeBugCheck01 ffffd001`7b99a800 fffff801`f3d2e913 tcpip!IppLogRouteChangeEvents+0x2a302 ffffd001`7b99a990 fffff801`f3d3d0b9 tcpip!IppDeleteUnicastRoute+0x2f03 ffffd001`7b99a9c0 fffff801`f3d3d582 tcpip!IppInvalidateRouter+0xbd04 ffffd001`7b99aa50 fffff801`f3d84b68 tcpip!IppHandleNeighborAdvertisement+0x49e05 ffffd001`7b99abb0 fffff801`f3d867c2 tcpip!Ipv6pHandleNeighborAdvertisement+0x26c06 ffffd001`7b99aca0 fffff801`f3d4ab65 tcpip!Icmpv6ReceiveDatagrams+0x39a07 ffffd001`7b99ad30 fffff801`f3d4b73b tcpip!IppDeliverListToProtocol+0x3908 ffffd001`7b99ade0 fffff801`f3d4bd59 tcpip!IppProcessDeliverList+0x6f09 ffffd001`7b99ae40 fffff801`f3d28ecf tcpip!IppReceiveHeaderBatch+0x2d90a ffffd001`7b99aef0 fffff801`f3da9644 tcpip!IppFlcReceivePacketsCore+0x15270b ffffd001`7b99b170 fffff801`f3da8c06 tcpip!FlpReceiveNonPreValidatedNetBufferListChain+0x8c00c ffffd001`7b99b260 fffff803`fe72a813 tcpip!FlReceiveNetBufferListChainCalloutRoutine+0x27e0d ffffd001`7b99b300 fffff801`f3da8cbc nt!KeExpandKernelStackAndCalloutInternal+0xf30e (Inline Function) --------`-------- tcpip!NetioExpandKernelStackAndCallout+0x470f ffffd001`7b99b3f0 fffff801`f36d7a53 tcpip!FlReceiveNetBufferListChain+0xa410 ffffd001`7b99b470 fffff801`f36d7e7f NDIS!ndisMIndicateNetBufferListsToOpen+0x12311 (Inline Function) --------`-------- NDIS!ndisIndicateSortedNetBufferLists+0x4112 (Inline Function) --------`-------- NDIS!ndisMDispatchReceiveNetBufferListsInternal+0x1e413 ffffd001`7b99b530 fffff801`f36d86b2 NDIS!ndisMTopReceiveNetBufferLists+0x22f*** ERROR: Module load completed but symbols could not be loaded for vmxnet3n61x64.sys14 (Inline Function) --------`-------- NDIS!ndisIterativeDPInvokeHandlerOnTracker+0x2d315 (Inline Function) --------`-------- NDIS!ndisInvokeNextReceiveHandler+0x64d16 (Inline Function) --------`-------- NDIS!ndisMIndicateReceiveNetBufferListsInternal+0x6a217 ffffd001`7b99b5c0 fffff801`f34cc387 NDIS!NdisMIndicateReceiveNetBufferLists+0x73218 ffffd001`7b99b7b0 fffff801`f34cdb2d vmxnet3n61x64+0xc38719 ffffd001`7b99b8b0 fffff801`f36d9e12 vmxnet3n61x64+0xdb2d1a (Inline Function) --------`-------- NDIS!ndisMiniportDpc+0x1101b ffffd001`7b99b8f0 fffff803`fe6af6f0 NDIS!ndisInterruptDpc+0x1a31c ffffd001`7b99b9d0 fffff803`fe6aea37 nt!KiExecuteAllDpcs+0x1b01d ffffd001`7b99bb20 fffff803`fe7d0dea nt!KiRetireDpcList+0xd71e ffffd001`7b99bda0 00000000`00000000 nt!KiIdleLoop+0x5aAs per RFC 4861, page 63 - 65, 7.2.5. Receipt of Neighbor Advertisements,
OS should delete the route entry based on the reception.
Extra
When server received more than 100 auto-created route entries (included the deleted one), new route will not be accepted until,
a. Unplug/plug the network cable,
b. Disable/enable NIC.c. Promote the server as a Router.