Have you ever seen any problem with ARP table caching?
What will happen if node A have two NIC with same IP. At first, just one of them is active and got a IP. If something is going wrong, the second NIC become active and request the same IP and node operate without hanging.
This situation looks similar to the bonding NIC. But, if we do this without the bonding and clients connected this server while old NIC works and suddenly old NIC died and new one becomes active.
When client to tried to connect to this IP, the node will send ARP information to the client with new Mac address.
The problem in here is that sometimes the client will ignore this new ARP information – i mean, new Mac address.

From no 2 to no 440, it does not reflect the changed information into the ARP table. Why this situation happened?
If you look at the kernel documentation, you can find following sentences from the “/usr/share/doc/kernel-doc-xxx/Documentation/filesystems/proc.txt”
locktime
——–An ARP/neighbor entry is only replaced with a new one if the old is at least
locktime old. This prevents ARP cache thrashing.
This file have a value of 100 (or 99) in default. The unit used in this file is clock_t which means 1 second in default.
This prevents frequent update of ARP table. After first update of ARP entry, it will be possible to update next one at least 1 second later.
So, as you seen at the figure, it does not allowed to update the entry until 1 second passed.
Following code snippet is from the lxr.linux.no which shows the arp_process which send request and receive reply related to the ARP.
http://lxr.linux.no/linux-bk+v2.6.9/net/ipv4/arp.c#L707
707int arp_process(struct sk_buff *skb)
708{
709 struct net_device *dev = skb->dev;
710 struct in_device *in_dev = in_dev_get(dev);
711 struct arphdr *arp;
712 unsigned char *arp_ptr;
713 struct rtable *rt;
714 unsigned char *sha, *tha;
715 u32 sip, tip;
716 u16 dev_type = dev->type;
717 int addr_type;
718 struct neighbour *n;
...
878
879 /* Update our ARP tables */
880
881 n = __neigh_lookup(&arp_tbl, &sip, dev, 0);
882
883#ifdef CONFIG_IP_ACCEPT_UNSOLICITED_ARP
884 /* Unsolicited ARP is not accepted by default.
885 It is possible, that this option should be enabled for some
886 devices (strip is candidate)
887 */
888 if (n == NULL &&
889 arp->ar_op == htons(ARPOP_REPLY) &&
890 inet_addr_type(sip) == RTN_UNICAST)
891 n = __neigh_lookup(&arp_tbl, &sip, dev, -1);
892#endif
893
894 if (n) {
895 int state = NUD_REACHABLE;
896 int override;
897
898 /* If several different ARP replies follows back-to-back,
899 use the FIRST one. It is possible, if several proxy
900 agents are active. Taking the first reply prevents
901 arp trashing and chooses the fastest router.
902 */
903 override = time_after(jiffies, n->updated + n->parms->locktime);
904
905 /* Broadcast replies and request packets
906 do not assert neighbour reachability.
907 */
908 if (arp->ar_op != htons(ARPOP_REPLY) ||
909 skb->pkt_type != PACKET_HOST)
910 state = NUD_STALE;
911 neigh_update(n, sha, state, override ? NEIGH_UPDATE_F_OVERRIDE : 0);
912 neigh_release(n);
913 }
914
915out:
916 if (in_dev)
917 in_dev_put(in_dev);
918 kfree_skb(skb);
919 return 0;
920}
921
Top of the parts are not related to the update, so I removed this from the code. After get entry from line number 881, it checks how long the time elapsed at the line number 903.
With this information, it decides to override or not at the line number 911.
Leave a Reply to orz Cancel reply