Blog roll: CPU

A trusty network administrator has just been on the phone with Symantec regarding a couple of SEPM's (Symantec Endpoint Protection Manager) servers running at 90% CPU load for no apparent reason.

We're using version 12.1 by the way.

Symantec say this is normal if replication is used (which it is). You probably wont see this load with a stand-alone SEPM installation.
Symantec say that although resource is consumed, its will be released, should the server need any CPU load for other services or processing. High CPU load witnessed is the SEPM's way of checking its database for consistency. Seems like a lot of checking to me, but when the server was asked to perform another duty, it did so without exploding - so maybe Symantec are right, it's their product after all...

It would be the case for using the embedded database - which we are using, but Symantec also say that the SQL server (if one was used) would also be running high CPU as it follows the same principles.

CPU queries seem to be flavour of the month at our place!

And so the learning continues today...

Another network manager has reported an issue with some legacy routers. The CPU is over 80% and the SNMP engine seems to be partly responsible. The manager has turned off the SNMP engine to prevent the router from melting - but unsure why the problem has occurred.

Another network administrator has already found some information that may help. He discovered that the ARP tables on the 'busy' routers were large. In fact they were extremely large for a remote site/branch router. 50,000 entries on one of them, is some serious amount of Arp requests.

Delving further into the problem it appears that the router is proxy-arp'ing everything. Running a show arp summary (on slightly new IOS's that some of ours have) give some indication as to which interfaces the Arp'ing was being conducted. In this case it was a Vlan (the router was using an etherswitch card) that connected the remote site to its main/hub site. It was also the route for traffic that didn't have a specific route in the routing table - namely a default route (ip route 0.0.0.0 0.0.0.0 vlan128). The arp table showed requests to sites on the internet as well as internal ranges.

The Cisco documentation regarding troubleshooting high CPU utilization does cover this aspect as highlighted here:

************************************

ARP Input

High CPU utilization in the Address Resolution Protocol (ARP) Input process occurs if the router has to originate an excessive number of ARP requests. The router uses ARP for all hosts, not just those on the local subnet, and ARP requests are sent out as broadcasts, which causes more CPU utilization on every host in the network. ARP requests for the same IP address are rate-limited to one request every two seconds, so an excessive number of ARP requests would have to originate for different IP addresses. This can happen if an IP route has been configured pointing to a broadcast interface. A most obvious example is a default route such as:

ip route 0.0.0.0 0.0.0.0 Fastethernet0/0

In this case, the router generates an ARP request for each IP address that is not reachable through more specific routes, which practically means that the router generates an ARP request for almost every address on the Internet. For more information about configuring next hop address for static routing, see Specifying a Next Hop IP Address for Static Routes.

Alternatively, an excessive amount of ARP requests can be caused by a malicious traffic stream which scans through locally attached subnets. An indication of such a stream would be the presence of a very high number of incomplete ARP entries in the ARP table. Since incoming IP packets that would trigger ARP requests would have to be processed, troubleshooting this problem would essentially be the same as troubleshooting high CPU utilization in the IP Input process.

************************************

So it appears that specifying the next hop as an interface may not be good practice. Luckily we already have some knowledge of this and have implemented IP addresses of remote router interfaces at most sites (usually wise to base next hop on a reachable IP in-case you have more than one route to that I.P rather basing it on an interface state), but obviously some legacy configs are still around.

It must be that the SNMP engine was then processing all this ARP data and thus adding to the CPU load for an already busy router processing arp requests!

By changing the next hop interface to an I.P address, the router calmed and normal service was resumed - although there is still some monitoring to be done to see if this is a long term cure. One suspects that turning proxy-arp off, is not only a good security lesson, but may help the router out in future.

Blog roll

Pages

Tuesday, 17 January 2012

SEPM CPU 90%? - its normal say Symantec

Friday, 13 January 2012

High CPU on Cisco 1841 Router - ARP or SNMP?

ARP Input