Friday, 27 January 2012

IP GRE Tunnel + PBR (Policy Based Routing)

A little project that has taken up a few days at work recently was: How to route traffic on a remote network out of an internet connection on a local network?  The closest idea that could be invented within the department was to turn off the remote networks internet connection (it is 1/10th the capacity of the local one in question) and let the routing tables take over.
I thought there was a better way and suggested PBR as the answer.  This lead to another network adminstrator and myself drawing up how it would work.  Cue whiteboard and a bit of head scratching.  We only wanted certain traffic to NOT use the remote internet connection - even though it was slower than the local one, it was still perfectly servicable.

I might use local and remote in the wrong context here, but the LOCAL internet connection is the one I'm sat next to and the REMOTE network is at the other side of the city - just for reference!

The idea to use PBR is to detect traffic on the remote network (matching them by destination and source) and policy routing them to the local internet router.  Whilst this sounds a solid enough idea, PBR relies on the next hop being locally connected (which it definately isn't in our case - the traffic must traverse a core network, approximately 4 hops or so).  Also the core network uses HP Layer 3 switches and this model is not capable of PBR (or a great deal else... but that is another story), so its not like we can simply keep PBR'ing all the way to the local network!  PBR would play its part in matching the traffic and assigning a next hop value, we just needed to create some form of connection that appeared local between the two sites to then assign PBR to it.

IP Tunnel
What we needed to do was create an IP tunnel from the remote network to the local one.  To accomplish this, we could have used a VPN, but encryption on the internal network was overkill, so a simple GRE IP Tunnel was investigated.  It stands for 'general routing encapsulation' and does the job we want of logically creating a point-to-point tunnel between two remote devices.  I must say that it is super easy to setup.  We setup the remote site router to run keepalives, as to not use the connection if the tunnel was down - a cheap form of disaster recovery you could say!  This forced the router to pass traffic through the normal routing table if the other end of the tunnel was offline.

Stuff I forgot about
Whilst setting up the PBR to state a next hop for the 'matched' traffic was easy and the tunnel was also easy to create, It still didn't work.
On the local router the only configuration commands required were that of the B-end of the tunnel.  Nothing more.  It was this oversight that proved to be the problem.
Of course to traverse the router to access the internet we use NAT.  You guessed it already, I didn't add the tunnel interface to be an inside NAT interface - lesson 1.
Lesson 2 was also NAT related.  We use ACL's to determine which traffic can access the internet - the remote site's I.P range was not specified and that concludes our lessons for today.

Learnt today?
All of it really.  I did have an idea about PBR already but didn't know it was only local gateways that could be used as next hops.  I even thought the recursive command would be useful but it turns out that the router performs a recursive lookup for the remote gateway and routes the packet TOWARDS it, rather than encapsulating it etc - this obviously was still an issue with the HP switches.
I also learnt the GRE tunnel isn't intelligent.  The remote ends make no difference to routing decisions if it is down and thus creating a routing black hole.  I though about using an IP Sla and tracking to check the remote interface and base the PBR on that, but keepalives did the job on the remote router where the traffic originates - this keepalive actually affects whether PBR is used or not - news to me.
Also, did you know that when using PBR you are actually process switching?  unless you state (at interface level) - ip route-cache policy.
PBR also was a bit confusing when using a loopback interface to test the tunnel - it didn't work correctly and thus this is because router generated traffic is not influenced by PBR unless you turn it on globally using - ip local policy route-map *map-tag*.

The most important thing learnt today was that working with others has its benefits when drafting up solutions like this.  Brain-storming and bouncing ideas of each other is a form of learning, even if the outcome isn't as expected! 

1 comment:

  1. I am trying to do the same thing but my 'remote' router is a NEXUS 7K and my 'local' router is a 'remote' 887VA. The PBR on the 7K is not working. You did not mention what 'next hop' you used for your PBR - Was it the IP address of the Tunnel or the IP address of the loopback interface you used to establish the tunnel?