Network Traffic Telemetry on modern routers: part 2

In first part of this series I covered such well known protocols as Netflow v5, Netflow v9 and IPFIX. In this article I'll continue conversation and will talk about protocols which offer more flexibility:

  • sFlow v5
  • Port mirror
  • Sampled port mirror
  • Sampled port mirror over GRE
  • Raw headers over IPFIX or Netflow v9
My EPYC based XDP lab at home

sFlow v5

I would like to start from talking about protocol name. This protocol has "flow" in it's name but it has nothing to do with flows as it simply lacks of any flow tracking logic and works on packet basis. Name may imply some similarities with Netflow protocol family but it's not true and it's completely different protocol. This protocol was developed by very smart team at InMon Corporation.

For Netflow family of protocols sampling is optional but for sFlow it's part of protocol design and you have to enable sampling to use it. The main advantage of the protocol is the almost complete absence of state information and any flow tracking tables. To address complexity of sampling rate encoding in Netflow v9 and IPFIX sFlow provides sampling rate value directly in each packet (similar to Netflow v5).

Instead of parsing each network packet and providing it as set of fields (as in Netflow) to collector sFlow simply provides first X bytes (starting from 60 to 140 or even more) of packet as-is and then defers packet parsing to collector. In this case collector developers have to create their own network packet parser and we did it for FastNetMon. It may sound more complex but it allows us to add support for non standard packets (MPLS, vxlans, GRE) and add fields which are not usually provided by vendor in Netflow case (TTL for example). For DDoS detection this information is absolutely necessarily.

Let's look on sFlow sample header in C structure format:

sFlow v5 sample header

You can see that sampling rate value provided directly in packet and some meta information about source and output interfaces is here. Everything else is provided in binary packet header which need to be parsed.

From theoretically point of view sFlow is basically the best protocol for DDoS detection on paper. What is the reality? Unfortunately, large number of sFlow agent implementations have very serious issues (check my own article about Huawei or just check list of limitation from Juniper MX sFlow v5 implementation) which negatively affects popularity of sFlow protocol.

There are some models of switches and routers which simply do not allow setting a sampling rate that would provide a more or less correct calculation of traffic rates, which makes it impossible to use certain models of network equipment for the task of detecting an attack. Usually such behaviour is caused by very slow CPUs on control plane.

Just to provide hint about reasonable sampling rate values I'll quote article from InMon:

sFlow v5 recommended sampling rates

Let's summarise our view on sFlow v5.

Benefits of sFlow v5

  • Almost instant export of observed traffic (hundreds of milliseconds)
  • Provides access to packet header
  • Simple sampling encoding

Issues with sFlow v5

  • Sampling rate selection process is not easy to understand
  • Performance of sFlow agents on routers and switches is very constrained by hardware which leads to extremely high sampling rates which do not provide accurate overview of traffic
  • Traffic parsing is complicated and very hard to do in secure manner (IPv6 headers, MPLS, QnQ)
  • Lack of useful meta information (MPLS tags, VRF IDs, next hop)
  • Long list of constraints and limitations from routers side (lack of LAG support for example)

Port mirror

It can be referenced as TAP, SPAN or RSPAN. Usually, port mirror is a last resort mode if no other network traffic telemetry method works for you. It's way more expensive from port cost perspective, forwarding capacity and from hardware cost of machine which monitors traffic then other methods. I recommend carefully checking for all the possible alternatives before going this way.

Almost any switch has the ability to mirror traffic but it may be challenging task for routers (especially soft routers) and you need to carefully check documentation from your vendor. Port mirror is implemented by selecting list of source interfaces for which all TX and RX traffic will be copied to one or few target interfaces. Please note that target interface has to have way larger capacity as it will receive both RX and TX traffic.

Even if your network equipment is not able to implement port mirroring then there is an option for installing optical splitters which will allow us to accomplish the same task but at the level of the transmission medium.

After we finished port mirroring configuration we need to plug target interface to machine with Linux and then use special tools which work as sFlow v5 or Netflow agents:

If you're looking on processing high bandwidth of traffic using your own tools I can recommend checking my series of blog posts dedicated to traffic processing on Linux.

After reading this article you may learn that just to receive 100G of traffic on machine we need to put quite extraordinary effort which requires outstanding level of hardware, network and Linux expertise. In case of DDoS we may face line-rate scale packet per second attacks which make this task almost impossible to solve. According to all above said regular port mirror is not a great option for DDoS detection.

Let's summarise our view on port mirror.

Benefits of port mirror

  • Complete access to all information in packet
  • Supported by almost any network switch or router

Issues of port mirror

  • Requires a lot of CPU time for collector to receive and then parse traffic
  • Lack of meta information (ASN, VRF IDs, source and destinations ports)
  • Requires spare ports on router
  • Requires high performance network cards on collector

Sampled port mirror

To address extraordinary demand for port capacity and CPU power needed to process 1:1 port mirror traffic we can rely on some assistance from network equipment which is capable to do statistical packet sampling and deliver us only small fraction of all traffic observed by router or switch to monitoring machine. Sadly the only vendors which supports sampled mirror are Juniper on their MX platform and Nokia. In addition to sampling Juniper MX can crop packets and instead of delivering whole packet it can cut only X first bytes of it which further reduces amount of CPU resourced needed to handle all traffic.

In this case you can monitor 100G interface loaded to 100% in both TX and RX via 1G interface which will be loaded for just 200 Mbits (RX+TX). 1G is very easy to handle and you even can use tcpdump to look on traffic.

Regular port mirror requires direct L2 connectivity between interfaces we want to monitor and machine which will receive all mirrored traffic. It may not be easy to accomplish and in this case some vendors allow sending port mirror via GRE encapsulation which allows us to place machine for traffic analytics basically in any place in our network and route traffic via L3.

Let's provide our overview on sampled port mirror.

Benefits of sampled port mirror

  • Requires way less port capacity
  • Requires way less CPU on collector side
  • No need in high performance NICs

Issues of sampled port mirror

  • Many vendors do not support it
  • No way to get sampling rate, needs static configuration
  • Lack of meta information (ASN, VRF IDs, source and destinations ports)
  • GRE requires MTU tuning to deliver 1500b+ packets

Payload via IPFIX or Netflow v9

This protocol sadly has no specific name as it's just very peculiar way of using Netflow v9 and IPFIX protocols to carry packet headers inside of them. We may call it PSAMP as referred in RFC but it's not entirely correct naming either.

Do we have vendor specific names? Sure. Juniper calls it inline monitoring services and Cisco calls it IPFIX 315.

What are the differences from regular Netflow v9 and IPFIX?

  • It's sampled by default and cannot be used in 1:1 mode
  • Lacks of any flow tracking or flow aggregation which addresses all issues with hardware overload
  • Instead of parsing packet and providing it as list of fields router simply exports first X bytes of packet which brings all benefits of sFlow v5 protocol but uses well known Netflow v9 or IPFIX encoding
  • Implemented on data plane of routers and lightning fast

Let's look on example flow:

PSAMP packet format

As you can see it's significantly different from regular Netflow v9 and IPFIX. Instead of dozens of fields kindly provided by router to us we have only interface numbers, direction of traffic and large binary field which carries packet header which need to be parsed by collector.

Sadly to carry sampling rate this family of protocol uses standard encoding used by Netflow v9 and IPFIX which is quite challenging to parse by Netflow collector:

Sampling rate encoding via IPFIX options packets

The only known routers which support this family of protocols are:

  • Juniper MX
  • Juniper PTX
  • Cisco NCS
  • Cisco ASR 9000

This protocol is basically best protocol for DDoS detection and majority of our largest deployments are based on it.

Benefits of PSAMP family of protocols

  • Clearly the best and most capable protocol on market
  • Almost instant traffic delivery
  • Well defined format for sampling rate encoding
  • Provides all information available in header
  • Provides meta information (interface numbers, direction)
  • Can be extended easily 

Issues with PSAMP family of protocols

  • Only few vendors support it
  • Extremely high complexity of integration for collector side
  • Limited by set of fields provided by vendor

Clearly this protocol is a competitor with sFlow v5 and it has one great advantage: vendors did it right on their top tier routers.

Thank you for reading! I hope more vendors will address issues in existing protocols and add support for more state of the art protocols covered in this article. Please share your feedback in comments.

Subscribe to Pavel's blog about underlying Internet technologies

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe