By Pavel Odintsov — Jun 27, 2023

It's just wrong to redefine IPFIX templates

In this blog post I'll cover pretty curious behaviour of IPFIX protocol on Juniper MX 204 boxes with JunOS 18.2R3-S6.5. This protocol is industry standard for exporting traffic observed by carrier grade routers or switches.

NB! After followup investigation it was confirmed that issue was caused by misconfiguration on customer side.

Update 26th June 2023: This issue was reported to Juniper as 2023-0628-721988 and being investigated.

Update 10th July 2023: Current hypothesise that source-address from different devices was set to same value.

Update August 2023: We got confirmation that same IP address was assigned to another device in customer's network and it caused duplicate templates. Sadly we had no options to assume it as both templates were coming from same source port, same MAC address and same IP address.

AWS Router is a fascinating example of state of the art network equipment

Each IPFIX packet carries some information about network traffic (source and destination IPs, protocol, source and destination ports, number of packets and octets) accompanied by meta information from router itself (source and destination interfaces, source and destinations ASNs, VRF IDs). You can find example IPFIX packet decoded by Wireshark below.

IPFIX packet example with information about IPv6 traffic

IPFIX uses pretty unusual and low level encoding. It's not anything like JSON but close to Protobuf as it carries data and data definition separately.

IPFIX uses UDP and has two types of packets to deliver traffic information:

Data packet
Template packet

Template packet carries information required to decode data packet. Basically, it's C structure definition serialised in binary format. It consists list of fields with defined types and their lengths. Their order is extremely important as we will read them directly from data packet one by one using simple memory copy function. You can find example template definition below.

When we receive data packet which carries actual information about traffic we just apply this structure and read data. I.e. to get IPV6_SRC_ADDR we need to read 16 bytes with 0 offset, to read next field IPV6_DST_ADDR we do shift for 16 bytes and read length of this field.

You can understand importance of order now. If we misplace any field in template or data we will read some other field and it will lead to exceptionally bad outcome. For example, we expect traffic length but instead we read field which carries interface ID which may be in range of hundreds millions easily. As outcome we will see huge spike of bandwidth and potentiality false positive DDoS attack detection.

We use such low level data structures to reduce amount of computational power needed to craft them. IPFIX may be partially or completely be implemented in hardware. Even in case of most common software implementations compute resources on routers are scarce as they need to keep their power usage as low as possible and they have way more important tasks such as routing itself. Clearly these protocols can be considered as very unsafe but that's a cost of efficiency and low overhead.

This research started when we received support request from customer because bandwidth for their IPv6 top talkers looked that way:

And to compare that's how normal output looks like:

In FastNetMon's IPFIX implementation we have vast number of metrics covering all important parts of IPFIX collector implementation.

After getting great and prompt feedback from customer we've noticed very unusual behaviour. We noticed that IPFIX template definition actually changed many times:

It's very first time when we noticed that this counter had non zero value.

What does IPFIX RFC says about template content updates? It completely bans it:

After getting pcap dumps we were able to pin point issue and identify slightly different templates used by router with sample template ID:

You may notice that second template has BGP_IPV6_NEXT_HOP field added which lead to shift of all fields after that field by 16 bytes.

What was outcome of such behaviour? All data carried in all fields after BGP_IPV6_NEXT_HOP was corrupted. We did our best to replicate this case with Wireshark and got following output:

Serious amount of octets for sure. Even for 100G well saturated link it's too much. Duration? Even funnier. It's negative.

What actually happened in this case? Instead of using new template ID router just re-defined old template ID by adding new field and altering template length which lead to such decoding issues. After announcing new definition of template fields router kept encoding data packets using old template and we've tried decoding them with new template which does not match.

What should we do in such cases? RFC recommends to stop receiving any traffic from this router and log warning message. Sadly it will lead to complete loss of visibility for IPv6 protocol. In this particular case issue did not affect IPv4 protocol and telemetry worked just fine for it.

There is one more corner case which prevents us from following RFC. For long running IPFIX collectors we may observe case when router's firmware get upgraded and new version introduced small change in list of template fields without any changes for template ID. In this case it will be completely safe to upgrade template content as router was shutdown and risk of getting data packets for old templates is on absolute minimum.

What is the best way for vendors to carry template changes during software upgrades? It may be reasonable to increase template ID number every single time when content of template changes. Sadly template ID field is just 2 byte and limited by 65535. Even trickier it will be for vendors which allow options to control list of exported fields during server operation. In this case they have to increase template ID all the time when field list changes. Potentially we can move to 32 or 64 bit template IDs in future.

If you know bug identifier for this issue or know that it was fixed please share in comments or at LinkedIN.

Subscribe to Pavel's blog about underlying Internet technologies