mbox series

[RFC,iproute2-next,00/11] ip: nexthop: cache nexthops and print routes' nh info

Message ID 20210929152848.1710552-1-razor@blackwall.org (mailing list archive)
Headers show
Series ip: nexthop: cache nexthops and print routes' nh info | expand

Message

Nikolay Aleksandrov Sept. 29, 2021, 3:28 p.m. UTC
From: Nikolay Aleksandrov <nikolay@nvidia.com>

Hi,
This set tries to help with an old ask that we've had for some time
which is to print nexthop information while monitoring or dumping routes.
The core problem is that people cannot follow nexthop changes while
monitoring route changes, by the time they check the nexthop it could be
deleted or updated to something else. In order to help them out I've
added a nexthop cache which is populated (only used if -d / show_details
is specified) while decoding routes and kept up to date while monitoring.
The nexthop information is printed on its own line starting with the
"nh_info" attribute and its embedded inside it if printing JSON. To
cache the nexthop entries I parse them into structures, in order to
reuse most of the code the print helpers have been altered so they rely
on prepared structures. Nexthops are now always parsed into a structure,
even if they won't be cached, that structure is later used to print the
nexthop and destroyed if not going to be cached. New nexthops (not found
in the cache) are retrieved from the kernel using a private netlink
socket so they don't disrupt an ongoing dump, similar to how interfaces
are retrieved and cached.

I have tested the set with the kernel forwarding selftests and also by
stressing it with nexthop create/update/delete in loops while monitoring.

Comments are very welcome as usual. :)

Patch breakdown:
Patches 1-2: update current route helpers to take parsed arguments so we
             can directly pass them from the nh_entry structure later
Patch     3: adds the new nh_entry structure and a helper to parse nhmsg
             into it
Patch     4: adds resilient nexthop group structure and a parser for it
Patch     5: converts current nexthop print code to always parse the
             nhmsg into nh_entry structure and use it for printing
Patch     6: factors out ipnh_get's rtnl talk part and allows to use a
             different rt handle for the communication
Patch     7: adds nexthop cache and helpers to manage it, it uses the
             new __ipnh_get to retrieve nexthops
Patch     8: factors out nh_entry printing into a separate helper called
             __print_nexthop_entry
Patch     9: adds a new helper print_cache_nexthop_id that prints nexthop
             information from its id, if the nexthop is not found in the
             cache it fetches it
Patch    10: the new print_cache_nexthop_id helper is used when printing
             routes with show_details (-d) to output detailed nexthop
             information, the format after nh_info is the same as
             ip nexthop show
Patch    11: changes print_nexthop into print_cache_nexthop which always
             outputs the nexthop information and can also update the cache
             (based on process_cache argument), it's used to keep the
             cache up to date while monitoring

Example outputs (monitor):
[NEXTHOP]id 101 via 169.254.2.22 dev veth2 scope link proto unspec 
[NEXTHOP]id 102 via 169.254.3.23 dev veth4 scope link proto unspec 
[NEXTHOP]id 103 group 101/102 type resilient buckets 512 idle_timer 0 unbalanced_timer 0 unbalanced_time 0 scope global proto unspec 
[ROUTE]unicast 192.0.2.0/24 nhid 203 table 4 proto boot scope global 
	nh_info id 203 group 201/202 type resilient buckets 512 idle_timer 0 unbalanced_timer 0 unbalanced_time 0 scope global proto unspec 
	nexthop via 169.254.2.12 dev veth3 weight 1 
	nexthop via 169.254.3.13 dev veth5 weight 1 

[NEXTHOP]id 204 via fe80:2::12 dev veth3 scope link proto unspec 
[NEXTHOP]id 205 via fe80:3::13 dev veth5 scope link proto unspec 
[NEXTHOP]id 206 group 204/205 type resilient buckets 512 idle_timer 0 unbalanced_timer 0 unbalanced_time 0 scope global proto unspec 
[ROUTE]unicast 2001:db8:1::/64 nhid 206 table 4 proto boot scope global metric 1024 pref medium
	nh_info id 206 group 204/205 type resilient buckets 512 idle_timer 0 unbalanced_timer 0 unbalanced_time 0 scope global proto unspec 
	nexthop via fe80:2::12 dev veth3 weight 1 
	nexthop via fe80:3::13 dev veth5 weight 1 

[NEXTHOP]id 2  encap mpls  200/300 via 10.1.1.1 dev ens20 scope link proto unspec onlink 
[ROUTE]unicast 2.3.4.10 nhid 2 table main proto boot scope global 
	nh_info id 2  encap mpls  200/300 via 10.1.1.1 dev ens20 scope link proto unspec onlink 

JSON:
 {
        "type": "unicast",
        "dst": "198.51.100.0/24",
        "nhid": 103,
        "table": "3",
        "protocol": "boot",
        "scope": "global",
        "flags": [ ],
        "nh_info": {
            "id": 103,
            "group": [ {
                    "id": 101,
                    "weight": 11
                },{
                    "id": 102,
                    "weight": 45
                } ],
            "type": "resilient",
            "resilient_args": {
                "buckets": 512,
                "idle_timer": 0,
                "unbalanced_timer": 0,
                "unbalanced_time": 0
            },
            "scope": "global",
            "protocol": "unspec",
            "flags": [ ]
        },
        "nexthops": [ {
                "gateway": "169.254.2.22",
                "dev": "veth2",
                "weight": 11,
                "flags": [ ]
            },{
                "gateway": "169.254.3.23",
                "dev": "veth4",
                "weight": 45,
                "flags": [ ]
            } ]
  }

Thank you,
 Nik

Nikolay Aleksandrov (11):
  ip: print_rta_if takes ifindex as device argument instead of attribute
  ip: export print_rta_gateway version which outputs prepared gateway
    string
  ip: nexthop: add nh struct and a helper to parse nhmsg into it
  ip: nexthop: parse resilient nexthop group attribute into structure
  ip: nexthop: always parse attributes for printing
  ip: nexthop: pull ipnh_get_id rtnl talk into a helper
  ip: nexthop: add cache helpers
  ip: nexthop: factor out entry printing
  ip: nexthop: add a helper which retrieves and prints cached nh entry
  ip: route: print and cache detailed nexthop information when requested
  ip: nexthop: add print_cache_nexthop which prints and manages the nh
    cache

 ip/ip_common.h |   4 +-
 ip/ipmonitor.c |   3 +-
 ip/ipnexthop.c | 455 +++++++++++++++++++++++++++++++++++++++----------
 ip/iproute.c   |  32 ++--
 ip/nh_common.h |  53 ++++++
 5 files changed, 446 insertions(+), 101 deletions(-)
 create mode 100644 ip/nh_common.h

Comments

David Ahern Sept. 30, 2021, 3:42 a.m. UTC | #1
On 9/29/21 9:28 AM, Nikolay Aleksandrov wrote:
> From: Nikolay Aleksandrov <nikolay@nvidia.com>
> 
> Hi,
> This set tries to help with an old ask that we've had for some time
> which is to print nexthop information while monitoring or dumping routes.
> The core problem is that people cannot follow nexthop changes while
> monitoring route changes, by the time they check the nexthop it could be
> deleted or updated to something else. In order to help them out I've
> added a nexthop cache which is populated (only used if -d / show_details
> is specified) while decoding routes and kept up to date while monitoring.
> The nexthop information is printed on its own line starting with the
> "nh_info" attribute and its embedded inside it if printing JSON. To
> cache the nexthop entries I parse them into structures, in order to
> reuse most of the code the print helpers have been altered so they rely
> on prepared structures. Nexthops are now always parsed into a structure,
> even if they won't be cached, that structure is later used to print the
> nexthop and destroyed if not going to be cached. New nexthops (not found
> in the cache) are retrieved from the kernel using a private netlink
> socket so they don't disrupt an ongoing dump, similar to how interfaces
> are retrieved and cached.
> 
> I have tested the set with the kernel forwarding selftests and also by
> stressing it with nexthop create/update/delete in loops while monitoring.
> 
> Comments are very welcome as usual. :)

overall it looks fine and not surprised a cache is needed.

Big comment is to re-order the patches - do all of the refactoring first
to get the code where you need it and then add what is needed for the cache.
Nikolay Aleksandrov Sept. 30, 2021, 7:17 a.m. UTC | #2
On 30/09/2021 06:42, David Ahern wrote:
> On 9/29/21 9:28 AM, Nikolay Aleksandrov wrote:
>> From: Nikolay Aleksandrov <nikolay@nvidia.com>
>>
>> Hi,
>> This set tries to help with an old ask that we've had for some time
>> which is to print nexthop information while monitoring or dumping routes.
>> The core problem is that people cannot follow nexthop changes while
>> monitoring route changes, by the time they check the nexthop it could be
>> deleted or updated to something else. In order to help them out I've
>> added a nexthop cache which is populated (only used if -d / show_details
>> is specified) while decoding routes and kept up to date while monitoring.
>> The nexthop information is printed on its own line starting with the
>> "nh_info" attribute and its embedded inside it if printing JSON. To
>> cache the nexthop entries I parse them into structures, in order to
>> reuse most of the code the print helpers have been altered so they rely
>> on prepared structures. Nexthops are now always parsed into a structure,
>> even if they won't be cached, that structure is later used to print the
>> nexthop and destroyed if not going to be cached. New nexthops (not found
>> in the cache) are retrieved from the kernel using a private netlink
>> socket so they don't disrupt an ongoing dump, similar to how interfaces
>> are retrieved and cached.
>>
>> I have tested the set with the kernel forwarding selftests and also by
>> stressing it with nexthop create/update/delete in loops while monitoring.
>>
>> Comments are very welcome as usual. :)
> 
> overall it looks fine and not surprised a cache is needed.
> 
> Big comment is to re-order the patches - do all of the refactoring first
> to get the code where you need it and then add what is needed for the cache.
> 

Thanks for the comments, apart from pairing the add parse/use parse functions
in the first few patches only patch 08 seems out of place, although it's there
because it was first needed in patch 09, I don't mind pulling it back.
All other patches after 06 are adding the new cache and print functions and
using them in iproute/ipmonitor, there is no refactoring done in those, so I
plan to keep those as they are.

Cheers,
 Nik