mbox series

[net-next,v2,00/11] devlink: introduce dump selector attr and use it for per-instance dumps

Message ID 20230720121829.566974-1-jiri@resnulli.us (mailing list archive)
Headers show
Series devlink: introduce dump selector attr and use it for per-instance dumps | expand

Message

Jiri Pirko July 20, 2023, 12:18 p.m. UTC
From: Jiri Pirko <jiri@nvidia.com>

Motivation:

For SFs, one devlink instance per SF is created. There might be
thousands of these on a single host. When a user needs to know port
handle for specific SF, he needs to dump all devlink ports on the host
which does not scale good.

Solution:

Allow user to pass devlink handle alongside the dump command
and dump only objects which are under selected devlink instance.

Introduce new attr DEVLINK_ATTR_DUMP_SELECTOR to nest the selection
attributes. This way the userspace can use maxattr to tell if dump
selector is supported by kernel or not.

Assemble netlink policy for selector attribute. If user passes attr
unknown to kernel, netlink validation errors out.

Example:
$ devlink port show
auxiliary/mlx5_core.eth.0/65535: type eth netdev eth2 flavour physical port 0 splittable false
auxiliary/mlx5_core.eth.1/131071: type eth netdev eth3 flavour physical port 1 splittable false

$ devlink port show auxiliary/mlx5_core.eth.0
auxiliary/mlx5_core.eth.0/65535: type eth netdev eth2 flavour physical port 0 splittable false

$ devlink port show auxiliary/mlx5_core.eth.1
auxiliary/mlx5_core.eth.1/131071: type eth netdev eth3 flavour physical port 1 splittable false

This is done in patch #10

Dependency:

The DEVLINK_ATTR_DUMP_SELECTOR parsing is very suitable to be done
once at the beginning of the dumping. Unfortunatelly, it is not possible
to define start() and done() callbacks for netlink small ops.
So all commands that use instance iterator for dumpit are converted to
split ops. This is done in patch #1-9

Extension:

patch #11 extends the selector by port index for health reporter
dumping.

v1->v2:
- the original single patch (patch #10) was extended to a patchset

Jiri Pirko (11):
  devlink: parse linecard attr in doit() callbacks
  devlink: parse rate attrs in doit() callbacks
  devlink: introduce __devlink_nl_pre_doit() with internal flags as
    function arg
  devlink: convert port get command to split ops
  devlink: convert health reporter get command to split ops
  devlink: convert param get command to split ops
  devlink: convert trap get command to split ops
  devlink: introduce set of macros and use it for split ops definitions
  devlink: convert rest of the iterator dumpit commands to split ops
  devlink: introduce dump selector attr and use it for per-instance
    dumps
  devlink: extend health reporter dump selector by port index

 include/uapi/linux/devlink.h |   2 +
 net/devlink/devl_internal.h  |  42 +++---
 net/devlink/health.c         |  21 ++-
 net/devlink/leftover.c       | 211 ++++++++--------------------
 net/devlink/netlink.c        | 263 ++++++++++++++++++++++++++++++-----
 5 files changed, 333 insertions(+), 206 deletions(-)

Comments

Petr Machata July 20, 2023, 1:55 p.m. UTC | #1
I'll take this through our nightly and will report back tomorrow.
Jiri Pirko July 20, 2023, 2:27 p.m. UTC | #2
Thu, Jul 20, 2023 at 03:55:00PM CEST, petrm@nvidia.com wrote:
>I'll take this through our nightly and will report back tomorrow.

Sure. I ran mlxsw regression with this already, no issues.
Petr Machata July 20, 2023, 2:51 p.m. UTC | #3
Jiri Pirko <jiri@resnulli.us> writes:

> Thu, Jul 20, 2023 at 03:55:00PM CEST, petrm@nvidia.com wrote:
>
>>I'll take this through our nightly and will report back tomorrow.
>
> Sure. I ran mlxsw regression with this already, no issues.

You started it on one machine and it went well for a while. But it's
getting a stream of these splats right now:

INFO       - INFO       - [ 4155.564670] rcu: INFO: rcu_preempt self-detected stall on CPU 
INFO       - INFO       - [ 4155.571093] rcu: 	7-....: (99998 ticks this GP) idle=ac7c/1/0x4000000000000000 softirq=86447/86447 fqs=25001 
INFO       - INFO       - [ 4155.582077] rcu: 	(t=100015 jiffies g=289809 q=1459 ncpus=8) 
INFO       - INFO       - [ 4155.588398] CPU: 7 PID: 38940 Comm: ip Not tainted 6.5.0-rc1jiri+ #1 
INFO       - INFO       - [ 4155.595497] Hardware name: Mellanox Technologies Ltd. MSN4700/VMOD0010, BIOS 5.11 01/06/2019 
INFO       - INFO       - [ 4155.604915] RIP: 0010:__netlink_lookup+0xca/0x150 
INFO       - INFO       - [ 4155.610171] Code: 00 00 48 89 c7 48 83 cf 01 48 8b 10 48 83 e2 fe 48 0f 44 d7 f6 c2 01 75 5a 0f b7 4b 16 44 8b 44 24 08 49 89 c9 49 f7 d9 eb 08 <48> 8b 12 f6 c2 01 75 41 4a 8d 34 0a 44 39 86 e8 02 00 00 75 eb 48 
INFO       - INFO       - [ 4155.631156] RSP: 0018:ffffbea7ca41b760 EFLAGS: 00000213 
INFO       - INFO       - [ 4155.636992] RAX: ffffa048c25120b0 RBX: ffffa048c01e4000 RCX: 0000000000000400 
INFO       - INFO       - [ 4155.644964] RDX: ffffa048c6d4b400 RSI: ffffa048c6d4b000 RDI: ffffa048c25120b1 
INFO       - INFO       - [ 4155.652935] RBP: ffffa048c2512000 R08: 00000000888fe595 R09: fffffffffffffc00 
INFO       - INFO       - [ 4155.660906] R10: 00000000302e3030 R11: 0000006900030008 R12: ffffa048c9205900 
INFO       - INFO       - [ 4155.668879] R13: 00000000888fe595 R14: 0000000000000001 R15: ffffa048c01e4000 
INFO       - INFO       - [ 4155.676850] FS:  00007f2155bcf800(0000) GS:ffffa04c2fdc0000(0000) knlGS:0000000000000000 
INFO       - INFO       - [ 4155.685890] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
INFO       - INFO       - [ 4155.692307] CR2: 00000000004e4140 CR3: 000000014c919005 CR4: 00000000003706e0 
INFO       - INFO       - [ 4155.700279] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 
INFO       - INFO       - [ 4155.708249] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 
INFO       - INFO       - [ 4155.716220] Call Trace: 
INFO       - INFO       - [ 4155.718948]  <IRQ> 
INFO       - INFO       - [ 4155.721190]  ? rcu_dump_cpu_stacks+0xea/0x170 
INFO       - INFO       - [ 4155.726057]  ? rcu_sched_clock_irq+0x53b/0x10b0 
INFO       - INFO       - [ 4155.731116]  ? update_load_avg+0x54/0x280 
INFO       - INFO       - [ 4155.735593]  ? notifier_call_chain+0x5a/0xc0 
INFO       - INFO       - [ 4155.740361]  ? timekeeping_update+0xaf/0x280 
INFO       - INFO       - [ 4155.745130]  ? timekeeping_advance+0x374/0x590 
INFO       - INFO       - [ 4155.750093]  ? update_process_times+0x74/0xb0 
INFO       - INFO       - [ 4155.754957]  ? tick_sched_handle+0x33/0x50 
INFO       - INFO       - [ 4155.759529]  ? tick_sched_timer+0x6b/0x80 
INFO       - INFO       - [ 4155.763995]  ? tick_sched_do_timer+0x80/0x80 
INFO       - INFO       - [ 4155.768762]  ? __hrtimer_run_queues+0x10f/0x2a0 
INFO       - INFO       - [ 4155.773820]  ? hrtimer_interrupt+0xf8/0x230 
INFO       - INFO       - [ 4155.778492]  ? __sysvec_apic_timer_interrupt+0x52/0x120 
INFO       - INFO       - [ 4155.784327]  ? sysvec_apic_timer_interrupt+0x6d/0x90 
INFO       - INFO       - [ 4155.789874]  </IRQ> 
INFO       - INFO       - [ 4155.792211]  <TASK> 
INFO       - INFO       - [ 4155.794549]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 
INFO       - INFO       - [ 4155.800485]  ? __netlink_lookup+0xca/0x150 
INFO       - INFO       - [ 4155.805059]  netlink_unicast+0x132/0x390 
INFO       - INFO       - [ 4155.809437]  rtnl_getlink+0x36d/0x410 
INFO       - INFO       - [ 4155.813532]  rtnetlink_rcv_msg+0x14f/0x3b0 
INFO       - INFO       - [ 4155.818106]  ? __alloc_pages+0x17c/0x290 
INFO       - INFO       - [ 4155.822485]  ? rtnl_calcit.isra.0+0x140/0x140 
INFO       - INFO       - [ 4155.827348]  netlink_rcv_skb+0x58/0x100 
INFO       - INFO       - [ 4155.831631]  netlink_unicast+0x23c/0x390 
INFO       - INFO       - [ 4155.836010]  netlink_sendmsg+0x214/0x470 
INFO       - INFO       - [ 4155.840390]  ? netlink_unicast+0x390/0x390 
INFO       - INFO       - [ 4155.844963]  ____sys_sendmsg+0x16a/0x260 
INFO       - INFO       - [ 4155.849345]  ___sys_sendmsg+0x9a/0xe0 
INFO       - INFO       - [ 4155.853437]  __sys_sendmsg+0x7a/0xc0 
INFO       - INFO       - [ 4155.857428]  do_syscall_64+0x38/0x80 
INFO       - INFO       - [ 4155.861419]  entry_SYSCALL_64_after_hwframe+0x63/0xcd 

BTW, while for core patches, any machine pass is usually a good
predictor of full regression pass, that's not always the case. There's
a reason we run on about 15 machines plus simulation. Even if this had
"no issues", there would be value in getting full regression run.

I'm pulling this from the nightly again.
Jiri Pirko July 25, 2023, 8:07 a.m. UTC | #4
I see that this patchset got moved to "changes requested" in patchwork.
Why exacly? There was no comment so far. Petr's splat is clearly not
caused by this patchset.

Should I resubmit?

Thanks!

Thu, Jul 20, 2023 at 02:18:18PM CEST, jiri@resnulli.us wrote:
>From: Jiri Pirko <jiri@nvidia.com>
>
>Motivation:
>
>For SFs, one devlink instance per SF is created. There might be
>thousands of these on a single host. When a user needs to know port
>handle for specific SF, he needs to dump all devlink ports on the host
>which does not scale good.
>
>Solution:
>
>Allow user to pass devlink handle alongside the dump command
>and dump only objects which are under selected devlink instance.
>
>Introduce new attr DEVLINK_ATTR_DUMP_SELECTOR to nest the selection
>attributes. This way the userspace can use maxattr to tell if dump
>selector is supported by kernel or not.
>
>Assemble netlink policy for selector attribute. If user passes attr
>unknown to kernel, netlink validation errors out.
>
>Example:
>$ devlink port show
>auxiliary/mlx5_core.eth.0/65535: type eth netdev eth2 flavour physical port 0 splittable false
>auxiliary/mlx5_core.eth.1/131071: type eth netdev eth3 flavour physical port 1 splittable false
>
>$ devlink port show auxiliary/mlx5_core.eth.0
>auxiliary/mlx5_core.eth.0/65535: type eth netdev eth2 flavour physical port 0 splittable false
>
>$ devlink port show auxiliary/mlx5_core.eth.1
>auxiliary/mlx5_core.eth.1/131071: type eth netdev eth3 flavour physical port 1 splittable false
>
>This is done in patch #10
>
>Dependency:
>
>The DEVLINK_ATTR_DUMP_SELECTOR parsing is very suitable to be done
>once at the beginning of the dumping. Unfortunatelly, it is not possible
>to define start() and done() callbacks for netlink small ops.
>So all commands that use instance iterator for dumpit are converted to
>split ops. This is done in patch #1-9
>
>Extension:
>
>patch #11 extends the selector by port index for health reporter
>dumping.
>
>v1->v2:
>- the original single patch (patch #10) was extended to a patchset
>
>Jiri Pirko (11):
>  devlink: parse linecard attr in doit() callbacks
>  devlink: parse rate attrs in doit() callbacks
>  devlink: introduce __devlink_nl_pre_doit() with internal flags as
>    function arg
>  devlink: convert port get command to split ops
>  devlink: convert health reporter get command to split ops
>  devlink: convert param get command to split ops
>  devlink: convert trap get command to split ops
>  devlink: introduce set of macros and use it for split ops definitions
>  devlink: convert rest of the iterator dumpit commands to split ops
>  devlink: introduce dump selector attr and use it for per-instance
>    dumps
>  devlink: extend health reporter dump selector by port index
>
> include/uapi/linux/devlink.h |   2 +
> net/devlink/devl_internal.h  |  42 +++---
> net/devlink/health.c         |  21 ++-
> net/devlink/leftover.c       | 211 ++++++++--------------------
> net/devlink/netlink.c        | 263 ++++++++++++++++++++++++++++++-----
> 5 files changed, 333 insertions(+), 206 deletions(-)
>
>-- 
>2.41.0
>
Paolo Abeni July 25, 2023, 8:36 a.m. UTC | #5
On Tue, 2023-07-25 at 10:07 +0200, Jiri Pirko wrote:
> I see that this patchset got moved to "changes requested" in patchwork.
> Why exacly? There was no comment so far. Petr's splat is clearly not
> caused by this patchset.

Quickly skimming over the series I agree the reported splat looks
possibly more related to core netlink and/or rhashtable but it would
help if you could express your reasoning on the splat itself, possibly
with a decoded back-trace.

Thanks!

Paolo
>
Jiri Pirko July 25, 2023, 3:29 p.m. UTC | #6
Tue, Jul 25, 2023 at 10:36:31AM CEST, pabeni@redhat.com wrote:
>On Tue, 2023-07-25 at 10:07 +0200, Jiri Pirko wrote:
>> I see that this patchset got moved to "changes requested" in patchwork.
>> Why exacly? There was no comment so far. Petr's splat is clearly not
>> caused by this patchset.
>
>Quickly skimming over the series I agree the reported splat looks
>possibly more related to core netlink and/or rhashtable but it would
>help if you could express your reasoning on the splat itself, possibly
>with a decoded back-trace.
>

Well, since I'm touching only devlink, this is underlated to this
patchset. I don't see why I would need to care about the splat.
I will repost, if that is ok.


>Thanks!
>
>Paolo
>> 
>
Paolo Abeni July 25, 2023, 4:39 p.m. UTC | #7
On Tue, 2023-07-25 at 17:29 +0200, Jiri Pirko wrote:
> Tue, Jul 25, 2023 at 10:36:31AM CEST, pabeni@redhat.com wrote:
> > On Tue, 2023-07-25 at 10:07 +0200, Jiri Pirko wrote:
> > > I see that this patchset got moved to "changes requested" in patchwork.
> > > Why exacly? There was no comment so far. Petr's splat is clearly not
> > > caused by this patchset.
> > 
> > Quickly skimming over the series I agree the reported splat looks
> > possibly more related to core netlink and/or rhashtable but it would
> > help if you could express your reasoning on the splat itself, possibly
> > with a decoded back-trace.
> > 
> 
> Well, since I'm touching only devlink, this is underlated to this
> patchset. I don't see why I would need to care about the splat.
> I will repost, if that is ok.

Not needed, the series is again in 'new' status.

Cheers,

Paolo