mbox series

[0/6] stricter netlink validation

Message ID 20190404065408.5864-1-johannes@sipsolutions.net (mailing list archive)
Headers show
Series stricter netlink validation | expand

Message

Johannes Berg April 4, 2019, 6:54 a.m. UTC
Here's a version that has passed build testing ;-)

As mentioned in the RFC postings, this was inspired by talks
between David, Pablo and myself. Pablo is somewhat firmly on
the side of less strict validation, while David and myself
are in the very strict validation camp. If I understand him
correctly, Pablo doesn't mind the strict validation if it is
accompanied by exposing the policy to userspace, but that
isn't something we can do today. I'll work on it later.

What this series does is basically first replace nla_parse()
and all its friends by nla_parse_deprecated(), while making
all of those just inlines around __nla_parse() and friends
with configurable strict checking bits. Three versions exist
after this patchset:
 * liberal           - no bits set
 * deprecated_strict - reject attrs > maxtype
                       reject trailing junk
 * new default       - reject trailing junk
                       reject attrs > maxtype
                       reject policy entries that are NLA_UNSPEC
                       require a policy
                       strictly validate attributes

The NLA_UNSPEC one can be opted in even in existing code with
existing userspace in the future, as policies are updated.

In addition, infrastructure is added to opt in to the strict
attribute validation even for new attributes added to existing
policies, regardless of the nla_parse() strictness setting
described above, as new attributes should not be a compatibility
issue.

Finally, much of this is plumbed through generic netlink etc.,
and I've included a patch to tag nl80211 with the future attribute
strictness for reference.

johannes

Comments

David Miller April 4, 2019, 5:28 p.m. UTC | #1
From: Johannes Berg <johannes@sipsolutions.net>
Date: Thu,  4 Apr 2019 08:54:02 +0200

> Here's a version that has passed build testing ;-)

:-)

I really like the approach taken here, and done in such a way that
new attributes added get strict checking by default.

I'll let David Ahern et al. have time to review this.
Johannes Berg April 4, 2019, 8:20 p.m. UTC | #2
On Thu, 2019-04-04 at 10:28 -0700, David Miller wrote:
> From: Johannes Berg <johannes@sipsolutions.net>
> Date: Thu,  4 Apr 2019 08:54:02 +0200
> 
> > Here's a version that has passed build testing ;-)
> 
> :-)

Actually it passed more than that - I did test the nl80211 bits etc.,
but I hadn't build-tested everything before so some missing function
renames were caught by the full build testing.

> I really like the approach taken here, and done in such a way that
> new attributes added get strict checking by default.

It's two things really

 * new commands (aka new instances of nla_parse/nlmsg_parse and friends)
   --> strict checking for everything, including existing attributes
       because we reason that you're writing some new userspace code,
       and even if that might use some existing functionality, which
       might even be wrong, you're going to fix it here

 * new attributes on existing commands (in the policy)
   --> can be set up (with the strict_start_type from patch 4) to be
       strictly checked

> I'll let David Ahern et al. have time to review this.

Sure.

FWIW, I wasn't really entirely sure I liked doing a cross-tree rename,
but ultimately I felt that we should discourage uses of what I now
called *_deprecated() and *_strict_deprecated() APIs, and having sort of
the "default" names do the thing we believe is right (strict checking)
helps with that - in a sort of 'social engineering' way, people will not
want to type out "_deprecated" all the time ;-)

I do realize that this may be a bit controversial and am certainly open
to other suggestions on this.

Similarly, I engineered the generic netlink stuff in a way that adding
non-strict behaviour needs extra work, so that hopefully new stuff will
not do that extra work.

Also, both of these are then easier to see in reviews, since you can see
"deprecated" in the function names, or "DONT_VALIDATE" in the generic
netlink things.

johannes
David Ahern April 5, 2019, 2:44 a.m. UTC | #3
On 4/4/19 11:28 AM, David Miller wrote:
> From: Johannes Berg <johannes@sipsolutions.net>
> Date: Thu,  4 Apr 2019 08:54:02 +0200
> 
>> Here's a version that has passed build testing ;-)
> 
> :-)
> 
> I really like the approach taken here, and done in such a way that
> new attributes added get strict checking by default.
> 
> I'll let David Ahern et al. have time to review this.
> 

Hit a compile issue right out of the gate:

$ make O=kbuild/perf -j 24 -s
/home/dsa/kernel-2.git/net/openvswitch/flow_netlink.c: In function
‘validate_and_copy_check_pkt_len’:
/home/dsa/kernel-2.git/net/openvswitch/flow_netlink.c:2887:8: error:
implicit declaration of function ‘nla_parse_deprecated_strict’
[-Werror=implicit-function-declaration]
  err = nla_parse_deprecated_strict(a, OVS_CHECK_PKT_LEN_ATTR_MAX,
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~

You should do an allmodconfig build to check for any others. I disabled
ovs to continue.
Johannes Berg April 5, 2019, 7:09 a.m. UTC | #4
On Thu, 2019-04-04 at 20:44 -0600, David Ahern wrote:
> Hit a compile issue right out of the gate:
> 
> $ make O=kbuild/perf -j 24 -s
> /home/dsa/kernel-2.git/net/openvswitch/flow_netlink.c: In function
> ‘validate_and_copy_check_pkt_len’:
> /home/dsa/kernel-2.git/net/openvswitch/flow_netlink.c:2887:8: error:
> implicit declaration of function ‘nla_parse_deprecated_strict’
> [-Werror=implicit-function-declaration]
>   err = nla_parse_deprecated_strict(a, OVS_CHECK_PKT_LEN_ATTR_MAX,
>         ^~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> You should do an allmodconfig build to check for any others. I disabled
> ovs to continue.

Ugh, yes. The one change I made because I rebased on the latest net-next 
after the build testing ...

Sorry about that, I'll fix it in v2 after more reviews.

johannes
Johannes Berg April 5, 2019, 11:47 a.m. UTC | #5
On Thu, 2019-04-04 at 10:28 -0700, David Miller wrote:

> > Here's a version that has passed build testing ;-)

Umm, I sent out the wrong branch!
(Didn't even realize I had two ... oops)

The generic netlink bits are completely broken here, I was passing a
stack pointer to dump control, which obviously doesn't work

I've pushed out the right version to mac80211-next netlink-validation
branch as David Ahern requested I don't resend for now.

I've also pushed some very much WIP code to the netlink-policy-export
branch there that exposes the policies to userspace, there at least for
generic netlink now.

johannes
Johannes Berg April 5, 2019, 3:13 p.m. UTC | #6
On Fri, 2019-04-05 at 13:47 +0200, Johannes Berg wrote:
> 
> I've also pushed some very much WIP code to the netlink-policy-export
> branch there that exposes the policies to userspace, there at least for
> generic netlink now.

Seems to more or less work now, userspace gets things like (for
nl80211):

(ID 0x18 is the nl80211 genl family)

	ID: 0x18  policy[0]:attr[1]: type=U32
[...]
	ID: 0x18  policy[0]:attr[87]: type=U32
	ID: 0x18  policy[0]:attr[88]: type=U64
	ID: 0x18  policy[0]:attr[89]: type=U8
	ID: 0x18  policy[0]:attr[90]: type=NESTED
	ID: 0x18  policy[0]:attr[91]: type=BINARY
[...]
	ID: 0x18  policy[0]:attr[270]: type=NESTED policy:1
[...]
	ID: 0x18  policy[0]:attr[273]: type=NESTED policy:2
[...]
	ID: 0x18  policy[1]:attr[1]: type=FLAG
	ID: 0x18  policy[1]:attr[2]: type=BINARY
	ID: 0x18  policy[1]:attr[3]: type=BINARY
[...]
	ID: 0x18  policy[2]:attr[1]: type=REJECT
	ID: 0x18  policy[2]:attr[2]: type=REJECT
	ID: 0x18  policy[2]:attr[3]: type=REJECT
	ID: 0x18  policy[2]:attr[4]: type=REJECT
	ID: 0x18  policy[2]:attr[5]: type=NESTED_ARRAY policy:3
[...]
	ID: 0x18  policy[3]:attr[3]: type=NESTED policy:4

etc.

See net/wireless/nl80211.c nl80211_policy[] for the original data, it's
unchanged over current net-next.


Policy 0 is - by convention - the top-level policy, but once I fix the
recursion issue in validate_nla() it's possible that a nested attribute
refers back to the top-level policy.

There are some bugs, like it generating an almost-empty message for when
the type is NLA_UNSPEC rather than eliding it entirely, and I haven't
implemented a bunch of things yet:

                /* TODO advertise range (min/max) */
                /* TODO advertise min/max len */
                /* TODO show reject string if any */

Also, I haven't hooked it up to anything that's not generic netlink, but
the API should be general enough for anyone:

int netlink_policy_dump_start(const struct nla_policy *policy,
                              unsigned int maxtype,
                              unsigned long *state);
bool netlink_policy_dump_loop(unsigned long *state);
int netlink_policy_dump_write(struct sk_buff *skb, unsigned long state);

(*state/state is &cb->args[n]/cb->args[n] for the netlink dump, it will
generate one message per type. That may be overkill, but it lets us
include the potentially long reject string etc. without worrying about
any message size limitations.)

It feels like it's working, and so I'd like to propose formal patches
soon.

Pablo, what do you think? It seems to me that this type of thing would
address most if not all what you did with the object/bus description
stuff, while not writing any new code, the info is taken straight from
the policy.

johannes
Leon Romanovsky April 8, 2019, 9 a.m. UTC | #7
On Thu, Apr 04, 2019 at 08:54:02AM +0200, Johannes Berg wrote:
> Here's a version that has passed build testing ;-)
>
> As mentioned in the RFC postings, this was inspired by talks
> between David, Pablo and myself. Pablo is somewhat firmly on
> the side of less strict validation, while David and myself
> are in the very strict validation camp. If I understand him
> correctly, Pablo doesn't mind the strict validation if it is
> accompanied by exposing the policy to userspace, but that
> isn't something we can do today. I'll work on it later.
>
> What this series does is basically first replace nla_parse()
> and all its friends by nla_parse_deprecated(), while making
> all of those just inlines around __nla_parse() and friends
> with configurable strict checking bits. Three versions exist
> after this patchset:
>  * liberal           - no bits set
>  * deprecated_strict - reject attrs > maxtype
>                        reject trailing junk
>  * new default       - reject trailing junk
>                        reject attrs > maxtype
>                        reject policy entries that are NLA_UNSPEC
>                        require a policy
>                        strictly validate attributes
>
> The NLA_UNSPEC one can be opted in even in existing code with
> existing userspace in the future, as policies are updated.
>
> In addition, infrastructure is added to opt in to the strict
> attribute validation even for new attributes added to existing
> policies, regardless of the nla_parse() strictness setting
> described above, as new attributes should not be a compatibility
> issue.
>
> Finally, much of this is plumbed through generic netlink etc.,
> and I've included a patch to tag nl80211 with the future attribute
> strictness for reference.
>
> johannes


Hi Johannes,

This series crashes on mlx4 devices with the following kernel panic.

[   92.937629] BUG: unable to handle kernel paging request at 0000000000001023
[   92.940094] #PF error: [normal kernel read fault]
[   92.941731] PGD 80000002291da067 P4D 80000002291da067 PUD 20f295067 PMD 0
[   92.943983] Oops: 0000 [#1] SMP PTI
[   92.945248] CPU: 1 PID: 3976 Comm: devlink Not tainted 5.1.0-rc2-J2742-G9070daeb7d6d #1
[   92.947951] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[   92.950921] RIP: 0010:genl_lock_dumpit+0x10/0xb0
[   92.952502] Code: c7 c7 a0 e6 30 82 e9 ef 96 a7 ff 0f 1f 44 00 00 66
2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 54 55 53 48 8b 46 20 48 8b
28 <0f> b6 55 23 f6 c2 02 75 4d 4c 8b 48 08 83 e2 04 4c 8b 5e 08 80 fa
[   92.958146] RSP: 0018:ffffc90002df7c30 EFLAGS: 00010202
[   92.959817] RAX: ffffc90002df7be8 RBX: ffff888231b0e800 RCX: 0000000000000ec0
[   92.962079] RDX: 00000000000000a8 RSI: ffff888231b0eb30 RDI: ffff88823195b400
[   92.964297] RBP: 0000000000001000 R08: 0000000000001ec0 R09: ffffffff81686c01
[   92.966475] R10: ffffea0008c656c0 R11: 0000000000000040 R12: 0000000000001000
[   92.968575] R13: ffff888231b0eb30 R14: 0000000000000000 R15: ffff888230f63700
[   92.970688] FS:  00007fa7e963bb80(0000) GS:ffff888237a80000(0000) knlGS:0000000000000000
[   92.973158] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   92.974895] CR2: 0000000000001023 CR3: 000000020f8fa001 CR4: 00000000003606a0
[   92.976994] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   92.979033] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   92.981030] Call Trace:
[   92.981870]  netlink_dump+0x166/0x390
[   92.982995]  netlink_recvmsg+0x2ef/0x3e0
[   92.984184]  ? copy_msghdr_from_user+0xd5/0x150
[   92.985540]  ___sys_recvmsg+0xf5/0x250
[   92.986685]  ? netlink_sendmsg+0x120/0x3a0
[   92.987905]  ? __sys_sendto+0x10e/0x140
[   92.989077]  ? __sys_recvmsg+0x5b/0xa0
[   92.990205]  __sys_recvmsg+0x5b/0xa0
[   92.991253]  do_syscall_64+0x48/0x100
[   92.992327]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   92.993743] RIP: 0033:0x7fa7e8d48437
[   92.994783] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00
00 00 8b 05 1a f4 2b 00 48 63 d2 48 63 ff 85 c0 75 18 b8 2f 00 00 00 0f
05 <48> 3d 00 f0 ff ff 77 59 f3 c3 0f 1f 80 00 00 00 00 53 48 89 f3 48
[   92.999640] RSP: 002b:00007ffcee2ae168 EFLAGS: 00000246 ORIG_RAX: 000000000000002f
[   93.001745] RAX: ffffffffffffffda RBX: 0000000000707320 RCX: 00007fa7e8d48437
[   93.003556] RDX: 0000000000000000 RSI: 00007ffcee2ae190 RDI: 0000000000000012
[   93.005383] RBP: 0000000000707260 R08: 00007fa7e900b0e0 R09: 000000000000000c
[   93.007206] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004035e0
[   93.009023] R13: 00007ffcee2ae348 R14: 0000000000000000 R15: 0000000000000000
[   93.010847] Modules linked in: mlx4_en mlx4_ib mlx4_core geneve
ip6_udp_tunnel udp_tunnel bonding ip6_gre ip6_tunnel tunnel6 ip_gre gre
ip_tunnel rdma_ucm ib_uverbs ib_ipoib ib_umad ib_srp scsi_transport_srp
rpcrdma ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm
ib_core [last unloaded: mlx4_core]
[   93.016658] CR2: 0000000000001023
[   93.017489] ---[ end trace 295441d824c2b8ba ]---
[   93.018440] RIP: 0010:genl_lock_dumpit+0x10/0xb0
[   93.019577] Code: c7 c7 a0 e6 30 82 e9 ef 96 a7 ff 0f 1f 44 00 00 66
2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 54 55 53 48 8b 46 20 48 8b
28 <0f> b6 55 23 f6 c2 02 75 4d 4c 8b 48 08 83 e2 04 4c 8b 5e 08 80 fa
[   93.023640] RSP: 0018:ffffc90002df7c30 EFLAGS: 00010202
[   93.024836] RAX: ffffc90002df7be8 RBX: ffff888231b0e800 RCX: 0000000000000ec0
[   93.026321] RDX: 00000000000000a8 RSI: ffff888231b0eb30 RDI: ffff88823195b400
[   93.027867] RBP: 0000000000001000 R08: 0000000000001ec0 R09: ffffffff81686c01
[   93.029333] R10: ffffea0008c656c0 R11: 0000000000000040 R12: 0000000000001000
[   93.030744] R13: ffff888231b0eb30 R14: 0000000000000000 R15: ffff888230f63700
[   93.032187] FS:  00007fa7e963bb80(0000) GS:ffff888237a80000(0000) knlGS:0000000000000000
[   93.033881] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   93.035071] CR2: 0000000000001023 CR3: 000000020f8fa001 CR4: 00000000003606a0
[   93.036502] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   93.037898] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   93.052853] BUG: unable to handle kernel paging request at ffffc90002df7be8
[   93.054466] #PF error: [normal kernel read fault]
[   93.055615] PGD 236931067 P4D 236931067 PUD 236934067 PMD 226489067 PTE 0
[   93.057203] Oops: 0000 [#2] SMP PTI
[   93.058069] CPU: 1 PID: 43 Comm: kworker/1:1 Tainted: G      D 5.1.0-rc2-J2742-G9070daeb7d6d #1
[   93.060241] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[   93.062335] Workqueue: events netlink_sock_destruct_work
[   93.063579] RIP: 0010:genl_lock_done+0xf/0x60
[   93.064641] Code: 48 c7 c7 e0 e6 30 82 e9 8f 6f 19 00 0f 1f 44 00 00
66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 53 48 83 ec 08 48 8b 47
20 <48> 8b 28 31 c0 48 83 7d 18 00 74 2f 48 89 fb 48 c7 c7 e0 e6 30 82
[   93.068791] RSP: 0018:ffffc90000173e50 EFLAGS: 00010286
[   93.070042] RAX: ffffc90002df7be8 RBX: ffff888231b0e800 RCX: 0000000000000000
[   93.071695] RDX: 0000000000000000 RSI: ffff888231b0e94c RDI: ffff888231b0eb30
[   93.073296] RBP: ffff888231b0e800 R08: 000073746e657665 R09: 8080808080808080
[   93.074964] R10: ffffc9000006bdf0 R11: fefefefefefefeff R12: ffff888237aa4200
[   93.076566] R13: 0000000000000000 R14: ffff888237aa0380 R15: 0000000000000000
[   93.078209] FS:  0000000000000000(0000) GS:ffff888237a80000(0000) knlGS:0000000000000000
[   93.080107] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   93.081403] CR2: ffffc90002df7be8 CR3: 0000000229700004 CR4: 00000000003606a0
[   93.083006] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   93.084590] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   93.086277] Call Trace:
[   93.086924]  netlink_sock_destruct+0x2a/0xa0
[   93.087949]  __sk_destruct+0x24/0x180
[   93.088817]  process_one_work+0x17d/0x3b0
[   93.089835]  worker_thread+0x30/0x370
[   93.090670]  ? process_one_work+0x3b0/0x3b0
[   93.091624]  kthread+0x113/0x130
[   93.092382]  ? kthread_park+0x90/0x90
[   93.093260]  ret_from_fork+0x35/0x40
[   93.094067] Modules linked in: mlx4_en mlx4_ib mlx4_core geneve
ip6_udp_tunnel udp_tunnel bonding ip6_gre ip6_tunnel tunnel6 ip_gre gre
ip_tunnel rdma_ucm ib_uverbs ib_ipoib ib_umad ib_srp scsi_transport_srp
rpcrdma ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm
ib_core [last unloaded: mlx4_core]
[   93.099824] CR2: ffffc90002df7be8
[   93.100718] ---[ end trace 295441d824c2b8bb ]---
[   93.101829] RIP: 0010:genl_lock_dumpit+0x10/0xb0
[   93.102919] Code: c7 c7 a0 e6 30 82 e9 ef 96 a7 ff 0f 1f 44 00 00 66
2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 54 55 53 48 8b 46 20 48 8b
28 <0f> b6 55 23 f6 c2 02 75 4d 4c 8b 48 08 83 e2 04 4c 8b 5e 08 80 fa
[   93.107107] RSP: 0018:ffffc90002df7c30 EFLAGS: 00010202
[   93.108382] RAX: ffffc90002df7be8 RBX: ffff888231b0e800 RCX: 0000000000000ec0
[   93.110007] RDX: 00000000000000a8 RSI: ffff888231b0eb30 RDI: ffff88823195b400
[   93.111941] RBP: 0000000000001000 R08: 0000000000001ec0 R09: ffffffff81686c01
[   93.113574] R10: ffffea0008c656c0 R11: 0000000000000040 R12: 0000000000001000
[   93.115220] R13: ffff888231b0eb30 R14: 0000000000000000 R15: ffff888230f63700
[   93.116821] FS:  0000000000000000(0000) GS:ffff888237a80000(0000) knlGS:0000000000000000
[   93.118937] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   93.120592] CR2: ffffc90002df7be8 CR3: 0000000229700004 CR4: 00000000003606a0
[   93.122196] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   93.123791] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   93.528971] BUG: unable to handle kernel paging request at 0000000000001023
[   93.532427] #PF error: [normal kernel read fault]
[   93.534683] PGD 8000000228100067 P4D 8000000228100067 PUD 20f87e067 PMD 0
[   93.537776] Oops: 0000 [#3] SMP PTI
[   93.539379] CPU: 2 PID: 4005 Comm: devlink Tainted: G      D 5.1.0-rc2-J2742-G9070daeb7d6d #1
[   93.543345] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[   93.547167] RIP: 0010:genl_lock_dumpit+0x10/0xb0
[   93.549214] Code: c7 c7 a0 e6 30 82 e9 ef 96 a7 ff 0f 1f 44 00 00 66
2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 54 55 53 48 8b 46 20 48 8b
28 <0f> b6 55 23 f6 c2 02 75 4d 4c 8b 48 08 83 e2 04 4c 8b 5e 08 80 fa
[   93.556214] RSP: 0018:ffffc90002e97c30 EFLAGS: 00010202
[   93.558301] RAX: ffffc90002e97be8 RBX: ffff888232bf8800 RCX: 0000000000000ec0
[   93.561059] RDX: 00000000000000a8 RSI: ffff888232bf8b30 RDI: ffff888228cf9700
[   93.563644] RBP: 0000000000001000 R08: 0000000000001ec0 R09: ffffffff81686c01
[   93.566212] R10: ffffea0008a33e40 R11: 0000000000000040 R12: 0000000000001000
[   93.568773] R13: ffff888232bf8b30 R14: 0000000000000000 R15: ffff888225084000
[   93.571347] FS:  00007f1754062b80(0000) GS:ffff888237b00000(0000) knlGS:0000000000000000
[   93.574320] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   93.591673] CR2: 0000000000001023 CR3: 000000020f30e004 CR4: 00000000003606a0
[   93.593943] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   93.596196] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   93.598425] Call Trace:
[   93.599303]  netlink_dump+0x166/0x390
[   93.600509]  netlink_recvmsg+0x2ef/0x3e0
[   93.601792]  ? copy_msghdr_from_user+0xd5/0x150
[   93.603242]  ___sys_recvmsg+0xf5/0x250
[   93.604477]  ? netlink_sendmsg+0x120/0x3a0
[   93.605816]  ? __sys_sendto+0x10e/0x140
[   93.607076]  ? __sys_recvmsg+0x5b/0xa0
[   93.608308]  __sys_recvmsg+0x5b/0xa0
[   93.609503]  do_syscall_64+0x48/0x100
[   93.610649]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   93.612160] RIP: 0033:0x7f175376f437
[   93.613295] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00
00 00 8b 05 1a f4 2b 00 48 63 d2 48 63 ff 85 c0 75 18 b8 2f 00 00 00 0f
05 <48> 3d 00 f0 ff ff 77 59 f3 c3 0f 1f 80 00 00 00 00 53 48 89 f3 48
[   93.618540] RSP: 002b:00007ffcb7c72218 EFLAGS: 00000246 ORIG_RAX: 000000000000002f
[   93.620790] RAX: ffffffffffffffda RBX: 000000000186d320 RCX: 00007f175376f437
[   93.622768] RDX: 0000000000000000 RSI: 00007ffcb7c72240 RDI: 000000000000000c
[   93.624688] RBP: 000000000186d260 R08: 00007f1753a320e0 R09: 000000000000000c
[   93.626610] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004035e0
[   93.628533] R13: 00007ffcb7c723f8 R14: 0000000000000000 R15: 0000000000000000
[   93.630457] Modules linked in: mlx4_en mlx4_ib mlx4_core geneve
ip6_udp_tunnel udp_tunnel bonding ip6_gre ip6_tunnel tunnel6 ip_gre gre
ip_tunnel rdma_ucm ib_uverbs ib_ipoib ib_umad ib_srp scsi_transport_srp
rpcrdma ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm
ib_core [last unloaded: mlx4_core]
[   93.637391] CR2: 0000000000001023
[   93.638348] ---[ end trace 295441d824c2b8bc ]---
[   93.639610] RIP: 0010:genl_lock_dumpit+0x10/0xb0
[   93.640876] Code: c7 c7 a0 e6 30 82 e9 ef 96 a7 ff 0f 1f 44 00 00 66
2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 54 55 53 48 8b 46 20 48 8b
28 <0f> b6 55 23 f6 c2 02 75 4d 4c 8b 48 08 83 e2 04 4c 8b 5e 08 80 fa
[   93.645621] RSP: 0018:ffffc90002df7c30 EFLAGS: 00010202
[   93.646966] RAX: ffffc90002df7be8 RBX: ffff888231b0e800 RCX: 0000000000000ec0
[   93.648733] RDX: 00000000000000a8 RSI: ffff888231b0eb30 RDI: ffff88823195b400
[   93.650510] RBP: 0000000000001000 R08: 0000000000001ec0 R09: ffffffff81686c01
[   93.652283] R10: ffffea0008c656c0 R11: 0000000000000040 R12: 0000000000001000
[   93.654061] R13: ffff888231b0eb30 R14: 0000000000000000 R15: ffff888230f63700
[   93.655803] FS:  00007f1754062b80(0000) GS:ffff888237b00000(0000) knlGS:0000000000000000
[   93.657761] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   93.658951] CR2: 0000000000001023 CR3: 000000020f30e004 CR4: 00000000003606a0
[   93.660431] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   93.661971] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   93.684915] BUG: unable to handle kernel paging request at ffffc90002e97be8
[   93.686561] #PF error: [normal kernel read fault]
[   93.687650] PGD 236931067 P4D 236931067 PUD 236934067 PMD 228084067 PTE 0
[   93.689182] Oops: 0000 [#4] SMP PTI
[   93.690035] CPU: 2 PID: 38 Comm: kworker/2:1 Tainted: G      D 5.1.0-rc2-J2742-G9070daeb7d6d #1
[   93.692162] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[   93.694147] Workqueue: events netlink_sock_destruct_work
[   93.695381] RIP: 0010:genl_lock_done+0xf/0x60
[   93.696387] Code: 48 c7 c7 e0 e6 30 82 e9 8f 6f 19 00 0f 1f 44 00 00
66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 53 48 83 ec 08 48 8b 47
20 <48> 8b 28 31 c0 48 83 7d 18 00 74 2f 48 89 fb 48 c7 c7 e0 e6 30 82
[   93.700436] RSP: 0018:ffffc9000014be50 EFLAGS: 00010286
[   93.701605] RAX: ffffc90002e97be8 RBX: ffff888232bf8800 RCX: 0000000000000000
[   93.703153] RDX: 0000000000000000 RSI: ffff888232bf894c RDI: ffff888232bf8b30
[   93.704712] RBP: ffff888232bf8800 R08: 000073746e657665 R09: 8080808080808080
[   93.706351] R10: ffffc900000c3df0 R11: fefefefefefefeff R12: ffff888237b24200
[   93.707943] R13: 0000000000000000 R14: ffff888237b20380 R15: 0000000000000000
[   93.709567] FS:  0000000000000000(0000) GS:ffff888237b00000(0000) knlGS:0000000000000000
[   93.711426] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   93.712712] CR2: ffffc90002e97be8 CR3: 000000021ce62006 CR4: 00000000003606a0
[   93.714299] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   93.715888] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   93.717426] Call Trace:
[   93.718093]  netlink_sock_destruct+0x2a/0xa0
[   93.719157]  __sk_destruct+0x24/0x180
[   93.720027]  process_one_work+0x17d/0x3b0
[   93.721033]  worker_thread+0x30/0x370
[   93.721946]  ? process_one_work+0x3b0/0x3b0
[   93.722926]  kthread+0x113/0x130
[   93.723753]  ? kthread_park+0x90/0x90
[   93.724606]  ret_from_fork+0x35/0x40
[   93.725494] Modules linked in: mlx4_en mlx4_ib mlx4_core geneve
ip6_udp_tunnel udp_tunnel bonding ip6_gre ip6_tunnel tunnel6 ip_gre gre
ip_tunnel rdma_ucm ib_uverbs ib_ipoib ib_umad ib_srp scsi_transport_srp
rpcrdma ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm
ib_core [last unloaded: mlx4_core]
[   93.731221] CR2: ffffc90002e97be8
[   93.732069] ---[ end trace 295441d824c2b8bd ]---
[   93.733128] RIP: 0010:genl_lock_dumpit+0x10/0xb0
[   93.734186] Code: c7 c7 a0 e6 30 82 e9 ef 96 a7 ff 0f 1f 44 00 00 66
2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 54 55 53 48 8b 46 20 48 8b
28 <0f> b6 55 23 f6 c2 02 75 4d 4c 8b 48 08 83 e2 04 4c 8b 5e 08 80 fa
[   93.738319] RSP: 0018:ffffc90002df7c30 EFLAGS: 00010202
[   93.739515] RAX: ffffc90002df7be8 RBX: ffff888231b0e800 RCX: 0000000000000ec0
[   93.741111] RDX: 00000000000000a8 RSI: ffff888231b0eb30 RDI: ffff88823195b400
[   93.742665] RBP: 0000000000001000 R08: 0000000000001ec0 R09: ffffffff81686c01
[   93.744233] R10: ffffea0008c656c0 R11: 0000000000000040 R12: 0000000000001000
[   93.745866] R13: ffff888231b0eb30 R14: 0000000000000000 R15: ffff888230f63700
[   93.747438] FS:  0000000000000000(0000) GS:ffff888237b00000(0000) knlGS:0000000000000000
[   93.749209] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   93.750540] CR2: ffffc90002e97be8 CR3: 000000021ce62006 C00000000000 DR1: 0000000000000000




>
>
Johannes Berg April 8, 2019, 9:01 a.m. UTC | #8
> This series crashes on mlx4 devices with the following kernel panic.

Yeah, I know. Like I said elsewhere on the thread, I accidentally sent
out the wrong branch (not realizing I had made two). :-(

This should work:
https://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next.git/log/?h=netlink-validation

johannes