diff mbox series

[net,3/4] bonding: fix xfrm real_dev null pointer dereference

Message ID 20240816114813.326645-4-razor@blackwall.org (mailing list archive)
State Accepted
Commit f8cde9805981c50d0c029063dc7d82821806fc44
Delegated to: Netdev Maintainers
Headers show
Series bonding: fix xfrm offload bugs | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 16 this patch: 16
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 1 maintainers not CCed: bpf@vger.kernel.org
netdev/build_clang success Errors and warnings before: 16 this patch: 16
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 16 this patch: 16
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 7 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 13 this patch: 13
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-08-20--09-00 (tests: 712)

Commit Message

Nikolay Aleksandrov Aug. 16, 2024, 11:48 a.m. UTC
We shouldn't set real_dev to NULL because packets can be in transit and
xfrm might call xdo_dev_offload_ok() in parallel. All callbacks assume
real_dev is set.

 Example trace:
 kernel: BUG: unable to handle page fault for address: 0000000000001030
 kernel: bond0: (slave eni0np1): making interface the new active one
 kernel: #PF: supervisor write access in kernel mode
 kernel: #PF: error_code(0x0002) - not-present page
 kernel: PGD 0 P4D 0
 kernel: Oops: 0002 [#1] PREEMPT SMP
 kernel: CPU: 4 PID: 2237 Comm: ping Not tainted 6.7.7+ #12
 kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
 kernel: RIP: 0010:nsim_ipsec_offload_ok+0xc/0x20 [netdevsim]
 kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
 kernel: Code: e0 0f 0b 48 83 7f 38 00 74 de 0f 0b 48 8b 47 08 48 8b 37 48 8b 78 40 e9 b2 e5 9a d7 66 90 0f 1f 44 00 00 48 8b 86 80 02 00 00 <83> 80 30 10 00 00 01 b8 01 00 00 00 c3 0f 1f 80 00 00 00 00 0f 1f
 kernel: bond0: (slave eni0np1): making interface the new active one
 kernel: RSP: 0018:ffffabde81553b98 EFLAGS: 00010246
 kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
 kernel:
 kernel: RAX: 0000000000000000 RBX: ffff9eb404e74900 RCX: ffff9eb403d97c60
 kernel: RDX: ffffffffc090de10 RSI: ffff9eb404e74900 RDI: ffff9eb3c5de9e00
 kernel: RBP: ffff9eb3c0a42000 R08: 0000000000000010 R09: 0000000000000014
 kernel: R10: 7974203030303030 R11: 3030303030303030 R12: 0000000000000000
 kernel: R13: ffff9eb3c5de9e00 R14: ffffabde81553cc8 R15: ffff9eb404c53000
 kernel: FS:  00007f2a77a3ad00(0000) GS:ffff9eb43bd00000(0000) knlGS:0000000000000000
 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 kernel: CR2: 0000000000001030 CR3: 00000001122ab000 CR4: 0000000000350ef0
 kernel: bond0: (slave eni0np1): making interface the new active one
 kernel: Call Trace:
 kernel:  <TASK>
 kernel:  ? __die+0x1f/0x60
 kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
 kernel:  ? page_fault_oops+0x142/0x4c0
 kernel:  ? do_user_addr_fault+0x65/0x670
 kernel:  ? kvm_read_and_reset_apf_flags+0x3b/0x50
 kernel: bond0: (slave eni0np1): making interface the new active one
 kernel:  ? exc_page_fault+0x7b/0x180
 kernel:  ? asm_exc_page_fault+0x22/0x30
 kernel:  ? nsim_bpf_uninit+0x50/0x50 [netdevsim]
 kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
 kernel:  ? nsim_ipsec_offload_ok+0xc/0x20 [netdevsim]
 kernel: bond0: (slave eni0np1): making interface the new active one
 kernel:  bond_ipsec_offload_ok+0x7b/0x90 [bonding]
 kernel:  xfrm_output+0x61/0x3b0
 kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
 kernel:  ip_push_pending_frames+0x56/0x80

Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves")
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
---
 drivers/net/bonding/bond_main.c | 1 -
 1 file changed, 1 deletion(-)

Comments

Hangbin Liu Aug. 19, 2024, 2:54 a.m. UTC | #1
On Fri, Aug 16, 2024 at 02:48:12PM +0300, Nikolay Aleksandrov wrote:
> We shouldn't set real_dev to NULL because packets can be in transit and
> xfrm might call xdo_dev_offload_ok() in parallel. All callbacks assume
> real_dev is set.
> 
>  Example trace:
>  kernel: BUG: unable to handle page fault for address: 0000000000001030
>  kernel: bond0: (slave eni0np1): making interface the new active one
>  kernel: #PF: supervisor write access in kernel mode
>  kernel: #PF: error_code(0x0002) - not-present page
>  kernel: PGD 0 P4D 0
>  kernel: Oops: 0002 [#1] PREEMPT SMP
>  kernel: CPU: 4 PID: 2237 Comm: ping Not tainted 6.7.7+ #12
>  kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
>  kernel: RIP: 0010:nsim_ipsec_offload_ok+0xc/0x20 [netdevsim]
>  kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA

I saw the errors are during bond_ipsec_add_sa_all, which also
set ipsec->xs->xso.real_dev = NULL. Should we fix it there?

Thanks
Hangbin
>  kernel: Code: e0 0f 0b 48 83 7f 38 00 74 de 0f 0b 48 8b 47 08 48 8b 37 48 8b 78 40 e9 b2 e5 9a d7 66 90 0f 1f 44 00 00 48 8b 86 80 02 00 00 <83> 80 30 10 00 00 01 b8 01 00 00 00 c3 0f 1f 80 00 00 00 00 0f 1f
>  kernel: bond0: (slave eni0np1): making interface the new active one
>  kernel: RSP: 0018:ffffabde81553b98 EFLAGS: 00010246
>  kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
>  kernel:
>  kernel: RAX: 0000000000000000 RBX: ffff9eb404e74900 RCX: ffff9eb403d97c60
>  kernel: RDX: ffffffffc090de10 RSI: ffff9eb404e74900 RDI: ffff9eb3c5de9e00
>  kernel: RBP: ffff9eb3c0a42000 R08: 0000000000000010 R09: 0000000000000014
>  kernel: R10: 7974203030303030 R11: 3030303030303030 R12: 0000000000000000
>  kernel: R13: ffff9eb3c5de9e00 R14: ffffabde81553cc8 R15: ffff9eb404c53000
>  kernel: FS:  00007f2a77a3ad00(0000) GS:ffff9eb43bd00000(0000) knlGS:0000000000000000
>  kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  kernel: CR2: 0000000000001030 CR3: 00000001122ab000 CR4: 0000000000350ef0
>  kernel: bond0: (slave eni0np1): making interface the new active one
>  kernel: Call Trace:
>  kernel:  <TASK>
>  kernel:  ? __die+0x1f/0x60
>  kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
>  kernel:  ? page_fault_oops+0x142/0x4c0
>  kernel:  ? do_user_addr_fault+0x65/0x670
>  kernel:  ? kvm_read_and_reset_apf_flags+0x3b/0x50
>  kernel: bond0: (slave eni0np1): making interface the new active one
>  kernel:  ? exc_page_fault+0x7b/0x180
>  kernel:  ? asm_exc_page_fault+0x22/0x30
>  kernel:  ? nsim_bpf_uninit+0x50/0x50 [netdevsim]
>  kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
>  kernel:  ? nsim_ipsec_offload_ok+0xc/0x20 [netdevsim]
>  kernel: bond0: (slave eni0np1): making interface the new active one
>  kernel:  bond_ipsec_offload_ok+0x7b/0x90 [bonding]
>  kernel:  xfrm_output+0x61/0x3b0
>  kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
>  kernel:  ip_push_pending_frames+0x56/0x80
> 
> Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves")
> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
> ---
>  drivers/net/bonding/bond_main.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 65ddb71eebcd..f74bacf071fc 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -582,7 +582,6 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
>  		} else {
>  			slave->dev->xfrmdev_ops->xdo_dev_state_delete(ipsec->xs);
>  		}
> -		ipsec->xs->xso.real_dev = NULL;
>  	}
>  	spin_unlock_bh(&bond->ipsec_lock);
>  	rcu_read_unlock();
> -- 
> 2.44.0
>
Nikolay Aleksandrov Aug. 19, 2024, 7:34 a.m. UTC | #2
On 19/08/2024 05:54, Hangbin Liu wrote:
> On Fri, Aug 16, 2024 at 02:48:12PM +0300, Nikolay Aleksandrov wrote:
>> We shouldn't set real_dev to NULL because packets can be in transit and
>> xfrm might call xdo_dev_offload_ok() in parallel. All callbacks assume
>> real_dev is set.
>>
>>  Example trace:
>>  kernel: BUG: unable to handle page fault for address: 0000000000001030
>>  kernel: bond0: (slave eni0np1): making interface the new active one
>>  kernel: #PF: supervisor write access in kernel mode
>>  kernel: #PF: error_code(0x0002) - not-present page
>>  kernel: PGD 0 P4D 0
>>  kernel: Oops: 0002 [#1] PREEMPT SMP
>>  kernel: CPU: 4 PID: 2237 Comm: ping Not tainted 6.7.7+ #12
>>  kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
>>  kernel: RIP: 0010:nsim_ipsec_offload_ok+0xc/0x20 [netdevsim]
>>  kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
> 
> I saw the errors are during bond_ipsec_add_sa_all, which also
> set ipsec->xs->xso.real_dev = NULL. Should we fix it there?
> 
> Thanks
> Hangbin

Correct, I saw it too but I didn't remove it on purpose. I know it can lead to a
similar error, but the fix is more complicated. I don't believe it's correct to
set real_dev if the SA add failed, so we need to think about a different way
to sync it. To be fair in real life it would be more difficult to hit it because
the device must be in a state where the SA add fails, although it supports
xfrm offload. The problem is that real_dev must be set before attempting the SA
add in the first place.

>>  kernel: Code: e0 0f 0b 48 83 7f 38 00 74 de 0f 0b 48 8b 47 08 48 8b 37 48 8b 78 40 e9 b2 e5 9a d7 66 90 0f 1f 44 00 00 48 8b 86 80 02 00 00 <83> 80 30 10 00 00 01 b8 01 00 00 00 c3 0f 1f 80 00 00 00 00 0f 1f
>>  kernel: bond0: (slave eni0np1): making interface the new active one
>>  kernel: RSP: 0018:ffffabde81553b98 EFLAGS: 00010246
>>  kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
>>  kernel:
>>  kernel: RAX: 0000000000000000 RBX: ffff9eb404e74900 RCX: ffff9eb403d97c60
>>  kernel: RDX: ffffffffc090de10 RSI: ffff9eb404e74900 RDI: ffff9eb3c5de9e00
>>  kernel: RBP: ffff9eb3c0a42000 R08: 0000000000000010 R09: 0000000000000014
>>  kernel: R10: 7974203030303030 R11: 3030303030303030 R12: 0000000000000000
>>  kernel: R13: ffff9eb3c5de9e00 R14: ffffabde81553cc8 R15: ffff9eb404c53000
>>  kernel: FS:  00007f2a77a3ad00(0000) GS:ffff9eb43bd00000(0000) knlGS:0000000000000000
>>  kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>  kernel: CR2: 0000000000001030 CR3: 00000001122ab000 CR4: 0000000000350ef0
>>  kernel: bond0: (slave eni0np1): making interface the new active one
>>  kernel: Call Trace:
>>  kernel:  <TASK>
>>  kernel:  ? __die+0x1f/0x60
>>  kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
>>  kernel:  ? page_fault_oops+0x142/0x4c0
>>  kernel:  ? do_user_addr_fault+0x65/0x670
>>  kernel:  ? kvm_read_and_reset_apf_flags+0x3b/0x50
>>  kernel: bond0: (slave eni0np1): making interface the new active one
>>  kernel:  ? exc_page_fault+0x7b/0x180
>>  kernel:  ? asm_exc_page_fault+0x22/0x30
>>  kernel:  ? nsim_bpf_uninit+0x50/0x50 [netdevsim]
>>  kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
>>  kernel:  ? nsim_ipsec_offload_ok+0xc/0x20 [netdevsim]
>>  kernel: bond0: (slave eni0np1): making interface the new active one
>>  kernel:  bond_ipsec_offload_ok+0x7b/0x90 [bonding]
>>  kernel:  xfrm_output+0x61/0x3b0
>>  kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
>>  kernel:  ip_push_pending_frames+0x56/0x80
>>
>> Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves")
>> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
>> ---
>>  drivers/net/bonding/bond_main.c | 1 -
>>  1 file changed, 1 deletion(-)
>>
>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> index 65ddb71eebcd..f74bacf071fc 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -582,7 +582,6 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
>>  		} else {
>>  			slave->dev->xfrmdev_ops->xdo_dev_state_delete(ipsec->xs);
>>  		}
>> -		ipsec->xs->xso.real_dev = NULL;
>>  	}
>>  	spin_unlock_bh(&bond->ipsec_lock);
>>  	rcu_read_unlock();
>> -- 
>> 2.44.0
>>
Nikolay Aleksandrov Aug. 19, 2024, 8:25 a.m. UTC | #3
On 19/08/2024 10:34, Nikolay Aleksandrov wrote:
> On 19/08/2024 05:54, Hangbin Liu wrote:
>> On Fri, Aug 16, 2024 at 02:48:12PM +0300, Nikolay Aleksandrov wrote:
>>> We shouldn't set real_dev to NULL because packets can be in transit and
>>> xfrm might call xdo_dev_offload_ok() in parallel. All callbacks assume
>>> real_dev is set.
>>>
>>>  Example trace:
>>>  kernel: BUG: unable to handle page fault for address: 0000000000001030
>>>  kernel: bond0: (slave eni0np1): making interface the new active one
>>>  kernel: #PF: supervisor write access in kernel mode
>>>  kernel: #PF: error_code(0x0002) - not-present page
>>>  kernel: PGD 0 P4D 0
>>>  kernel: Oops: 0002 [#1] PREEMPT SMP
>>>  kernel: CPU: 4 PID: 2237 Comm: ping Not tainted 6.7.7+ #12
>>>  kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
>>>  kernel: RIP: 0010:nsim_ipsec_offload_ok+0xc/0x20 [netdevsim]
>>>  kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
>>
>> I saw the errors are during bond_ipsec_add_sa_all, which also
>> set ipsec->xs->xso.real_dev = NULL. Should we fix it there?
>>
>> Thanks
>> Hangbin
> 
> Correct, I saw it too but I didn't remove it on purpose. I know it can lead to a
> similar error, but the fix is more complicated. I don't believe it's correct to
> set real_dev if the SA add failed, so we need to think about a different way
> to sync it. To be fair in real life it would be more difficult to hit it because
> the device must be in a state where the SA add fails, although it supports
> xfrm offload. The problem is that real_dev must be set before attempting the SA
> add in the first place.
> 

Just fyi I do have an idea about an additional bit that is set on successful ops
in combination with a call_rcu to wait for a grace period on error, I'll test it
this week and send a patch if it's good.

>>>  kernel: Code: e0 0f 0b 48 83 7f 38 00 74 de 0f 0b 48 8b 47 08 48 8b 37 48 8b 78 40 e9 b2 e5 9a d7 66 90 0f 1f 44 00 00 48 8b 86 80 02 00 00 <83> 80 30 10 00 00 01 b8 01 00 00 00 c3 0f 1f 80 00 00 00 00 0f 1f
>>>  kernel: bond0: (slave eni0np1): making interface the new active one
>>>  kernel: RSP: 0018:ffffabde81553b98 EFLAGS: 00010246
>>>  kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
>>>  kernel:
>>>  kernel: RAX: 0000000000000000 RBX: ffff9eb404e74900 RCX: ffff9eb403d97c60
>>>  kernel: RDX: ffffffffc090de10 RSI: ffff9eb404e74900 RDI: ffff9eb3c5de9e00
>>>  kernel: RBP: ffff9eb3c0a42000 R08: 0000000000000010 R09: 0000000000000014
>>>  kernel: R10: 7974203030303030 R11: 3030303030303030 R12: 0000000000000000
>>>  kernel: R13: ffff9eb3c5de9e00 R14: ffffabde81553cc8 R15: ffff9eb404c53000
>>>  kernel: FS:  00007f2a77a3ad00(0000) GS:ffff9eb43bd00000(0000) knlGS:0000000000000000
>>>  kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>  kernel: CR2: 0000000000001030 CR3: 00000001122ab000 CR4: 0000000000350ef0
>>>  kernel: bond0: (slave eni0np1): making interface the new active one
>>>  kernel: Call Trace:
>>>  kernel:  <TASK>
>>>  kernel:  ? __die+0x1f/0x60
>>>  kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
>>>  kernel:  ? page_fault_oops+0x142/0x4c0
>>>  kernel:  ? do_user_addr_fault+0x65/0x670
>>>  kernel:  ? kvm_read_and_reset_apf_flags+0x3b/0x50
>>>  kernel: bond0: (slave eni0np1): making interface the new active one
>>>  kernel:  ? exc_page_fault+0x7b/0x180
>>>  kernel:  ? asm_exc_page_fault+0x22/0x30
>>>  kernel:  ? nsim_bpf_uninit+0x50/0x50 [netdevsim]
>>>  kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
>>>  kernel:  ? nsim_ipsec_offload_ok+0xc/0x20 [netdevsim]
>>>  kernel: bond0: (slave eni0np1): making interface the new active one
>>>  kernel:  bond_ipsec_offload_ok+0x7b/0x90 [bonding]
>>>  kernel:  xfrm_output+0x61/0x3b0
>>>  kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
>>>  kernel:  ip_push_pending_frames+0x56/0x80
>>>
>>> Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves")
>>> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
>>> ---
>>>  drivers/net/bonding/bond_main.c | 1 -
>>>  1 file changed, 1 deletion(-)
>>>
>>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>>> index 65ddb71eebcd..f74bacf071fc 100644
>>> --- a/drivers/net/bonding/bond_main.c
>>> +++ b/drivers/net/bonding/bond_main.c
>>> @@ -582,7 +582,6 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
>>>  		} else {
>>>  			slave->dev->xfrmdev_ops->xdo_dev_state_delete(ipsec->xs);
>>>  		}
>>> -		ipsec->xs->xso.real_dev = NULL;
>>>  	}
>>>  	spin_unlock_bh(&bond->ipsec_lock);
>>>  	rcu_read_unlock();
>>> -- 
>>> 2.44.0
>>>
>
Hangbin Liu Aug. 19, 2024, 8:34 a.m. UTC | #4
On Mon, Aug 19, 2024 at 10:34:16AM +0300, Nikolay Aleksandrov wrote:
> On 19/08/2024 05:54, Hangbin Liu wrote:
> > On Fri, Aug 16, 2024 at 02:48:12PM +0300, Nikolay Aleksandrov wrote:
> >> We shouldn't set real_dev to NULL because packets can be in transit and
> >> xfrm might call xdo_dev_offload_ok() in parallel. All callbacks assume
> >> real_dev is set.
> >>
> >>  Example trace:
> >>  kernel: BUG: unable to handle page fault for address: 0000000000001030
> >>  kernel: bond0: (slave eni0np1): making interface the new active one
> >>  kernel: #PF: supervisor write access in kernel mode
> >>  kernel: #PF: error_code(0x0002) - not-present page
> >>  kernel: PGD 0 P4D 0
> >>  kernel: Oops: 0002 [#1] PREEMPT SMP
> >>  kernel: CPU: 4 PID: 2237 Comm: ping Not tainted 6.7.7+ #12
> >>  kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
> >>  kernel: RIP: 0010:nsim_ipsec_offload_ok+0xc/0x20 [netdevsim]
> >>  kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
> > 
> > I saw the errors are during bond_ipsec_add_sa_all, which also
> > set ipsec->xs->xso.real_dev = NULL. Should we fix it there?
> > 
> > Thanks
> > Hangbin
> 
> Correct, I saw it too but I didn't remove it on purpose. I know it can lead to a
> similar error, but the fix is more complicated. I don't believe it's correct to
> set real_dev if the SA add failed, so we need to think about a different way
> to sync it. To be fair in real life it would be more difficult to hit it because
> the device must be in a state where the SA add fails, although it supports
> xfrm offload. The problem is that real_dev must be set before attempting the SA
> add in the first place.

Got it, so this time we only fix the delete path.

Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
diff mbox series

Patch

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 65ddb71eebcd..f74bacf071fc 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -582,7 +582,6 @@  static void bond_ipsec_del_sa_all(struct bonding *bond)
 		} else {
 			slave->dev->xfrmdev_ops->xdo_dev_state_delete(ipsec->xs);
 		}
-		ipsec->xs->xso.real_dev = NULL;
 	}
 	spin_unlock_bh(&bond->ipsec_lock);
 	rcu_read_unlock();