diff mbox series

[net] net/tls: Remove the context from the list in tls_device_down

Message ID 20220721091127.3209661-1-maximmi@nvidia.com (mailing list archive)
State Accepted
Commit f6336724a4d4220c89a4ec38bca84b03b178b1a3
Delegated to: Netdev Maintainers
Headers show
Series [net] net/tls: Remove the context from the list in tls_device_down | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 22 this patch: 22
netdev/cc_maintainers success CCed 9 of 9 maintainers
netdev/build_clang success Errors and warnings before: 9 this patch: 9
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 19 this patch: 19
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 14 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Maxim Mikityanskiy July 21, 2022, 9:11 a.m. UTC
tls_device_down takes a reference on all contexts it's going to move to
the degraded state (software fallback). If sk_destruct runs afterwards,
it can reduce the reference counter back to 1 and return early without
destroying the context. Then tls_device_down will release the reference
it took and call tls_device_free_ctx. However, the context will still
stay in tls_device_down_list forever. The list will contain an item,
memory for which is released, making a memory corruption possible.

Fix the above bug by properly removing the context from all lists before
any call to tls_device_free_ctx.

Fixes: 3740651bf7e2 ("tls: Fix context leak on tls_device_down")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
---
 net/tls/tls_device.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Jakub Kicinski July 22, 2022, 10:04 p.m. UTC | #1
On Thu, 21 Jul 2022 12:11:27 +0300 Maxim Mikityanskiy wrote:
> tls_device_down takes a reference on all contexts it's going to move to
> the degraded state (software fallback). If sk_destruct runs afterwards,
> it can reduce the reference counter back to 1 and return early without
> destroying the context. Then tls_device_down will release the reference
> it took and call tls_device_free_ctx. However, the context will still
> stay in tls_device_down_list forever. The list will contain an item,
> memory for which is released, making a memory corruption possible.
> 
> Fix the above bug by properly removing the context from all lists before
> any call to tls_device_free_ctx.

SGTM. The tls_device_down_list has no use, tho, is the plan to remove
it later as a cleanup or your upcoming patches make use of it?

We can delete it now if you don't have a preference, either way the fix
is small.
patchwork-bot+netdevbpf@kernel.org July 24, 2022, 8:50 p.m. UTC | #2
Hello:

This patch was applied to netdev/net.git (master)
by David S. Miller <davem@davemloft.net>:

On Thu, 21 Jul 2022 12:11:27 +0300 you wrote:
> tls_device_down takes a reference on all contexts it's going to move to
> the degraded state (software fallback). If sk_destruct runs afterwards,
> it can reduce the reference counter back to 1 and return early without
> destroying the context. Then tls_device_down will release the reference
> it took and call tls_device_free_ctx. However, the context will still
> stay in tls_device_down_list forever. The list will contain an item,
> memory for which is released, making a memory corruption possible.
> 
> [...]

Here is the summary with links:
  - [net] net/tls: Remove the context from the list in tls_device_down
    https://git.kernel.org/netdev/net/c/f6336724a4d4

You are awesome, thank you!
Maxim Mikityanskiy July 25, 2022, 2:35 p.m. UTC | #3
On Fri, 2022-07-22 at 15:04 -0700, Jakub Kicinski wrote:
> On Thu, 21 Jul 2022 12:11:27 +0300 Maxim Mikityanskiy wrote:
> > tls_device_down takes a reference on all contexts it's going to move to
> > the degraded state (software fallback). If sk_destruct runs afterwards,
> > it can reduce the reference counter back to 1 and return early without
> > destroying the context. Then tls_device_down will release the reference
> > it took and call tls_device_free_ctx. However, the context will still
> > stay in tls_device_down_list forever. The list will contain an item,
> > memory for which is released, making a memory corruption possible.
> > 
> > Fix the above bug by properly removing the context from all lists before
> > any call to tls_device_free_ctx.
> 
> SGTM. The tls_device_down_list has no use, tho, is the plan to remove
> it later as a cleanup or your upcoming patches make use of it?

I don't plan to remove it. Right, we never iterate over it, so instead
of moving the context to tls_device_down_list, we can remove it from
list, as long as we check to not remove it second time on destruction.

However, this way we don't gain anything, but lose a debugging
opportunity: for example, when list debugging is enabled, double
list_del will be detected.

So, it doesn't make sense to me to remove this list, but if you still
want to do it, Tariq has a patch for this.

> 
> We can delete it now if you don't have a preference, either way the fix
> is small.
Jakub Kicinski July 25, 2022, 6:37 p.m. UTC | #4
On Mon, 25 Jul 2022 14:35:08 +0000 Maxim Mikityanskiy wrote:
> On Fri, 2022-07-22 at 15:04 -0700, Jakub Kicinski wrote:
> > On Thu, 21 Jul 2022 12:11:27 +0300 Maxim Mikityanskiy wrote:  
> > > tls_device_down takes a reference on all contexts it's going to move to
> > > the degraded state (software fallback). If sk_destruct runs afterwards,
> > > it can reduce the reference counter back to 1 and return early without
> > > destroying the context. Then tls_device_down will release the reference
> > > it took and call tls_device_free_ctx. However, the context will still
> > > stay in tls_device_down_list forever. The list will contain an item,
> > > memory for which is released, making a memory corruption possible.
> > > 
> > > Fix the above bug by properly removing the context from all lists before
> > > any call to tls_device_free_ctx.  
> > 
> > SGTM. The tls_device_down_list has no use, tho, is the plan to remove
> > it later as a cleanup or your upcoming patches make use of it?  
> 
> I don't plan to remove it. Right, we never iterate over it, so instead
> of moving the context to tls_device_down_list, we can remove it from
> list, as long as we check to not remove it second time on destruction.
> 
> However, this way we don't gain anything, but lose a debugging
> opportunity: for example, when list debugging is enabled, double
> list_del will be detected.

I see. I haven't actually checked if list_del_init() would do as well
here.

> So, it doesn't make sense to me to remove this list, but if you still
> want to do it, Tariq has a patch for this.

Fine either way, thanks for the explanation.
diff mbox series

Patch

diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index 879b9024678e..9975df34d9c2 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -1376,8 +1376,13 @@  static int tls_device_down(struct net_device *netdev)
 		 * by tls_device_free_ctx. rx_conf and tx_conf stay in TLS_HW.
 		 * Now release the ref taken above.
 		 */
-		if (refcount_dec_and_test(&ctx->refcount))
+		if (refcount_dec_and_test(&ctx->refcount)) {
+			/* sk_destruct ran after tls_device_down took a ref, and
+			 * it returned early. Complete the destruction here.
+			 */
+			list_del(&ctx->list);
 			tls_device_free_ctx(ctx);
+		}
 	}
 
 	up_write(&device_offload_lock);