diff mbox series

[net] net/9p: fix issue of list_del corruption in p9_fd_cancel()

Message ID 20221110122606.383352-1-shaozhengchao@huawei.com (mailing list archive)
State Not Applicable
Delegated to: Netdev Maintainers
Headers show
Series [net] net/9p: fix issue of list_del corruption in p9_fd_cancel() | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 1 maintainers not CCed: davem@davemloft.net
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 11 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

shaozhengchao Nov. 10, 2022, 12:26 p.m. UTC
Syz reported the following issue:
kernel BUG at lib/list_debug.c:53!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
RIP: 0010:__list_del_entry_valid.cold+0x5c/0x72
Call Trace:
<TASK>
p9_fd_cancel+0xb1/0x270
p9_client_rpc+0x8ea/0xba0
p9_client_create+0x9c0/0xed0
v9fs_session_init+0x1e0/0x1620
v9fs_mount+0xba/0xb80
legacy_get_tree+0x103/0x200
vfs_get_tree+0x89/0x2d0
path_mount+0x4c0/0x1ac0
__x64_sys_mount+0x33b/0x430
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
</TASK>

The process is as follows:
Thread A:                       Thread B:
p9_poll_workfn()                p9_client_create()
...                                 ...
    p9_conn_cancel()                p9_fd_cancel()
        list_del()                      ...
        ...                             list_del()  //list_del
                                                      corruption
There is no lock protection when deleting list in p9_conn_cancel(). After
deleting list in Thread A, thread B will delete the same list again. It
will cause issue of list_del corruption.

Fixes: 52f1c45dde91 ("9p: trans_fd/p9_conn_cancel: drop client lock earlier")
Reported-by: syzbot+9b69b8d10ab4a7d88056@syzkaller.appspotmail.com
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
---
v2: set req status when removing list
---
 net/9p/trans_fd.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Dominique Martinet Nov. 10, 2022, 12:51 p.m. UTC | #1
Zhengchao Shao wrote on Thu, Nov 10, 2022 at 08:26:06PM +0800:
> Syz reported the following issue:
> kernel BUG at lib/list_debug.c:53!
> invalid opcode: 0000 [#1] PREEMPT SMP KASAN
> RIP: 0010:__list_del_entry_valid.cold+0x5c/0x72
> Call Trace:
> <TASK>
> p9_fd_cancel+0xb1/0x270
> p9_client_rpc+0x8ea/0xba0
> p9_client_create+0x9c0/0xed0
> v9fs_session_init+0x1e0/0x1620
> v9fs_mount+0xba/0xb80
> legacy_get_tree+0x103/0x200
> vfs_get_tree+0x89/0x2d0
> path_mount+0x4c0/0x1ac0
> __x64_sys_mount+0x33b/0x430
> do_syscall_64+0x35/0x80
> entry_SYSCALL_64_after_hwframe+0x46/0xb0
> </TASK>
> 
> The process is as follows:
> Thread A:                       Thread B:
> p9_poll_workfn()                p9_client_create()
> ...                                 ...
>     p9_conn_cancel()                p9_fd_cancel()
>         list_del()                      ...
>         ...                             list_del()  //list_del
>                                                       corruption
> There is no lock protection when deleting list in p9_conn_cancel(). After
> deleting list in Thread A, thread B will delete the same list again. It
> will cause issue of list_del corruption.

Thanks!

I'd add a couple of lines here describing the actual fix.
Something like this?
---
Setting req->status to REQ_STATUS_ERROR under lock prevents other
cleanup paths from trying to manipulate req_list.
The other thread can safely check req->status because it still holds a
reference to req at this point.
---

With that out of the way, it's a good idea; I didn't remember that
p9_fd_cancel (and cancelled) check for req status before acting on it.
This really feels like whack-a-mole, but I'd say this is one step
better.

Please tell me if you want to send a v2 with your words, or I'll just
pick this up with my suggestion and submit to Linus in a week-ish after
testing. No point in waiting a full cycle for this.


> Fixes: 52f1c45dde91 ("9p: trans_fd/p9_conn_cancel: drop client lock earlier")
> Reported-by: syzbot+9b69b8d10ab4a7d88056@syzkaller.appspotmail.com
> Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
> ---
> v2: set req status when removing list

(I don't recall seeing a v1?)

> ---
>  net/9p/trans_fd.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
> index 56a186768750..bd28e63d7666 100644
> --- a/net/9p/trans_fd.c
> +++ b/net/9p/trans_fd.c
> @@ -202,9 +202,11 @@ static void p9_conn_cancel(struct p9_conn *m, int err)
>  
>  	list_for_each_entry_safe(req, rtmp, &m->req_list, req_list) {
>  		list_move(&req->req_list, &cancel_list);
> +		req->status = REQ_STATUS_ERROR;
>  	}
>  	list_for_each_entry_safe(req, rtmp, &m->unsent_req_list, req_list) {
>  		list_move(&req->req_list, &cancel_list);
> +		req->status = REQ_STATUS_ERROR;
>  	}
>  
>  	spin_unlock(&m->req_lock);

--
Dominique
shaozhengchao Nov. 11, 2022, 1:23 a.m. UTC | #2
On 2022/11/10 20:51, asmadeus@codewreck.org wrote:
> Zhengchao Shao wrote on Thu, Nov 10, 2022 at 08:26:06PM +0800:
>> Syz reported the following issue:
>> kernel BUG at lib/list_debug.c:53!
>> invalid opcode: 0000 [#1] PREEMPT SMP KASAN
>> RIP: 0010:__list_del_entry_valid.cold+0x5c/0x72
>> Call Trace:
>> <TASK>
>> p9_fd_cancel+0xb1/0x270
>> p9_client_rpc+0x8ea/0xba0
>> p9_client_create+0x9c0/0xed0
>> v9fs_session_init+0x1e0/0x1620
>> v9fs_mount+0xba/0xb80
>> legacy_get_tree+0x103/0x200
>> vfs_get_tree+0x89/0x2d0
>> path_mount+0x4c0/0x1ac0
>> __x64_sys_mount+0x33b/0x430
>> do_syscall_64+0x35/0x80
>> entry_SYSCALL_64_after_hwframe+0x46/0xb0
>> </TASK>
>>
>> The process is as follows:
>> Thread A:                       Thread B:
>> p9_poll_workfn()                p9_client_create()
>> ...                                 ...
>>      p9_conn_cancel()                p9_fd_cancel()
>>          list_del()                      ...
>>          ...                             list_del()  //list_del
>>                                                        corruption
>> There is no lock protection when deleting list in p9_conn_cancel(). After
>> deleting list in Thread A, thread B will delete the same list again. It
>> will cause issue of list_del corruption.
> 
> Thanks!
> 
> I'd add a couple of lines here describing the actual fix.
> Something like this?
> ---
> Setting req->status to REQ_STATUS_ERROR under lock prevents other
> cleanup paths from trying to manipulate req_list.
> The other thread can safely check req->status because it still holds a
> reference to req at this point.
> ---
> 
> With that out of the way, it's a good idea; I didn't remember that
> p9_fd_cancel (and cancelled) check for req status before acting on it.
> This really feels like whack-a-mole, but I'd say this is one step
> better.
> 
> Please tell me if you want to send a v2 with your words, or I'll just
> pick this up with my suggestion and submit to Linus in a week-ish after
> testing. No point in waiting a full cycle for this.
> 
> 
Hi Dominique:
	Thank you for your review. Your suggestion looks good to me, and
please add your suggestion. :)
>> Fixes: 52f1c45dde91 ("9p: trans_fd/p9_conn_cancel: drop client lock earlier")
>> Reported-by: syzbot+9b69b8d10ab4a7d88056@syzkaller.appspotmail.com
>> Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
>> ---
>> v2: set req status when removing list
> 
> (I don't recall seeing a v1?)
> 
Sorry, please ignore this notes.
>> ---
>>   net/9p/trans_fd.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
>> index 56a186768750..bd28e63d7666 100644
>> --- a/net/9p/trans_fd.c
>> +++ b/net/9p/trans_fd.c
>> @@ -202,9 +202,11 @@ static void p9_conn_cancel(struct p9_conn *m, int err)
>>   
>>   	list_for_each_entry_safe(req, rtmp, &m->req_list, req_list) {
>>   		list_move(&req->req_list, &cancel_list);
>> +		req->status = REQ_STATUS_ERROR;
>>   	}
>>   	list_for_each_entry_safe(req, rtmp, &m->unsent_req_list, req_list) {
>>   		list_move(&req->req_list, &cancel_list);
>> +		req->status = REQ_STATUS_ERROR;
>>   	}
>>   
>>   	spin_unlock(&m->req_lock);
> 
> --
> Dominique
Dominique Martinet Nov. 11, 2022, 2:12 a.m. UTC | #3
shaozhengchao wrote on Fri, Nov 11, 2022 at 09:23:12AM +0800:
> > Please tell me if you want to send a v2 with your words, or I'll just
> > pick this up with my suggestion and submit to Linus in a week-ish after
> > testing. No point in waiting a full cycle for this.
>
> 	Thank you for your review. Your suggestion looks good to me, and
> please add your suggestion. :)

I've done quick checks with a tcp server and pushed it for next.

I'll try to remember to send it to Linus mid next-week.
diff mbox series

Patch

diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 56a186768750..bd28e63d7666 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -202,9 +202,11 @@  static void p9_conn_cancel(struct p9_conn *m, int err)
 
 	list_for_each_entry_safe(req, rtmp, &m->req_list, req_list) {
 		list_move(&req->req_list, &cancel_list);
+		req->status = REQ_STATUS_ERROR;
 	}
 	list_for_each_entry_safe(req, rtmp, &m->unsent_req_list, req_list) {
 		list_move(&req->req_list, &cancel_list);
+		req->status = REQ_STATUS_ERROR;
 	}
 
 	spin_unlock(&m->req_lock);