Message ID | 20221110122606.383352-1-shaozhengchao@huawei.com (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] net/9p: fix issue of list_del corruption in p9_fd_cancel() | expand |
Zhengchao Shao wrote on Thu, Nov 10, 2022 at 08:26:06PM +0800: > Syz reported the following issue: > kernel BUG at lib/list_debug.c:53! > invalid opcode: 0000 [#1] PREEMPT SMP KASAN > RIP: 0010:__list_del_entry_valid.cold+0x5c/0x72 > Call Trace: > <TASK> > p9_fd_cancel+0xb1/0x270 > p9_client_rpc+0x8ea/0xba0 > p9_client_create+0x9c0/0xed0 > v9fs_session_init+0x1e0/0x1620 > v9fs_mount+0xba/0xb80 > legacy_get_tree+0x103/0x200 > vfs_get_tree+0x89/0x2d0 > path_mount+0x4c0/0x1ac0 > __x64_sys_mount+0x33b/0x430 > do_syscall_64+0x35/0x80 > entry_SYSCALL_64_after_hwframe+0x46/0xb0 > </TASK> > > The process is as follows: > Thread A: Thread B: > p9_poll_workfn() p9_client_create() > ... ... > p9_conn_cancel() p9_fd_cancel() > list_del() ... > ... list_del() //list_del > corruption > There is no lock protection when deleting list in p9_conn_cancel(). After > deleting list in Thread A, thread B will delete the same list again. It > will cause issue of list_del corruption. Thanks! I'd add a couple of lines here describing the actual fix. Something like this? --- Setting req->status to REQ_STATUS_ERROR under lock prevents other cleanup paths from trying to manipulate req_list. The other thread can safely check req->status because it still holds a reference to req at this point. --- With that out of the way, it's a good idea; I didn't remember that p9_fd_cancel (and cancelled) check for req status before acting on it. This really feels like whack-a-mole, but I'd say this is one step better. Please tell me if you want to send a v2 with your words, or I'll just pick this up with my suggestion and submit to Linus in a week-ish after testing. No point in waiting a full cycle for this. > Fixes: 52f1c45dde91 ("9p: trans_fd/p9_conn_cancel: drop client lock earlier") > Reported-by: syzbot+9b69b8d10ab4a7d88056@syzkaller.appspotmail.com > Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> > --- > v2: set req status when removing list (I don't recall seeing a v1?) > --- > net/9p/trans_fd.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c > index 56a186768750..bd28e63d7666 100644 > --- a/net/9p/trans_fd.c > +++ b/net/9p/trans_fd.c > @@ -202,9 +202,11 @@ static void p9_conn_cancel(struct p9_conn *m, int err) > > list_for_each_entry_safe(req, rtmp, &m->req_list, req_list) { > list_move(&req->req_list, &cancel_list); > + req->status = REQ_STATUS_ERROR; > } > list_for_each_entry_safe(req, rtmp, &m->unsent_req_list, req_list) { > list_move(&req->req_list, &cancel_list); > + req->status = REQ_STATUS_ERROR; > } > > spin_unlock(&m->req_lock); -- Dominique
On 2022/11/10 20:51, asmadeus@codewreck.org wrote: > Zhengchao Shao wrote on Thu, Nov 10, 2022 at 08:26:06PM +0800: >> Syz reported the following issue: >> kernel BUG at lib/list_debug.c:53! >> invalid opcode: 0000 [#1] PREEMPT SMP KASAN >> RIP: 0010:__list_del_entry_valid.cold+0x5c/0x72 >> Call Trace: >> <TASK> >> p9_fd_cancel+0xb1/0x270 >> p9_client_rpc+0x8ea/0xba0 >> p9_client_create+0x9c0/0xed0 >> v9fs_session_init+0x1e0/0x1620 >> v9fs_mount+0xba/0xb80 >> legacy_get_tree+0x103/0x200 >> vfs_get_tree+0x89/0x2d0 >> path_mount+0x4c0/0x1ac0 >> __x64_sys_mount+0x33b/0x430 >> do_syscall_64+0x35/0x80 >> entry_SYSCALL_64_after_hwframe+0x46/0xb0 >> </TASK> >> >> The process is as follows: >> Thread A: Thread B: >> p9_poll_workfn() p9_client_create() >> ... ... >> p9_conn_cancel() p9_fd_cancel() >> list_del() ... >> ... list_del() //list_del >> corruption >> There is no lock protection when deleting list in p9_conn_cancel(). After >> deleting list in Thread A, thread B will delete the same list again. It >> will cause issue of list_del corruption. > > Thanks! > > I'd add a couple of lines here describing the actual fix. > Something like this? > --- > Setting req->status to REQ_STATUS_ERROR under lock prevents other > cleanup paths from trying to manipulate req_list. > The other thread can safely check req->status because it still holds a > reference to req at this point. > --- > > With that out of the way, it's a good idea; I didn't remember that > p9_fd_cancel (and cancelled) check for req status before acting on it. > This really feels like whack-a-mole, but I'd say this is one step > better. > > Please tell me if you want to send a v2 with your words, or I'll just > pick this up with my suggestion and submit to Linus in a week-ish after > testing. No point in waiting a full cycle for this. > > Hi Dominique: Thank you for your review. Your suggestion looks good to me, and please add your suggestion. :) >> Fixes: 52f1c45dde91 ("9p: trans_fd/p9_conn_cancel: drop client lock earlier") >> Reported-by: syzbot+9b69b8d10ab4a7d88056@syzkaller.appspotmail.com >> Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> >> --- >> v2: set req status when removing list > > (I don't recall seeing a v1?) > Sorry, please ignore this notes. >> --- >> net/9p/trans_fd.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c >> index 56a186768750..bd28e63d7666 100644 >> --- a/net/9p/trans_fd.c >> +++ b/net/9p/trans_fd.c >> @@ -202,9 +202,11 @@ static void p9_conn_cancel(struct p9_conn *m, int err) >> >> list_for_each_entry_safe(req, rtmp, &m->req_list, req_list) { >> list_move(&req->req_list, &cancel_list); >> + req->status = REQ_STATUS_ERROR; >> } >> list_for_each_entry_safe(req, rtmp, &m->unsent_req_list, req_list) { >> list_move(&req->req_list, &cancel_list); >> + req->status = REQ_STATUS_ERROR; >> } >> >> spin_unlock(&m->req_lock); > > -- > Dominique
shaozhengchao wrote on Fri, Nov 11, 2022 at 09:23:12AM +0800: > > Please tell me if you want to send a v2 with your words, or I'll just > > pick this up with my suggestion and submit to Linus in a week-ish after > > testing. No point in waiting a full cycle for this. > > Thank you for your review. Your suggestion looks good to me, and > please add your suggestion. :) I've done quick checks with a tcp server and pushed it for next. I'll try to remember to send it to Linus mid next-week.
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c index 56a186768750..bd28e63d7666 100644 --- a/net/9p/trans_fd.c +++ b/net/9p/trans_fd.c @@ -202,9 +202,11 @@ static void p9_conn_cancel(struct p9_conn *m, int err) list_for_each_entry_safe(req, rtmp, &m->req_list, req_list) { list_move(&req->req_list, &cancel_list); + req->status = REQ_STATUS_ERROR; } list_for_each_entry_safe(req, rtmp, &m->unsent_req_list, req_list) { list_move(&req->req_list, &cancel_list); + req->status = REQ_STATUS_ERROR; } spin_unlock(&m->req_lock);
Syz reported the following issue: kernel BUG at lib/list_debug.c:53! invalid opcode: 0000 [#1] PREEMPT SMP KASAN RIP: 0010:__list_del_entry_valid.cold+0x5c/0x72 Call Trace: <TASK> p9_fd_cancel+0xb1/0x270 p9_client_rpc+0x8ea/0xba0 p9_client_create+0x9c0/0xed0 v9fs_session_init+0x1e0/0x1620 v9fs_mount+0xba/0xb80 legacy_get_tree+0x103/0x200 vfs_get_tree+0x89/0x2d0 path_mount+0x4c0/0x1ac0 __x64_sys_mount+0x33b/0x430 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 </TASK> The process is as follows: Thread A: Thread B: p9_poll_workfn() p9_client_create() ... ... p9_conn_cancel() p9_fd_cancel() list_del() ... ... list_del() //list_del corruption There is no lock protection when deleting list in p9_conn_cancel(). After deleting list in Thread A, thread B will delete the same list again. It will cause issue of list_del corruption. Fixes: 52f1c45dde91 ("9p: trans_fd/p9_conn_cancel: drop client lock earlier") Reported-by: syzbot+9b69b8d10ab4a7d88056@syzkaller.appspotmail.com Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> --- v2: set req status when removing list --- net/9p/trans_fd.c | 2 ++ 1 file changed, 2 insertions(+)