Message ID | 20200121124813.13332-1-sunke32@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | nbd: add a flush_workqueue in nbd_start_device | expand |
On 1/21/20 7:48 AM, Sun Ke wrote: > When kzalloc fail, may cause trying to destroy the > workqueue from inside the workqueue. > > If num_connections is m (2 < m), and NO.1 ~ NO.n > (1 < n < m) kzalloc are successful. The NO.(n + 1) > failed. Then, nbd_start_device will return ENOMEM > to nbd_start_device_ioctl, and nbd_start_device_ioctl > will return immediately without running flush_workqueue. > However, we still have n recv threads. If nbd_release > run first, recv threads may have to drop the last > config_refs and try to destroy the workqueue from > inside the workqueue. > > To fix it, add a flush_workqueue in nbd_start_device. > > Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs") > Signed-off-by: Sun Ke <sunke32@huawei.com> > --- > drivers/block/nbd.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c > index b4607dd96185..dd1f8c2c6169 100644 > --- a/drivers/block/nbd.c > +++ b/drivers/block/nbd.c > @@ -1264,7 +1264,12 @@ static int nbd_start_device(struct nbd_device *nbd) > > args = kzalloc(sizeof(*args), GFP_KERNEL); > if (!args) { > - sock_shutdown(nbd); > + if (i == 0) > + sock_shutdown(nbd); > + else { > + sock_shutdown(nbd); > + flush_workqueue(nbd->recv_workq); > + } Just for readability sake why don't we just flush_workqueue() unconditionally, and add a comment so we know why in the future. Thanks, Josef
On 1/21/20 7:00 AM, Josef Bacik wrote: > On 1/21/20 7:48 AM, Sun Ke wrote: >> When kzalloc fail, may cause trying to destroy the >> workqueue from inside the workqueue. >> >> If num_connections is m (2 < m), and NO.1 ~ NO.n >> (1 < n < m) kzalloc are successful. The NO.(n + 1) >> failed. Then, nbd_start_device will return ENOMEM >> to nbd_start_device_ioctl, and nbd_start_device_ioctl >> will return immediately without running flush_workqueue. >> However, we still have n recv threads. If nbd_release >> run first, recv threads may have to drop the last >> config_refs and try to destroy the workqueue from >> inside the workqueue. >> >> To fix it, add a flush_workqueue in nbd_start_device. >> >> Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs") >> Signed-off-by: Sun Ke <sunke32@huawei.com> >> --- >> drivers/block/nbd.c | 7 ++++++- >> 1 file changed, 6 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c >> index b4607dd96185..dd1f8c2c6169 100644 >> --- a/drivers/block/nbd.c >> +++ b/drivers/block/nbd.c >> @@ -1264,7 +1264,12 @@ static int nbd_start_device(struct nbd_device *nbd) >> >> args = kzalloc(sizeof(*args), GFP_KERNEL); >> if (!args) { >> - sock_shutdown(nbd); >> + if (i == 0) >> + sock_shutdown(nbd); >> + else { >> + sock_shutdown(nbd); >> + flush_workqueue(nbd->recv_workq); >> + } > > Just for readability sake why don't we just flush_workqueue() > unconditionally, and add a comment so we know why in the future. Or maybe just make it: sock_shutdown(nbd); if (i) flush_workqueue(nbd->recv_workq); which does the same thing, but is still readable. The current code with the shutdown duplication is just a bit odd. Needs a comment either way.
在 2020/1/22 5:25, Jens Axboe 写道: > On 1/21/20 7:00 AM, Josef Bacik wrote: >> On 1/21/20 7:48 AM, Sun Ke wrote: >>> When kzalloc fail, may cause trying to destroy the >>> workqueue from inside the workqueue. >>> >>> If num_connections is m (2 < m), and NO.1 ~ NO.n >>> (1 < n < m) kzalloc are successful. The NO.(n + 1) >>> failed. Then, nbd_start_device will return ENOMEM >>> to nbd_start_device_ioctl, and nbd_start_device_ioctl >>> will return immediately without running flush_workqueue. >>> However, we still have n recv threads. If nbd_release >>> run first, recv threads may have to drop the last >>> config_refs and try to destroy the workqueue from >>> inside the workqueue. >>> >>> To fix it, add a flush_workqueue in nbd_start_device. >>> >>> Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs") >>> Signed-off-by: Sun Ke <sunke32@huawei.com> >>> --- >>> drivers/block/nbd.c | 7 ++++++- >>> 1 file changed, 6 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c >>> index b4607dd96185..dd1f8c2c6169 100644 >>> --- a/drivers/block/nbd.c >>> +++ b/drivers/block/nbd.c >>> @@ -1264,7 +1264,12 @@ static int nbd_start_device(struct nbd_device *nbd) >>> >>> args = kzalloc(sizeof(*args), GFP_KERNEL); >>> if (!args) { >>> - sock_shutdown(nbd); >>> + if (i == 0) >>> + sock_shutdown(nbd); >>> + else { >>> + sock_shutdown(nbd); >>> + flush_workqueue(nbd->recv_workq); >>> + } >> >> Just for readability sake why don't we just flush_workqueue() >> unconditionally, and add a comment so we know why in the future. > > Or maybe just make it: > > sock_shutdown(nbd); > if (i) > flush_workqueue(nbd->recv_workq); > > which does the same thing, but is still readable. The current code with > the shutdown duplication is just a bit odd. Needs a comment either way. > OK, I will improve it in my v2 patch. Thanks, Sun Ke
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index b4607dd96185..dd1f8c2c6169 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -1264,7 +1264,12 @@ static int nbd_start_device(struct nbd_device *nbd) args = kzalloc(sizeof(*args), GFP_KERNEL); if (!args) { - sock_shutdown(nbd); + if (i == 0) + sock_shutdown(nbd); + else { + sock_shutdown(nbd); + flush_workqueue(nbd->recv_workq); + } return -ENOMEM; } sk_set_memalloc(config->socks[i]->sock->sk);
When kzalloc fail, may cause trying to destroy the workqueue from inside the workqueue. If num_connections is m (2 < m), and NO.1 ~ NO.n (1 < n < m) kzalloc are successful. The NO.(n + 1) failed. Then, nbd_start_device will return ENOMEM to nbd_start_device_ioctl, and nbd_start_device_ioctl will return immediately without running flush_workqueue. However, we still have n recv threads. If nbd_release run first, recv threads may have to drop the last config_refs and try to destroy the workqueue from inside the workqueue. To fix it, add a flush_workqueue in nbd_start_device. Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs") Signed-off-by: Sun Ke <sunke32@huawei.com> --- drivers/block/nbd.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)