Message ID | 20210128201418.607640-4-rvkagan@yandex-team.ru (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | block/nbd: fix crashers in reconnect while migrating | expand |
28.01.2021 23:14, Roman Kagan wrote: > NBD reconnect logic considers the error code from the functions that > read NBD messages to tell if reconnect should be attempted or not: it is > attempted on -EIO, otherwise the client transitions to NBD_CLIENT_QUIT > state (see nbd_channel_error). This error code is propagated from the > primitives like nbd_read. > > The problem, however, is that nbd_read itself turns every error into -1 > rather than -EIO. As a result, if the NBD server happens to die while > sending the message, the client in QEMU receives less data than it > expects, considers it as a fatal error, and wouldn't attempt > reestablishing the connection. > > Fix it by turning every negative return from qio_channel_read_all into > -EIO returned from nbd_read. Apparently that was the original behavior, > but got broken later. Also adjust nbd_readXX to follow. > > Fixes: e6798f06a6 ("nbd: generalize usage of nbd_read") > Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Really looks like a bug in e6798f06a6: it changes error code from -EIO to -1 without any reasoning. > --- > include/block/nbd.h | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/include/block/nbd.h b/include/block/nbd.h > index 4a52a43ef5..5f34d23bb0 100644 > --- a/include/block/nbd.h > +++ b/include/block/nbd.h > @@ -364,7 +364,7 @@ static inline int nbd_read(QIOChannel *ioc, void *buffer, size_t size, > if (desc) { > error_prepend(errp, "Failed to read %s: ", desc); > } > - return -1; > + return ret; > } > > return 0; > @@ -375,8 +375,9 @@ static inline int nbd_read##bits(QIOChannel *ioc, \ > uint##bits##_t *val, \ > const char *desc, Error **errp) \ > { \ > - if (nbd_read(ioc, val, sizeof(*val), desc, errp) < 0) { \ > - return -1; \ > + int ret = nbd_read(ioc, val, sizeof(*val), desc, errp); \ > + if (ret < 0) { \ > + return ret; \ > } \ > *val = be##bits##_to_cpu(*val); \ > return 0; \ >
diff --git a/include/block/nbd.h b/include/block/nbd.h index 4a52a43ef5..5f34d23bb0 100644 --- a/include/block/nbd.h +++ b/include/block/nbd.h @@ -364,7 +364,7 @@ static inline int nbd_read(QIOChannel *ioc, void *buffer, size_t size, if (desc) { error_prepend(errp, "Failed to read %s: ", desc); } - return -1; + return ret; } return 0; @@ -375,8 +375,9 @@ static inline int nbd_read##bits(QIOChannel *ioc, \ uint##bits##_t *val, \ const char *desc, Error **errp) \ { \ - if (nbd_read(ioc, val, sizeof(*val), desc, errp) < 0) { \ - return -1; \ + int ret = nbd_read(ioc, val, sizeof(*val), desc, errp); \ + if (ret < 0) { \ + return ret; \ } \ *val = be##bits##_to_cpu(*val); \ return 0; \
NBD reconnect logic considers the error code from the functions that read NBD messages to tell if reconnect should be attempted or not: it is attempted on -EIO, otherwise the client transitions to NBD_CLIENT_QUIT state (see nbd_channel_error). This error code is propagated from the primitives like nbd_read. The problem, however, is that nbd_read itself turns every error into -1 rather than -EIO. As a result, if the NBD server happens to die while sending the message, the client in QEMU receives less data than it expects, considers it as a fatal error, and wouldn't attempt reestablishing the connection. Fix it by turning every negative return from qio_channel_read_all into -EIO returned from nbd_read. Apparently that was the original behavior, but got broken later. Also adjust nbd_readXX to follow. Fixes: e6798f06a6 ("nbd: generalize usage of nbd_read") Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru> --- include/block/nbd.h | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)