Message ID | 20210603133914.79072-1-jlayton@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | ceph: decoding error in ceph_update_snap_realm should return -EIO | expand |
On Thu, Jun 3, 2021 at 3:39 PM Jeff Layton <jlayton@kernel.org> wrote: > > Currently ceph_update_snap_realm returns -EINVAL when it hits a decoding > error, which is the wrong error code. -EINVAL implies that the user gave > us a bogus argument to a syscall or something similar. -EIO is more > descriptive when we hit a decoding error. > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > --- > fs/ceph/snap.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c > index d07c1c6ac8fb..f8cac2abab3f 100644 > --- a/fs/ceph/snap.c > +++ b/fs/ceph/snap.c > @@ -807,7 +807,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc, > return 0; > > bad: > - err = -EINVAL; > + err = -EIO; > fail: > if (realm && !IS_ERR(realm)) > ceph_put_snap_realm(mdsc, realm); Hi Jeff, Is this error code propagated anywhere important? The vast majority of functions that have something to do with decoding use EINVAL as a default (usually out-of-bounds) error. I agree that it is totally ambiguous, but EIO doesn't seem to be any better to me. If there is a desire to separate these errors, I think we need to pick something much more distinctive. Thanks, Ilya
On Thu, 2021-06-03 at 15:57 +0200, Ilya Dryomov wrote: > On Thu, Jun 3, 2021 at 3:39 PM Jeff Layton <jlayton@kernel.org> wrote: > > > > Currently ceph_update_snap_realm returns -EINVAL when it hits a decoding > > error, which is the wrong error code. -EINVAL implies that the user gave > > us a bogus argument to a syscall or something similar. -EIO is more > > descriptive when we hit a decoding error. > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > --- > > fs/ceph/snap.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c > > index d07c1c6ac8fb..f8cac2abab3f 100644 > > --- a/fs/ceph/snap.c > > +++ b/fs/ceph/snap.c > > @@ -807,7 +807,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc, > > return 0; > > > > bad: > > - err = -EINVAL; > > + err = -EIO; > > fail: > > if (realm && !IS_ERR(realm)) > > ceph_put_snap_realm(mdsc, realm); > > Hi Jeff, > > Is this error code propagated anywhere important? > > The vast majority of functions that have something to do with decoding > use EINVAL as a default (usually out-of-bounds) error. I agree that it > is totally ambiguous, but EIO doesn't seem to be any better to me. If > there is a desire to separate these errors, I think we need to pick > something much more distinctive. > When I see EINVAL, I automatically wonder what bogus argument I passed in somewhere, so I find that particularly deceptive here where the bug is either from the MDS or we had some sort of low-level socket handling problem. OTOH, you have a good point. The callers universally ignore the error code from this function. Perhaps we ought to just log a pr_warn message or something if the decoding fails here instead?
On Thu, Jun 3, 2021 at 4:02 PM Jeff Layton <jlayton@kernel.org> wrote: > > On Thu, 2021-06-03 at 15:57 +0200, Ilya Dryomov wrote: > > On Thu, Jun 3, 2021 at 3:39 PM Jeff Layton <jlayton@kernel.org> wrote: > > > > > > Currently ceph_update_snap_realm returns -EINVAL when it hits a decoding > > > error, which is the wrong error code. -EINVAL implies that the user gave > > > us a bogus argument to a syscall or something similar. -EIO is more > > > descriptive when we hit a decoding error. > > > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > > --- > > > fs/ceph/snap.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c > > > index d07c1c6ac8fb..f8cac2abab3f 100644 > > > --- a/fs/ceph/snap.c > > > +++ b/fs/ceph/snap.c > > > @@ -807,7 +807,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc, > > > return 0; > > > > > > bad: > > > - err = -EINVAL; > > > + err = -EIO; > > > fail: > > > if (realm && !IS_ERR(realm)) > > > ceph_put_snap_realm(mdsc, realm); > > > > Hi Jeff, > > > > Is this error code propagated anywhere important? > > > > The vast majority of functions that have something to do with decoding > > use EINVAL as a default (usually out-of-bounds) error. I agree that it > > is totally ambiguous, but EIO doesn't seem to be any better to me. If > > there is a desire to separate these errors, I think we need to pick > > something much more distinctive. > > > > When I see EINVAL, I automatically wonder what bogus argument I passed > in somewhere, so I find that particularly deceptive here where the bug > is either from the MDS or we had some sort of low-level socket handling > problem. > > OTOH, you have a good point. The callers universally ignore the error > code from this function. Perhaps we ought to just log a pr_warn message > or something if the decoding fails here instead? There already is one: 793 bad: 794 err = -EINVAL; 795 fail: 796 if (realm && !IS_ERR(realm)) 797 ceph_put_snap_realm(mdsc, realm); 798 if (first_realm) 799 ceph_put_snap_realm(mdsc, first_realm); 800 pr_err("update_snap_trace error %d\n", err); 801 return err; Or do you mean specifically the "bad" label? Thanks, Ilya
On Thu, 2021-06-03 at 16:33 +0200, Ilya Dryomov wrote: > On Thu, Jun 3, 2021 at 4:02 PM Jeff Layton <jlayton@kernel.org> wrote: > > > > On Thu, 2021-06-03 at 15:57 +0200, Ilya Dryomov wrote: > > > On Thu, Jun 3, 2021 at 3:39 PM Jeff Layton <jlayton@kernel.org> wrote: > > > > > > > > Currently ceph_update_snap_realm returns -EINVAL when it hits a decoding > > > > error, which is the wrong error code. -EINVAL implies that the user gave > > > > us a bogus argument to a syscall or something similar. -EIO is more > > > > descriptive when we hit a decoding error. > > > > > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > > > --- > > > > fs/ceph/snap.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c > > > > index d07c1c6ac8fb..f8cac2abab3f 100644 > > > > --- a/fs/ceph/snap.c > > > > +++ b/fs/ceph/snap.c > > > > @@ -807,7 +807,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc, > > > > return 0; > > > > > > > > bad: > > > > - err = -EINVAL; > > > > + err = -EIO; > > > > fail: > > > > if (realm && !IS_ERR(realm)) > > > > ceph_put_snap_realm(mdsc, realm); > > > > > > Hi Jeff, > > > > > > Is this error code propagated anywhere important? > > > > > > The vast majority of functions that have something to do with decoding > > > use EINVAL as a default (usually out-of-bounds) error. I agree that it > > > is totally ambiguous, but EIO doesn't seem to be any better to me. If > > > there is a desire to separate these errors, I think we need to pick > > > something much more distinctive. > > > > > > > When I see EINVAL, I automatically wonder what bogus argument I passed > > in somewhere, so I find that particularly deceptive here where the bug > > is either from the MDS or we had some sort of low-level socket handling > > problem. > > > > OTOH, you have a good point. The callers universally ignore the error > > code from this function. Perhaps we ought to just log a pr_warn message > > or something if the decoding fails here instead? > > There already is one: > > 793 bad: > 794 err = -EINVAL; > 795 fail: > 796 if (realm && !IS_ERR(realm)) > 797 ceph_put_snap_realm(mdsc, realm); > 798 if (first_realm) > 799 ceph_put_snap_realm(mdsc, first_realm); > 800 pr_err("update_snap_trace error %d\n", err); > 801 return err; > > Or do you mean specifically the "bad" label? > Well, if we have a distinctive error code there, then we won't need a separate pr_err message or anything. I still think that -EINVAL is not descriptive of the issue though. I suppose if -EIO is too vague, we could use something like -EILSEQ ?
On Thu, Jun 3, 2021 at 4:42 PM Jeff Layton <jlayton@kernel.org> wrote: > > On Thu, 2021-06-03 at 16:33 +0200, Ilya Dryomov wrote: > > On Thu, Jun 3, 2021 at 4:02 PM Jeff Layton <jlayton@kernel.org> wrote: > > > > > > On Thu, 2021-06-03 at 15:57 +0200, Ilya Dryomov wrote: > > > > On Thu, Jun 3, 2021 at 3:39 PM Jeff Layton <jlayton@kernel.org> wrote: > > > > > > > > > > Currently ceph_update_snap_realm returns -EINVAL when it hits a decoding > > > > > error, which is the wrong error code. -EINVAL implies that the user gave > > > > > us a bogus argument to a syscall or something similar. -EIO is more > > > > > descriptive when we hit a decoding error. > > > > > > > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > > > > --- > > > > > fs/ceph/snap.c | 2 +- > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c > > > > > index d07c1c6ac8fb..f8cac2abab3f 100644 > > > > > --- a/fs/ceph/snap.c > > > > > +++ b/fs/ceph/snap.c > > > > > @@ -807,7 +807,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc, > > > > > return 0; > > > > > > > > > > bad: > > > > > - err = -EINVAL; > > > > > + err = -EIO; > > > > > fail: > > > > > if (realm && !IS_ERR(realm)) > > > > > ceph_put_snap_realm(mdsc, realm); > > > > > > > > Hi Jeff, > > > > > > > > Is this error code propagated anywhere important? > > > > > > > > The vast majority of functions that have something to do with decoding > > > > use EINVAL as a default (usually out-of-bounds) error. I agree that it > > > > is totally ambiguous, but EIO doesn't seem to be any better to me. If > > > > there is a desire to separate these errors, I think we need to pick > > > > something much more distinctive. > > > > > > > > > > When I see EINVAL, I automatically wonder what bogus argument I passed > > > in somewhere, so I find that particularly deceptive here where the bug > > > is either from the MDS or we had some sort of low-level socket handling > > > problem. > > > > > > OTOH, you have a good point. The callers universally ignore the error > > > code from this function. Perhaps we ought to just log a pr_warn message > > > or something if the decoding fails here instead? > > > > There already is one: > > > > 793 bad: > > 794 err = -EINVAL; > > 795 fail: > > 796 if (realm && !IS_ERR(realm)) > > 797 ceph_put_snap_realm(mdsc, realm); > > 798 if (first_realm) > > 799 ceph_put_snap_realm(mdsc, first_realm); > > 800 pr_err("update_snap_trace error %d\n", err); > > 801 return err; > > > > Or do you mean specifically the "bad" label? > > > > Well, if we have a distinctive error code there, then we won't need a > separate pr_err message or anything. I still think that -EINVAL is not > descriptive of the issue though. I suppose if -EIO is too vague, we > could use something like -EILSEQ ? In a sense it is an invalid argument because the buffer passed to the decoding function is too short. This is what would lead to EINVAL here and in many other decoding-related functions. EINVAL is the standard error code for "buffer/message too short" in many other APIs. EILSEQ is certainly more distinctive, but I'm not sure it is the "right" error code for this kind of error. Thanks, Ilya
On Thu, 2021-06-03 at 17:19 +0200, Ilya Dryomov wrote: > On Thu, Jun 3, 2021 at 4:42 PM Jeff Layton <jlayton@kernel.org> wrote: > > > > On Thu, 2021-06-03 at 16:33 +0200, Ilya Dryomov wrote: > > > On Thu, Jun 3, 2021 at 4:02 PM Jeff Layton <jlayton@kernel.org> wrote: > > > > > > > > On Thu, 2021-06-03 at 15:57 +0200, Ilya Dryomov wrote: > > > > > On Thu, Jun 3, 2021 at 3:39 PM Jeff Layton <jlayton@kernel.org> wrote: > > > > > > > > > > > > Currently ceph_update_snap_realm returns -EINVAL when it hits a decoding > > > > > > error, which is the wrong error code. -EINVAL implies that the user gave > > > > > > us a bogus argument to a syscall or something similar. -EIO is more > > > > > > descriptive when we hit a decoding error. > > > > > > > > > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > > > > > --- > > > > > > fs/ceph/snap.c | 2 +- > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c > > > > > > index d07c1c6ac8fb..f8cac2abab3f 100644 > > > > > > --- a/fs/ceph/snap.c > > > > > > +++ b/fs/ceph/snap.c > > > > > > @@ -807,7 +807,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc, > > > > > > return 0; > > > > > > > > > > > > bad: > > > > > > - err = -EINVAL; > > > > > > + err = -EIO; > > > > > > fail: > > > > > > if (realm && !IS_ERR(realm)) > > > > > > ceph_put_snap_realm(mdsc, realm); > > > > > > > > > > Hi Jeff, > > > > > > > > > > Is this error code propagated anywhere important? > > > > > > > > > > The vast majority of functions that have something to do with decoding > > > > > use EINVAL as a default (usually out-of-bounds) error. I agree that it > > > > > is totally ambiguous, but EIO doesn't seem to be any better to me. If > > > > > there is a desire to separate these errors, I think we need to pick > > > > > something much more distinctive. > > > > > > > > > > > > > When I see EINVAL, I automatically wonder what bogus argument I passed > > > > in somewhere, so I find that particularly deceptive here where the bug > > > > is either from the MDS or we had some sort of low-level socket handling > > > > problem. > > > > > > > > OTOH, you have a good point. The callers universally ignore the error > > > > code from this function. Perhaps we ought to just log a pr_warn message > > > > or something if the decoding fails here instead? > > > > > > There already is one: > > > > > > 793 bad: > > > 794 err = -EINVAL; > > > 795 fail: > > > 796 if (realm && !IS_ERR(realm)) > > > 797 ceph_put_snap_realm(mdsc, realm); > > > 798 if (first_realm) > > > 799 ceph_put_snap_realm(mdsc, first_realm); > > > 800 pr_err("update_snap_trace error %d\n", err); > > > 801 return err; > > > > > > Or do you mean specifically the "bad" label? > > > > > > > Well, if we have a distinctive error code there, then we won't need a > > separate pr_err message or anything. I still think that -EINVAL is not > > descriptive of the issue though. I suppose if -EIO is too vague, we > > could use something like -EILSEQ ? > > In a sense it is an invalid argument because the buffer passed to the > decoding function is too short. This is what would lead to EINVAL here > and in many other decoding-related functions. > > EINVAL is the standard error code for "buffer/message too short" in > many other APIs. EILSEQ is certainly more distinctive, but I'm not > sure it is the "right" error code for this kind of error. > The issue is that almost everywhere else, decoding routines use -EIO for this. This function is a special snowflake. Why? I don't see any justification for it.
On Thu, Jun 3, 2021 at 5:20 PM Jeff Layton <jlayton@kernel.org> wrote: > > On Thu, 2021-06-03 at 17:19 +0200, Ilya Dryomov wrote: > > On Thu, Jun 3, 2021 at 4:42 PM Jeff Layton <jlayton@kernel.org> wrote: > > > > > > On Thu, 2021-06-03 at 16:33 +0200, Ilya Dryomov wrote: > > > > On Thu, Jun 3, 2021 at 4:02 PM Jeff Layton <jlayton@kernel.org> wrote: > > > > > > > > > > On Thu, 2021-06-03 at 15:57 +0200, Ilya Dryomov wrote: > > > > > > On Thu, Jun 3, 2021 at 3:39 PM Jeff Layton <jlayton@kernel.org> wrote: > > > > > > > > > > > > > > Currently ceph_update_snap_realm returns -EINVAL when it hits a decoding > > > > > > > error, which is the wrong error code. -EINVAL implies that the user gave > > > > > > > us a bogus argument to a syscall or something similar. -EIO is more > > > > > > > descriptive when we hit a decoding error. > > > > > > > > > > > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > > > > > > --- > > > > > > > fs/ceph/snap.c | 2 +- > > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > > > diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c > > > > > > > index d07c1c6ac8fb..f8cac2abab3f 100644 > > > > > > > --- a/fs/ceph/snap.c > > > > > > > +++ b/fs/ceph/snap.c > > > > > > > @@ -807,7 +807,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc, > > > > > > > return 0; > > > > > > > > > > > > > > bad: > > > > > > > - err = -EINVAL; > > > > > > > + err = -EIO; > > > > > > > fail: > > > > > > > if (realm && !IS_ERR(realm)) > > > > > > > ceph_put_snap_realm(mdsc, realm); > > > > > > > > > > > > Hi Jeff, > > > > > > > > > > > > Is this error code propagated anywhere important? > > > > > > > > > > > > The vast majority of functions that have something to do with decoding > > > > > > use EINVAL as a default (usually out-of-bounds) error. I agree that it > > > > > > is totally ambiguous, but EIO doesn't seem to be any better to me. If > > > > > > there is a desire to separate these errors, I think we need to pick > > > > > > something much more distinctive. > > > > > > > > > > > > > > > > When I see EINVAL, I automatically wonder what bogus argument I passed > > > > > in somewhere, so I find that particularly deceptive here where the bug > > > > > is either from the MDS or we had some sort of low-level socket handling > > > > > problem. > > > > > > > > > > OTOH, you have a good point. The callers universally ignore the error > > > > > code from this function. Perhaps we ought to just log a pr_warn message > > > > > or something if the decoding fails here instead? > > > > > > > > There already is one: > > > > > > > > 793 bad: > > > > 794 err = -EINVAL; > > > > 795 fail: > > > > 796 if (realm && !IS_ERR(realm)) > > > > 797 ceph_put_snap_realm(mdsc, realm); > > > > 798 if (first_realm) > > > > 799 ceph_put_snap_realm(mdsc, first_realm); > > > > 800 pr_err("update_snap_trace error %d\n", err); > > > > 801 return err; > > > > > > > > Or do you mean specifically the "bad" label? > > > > > > > > > > Well, if we have a distinctive error code there, then we won't need a > > > separate pr_err message or anything. I still think that -EINVAL is not > > > descriptive of the issue though. I suppose if -EIO is too vague, we > > > could use something like -EILSEQ ? > > > > In a sense it is an invalid argument because the buffer passed to the > > decoding function is too short. This is what would lead to EINVAL here > > and in many other decoding-related functions. > > > > EINVAL is the standard error code for "buffer/message too short" in > > many other APIs. EILSEQ is certainly more distinctive, but I'm not > > sure it is the "right" error code for this kind of error. > > > > The issue is that almost everywhere else, decoding routines use -EIO for > this. This function is a special snowflake. Why? I don't see any > justification for it. Ah, indeed I see that there is precedent for using EIO for this in fs/ceph (although this particular EINVAL goes back to 2009). I just wanted to keep things consistent with libceph and rbd where it is mostly EINVAL. Sorry for the noise. Thanks, Ilya
diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c index d07c1c6ac8fb..f8cac2abab3f 100644 --- a/fs/ceph/snap.c +++ b/fs/ceph/snap.c @@ -807,7 +807,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc, return 0; bad: - err = -EINVAL; + err = -EIO; fail: if (realm && !IS_ERR(realm)) ceph_put_snap_realm(mdsc, realm);
Currently ceph_update_snap_realm returns -EINVAL when it hits a decoding error, which is the wrong error code. -EINVAL implies that the user gave us a bogus argument to a syscall or something similar. -EIO is more descriptive when we hit a decoding error. Signed-off-by: Jeff Layton <jlayton@kernel.org> --- fs/ceph/snap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)