diff mbox series

ceph: decoding error in ceph_update_snap_realm should return -EIO

Message ID 20210603133914.79072-1-jlayton@kernel.org (mailing list archive)
State New, archived
Headers show
Series ceph: decoding error in ceph_update_snap_realm should return -EIO | expand

Commit Message

Jeff Layton June 3, 2021, 1:39 p.m. UTC
Currently ceph_update_snap_realm returns -EINVAL when it hits a decoding
error, which is the wrong error code. -EINVAL implies that the user gave
us a bogus argument to a syscall or something similar. -EIO is more
descriptive when we hit a decoding error.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/snap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Ilya Dryomov June 3, 2021, 1:57 p.m. UTC | #1
On Thu, Jun 3, 2021 at 3:39 PM Jeff Layton <jlayton@kernel.org> wrote:
>
> Currently ceph_update_snap_realm returns -EINVAL when it hits a decoding
> error, which is the wrong error code. -EINVAL implies that the user gave
> us a bogus argument to a syscall or something similar. -EIO is more
> descriptive when we hit a decoding error.
>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  fs/ceph/snap.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
> index d07c1c6ac8fb..f8cac2abab3f 100644
> --- a/fs/ceph/snap.c
> +++ b/fs/ceph/snap.c
> @@ -807,7 +807,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
>         return 0;
>
>  bad:
> -       err = -EINVAL;
> +       err = -EIO;
>  fail:
>         if (realm && !IS_ERR(realm))
>                 ceph_put_snap_realm(mdsc, realm);

Hi Jeff,

Is this error code propagated anywhere important?

The vast majority of functions that have something to do with decoding
use EINVAL as a default (usually out-of-bounds) error.  I agree that it
is totally ambiguous, but EIO doesn't seem to be any better to me.  If
there is a desire to separate these errors, I think we need to pick
something much more distinctive.

Thanks,

                Ilya
Jeff Layton June 3, 2021, 2:02 p.m. UTC | #2
On Thu, 2021-06-03 at 15:57 +0200, Ilya Dryomov wrote:
> On Thu, Jun 3, 2021 at 3:39 PM Jeff Layton <jlayton@kernel.org> wrote:
> > 
> > Currently ceph_update_snap_realm returns -EINVAL when it hits a decoding
> > error, which is the wrong error code. -EINVAL implies that the user gave
> > us a bogus argument to a syscall or something similar. -EIO is more
> > descriptive when we hit a decoding error.
> > 
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> >  fs/ceph/snap.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
> > index d07c1c6ac8fb..f8cac2abab3f 100644
> > --- a/fs/ceph/snap.c
> > +++ b/fs/ceph/snap.c
> > @@ -807,7 +807,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
> >         return 0;
> > 
> >  bad:
> > -       err = -EINVAL;
> > +       err = -EIO;
> >  fail:
> >         if (realm && !IS_ERR(realm))
> >                 ceph_put_snap_realm(mdsc, realm);
> 
> Hi Jeff,
> 
> Is this error code propagated anywhere important?
> 
> The vast majority of functions that have something to do with decoding
> use EINVAL as a default (usually out-of-bounds) error.  I agree that it
> is totally ambiguous, but EIO doesn't seem to be any better to me.  If
> there is a desire to separate these errors, I think we need to pick
> something much more distinctive.
> 

When I see EINVAL, I automatically wonder what bogus argument I passed
in somewhere, so I find that particularly deceptive here where the bug
is either from the MDS or we had some sort of low-level socket handling
problem.

OTOH, you have a good point. The callers universally ignore the error
code from this function. Perhaps we ought to just log a pr_warn message
or something if the decoding fails here instead?
Ilya Dryomov June 3, 2021, 2:33 p.m. UTC | #3
On Thu, Jun 3, 2021 at 4:02 PM Jeff Layton <jlayton@kernel.org> wrote:
>
> On Thu, 2021-06-03 at 15:57 +0200, Ilya Dryomov wrote:
> > On Thu, Jun 3, 2021 at 3:39 PM Jeff Layton <jlayton@kernel.org> wrote:
> > >
> > > Currently ceph_update_snap_realm returns -EINVAL when it hits a decoding
> > > error, which is the wrong error code. -EINVAL implies that the user gave
> > > us a bogus argument to a syscall or something similar. -EIO is more
> > > descriptive when we hit a decoding error.
> > >
> > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > ---
> > >  fs/ceph/snap.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
> > > index d07c1c6ac8fb..f8cac2abab3f 100644
> > > --- a/fs/ceph/snap.c
> > > +++ b/fs/ceph/snap.c
> > > @@ -807,7 +807,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
> > >         return 0;
> > >
> > >  bad:
> > > -       err = -EINVAL;
> > > +       err = -EIO;
> > >  fail:
> > >         if (realm && !IS_ERR(realm))
> > >                 ceph_put_snap_realm(mdsc, realm);
> >
> > Hi Jeff,
> >
> > Is this error code propagated anywhere important?
> >
> > The vast majority of functions that have something to do with decoding
> > use EINVAL as a default (usually out-of-bounds) error.  I agree that it
> > is totally ambiguous, but EIO doesn't seem to be any better to me.  If
> > there is a desire to separate these errors, I think we need to pick
> > something much more distinctive.
> >
>
> When I see EINVAL, I automatically wonder what bogus argument I passed
> in somewhere, so I find that particularly deceptive here where the bug
> is either from the MDS or we had some sort of low-level socket handling
> problem.
>
> OTOH, you have a good point. The callers universally ignore the error
> code from this function. Perhaps we ought to just log a pr_warn message
> or something if the decoding fails here instead?

There already is one:

 793 bad:
 794         err = -EINVAL;
 795 fail:
 796         if (realm && !IS_ERR(realm))
 797                 ceph_put_snap_realm(mdsc, realm);
 798         if (first_realm)
 799                 ceph_put_snap_realm(mdsc, first_realm);
 800         pr_err("update_snap_trace error %d\n", err);
 801         return err;

Or do you mean specifically the "bad" label?

Thanks,

                Ilya
Jeff Layton June 3, 2021, 2:42 p.m. UTC | #4
On Thu, 2021-06-03 at 16:33 +0200, Ilya Dryomov wrote:
> On Thu, Jun 3, 2021 at 4:02 PM Jeff Layton <jlayton@kernel.org> wrote:
> > 
> > On Thu, 2021-06-03 at 15:57 +0200, Ilya Dryomov wrote:
> > > On Thu, Jun 3, 2021 at 3:39 PM Jeff Layton <jlayton@kernel.org> wrote:
> > > > 
> > > > Currently ceph_update_snap_realm returns -EINVAL when it hits a decoding
> > > > error, which is the wrong error code. -EINVAL implies that the user gave
> > > > us a bogus argument to a syscall or something similar. -EIO is more
> > > > descriptive when we hit a decoding error.
> > > > 
> > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > ---
> > > >  fs/ceph/snap.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
> > > > index d07c1c6ac8fb..f8cac2abab3f 100644
> > > > --- a/fs/ceph/snap.c
> > > > +++ b/fs/ceph/snap.c
> > > > @@ -807,7 +807,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
> > > >         return 0;
> > > > 
> > > >  bad:
> > > > -       err = -EINVAL;
> > > > +       err = -EIO;
> > > >  fail:
> > > >         if (realm && !IS_ERR(realm))
> > > >                 ceph_put_snap_realm(mdsc, realm);
> > > 
> > > Hi Jeff,
> > > 
> > > Is this error code propagated anywhere important?
> > > 
> > > The vast majority of functions that have something to do with decoding
> > > use EINVAL as a default (usually out-of-bounds) error.  I agree that it
> > > is totally ambiguous, but EIO doesn't seem to be any better to me.  If
> > > there is a desire to separate these errors, I think we need to pick
> > > something much more distinctive.
> > > 
> > 
> > When I see EINVAL, I automatically wonder what bogus argument I passed
> > in somewhere, so I find that particularly deceptive here where the bug
> > is either from the MDS or we had some sort of low-level socket handling
> > problem.
> > 
> > OTOH, you have a good point. The callers universally ignore the error
> > code from this function. Perhaps we ought to just log a pr_warn message
> > or something if the decoding fails here instead?
> 
> There already is one:
> 
>  793 bad:
>  794         err = -EINVAL;
>  795 fail:
>  796         if (realm && !IS_ERR(realm))
>  797                 ceph_put_snap_realm(mdsc, realm);
>  798         if (first_realm)
>  799                 ceph_put_snap_realm(mdsc, first_realm);
>  800         pr_err("update_snap_trace error %d\n", err);
>  801         return err;
> 
> Or do you mean specifically the "bad" label?
> 

Well, if we have a distinctive error code there, then we won't need a
separate pr_err message or anything. I still think that -EINVAL is not
descriptive of the issue though. I suppose if -EIO is too vague, we
could use something like -EILSEQ ?
Ilya Dryomov June 3, 2021, 3:19 p.m. UTC | #5
On Thu, Jun 3, 2021 at 4:42 PM Jeff Layton <jlayton@kernel.org> wrote:
>
> On Thu, 2021-06-03 at 16:33 +0200, Ilya Dryomov wrote:
> > On Thu, Jun 3, 2021 at 4:02 PM Jeff Layton <jlayton@kernel.org> wrote:
> > >
> > > On Thu, 2021-06-03 at 15:57 +0200, Ilya Dryomov wrote:
> > > > On Thu, Jun 3, 2021 at 3:39 PM Jeff Layton <jlayton@kernel.org> wrote:
> > > > >
> > > > > Currently ceph_update_snap_realm returns -EINVAL when it hits a decoding
> > > > > error, which is the wrong error code. -EINVAL implies that the user gave
> > > > > us a bogus argument to a syscall or something similar. -EIO is more
> > > > > descriptive when we hit a decoding error.
> > > > >
> > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > > ---
> > > > >  fs/ceph/snap.c | 2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
> > > > > index d07c1c6ac8fb..f8cac2abab3f 100644
> > > > > --- a/fs/ceph/snap.c
> > > > > +++ b/fs/ceph/snap.c
> > > > > @@ -807,7 +807,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
> > > > >         return 0;
> > > > >
> > > > >  bad:
> > > > > -       err = -EINVAL;
> > > > > +       err = -EIO;
> > > > >  fail:
> > > > >         if (realm && !IS_ERR(realm))
> > > > >                 ceph_put_snap_realm(mdsc, realm);
> > > >
> > > > Hi Jeff,
> > > >
> > > > Is this error code propagated anywhere important?
> > > >
> > > > The vast majority of functions that have something to do with decoding
> > > > use EINVAL as a default (usually out-of-bounds) error.  I agree that it
> > > > is totally ambiguous, but EIO doesn't seem to be any better to me.  If
> > > > there is a desire to separate these errors, I think we need to pick
> > > > something much more distinctive.
> > > >
> > >
> > > When I see EINVAL, I automatically wonder what bogus argument I passed
> > > in somewhere, so I find that particularly deceptive here where the bug
> > > is either from the MDS or we had some sort of low-level socket handling
> > > problem.
> > >
> > > OTOH, you have a good point. The callers universally ignore the error
> > > code from this function. Perhaps we ought to just log a pr_warn message
> > > or something if the decoding fails here instead?
> >
> > There already is one:
> >
> >  793 bad:
> >  794         err = -EINVAL;
> >  795 fail:
> >  796         if (realm && !IS_ERR(realm))
> >  797                 ceph_put_snap_realm(mdsc, realm);
> >  798         if (first_realm)
> >  799                 ceph_put_snap_realm(mdsc, first_realm);
> >  800         pr_err("update_snap_trace error %d\n", err);
> >  801         return err;
> >
> > Or do you mean specifically the "bad" label?
> >
>
> Well, if we have a distinctive error code there, then we won't need a
> separate pr_err message or anything. I still think that -EINVAL is not
> descriptive of the issue though. I suppose if -EIO is too vague, we
> could use something like -EILSEQ ?

In a sense it is an invalid argument because the buffer passed to the
decoding function is too short.  This is what would lead to EINVAL here
and in many other decoding-related functions.

EINVAL is the standard error code for "buffer/message too short" in
many other APIs.  EILSEQ is certainly more distinctive, but I'm not
sure it is the "right" error code for this kind of error.

Thanks,

                Ilya
Jeff Layton June 3, 2021, 3:20 p.m. UTC | #6
On Thu, 2021-06-03 at 17:19 +0200, Ilya Dryomov wrote:
> On Thu, Jun 3, 2021 at 4:42 PM Jeff Layton <jlayton@kernel.org> wrote:
> > 
> > On Thu, 2021-06-03 at 16:33 +0200, Ilya Dryomov wrote:
> > > On Thu, Jun 3, 2021 at 4:02 PM Jeff Layton <jlayton@kernel.org> wrote:
> > > > 
> > > > On Thu, 2021-06-03 at 15:57 +0200, Ilya Dryomov wrote:
> > > > > On Thu, Jun 3, 2021 at 3:39 PM Jeff Layton <jlayton@kernel.org> wrote:
> > > > > > 
> > > > > > Currently ceph_update_snap_realm returns -EINVAL when it hits a decoding
> > > > > > error, which is the wrong error code. -EINVAL implies that the user gave
> > > > > > us a bogus argument to a syscall or something similar. -EIO is more
> > > > > > descriptive when we hit a decoding error.
> > > > > > 
> > > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > > > ---
> > > > > >  fs/ceph/snap.c | 2 +-
> > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
> > > > > > index d07c1c6ac8fb..f8cac2abab3f 100644
> > > > > > --- a/fs/ceph/snap.c
> > > > > > +++ b/fs/ceph/snap.c
> > > > > > @@ -807,7 +807,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
> > > > > >         return 0;
> > > > > > 
> > > > > >  bad:
> > > > > > -       err = -EINVAL;
> > > > > > +       err = -EIO;
> > > > > >  fail:
> > > > > >         if (realm && !IS_ERR(realm))
> > > > > >                 ceph_put_snap_realm(mdsc, realm);
> > > > > 
> > > > > Hi Jeff,
> > > > > 
> > > > > Is this error code propagated anywhere important?
> > > > > 
> > > > > The vast majority of functions that have something to do with decoding
> > > > > use EINVAL as a default (usually out-of-bounds) error.  I agree that it
> > > > > is totally ambiguous, but EIO doesn't seem to be any better to me.  If
> > > > > there is a desire to separate these errors, I think we need to pick
> > > > > something much more distinctive.
> > > > > 
> > > > 
> > > > When I see EINVAL, I automatically wonder what bogus argument I passed
> > > > in somewhere, so I find that particularly deceptive here where the bug
> > > > is either from the MDS or we had some sort of low-level socket handling
> > > > problem.
> > > > 
> > > > OTOH, you have a good point. The callers universally ignore the error
> > > > code from this function. Perhaps we ought to just log a pr_warn message
> > > > or something if the decoding fails here instead?
> > > 
> > > There already is one:
> > > 
> > >  793 bad:
> > >  794         err = -EINVAL;
> > >  795 fail:
> > >  796         if (realm && !IS_ERR(realm))
> > >  797                 ceph_put_snap_realm(mdsc, realm);
> > >  798         if (first_realm)
> > >  799                 ceph_put_snap_realm(mdsc, first_realm);
> > >  800         pr_err("update_snap_trace error %d\n", err);
> > >  801         return err;
> > > 
> > > Or do you mean specifically the "bad" label?
> > > 
> > 
> > Well, if we have a distinctive error code there, then we won't need a
> > separate pr_err message or anything. I still think that -EINVAL is not
> > descriptive of the issue though. I suppose if -EIO is too vague, we
> > could use something like -EILSEQ ?
> 
> In a sense it is an invalid argument because the buffer passed to the
> decoding function is too short.  This is what would lead to EINVAL here
> and in many other decoding-related functions.
> 
> EINVAL is the standard error code for "buffer/message too short" in
> many other APIs.  EILSEQ is certainly more distinctive, but I'm not
> sure it is the "right" error code for this kind of error.
> 

The issue is that almost everywhere else, decoding routines use -EIO for
this. This function is a special snowflake. Why? I don't see any
justification for it.
Ilya Dryomov June 3, 2021, 3:37 p.m. UTC | #7
On Thu, Jun 3, 2021 at 5:20 PM Jeff Layton <jlayton@kernel.org> wrote:
>
> On Thu, 2021-06-03 at 17:19 +0200, Ilya Dryomov wrote:
> > On Thu, Jun 3, 2021 at 4:42 PM Jeff Layton <jlayton@kernel.org> wrote:
> > >
> > > On Thu, 2021-06-03 at 16:33 +0200, Ilya Dryomov wrote:
> > > > On Thu, Jun 3, 2021 at 4:02 PM Jeff Layton <jlayton@kernel.org> wrote:
> > > > >
> > > > > On Thu, 2021-06-03 at 15:57 +0200, Ilya Dryomov wrote:
> > > > > > On Thu, Jun 3, 2021 at 3:39 PM Jeff Layton <jlayton@kernel.org> wrote:
> > > > > > >
> > > > > > > Currently ceph_update_snap_realm returns -EINVAL when it hits a decoding
> > > > > > > error, which is the wrong error code. -EINVAL implies that the user gave
> > > > > > > us a bogus argument to a syscall or something similar. -EIO is more
> > > > > > > descriptive when we hit a decoding error.
> > > > > > >
> > > > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > > > > ---
> > > > > > >  fs/ceph/snap.c | 2 +-
> > > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > >
> > > > > > > diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
> > > > > > > index d07c1c6ac8fb..f8cac2abab3f 100644
> > > > > > > --- a/fs/ceph/snap.c
> > > > > > > +++ b/fs/ceph/snap.c
> > > > > > > @@ -807,7 +807,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
> > > > > > >         return 0;
> > > > > > >
> > > > > > >  bad:
> > > > > > > -       err = -EINVAL;
> > > > > > > +       err = -EIO;
> > > > > > >  fail:
> > > > > > >         if (realm && !IS_ERR(realm))
> > > > > > >                 ceph_put_snap_realm(mdsc, realm);
> > > > > >
> > > > > > Hi Jeff,
> > > > > >
> > > > > > Is this error code propagated anywhere important?
> > > > > >
> > > > > > The vast majority of functions that have something to do with decoding
> > > > > > use EINVAL as a default (usually out-of-bounds) error.  I agree that it
> > > > > > is totally ambiguous, but EIO doesn't seem to be any better to me.  If
> > > > > > there is a desire to separate these errors, I think we need to pick
> > > > > > something much more distinctive.
> > > > > >
> > > > >
> > > > > When I see EINVAL, I automatically wonder what bogus argument I passed
> > > > > in somewhere, so I find that particularly deceptive here where the bug
> > > > > is either from the MDS or we had some sort of low-level socket handling
> > > > > problem.
> > > > >
> > > > > OTOH, you have a good point. The callers universally ignore the error
> > > > > code from this function. Perhaps we ought to just log a pr_warn message
> > > > > or something if the decoding fails here instead?
> > > >
> > > > There already is one:
> > > >
> > > >  793 bad:
> > > >  794         err = -EINVAL;
> > > >  795 fail:
> > > >  796         if (realm && !IS_ERR(realm))
> > > >  797                 ceph_put_snap_realm(mdsc, realm);
> > > >  798         if (first_realm)
> > > >  799                 ceph_put_snap_realm(mdsc, first_realm);
> > > >  800         pr_err("update_snap_trace error %d\n", err);
> > > >  801         return err;
> > > >
> > > > Or do you mean specifically the "bad" label?
> > > >
> > >
> > > Well, if we have a distinctive error code there, then we won't need a
> > > separate pr_err message or anything. I still think that -EINVAL is not
> > > descriptive of the issue though. I suppose if -EIO is too vague, we
> > > could use something like -EILSEQ ?
> >
> > In a sense it is an invalid argument because the buffer passed to the
> > decoding function is too short.  This is what would lead to EINVAL here
> > and in many other decoding-related functions.
> >
> > EINVAL is the standard error code for "buffer/message too short" in
> > many other APIs.  EILSEQ is certainly more distinctive, but I'm not
> > sure it is the "right" error code for this kind of error.
> >
>
> The issue is that almost everywhere else, decoding routines use -EIO for
> this. This function is a special snowflake. Why? I don't see any
> justification for it.

Ah, indeed I see that there is precedent for using EIO for this in
fs/ceph (although this particular EINVAL goes back to 2009).  I just
wanted to keep things consistent with libceph and rbd where it is
mostly EINVAL.

Sorry for the noise.

Thanks,

                Ilya
diff mbox series

Patch

diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
index d07c1c6ac8fb..f8cac2abab3f 100644
--- a/fs/ceph/snap.c
+++ b/fs/ceph/snap.c
@@ -807,7 +807,7 @@  int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
 	return 0;
 
 bad:
-	err = -EINVAL;
+	err = -EIO;
 fail:
 	if (realm && !IS_ERR(realm))
 		ceph_put_snap_realm(mdsc, realm);