diff mbox

Make __xfs_xattr_put_listen preperly report errors.

Message ID 1471967653-2561-1-git-send-email-asavkov@redhat.com (mailing list archive)
State Accepted, archived
Headers show

Commit Message

Artem Savkov Aug. 23, 2016, 3:54 p.m. UTC
Commit "xfs: only return -errno or success from attr ->put_listent" changes the
returnvalue of __xfs_xattr_put_listen to 0 in case when there is insufficient
space in the buffer assuming that setting context->count to -1 would be enough,
but all of the ->put_listent callers only check seen_enough. This results in
a failed assertion:
XFS: Assertion failed: context->count >= 0, file: fs/xfs/xfs_xattr.c, line: 175
in insufficient buffer size case.

Signed-off-by: Artem Savkov <asavkov@redhat.com>
---
 fs/xfs/xfs_xattr.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Dave Chinner Aug. 24, 2016, 1:55 a.m. UTC | #1
On Tue, Aug 23, 2016 at 05:54:13PM +0200, Artem Savkov wrote:
> Commit "xfs: only return -errno or success from attr ->put_listent" changes the

Please quote commits in --oneline format in changelogs - it makes it
much easier to find the change you are refering to if there is both
a commit ID and the text string in the commit message. (i.e. text
string confirms the commit id is the one you meant to quote).

commit 2a6fba6 ("xfs: only return -errno or success from attr
->put_listent") is the one you are refering to here, right?

> returnvalue of __xfs_xattr_put_listen to 0 in case when there is insufficient
> space in the buffer assuming that setting context->count to -1 would be enough,
> but all of the ->put_listent callers only check seen_enough. This results in
> a failed assertion:
> XFS: Assertion failed: context->count >= 0, file: fs/xfs/xfs_xattr.c, line: 175
> in insufficient buffer size case.

You have a test case? Can you turn it into an xfstest? We really
need regression tests that cover issues like this....

> Signed-off-by: Artem Savkov <asavkov@redhat.com>
> ---
>  fs/xfs/xfs_xattr.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
> index ea62245..6290093 100644
> --- a/fs/xfs/xfs_xattr.c
> +++ b/fs/xfs/xfs_xattr.c
> @@ -147,6 +147,7 @@ __xfs_xattr_put_listent(
>  	arraytop = context->count + prefix_len + namelen + 1;
>  	if (arraytop > context->firstu) {
>  		context->count = -1;	/* insufficient space */
> +		context->seen_enough = 1;
>  		return 0;
>  	}
>  	offset = (char *)context->alist + context->count;

Looks sane, though I don't know how to test it yet....

Cheers,

Dave.
Artem Savkov Aug. 24, 2016, 8:08 a.m. UTC | #2
On Wed, Aug 24, 2016 at 11:55:51AM +1000, Dave Chinner wrote:
> On Tue, Aug 23, 2016 at 05:54:13PM +0200, Artem Savkov wrote:
> > Commit "xfs: only return -errno or success from attr ->put_listent" changes the
> 
> Please quote commits in --oneline format in changelogs - it makes it
> much easier to find the change you are refering to if there is both
> a commit ID and the text string in the commit message. (i.e. text
> string confirms the commit id is the one you meant to quote).

Noted, thanks.

> 
> commit 2a6fba6 ("xfs: only return -errno or success from attr
> ->put_listent") is the one you are refering to here, right?

Yes, that is the one.

> > returnvalue of __xfs_xattr_put_listen to 0 in case when there is insufficient
> > space in the buffer assuming that setting context->count to -1 would be enough,
> > but all of the ->put_listent callers only check seen_enough. This results in
> > a failed assertion:
> > XFS: Assertion failed: context->count >= 0, file: fs/xfs/xfs_xattr.c, line: 175
> > in insufficient buffer size case.
> 
> You have a test case? Can you turn it into an xfstest? We really
> need regression tests that cover issues like this....
> 

llistxattr02 test from LTP reliably hits this, I'll see how this can be
ported to xfstest.

> > Signed-off-by: Artem Savkov <asavkov@redhat.com>
> > ---
> >  fs/xfs/xfs_xattr.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
> > index ea62245..6290093 100644
> > --- a/fs/xfs/xfs_xattr.c
> > +++ b/fs/xfs/xfs_xattr.c
> > @@ -147,6 +147,7 @@ __xfs_xattr_put_listent(
> >  	arraytop = context->count + prefix_len + namelen + 1;
> >  	if (arraytop > context->firstu) {
> >  		context->count = -1;	/* insufficient space */
> > +		context->seen_enough = 1;
> >  		return 0;
> >  	}
> >  	offset = (char *)context->alist + context->count;
> 
> Looks sane, though I don't know how to test it yet....
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
Dave Chinner Aug. 25, 2016, 12:24 a.m. UTC | #3
On Wed, Aug 24, 2016 at 10:08:33AM +0200, Artem Savkov wrote:
> On Wed, Aug 24, 2016 at 11:55:51AM +1000, Dave Chinner wrote:
> > On Tue, Aug 23, 2016 at 05:54:13PM +0200, Artem Savkov wrote:
> > > Commit "xfs: only return -errno or success from attr ->put_listent" changes the
> > 
> > Please quote commits in --oneline format in changelogs - it makes it
> > much easier to find the change you are refering to if there is both
> > a commit ID and the text string in the commit message. (i.e. text
> > string confirms the commit id is the one you meant to quote).
> 
> Noted, thanks.
> 
> > 
> > commit 2a6fba6 ("xfs: only return -errno or success from attr
> > ->put_listent") is the one you are refering to here, right?
> 
> Yes, that is the one.
> 
> > > returnvalue of __xfs_xattr_put_listen to 0 in case when there is insufficient
> > > space in the buffer assuming that setting context->count to -1 would be enough,
> > > but all of the ->put_listent callers only check seen_enough. This results in
> > > a failed assertion:
> > > XFS: Assertion failed: context->count >= 0, file: fs/xfs/xfs_xattr.c, line: 175
> > > in insufficient buffer size case.
> > 
> > You have a test case? Can you turn it into an xfstest? We really
> > need regression tests that cover issues like this....
> > 
> 
> llistxattr02 test from LTP reliably hits this, I'll see how this can be
> ported to xfstest.

So, after battling the obtuse, completely useless ltp install
documentation and having to resort to reverse engineering a working
configuration using strace, I finally got this running, I think, on
an XFS filesystem:

/mnt/scratch/ltp$ sudo ./runltp -b /dev/pmem1 -B xfs -z /dev/vdc -Z xfs -q -p -s llistxattr -I 100
....
Summary:
passed   1
failed   0
skipped  0
warnings 0
tst_test.c:756: INFO: Timeout per run is 0h 05m 00s
llistxattr02.c:76: PASS: llistxattr() failed as expected: ERANGE
llistxattr02.c:76: PASS: llistxattr() failed as expected: ENOENT
llistxattr02.c:76: PASS: llistxattr() failed as expected: EFAULT
....

And it doesn't fail. strace output:

24833 lsetxattr("symlink", "security.ltptest", "test", 4, XATTR_CREATE) = 0
24833 llistxattr("symlink", 0x7ffe312356b0, 1) = -1 ERANGE (Numerical result out of range)
24833 llistxattr("", 0x7ffe312356a0, 20) = -1 ENOENT (No such file or directory)
24833 llistxattr(0xffffffffffffffff, 0x7ffe312356a0, 20) = -1 EFAULT (Bad address)

I'm assuming from your description that it is the first one of these
that fails for you as it is the "buffer too small" test case. So,
not as obvious as it first seems - ltp doesn't appear to be a
reliable reproducer of the problem, so we are going to need a custom
test to exercise it....

Cheers,

Dave.
Artem Savkov Aug. 25, 2016, 8:21 a.m. UTC | #4
On Thu, Aug 25, 2016 at 10:24:08AM +1000, Dave Chinner wrote:
> On Wed, Aug 24, 2016 at 10:08:33AM +0200, Artem Savkov wrote:
> > On Wed, Aug 24, 2016 at 11:55:51AM +1000, Dave Chinner wrote:
> > > On Tue, Aug 23, 2016 at 05:54:13PM +0200, Artem Savkov wrote:
> > > > Commit "xfs: only return -errno or success from attr ->put_listent" changes the
> > > 
> > > Please quote commits in --oneline format in changelogs - it makes it
> > > much easier to find the change you are refering to if there is both
> > > a commit ID and the text string in the commit message. (i.e. text
> > > string confirms the commit id is the one you meant to quote).
> > 
> > Noted, thanks.
> > 
> > > 
> > > commit 2a6fba6 ("xfs: only return -errno or success from attr
> > > ->put_listent") is the one you are refering to here, right?
> > 
> > Yes, that is the one.
> > 
> > > > returnvalue of __xfs_xattr_put_listen to 0 in case when there is insufficient
> > > > space in the buffer assuming that setting context->count to -1 would be enough,
> > > > but all of the ->put_listent callers only check seen_enough. This results in
> > > > a failed assertion:
> > > > XFS: Assertion failed: context->count >= 0, file: fs/xfs/xfs_xattr.c, line: 175
> > > > in insufficient buffer size case.
> > > 
> > > You have a test case? Can you turn it into an xfstest? We really
> > > need regression tests that cover issues like this....
> > > 
> > 
> > llistxattr02 test from LTP reliably hits this, I'll see how this can be
> > ported to xfstest.
> 
> So, after battling the obtuse, completely useless ltp install
> documentation and having to resort to reverse engineering a working
> configuration using strace, I finally got this running, I think, on
> an XFS filesystem:
> 
> /mnt/scratch/ltp$ sudo ./runltp -b /dev/pmem1 -B xfs -z /dev/vdc -Z xfs -q -p -s llistxattr -I 100
> ....
> Summary:
> passed   1
> failed   0
> skipped  0
> warnings 0
> tst_test.c:756: INFO: Timeout per run is 0h 05m 00s
> llistxattr02.c:76: PASS: llistxattr() failed as expected: ERANGE
> llistxattr02.c:76: PASS: llistxattr() failed as expected: ENOENT
> llistxattr02.c:76: PASS: llistxattr() failed as expected: EFAULT
> ....
> 
> And it doesn't fail. strace output:
> 
> 24833 lsetxattr("symlink", "security.ltptest", "test", 4, XATTR_CREATE) = 0
> 24833 llistxattr("symlink", 0x7ffe312356b0, 1) = -1 ERANGE (Numerical result out of range)
> 24833 llistxattr("", 0x7ffe312356a0, 20) = -1 ENOENT (No such file or directory)
> 24833 llistxattr(0xffffffffffffffff, 0x7ffe312356a0, 20) = -1 EFAULT (Bad address)
> 
> I'm assuming from your description that it is the first one of these
> that fails for you as it is the "buffer too small" test case. So,
> not as obvious as it first seems - ltp doesn't appear to be a
> reliable reproducer of the problem, so we are going to need a custom
> test to exercise it....

LTP doesn't check dmesg for warnings on it's own, so it is ok for the test
to be marked as "PASSED" since we get ERANGE after all. But you should
be able to see the warnings in dmesg.
I'm attaching the config I'm using. With this config I can repdroduce the
issue with both 4.8-rc3 and next-20160825.
Eric Sandeen Aug. 25, 2016, 9:36 p.m. UTC | #5
On 8/23/16 10:54 AM, Artem Savkov wrote:
> Commit "xfs: only return -errno or success from attr ->put_listent" changes the
> returnvalue of __xfs_xattr_put_listen to 0 in case when there is insufficient
> space in the buffer assuming that setting context->count to -1 would be enough,
> but all of the ->put_listent callers only check seen_enough. This results in
> a failed assertion:
> XFS: Assertion failed: context->count >= 0, file: fs/xfs/xfs_xattr.c, line: 175
> in insufficient buffer size case.

Oof.  I remember spending a lot of time trying to track down all
of these cases.  It seems overly complex and obscure.  :(

I'll try to find time to dig through it again; offhand it doesn't
seem like "count = -1" means anything to anyone except in an
ASSERT check?

Thanks,
-Eric


> Signed-off-by: Artem Savkov <asavkov@redhat.com>
> ---
>  fs/xfs/xfs_xattr.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
> index ea62245..6290093 100644
> --- a/fs/xfs/xfs_xattr.c
> +++ b/fs/xfs/xfs_xattr.c
> @@ -147,6 +147,7 @@ __xfs_xattr_put_listent(
>  	arraytop = context->count + prefix_len + namelen + 1;
>  	if (arraytop > context->firstu) {
>  		context->count = -1;	/* insufficient space */
> +		context->seen_enough = 1;
>  		return 0;
>  	}
>  	offset = (char *)context->alist + context->count;
>
Dave Chinner Aug. 25, 2016, 10:42 p.m. UTC | #6
On Thu, Aug 25, 2016 at 10:21:09AM +0200, Artem Savkov wrote:
> On Thu, Aug 25, 2016 at 10:24:08AM +1000, Dave Chinner wrote:
> > On Wed, Aug 24, 2016 at 10:08:33AM +0200, Artem Savkov wrote:
> > > On Wed, Aug 24, 2016 at 11:55:51AM +1000, Dave Chinner wrote:
> > > > On Tue, Aug 23, 2016 at 05:54:13PM +0200, Artem Savkov wrote:
> > > > > Commit "xfs: only return -errno or success from attr ->put_listent" changes the
> > > > 
> > > > Please quote commits in --oneline format in changelogs - it makes it
> > > > much easier to find the change you are refering to if there is both
> > > > a commit ID and the text string in the commit message. (i.e. text
> > > > string confirms the commit id is the one you meant to quote).
> > > 
> > > Noted, thanks.
> > > 
> > > > 
> > > > commit 2a6fba6 ("xfs: only return -errno or success from attr
> > > > ->put_listent") is the one you are refering to here, right?
> > > 
> > > Yes, that is the one.
> > > 
> > > > > returnvalue of __xfs_xattr_put_listen to 0 in case when there is insufficient
> > > > > space in the buffer assuming that setting context->count to -1 would be enough,
> > > > > but all of the ->put_listent callers only check seen_enough. This results in
> > > > > a failed assertion:
> > > > > XFS: Assertion failed: context->count >= 0, file: fs/xfs/xfs_xattr.c, line: 175
> > > > > in insufficient buffer size case.
> > > > 
> > > > You have a test case? Can you turn it into an xfstest? We really
> > > > need regression tests that cover issues like this....
> > > > 
> > > 
> > > llistxattr02 test from LTP reliably hits this, I'll see how this can be
> > > ported to xfstest.
> > 
> > So, after battling the obtuse, completely useless ltp install
> > documentation and having to resort to reverse engineering a working
> > configuration using strace, I finally got this running, I think, on
> > an XFS filesystem:
> > 
> > /mnt/scratch/ltp$ sudo ./runltp -b /dev/pmem1 -B xfs -z /dev/vdc -Z xfs -q -p -s llistxattr -I 100
> > ....
> > Summary:
> > passed   1
> > failed   0
> > skipped  0
> > warnings 0
> > tst_test.c:756: INFO: Timeout per run is 0h 05m 00s
> > llistxattr02.c:76: PASS: llistxattr() failed as expected: ERANGE
> > llistxattr02.c:76: PASS: llistxattr() failed as expected: ENOENT
> > llistxattr02.c:76: PASS: llistxattr() failed as expected: EFAULT
> > ....
> > 
> > And it doesn't fail. strace output:
> > 
> > 24833 lsetxattr("symlink", "security.ltptest", "test", 4, XATTR_CREATE) = 0
> > 24833 llistxattr("symlink", 0x7ffe312356b0, 1) = -1 ERANGE (Numerical result out of range)
> > 24833 llistxattr("", 0x7ffe312356a0, 20) = -1 ENOENT (No such file or directory)
> > 24833 llistxattr(0xffffffffffffffff, 0x7ffe312356a0, 20) = -1 EFAULT (Bad address)
> > 
> > I'm assuming from your description that it is the first one of these
> > that fails for you as it is the "buffer too small" test case. So,
> > not as obvious as it first seems - ltp doesn't appear to be a
> > reliable reproducer of the problem, so we are going to need a custom
> > test to exercise it....
> 
> LTP doesn't check dmesg for warnings on it's own, so it is ok for the test
> to be marked as "PASSED" since we get ERANGE after all. But you should
> be able to see the warnings in dmesg.

Of course. My systems are set up so that any critical message that
comes through dmesg is wall'd to all active sessions, so it doesn't
matter that xfstests or ltp doesn't capture the error, I know it's
happened.

Besides, I always use CONFIG_XFS_DEBUG=y, which means these things
BUG() rather than just print a warning, and that tends to be fatal
in more ways than one...

> I'm attaching the config I'm using. With this config I can repdroduce the
> issue with both 4.8-rc3 and next-20160825.

What I suggest you do is try to recreate the problem manually. I'm
pretty certain I know what is different between our test
environments that is resulting in me not seeing the problem
but, really, we need more people with root cause analysis skills
around here so I don't have to spend time solving every problem that
is reported.

That is, it's all well and good to send a patch to fix a problem,
but if we don't understand the root cause of the problem being
fixed, then we have no idea whether we've fixed the problem fully or
not.  Part of understanding the root cause is finding a reliable
reproducer of the problem, part of it is reading and understanding
the code being fixed, and part of it is understanding the
environment and behaviour that demonstrates the problem.

So when I look at the fix, and see that it doesn't reproduce on my
systems, it's clear that it's either not yet fully understood or
hasn't been fully explained by the person who understands the issue.
These are some of the questions I've asked myself to understand why
we are seeing what we've been seeing:

	- what condition in the unfixed code leads to the ASSERT
	  being tripped?
	- how does the patch prevent that from occurring?
	- at what threshold does the problem trigger (i.e. n=0, n=1,
	  n=2 .... ?)
	- how do the environmental initial conditions affect the
	  test being run?
	- what do security layers automatically store in the inode
	  at creation time?
	- how can we modify the test to always trigger the assert?

I know the answer, and it would take much less time to tell everyone
that it does to write an email like this.  But that means I'll just
have to do the same thing next time, and the next time, and so on.
The more people we have that can think through issues like this and
come to the right conclusion without needing my help, the better off
we'll all be...

Cheers,

Dave.
Artem Savkov Aug. 26, 2016, 8:59 a.m. UTC | #7
On Fri, Aug 26, 2016 at 08:42:15AM +1000, Dave Chinner wrote:
> On Thu, Aug 25, 2016 at 10:21:09AM +0200, Artem Savkov wrote:
> > On Thu, Aug 25, 2016 at 10:24:08AM +1000, Dave Chinner wrote:
> > > On Wed, Aug 24, 2016 at 10:08:33AM +0200, Artem Savkov wrote:
> > > > On Wed, Aug 24, 2016 at 11:55:51AM +1000, Dave Chinner wrote:
> > > > > On Tue, Aug 23, 2016 at 05:54:13PM +0200, Artem Savkov wrote:
> > > > > > Commit "xfs: only return -errno or success from attr ->put_listent" changes the
> > > > > 
> > > > > Please quote commits in --oneline format in changelogs - it makes it
> > > > > much easier to find the change you are refering to if there is both
> > > > > a commit ID and the text string in the commit message. (i.e. text
> > > > > string confirms the commit id is the one you meant to quote).
> > > > 
> > > > Noted, thanks.
> > > > 
> > > > > 
> > > > > commit 2a6fba6 ("xfs: only return -errno or success from attr
> > > > > ->put_listent") is the one you are refering to here, right?
> > > > 
> > > > Yes, that is the one.
> > > > 
> > > > > > returnvalue of __xfs_xattr_put_listen to 0 in case when there is insufficient
> > > > > > space in the buffer assuming that setting context->count to -1 would be enough,
> > > > > > but all of the ->put_listent callers only check seen_enough. This results in
> > > > > > a failed assertion:
> > > > > > XFS: Assertion failed: context->count >= 0, file: fs/xfs/xfs_xattr.c, line: 175
> > > > > > in insufficient buffer size case.
> > > > > 
> > > > > You have a test case? Can you turn it into an xfstest? We really
> > > > > need regression tests that cover issues like this....
> > > > > 
> > > > 
> > > > llistxattr02 test from LTP reliably hits this, I'll see how this can be
> > > > ported to xfstest.
> > > 
> > > So, after battling the obtuse, completely useless ltp install
> > > documentation and having to resort to reverse engineering a working
> > > configuration using strace, I finally got this running, I think, on
> > > an XFS filesystem:
> > > 
> > > /mnt/scratch/ltp$ sudo ./runltp -b /dev/pmem1 -B xfs -z /dev/vdc -Z xfs -q -p -s llistxattr -I 100
> > > ....
> > > Summary:
> > > passed   1
> > > failed   0
> > > skipped  0
> > > warnings 0
> > > tst_test.c:756: INFO: Timeout per run is 0h 05m 00s
> > > llistxattr02.c:76: PASS: llistxattr() failed as expected: ERANGE
> > > llistxattr02.c:76: PASS: llistxattr() failed as expected: ENOENT
> > > llistxattr02.c:76: PASS: llistxattr() failed as expected: EFAULT
> > > ....
> > > 
> > > And it doesn't fail. strace output:
> > > 
> > > 24833 lsetxattr("symlink", "security.ltptest", "test", 4, XATTR_CREATE) = 0
> > > 24833 llistxattr("symlink", 0x7ffe312356b0, 1) = -1 ERANGE (Numerical result out of range)
> > > 24833 llistxattr("", 0x7ffe312356a0, 20) = -1 ENOENT (No such file or directory)
> > > 24833 llistxattr(0xffffffffffffffff, 0x7ffe312356a0, 20) = -1 EFAULT (Bad address)
> > > 
> > > I'm assuming from your description that it is the first one of these
> > > that fails for you as it is the "buffer too small" test case. So,
> > > not as obvious as it first seems - ltp doesn't appear to be a
> > > reliable reproducer of the problem, so we are going to need a custom
> > > test to exercise it....
> > 
> > LTP doesn't check dmesg for warnings on it's own, so it is ok for the test
> > to be marked as "PASSED" since we get ERANGE after all. But you should
> > be able to see the warnings in dmesg.
> 
> Of course. My systems are set up so that any critical message that
> comes through dmesg is wall'd to all active sessions, so it doesn't
> matter that xfstests or ltp doesn't capture the error, I know it's
> happened.
> 
> Besides, I always use CONFIG_XFS_DEBUG=y, which means these things
> BUG() rather than just print a warning, and that tends to be fatal
> in more ways than one...
> 
> > I'm attaching the config I'm using. With this config I can repdroduce the
> > issue with both 4.8-rc3 and next-20160825.
> 
> What I suggest you do is try to recreate the problem manually. I'm
> pretty certain I know what is different between our test
> environments that is resulting in me not seeing the problem
> but, really, we need more people with root cause analysis skills
> around here so I don't have to spend time solving every problem that
> is reported.
> 
> That is, it's all well and good to send a patch to fix a problem,
> but if we don't understand the root cause of the problem being
> fixed, then we have no idea whether we've fixed the problem fully or
> not.  Part of understanding the root cause is finding a reliable
> reproducer of the problem, part of it is reading and understanding
> the code being fixed, and part of it is understanding the
> environment and behaviour that demonstrates the problem.
> 
> So when I look at the fix, and see that it doesn't reproduce on my
> systems, it's clear that it's either not yet fully understood or
> hasn't been fully explained by the person who understands the issue.
> These are some of the questions I've asked myself to understand why
> we are seeing what we've been seeing:
> 
> 	- what condition in the unfixed code leads to the ASSERT
> 	  being tripped?
> 	- how does the patch prevent that from occurring?
> 	- at what threshold does the problem trigger (i.e. n=0, n=1,
> 	  n=2 .... ?)
> 	- how do the environmental initial conditions affect the
> 	  test being run?
> 	- what do security layers automatically store in the inode
> 	  at creation time?
> 	- how can we modify the test to always trigger the assert?
> 
> I know the answer, and it would take much less time to tell everyone
> that it does to write an email like this.  But that means I'll just
> have to do the same thing next time, and the next time, and so on.
> The more people we have that can think through issues like this and
> come to the right conclusion without needing my help, the better off
> we'll all be...

Fair enough.

The problem only shows itself with a minimum of 2 xattrs and only when
the buffer gets depleted before the last one. LTP's llistxattr02 test
only sets one xattr, but on my testsystem "security.selinux" attribute
is automatically added on file creation which allows this bug to be
reproduced. So I would assume that on your systems there are no
automatically created xattrs and thats why you can't reproduce this.

Furthermore if buffersize is such that it is enough to hold the last
xattr's name, but not enough to hold the sum of preceeding xattrs
listxattr won't fail with ERANGE, but will suceed returning that xattr's
name without the first character. The first character end's up
overwriting whatever is stored at (context->alist - 1).

This should also answer Eric's question on whether it only affects the
ASSERT.
Dave Chinner Aug. 28, 2016, 10:55 p.m. UTC | #8
On Fri, Aug 26, 2016 at 10:59:28AM +0200, Artem Savkov wrote:
> On Fri, Aug 26, 2016 at 08:42:15AM +1000, Dave Chinner wrote:
> > So when I look at the fix, and see that it doesn't reproduce on my
> > systems, it's clear that it's either not yet fully understood or
> > hasn't been fully explained by the person who understands the issue.
> > These are some of the questions I've asked myself to understand why
> > we are seeing what we've been seeing:
> > 
> > 	- what condition in the unfixed code leads to the ASSERT
> > 	  being tripped?
> > 	- how does the patch prevent that from occurring?
> > 	- at what threshold does the problem trigger (i.e. n=0, n=1,
> > 	  n=2 .... ?)
> > 	- how do the environmental initial conditions affect the
> > 	  test being run?
> > 	- what do security layers automatically store in the inode
> > 	  at creation time?
> > 	- how can we modify the test to always trigger the assert?
> > 
> > I know the answer, and it would take much less time to tell everyone
> > that it does to write an email like this.  But that means I'll just
> > have to do the same thing next time, and the next time, and so on.
> > The more people we have that can think through issues like this and
> > come to the right conclusion without needing my help, the better off
> > we'll all be...
> 
> Fair enough.
> 
> The problem only shows itself with a minimum of 2 xattrs and only when
> the buffer gets depleted before the last one.

This sentence needs to be in the commit description. :P

> LTP's llistxattr02 test
> only sets one xattr, but on my testsystem "security.selinux" attribute
> is automatically added on file creation which allows this bug to be
> reproduced. So I would assume that on your systems there are no
> automatically created xattrs and thats why you can't reproduce this.

On /some/ of my systems. I have a mix of selinux enabled/disabled
test machines, precisely because of the way always having an
attribute fork in the inode can perturb test results. I happened to
try to reproduce this on a machine that doesn't have selinux
enabled....

> Furthermore if buffersize is such that it is enough to hold the last
> xattr's name, but not enough to hold the sum of preceeding xattrs
> listxattr won't fail with ERANGE, but will suceed returning that xattr's
> name without the first character. The first character end's up
> overwriting whatever is stored at (context->alist - 1).

That should probably also be in the commit description - that
way when we have an idea of what problems it fixes when trying to
match upstream fixes to problems with older kernels (e.g. for distro
kernel backports).

Yes, I know it's a lot to put in a commit message, but in a couple
of years time nobody will remember these details. We regularly have
to work out why something was done 10-15 years ago in the code base,
and having good commit messages makes this a much easier job.
Someone like me will thank you in future for writing a comprehensive
commit message for a relatively simple bug fix....

Cheers,

Dave.
diff mbox

Patch

diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
index ea62245..6290093 100644
--- a/fs/xfs/xfs_xattr.c
+++ b/fs/xfs/xfs_xattr.c
@@ -147,6 +147,7 @@  __xfs_xattr_put_listent(
 	arraytop = context->count + prefix_len + namelen + 1;
 	if (arraytop > context->firstu) {
 		context->count = -1;	/* insufficient space */
+		context->seen_enough = 1;
 		return 0;
 	}
 	offset = (char *)context->alist + context->count;