diff mbox

[RFC] coredump: avoid ext4 auto_da_alloc for core file

Message ID 5cdda475417b2719dced162cce89a283153cb818.1466012020.git.osandov@fb.com (mailing list archive)
State New, archived
Headers show

Commit Message

Omar Sandoval June 15, 2016, 5:42 p.m. UTC
From: Omar Sandoval <osandov@fb.com>

Someone at Facebook reported that their coredumps were much faster when
using a pipe helper than when dumping directly to a file, which doesn't
make much sense. It turns out that this difference is because in
do_coredump(), we truncate the core file and thus trigger the ext4
auto_da_alloc heuristic. We can't use O_TRUNC because we might bail out
of do_coredump() in certain conditions, so instead, avoid truncating
when the file is already empty. In cases where we're actually
overwriting a core file, this won't help, but the common case will be
much better.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
Hi, Al and Ted,

This is probably the wrong solution to the problem I described in the
commit message. Do you guys have any better ideas? Something like
0eab928221ba ("ext4: Don't treat a truncation of a zero-length file as
replace-via-truncate") would also work, but that apparently wasn't
right, as it was reverted in 5534fb5bb35a ("ext4: Fix the alloc on close
after a truncate hueristic").

Thanks.

 fs/coredump.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Omar Sandoval June 29, 2016, 6:34 p.m. UTC | #1
On Wed, Jun 15, 2016 at 10:42:05AM -0700, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> Someone at Facebook reported that their coredumps were much faster when
> using a pipe helper than when dumping directly to a file, which doesn't
> make much sense. It turns out that this difference is because in
> do_coredump(), we truncate the core file and thus trigger the ext4
> auto_da_alloc heuristic. We can't use O_TRUNC because we might bail out
> of do_coredump() in certain conditions, so instead, avoid truncating
> when the file is already empty. In cases where we're actually
> overwriting a core file, this won't help, but the common case will be
> much better.
> 
> Signed-off-by: Omar Sandoval <osandov@fb.com>
> ---
> Hi, Al and Ted,
> 
> This is probably the wrong solution to the problem I described in the
> commit message. Do you guys have any better ideas? Something like
> 0eab928221ba ("ext4: Don't treat a truncation of a zero-length file as
> replace-via-truncate") would also work, but that apparently wasn't
> right, as it was reverted in 5534fb5bb35a ("ext4: Fix the alloc on close
> after a truncate hueristic").
> 
> Thanks.

Ping, any thoughts on this?

>  fs/coredump.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/coredump.c b/fs/coredump.c
> index 281b768000e6..9da7357773f0 100644
> --- a/fs/coredump.c
> +++ b/fs/coredump.c
> @@ -741,8 +741,10 @@ void do_coredump(const siginfo_t *siginfo)
>  			goto close_fail;
>  		if (!(cprm.file->f_mode & FMODE_CAN_WRITE))
>  			goto close_fail;
> -		if (do_truncate(cprm.file->f_path.dentry, 0, 0, cprm.file))
> -			goto close_fail;
> +		if (i_size_read(file_inode(cprm.file)) != 0) {
> +			if (do_truncate(cprm.file->f_path.dentry, 0, 0, cprm.file))
> +				goto close_fail;
> +		}
>  	}
>  
>  	/* get us an unshared descriptor table; almost always a no-op */
> -- 
> 2.8.3
>
Josef Bacik July 5, 2016, 1:42 p.m. UTC | #2
On 06/15/2016 01:42 PM, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
>
> Someone at Facebook reported that their coredumps were much faster when
> using a pipe helper than when dumping directly to a file, which doesn't
> make much sense. It turns out that this difference is because in
> do_coredump(), we truncate the core file and thus trigger the ext4
> auto_da_alloc heuristic. We can't use O_TRUNC because we might bail out
> of do_coredump() in certain conditions, so instead, avoid truncating
> when the file is already empty. In cases where we're actually
> overwriting a core file, this won't help, but the common case will be
> much better.
>
> Signed-off-by: Omar Sandoval <osandov@fb.com>
> ---
> Hi, Al and Ted,
>
> This is probably the wrong solution to the problem I described in the
> commit message. Do you guys have any better ideas? Something like
> 0eab928221ba ("ext4: Don't treat a truncation of a zero-length file as
> replace-via-truncate") would also work, but that apparently wasn't
> right, as it was reverted in 5534fb5bb35a ("ext4: Fix the alloc on close
> after a truncate hueristic").
>
> Thanks.
>
>  fs/coredump.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/fs/coredump.c b/fs/coredump.c
> index 281b768000e6..9da7357773f0 100644
> --- a/fs/coredump.c
> +++ b/fs/coredump.c
> @@ -741,8 +741,10 @@ void do_coredump(const siginfo_t *siginfo)
>  			goto close_fail;
>  		if (!(cprm.file->f_mode & FMODE_CAN_WRITE))
>  			goto close_fail;
> -		if (do_truncate(cprm.file->f_path.dentry, 0, 0, cprm.file))
> -			goto close_fail;
> +		if (i_size_read(file_inode(cprm.file)) != 0) {
> +			if (do_truncate(cprm.file->f_path.dentry, 0, 0, cprm.file))
> +				goto close_fail;
> +		}
>  	}
>
>  	/* get us an unshared descriptor table; almost always a no-op */
>

Omar, this probably breaks the case where we do fallocate(FALLOC_FL_KEEP_SIZE), 
the i_size will be 0 but there will be blocks to truncate.  Probably want to 
check i_blocks or something.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o July 5, 2016, 2:37 p.m. UTC | #3
On Tue, Jul 05, 2016 at 09:42:13AM -0400, Josef Bacik wrote:
> > diff --git a/fs/coredump.c b/fs/coredump.c
> > index 281b768000e6..9da7357773f0 100644
> > --- a/fs/coredump.c
> > +++ b/fs/coredump.c
> > @@ -741,8 +741,10 @@ void do_coredump(const siginfo_t *siginfo)
> >  			goto close_fail;
> >  		if (!(cprm.file->f_mode & FMODE_CAN_WRITE))
> >  			goto close_fail;
> > -		if (do_truncate(cprm.file->f_path.dentry, 0, 0, cprm.file))
> > -			goto close_fail;
> > +		if (i_size_read(file_inode(cprm.file)) != 0) {
> > +			if (do_truncate(cprm.file->f_path.dentry, 0, 0, cprm.file))
> > +				goto close_fail;
> > +		}
> >  	}
> > 
> >  	/* get us an unshared descriptor table; almost always a no-op */
> > 
> 
> Omar, this probably breaks the case where we do
> fallocate(FALLOC_FL_KEEP_SIZE), the i_size will be 0 but there will be
> blocks to truncate.  Probably want to check i_blocks or something.  Thanks,

Sure, but this is in the coredump code; do we care there?  What are
the odds that someone will have fallocated blocks beyond i_size in a
file named "core"?  And if so, it's not like it's going to make the
coredump invalid or non-useful in any way.

      	       	     	 	  	      - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Josef Bacik July 5, 2016, 3:01 p.m. UTC | #4
On 07/05/2016 10:37 AM, Theodore Ts'o wrote:
> On Tue, Jul 05, 2016 at 09:42:13AM -0400, Josef Bacik wrote:
>>> diff --git a/fs/coredump.c b/fs/coredump.c
>>> index 281b768000e6..9da7357773f0 100644
>>> --- a/fs/coredump.c
>>> +++ b/fs/coredump.c
>>> @@ -741,8 +741,10 @@ void do_coredump(const siginfo_t *siginfo)
>>>  			goto close_fail;
>>>  		if (!(cprm.file->f_mode & FMODE_CAN_WRITE))
>>>  			goto close_fail;
>>> -		if (do_truncate(cprm.file->f_path.dentry, 0, 0, cprm.file))
>>> -			goto close_fail;
>>> +		if (i_size_read(file_inode(cprm.file)) != 0) {
>>> +			if (do_truncate(cprm.file->f_path.dentry, 0, 0, cprm.file))
>>> +				goto close_fail;
>>> +		}
>>>  	}
>>>
>>>  	/* get us an unshared descriptor table; almost always a no-op */
>>>
>>
>> Omar, this probably breaks the case where we do
>> fallocate(FALLOC_FL_KEEP_SIZE), the i_size will be 0 but there will be
>> blocks to truncate.  Probably want to check i_blocks or something.  Thanks,
>
> Sure, but this is in the coredump code; do we care there?  What are
> the odds that someone will have fallocated blocks beyond i_size in a
> file named "core"?  And if so, it's not like it's going to make the
> coredump invalid or non-useful in any way.

Wow I totally didn't notice this was in coredump.c, I thought it was in ext4 
code because you said it failed regression tests, which I assumed were your ext4 
tests.  Ignore me.  Thanks,

Josef

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o July 5, 2016, 4:57 p.m. UTC | #5
On Tue, Jul 05, 2016 at 11:01:40AM -0400, Josef Bacik wrote:
> > > Omar, this probably breaks the case where we do
> > > fallocate(FALLOC_FL_KEEP_SIZE), the i_size will be 0 but there will be
> > > blocks to truncate.  Probably want to check i_blocks or something.  Thanks,
> > 
> > Sure, but this is in the coredump code; do we care there?  What are
> > the odds that someone will have fallocated blocks beyond i_size in a
> > file named "core"?  And if so, it's not like it's going to make the
> > coredump invalid or non-useful in any way.
> 
> Wow I totally didn't notice this was in coredump.c, I thought it was in ext4
> code because you said it failed regression tests, which I assumed were your
> ext4 tests.  Ignore me.  Thanks,

Yeah, Omar's original patch was something he described as a "hack" to
the coredump code.  I actually don't think it's that bad, but it does
make sense to have ext4 not enable the "replace-via-truncate" code
when the truncate is a no-op, but it turns out this is a bit tricky
because the places where we set i_size and where we decide to truncate
beyond i_size are separated.  I tried to do something simple but it
didn't quite work right; I'll look into why it didn't work hopefully
later today.

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/coredump.c b/fs/coredump.c
index 281b768000e6..9da7357773f0 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -741,8 +741,10 @@  void do_coredump(const siginfo_t *siginfo)
 			goto close_fail;
 		if (!(cprm.file->f_mode & FMODE_CAN_WRITE))
 			goto close_fail;
-		if (do_truncate(cprm.file->f_path.dentry, 0, 0, cprm.file))
-			goto close_fail;
+		if (i_size_read(file_inode(cprm.file)) != 0) {
+			if (do_truncate(cprm.file->f_path.dentry, 0, 0, cprm.file))
+				goto close_fail;
+		}
 	}
 
 	/* get us an unshared descriptor table; almost always a no-op */