diff mbox series

[RFC] libxfs: use FALLOC_FL_ZERO_RANGE in libxfs_device_zero

Message ID 4bc3be27-b09d-a708-f053-6f7240642667@sandeen.net (mailing list archive)
State Superseded
Headers show
Series [RFC] libxfs: use FALLOC_FL_ZERO_RANGE in libxfs_device_zero | expand

Commit Message

Eric Sandeen Feb. 13, 2020, 9:12 p.m. UTC
I had a request from someone who cared about mkfs speed(!)
over a slower network block device to look into using faster
zeroing methods, particularly for the log, during mkfs.xfs.

e2fsprogs already does this, thanks to some guy named Darrick:

/*
 * If we know about ZERO_RANGE, try that before we try PUNCH_HOLE because
 * ZERO_RANGE doesn't unmap preallocated blocks.  We prefer fallocate because
 * it always invalidates page cache, and libext2fs requires that reads after
 * ZERO_RANGE return zeroes.
 */
static int __unix_zeroout(int fd, off_t offset, off_t len)
{
        int ret = -1;

#if defined(HAVE_FALLOCATE) && defined(FALLOC_FL_ZERO_RANGE)
        ret = fallocate(fd, FALLOC_FL_ZERO_RANGE, offset, len);
        if (ret == 0)
                return 0;
#endif
#if defined(HAVE_FALLOCATE) && defined(FALLOC_FL_PUNCH_HOLE) && defined(FALLOC_FL_KEEP_SIZE)
        ret = fallocate(fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
                        offset,  len);
        if (ret == 0)
                return 0;
#endif
        errno = EOPNOTSUPP;
        return ret;
}

and nobody has exploded so far, AFAIK.  :)  So, floating this idea
for xfsprogs.  I'm a little scared of the second #ifdef block above, but
if that's really ok/consistent/safe we could add it too.

The patch moves some defines around too, I could split that up and resend
if this isn't laughed out of the room.

Thanks,
-Eric

=====

libxfs: use FALLOC_FL_ZERO_RANGE in libxfs_device_zero

I had a request from someone who cared about mkfs speed(!)
over a slower network block device to look into using faster
zeroing methods, particularly for the log, during mkfs.

Using FALLOC_FL_ZERO_RANGE is faster in this case than writing
a bunch of zeros across a wire.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---

Comments

Dave Chinner Feb. 13, 2020, 11:48 p.m. UTC | #1
On Thu, Feb 13, 2020 at 03:12:24PM -0600, Eric Sandeen wrote:
> I had a request from someone who cared about mkfs speed(!)
> over a slower network block device to look into using faster
> zeroing methods, particularly for the log, during mkfs.xfs.
> 
> e2fsprogs already does this, thanks to some guy named Darrick:
> 
> /*
>  * If we know about ZERO_RANGE, try that before we try PUNCH_HOLE because
>  * ZERO_RANGE doesn't unmap preallocated blocks.  We prefer fallocate because
>  * it always invalidates page cache, and libext2fs requires that reads after
>  * ZERO_RANGE return zeroes.
>  */
> static int __unix_zeroout(int fd, off_t offset, off_t len)
> {
>         int ret = -1;
> 
> #if defined(HAVE_FALLOCATE) && defined(FALLOC_FL_ZERO_RANGE)
>         ret = fallocate(fd, FALLOC_FL_ZERO_RANGE, offset, len);
>         if (ret == 0)
>                 return 0;
> #endif
> #if defined(HAVE_FALLOCATE) && defined(FALLOC_FL_PUNCH_HOLE) && defined(FALLOC_FL_KEEP_SIZE)
>         ret = fallocate(fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
>                         offset,  len);
>         if (ret == 0)
>                 return 0;
> #endif
>         errno = EOPNOTSUPP;
>         return ret;
> }
> 
> and nobody has exploded so far, AFAIK.  :)  So, floating this idea
> for xfsprogs.  I'm a little scared of the second #ifdef block above, but
> if that's really ok/consistent/safe we could add it too.

If FALLOC_FL_PUNCH_HOLE is defined, then FALLOC_FL_KEEP_SIZE is
guaranteed to be defined, so that condition check is somewhat
redundant. See commit 79124f18b335 ("fs: add hole punching to
fallocate")....

> The patch moves some defines around too, I could split that up and resend
> if this isn't laughed out of the room.
> 
> Thanks,
> -Eric
> 
> =====
> 
> libxfs: use FALLOC_FL_ZERO_RANGE in libxfs_device_zero
> 
> I had a request from someone who cared about mkfs speed(!)
> over a slower network block device to look into using faster
> zeroing methods, particularly for the log, during mkfs.
> 
> Using FALLOC_FL_ZERO_RANGE is faster in this case than writing
> a bunch of zeros across a wire.
> 
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> ---
> 
> diff --git a/include/linux.h b/include/linux.h
> index 8f3c32b0..425badb5 100644
> --- a/include/linux.h
> +++ b/include/linux.h
> @@ -113,6 +113,26 @@ static __inline__ void platform_uuid_copy(uuid_t *dst, uuid_t *src)
>  	uuid_copy(*dst, *src);
>  }
>  
> +#ifndef FALLOC_FL_PUNCH_HOLE
> +#define FALLOC_FL_PUNCH_HOLE	0x02
> +#endif
> +
> +#ifndef FALLOC_FL_COLLAPSE_RANGE
> +#define FALLOC_FL_COLLAPSE_RANGE 0x08
> +#endif
> +
> +#ifndef FALLOC_FL_ZERO_RANGE
> +#define FALLOC_FL_ZERO_RANGE 0x10
> +#endif
> +
> +#ifndef FALLOC_FL_INSERT_RANGE
> +#define FALLOC_FL_INSERT_RANGE 0x20
> +#endif
> +
> +#ifndef FALLOC_FL_UNSHARE_RANGE
> +#define FALLOC_FL_UNSHARE_RANGE 0x40
> +#endif

These were added to allow xfs_io to test these operations before
there was userspace support for them. I do not think we should
propagate them outside of xfs_io - if they are not supported by the
distro at build time, we shouldn't attempt to use them in mkfs.

i.e. xfs_io is test-enablement code for developers, mkfs.xfs is
production code for users, so different rules kinda exist for
them...

> diff --git a/libxfs/Makefile b/libxfs/Makefile
> index fbcc963a..b4e8864b 100644
> --- a/libxfs/Makefile
> +++ b/libxfs/Makefile
> @@ -105,6 +105,10 @@ CFILES = cache.c \
>  #
>  #LCFLAGS +=
>  
> +ifeq ($(HAVE_FALLOCATE),yes)
> +LCFLAGS += -DHAVE_FALLOCATE
> +endif

HAVE_FALLOCATE comes from an autoconf test. I suspect that this
needs to be made more finegrained, testing for fallocate() features,
not just just whether the syscall exists. And the above should
probably be in include/builddefs.in so that it's available to all
of xfsprogs, not just libxfs code...

> +
>  FCFLAGS = -I.
>  
>  LTLIBS = $(LIBPTHREAD) $(LIBRT)
> diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
> index 0d9d7202..94f63bbf 100644
> --- a/libxfs/rdwr.c
> +++ b/libxfs/rdwr.c
> @@ -4,6 +4,9 @@
>   * All Rights Reserved.
>   */
>  
> +#if defined(HAVE_FALLOCATE)
> +#include <linux/falloc.h>
> +#endif

Urk. That should be in include/linux.h, right?

>  
>  #include "libxfs_priv.h"
>  #include "init.h"
> @@ -60,9 +63,21 @@ int
>  libxfs_device_zero(struct xfs_buftarg *btp, xfs_daddr_t start, uint len)
>  {
>  	xfs_off_t	start_offset, end_offset, offset;
> -	ssize_t		zsize, bytes;
> +	ssize_t		zsize, bytes, len_bytes;
>  	char		*z;
> -	int		fd;
> +	int		ret, fd;
> +
> +	fd = libxfs_device_to_fd(btp->dev);
> +	start_offset = LIBXFS_BBTOOFF64(start);
> +	end_offset = LIBXFS_BBTOOFF64(start + len) - start_offset;
> +
> +#if defined(HAVE_FALLOCATE)
> +	/* try to use special zeroing methods, fall back to writes if needed */
> +	len_bytes = LIBXFS_BBTOOFF64(len);
> +	ret = fallocate(fd, FALLOC_FL_ZERO_RANGE, start_offset, len_bytes);
> +	if (ret == 0)
> +		return 0;
> +#endif

I kinda dislike the "return if success" hidden inside the ifdef -
it's not a code pattern I'd expect to see. This is what I'd tend
to expect in include/linux.h:

#if defined(HAVE_FALLOCATE_ZERO_RANGE)
static inline int
platform_zero_range(
	int		fd,
	xfs_off_t	start_offset,
	xfs_off_t	end_offset)
{
	int		ret;

	ret = fallocate(fd, FALLOC_FL_ZERO_RANGE, start_offset, len_bytes);
	if (!ret)
		return 0;
	return -errno;
}
#else
#define platform_zero_range(fd, s, o)	(true)
#endif

and then the code in libxfs_device_zero() does:

	error = platform_zero_range(fd, start_offset, len_bytes);
	if (!error)
		return 0;

without adding nasty #defines...

Cheers,

Dave.
Eric Sandeen Feb. 13, 2020, 11:57 p.m. UTC | #2
On 2/13/20 5:48 PM, Dave Chinner wrote:
> On Thu, Feb 13, 2020 at 03:12:24PM -0600, Eric Sandeen wrote:
>> I had a request from someone who cared about mkfs speed(!)
>> over a slower network block device to look into using faster
>> zeroing methods, particularly for the log, during mkfs.xfs.
>>
>> e2fsprogs already does this, thanks to some guy named Darrick:
>>
>> /*
>>  * If we know about ZERO_RANGE, try that before we try PUNCH_HOLE because
>>  * ZERO_RANGE doesn't unmap preallocated blocks.  We prefer fallocate because
>>  * it always invalidates page cache, and libext2fs requires that reads after
>>  * ZERO_RANGE return zeroes.
>>  */
>> static int __unix_zeroout(int fd, off_t offset, off_t len)
>> {
>>         int ret = -1;
>>
>> #if defined(HAVE_FALLOCATE) && defined(FALLOC_FL_ZERO_RANGE)
>>         ret = fallocate(fd, FALLOC_FL_ZERO_RANGE, offset, len);
>>         if (ret == 0)
>>                 return 0;
>> #endif
>> #if defined(HAVE_FALLOCATE) && defined(FALLOC_FL_PUNCH_HOLE) && defined(FALLOC_FL_KEEP_SIZE)
>>         ret = fallocate(fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
>>                         offset,  len);
>>         if (ret == 0)
>>                 return 0;
>> #endif
>>         errno = EOPNOTSUPP;
>>         return ret;
>> }
>>
>> and nobody has exploded so far, AFAIK.  :)  So, floating this idea
>> for xfsprogs.  I'm a little scared of the second #ifdef block above, but
>> if that's really ok/consistent/safe we could add it too.
> 
> If FALLOC_FL_PUNCH_HOLE is defined, then FALLOC_FL_KEEP_SIZE is
> guaranteed to be defined, so that condition check is somewhat
> redundant. See commit 79124f18b335 ("fs: add hole punching to
> fallocate")....

Style aside, is that 2nd zeroing method something we want to try?

>> The patch moves some defines around too, I could split that up and resend
>> if this isn't laughed out of the room.
>>
>> Thanks,
>> -Eric
>>
>> =====
>>
>> libxfs: use FALLOC_FL_ZERO_RANGE in libxfs_device_zero
>>
>> I had a request from someone who cared about mkfs speed(!)
>> over a slower network block device to look into using faster
>> zeroing methods, particularly for the log, during mkfs.
>>
>> Using FALLOC_FL_ZERO_RANGE is faster in this case than writing
>> a bunch of zeros across a wire.
>>
>> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
>> ---
>>
>> diff --git a/include/linux.h b/include/linux.h
>> index 8f3c32b0..425badb5 100644
>> --- a/include/linux.h
>> +++ b/include/linux.h
>> @@ -113,6 +113,26 @@ static __inline__ void platform_uuid_copy(uuid_t *dst, uuid_t *src)
>>  	uuid_copy(*dst, *src);
>>  }
>>  
>> +#ifndef FALLOC_FL_PUNCH_HOLE
>> +#define FALLOC_FL_PUNCH_HOLE	0x02
>> +#endif
>> +
>> +#ifndef FALLOC_FL_COLLAPSE_RANGE
>> +#define FALLOC_FL_COLLAPSE_RANGE 0x08
>> +#endif
>> +
>> +#ifndef FALLOC_FL_ZERO_RANGE
>> +#define FALLOC_FL_ZERO_RANGE 0x10
>> +#endif
>> +
>> +#ifndef FALLOC_FL_INSERT_RANGE
>> +#define FALLOC_FL_INSERT_RANGE 0x20
>> +#endif
>> +
>> +#ifndef FALLOC_FL_UNSHARE_RANGE
>> +#define FALLOC_FL_UNSHARE_RANGE 0x40
>> +#endif
> 
> These were added to allow xfs_io to test these operations before
> there was userspace support for them. I do not think we should
> propagate them outside of xfs_io - if they are not supported by the
> distro at build time, we shouldn't attempt to use them in mkfs.
> 
> i.e. xfs_io is test-enablement code for developers, mkfs.xfs is
> production code for users, so different rules kinda exist for
> them...

Fair enough.  people /could/ use newer xfsprogs on older kernels, but...
those defines are getting pretty old by now in any case.

It's probably not terrible to just fail the build on a system that doesn't
have falloc.h, by now?

>> diff --git a/libxfs/Makefile b/libxfs/Makefile
>> index fbcc963a..b4e8864b 100644
>> --- a/libxfs/Makefile
>> +++ b/libxfs/Makefile
>> @@ -105,6 +105,10 @@ CFILES = cache.c \
>>  #
>>  #LCFLAGS +=
>>  
>> +ifeq ($(HAVE_FALLOCATE),yes)
>> +LCFLAGS += -DHAVE_FALLOCATE
>> +endif
> 
> HAVE_FALLOCATE comes from an autoconf test. I suspect that this
> needs to be made more finegrained, testing for fallocate() features,
> not just just whether the syscall exists. And the above should
> probably be in include/builddefs.in so that it's available to all
> of xfsprogs, not just libxfs code...

But that's testing the kernel on the build host, right?  What's the
point of that?  Or am I misunderstanding?

>> +
>>  FCFLAGS = -I.
>>  
>>  LTLIBS = $(LIBPTHREAD) $(LIBRT)
>> diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
>> index 0d9d7202..94f63bbf 100644
>> --- a/libxfs/rdwr.c
>> +++ b/libxfs/rdwr.c
>> @@ -4,6 +4,9 @@
>>   * All Rights Reserved.
>>   */
>>  
>> +#if defined(HAVE_FALLOCATE)
>> +#include <linux/falloc.h>
>> +#endif
> 
> Urk. That should be in include/linux.h, right?
> 
>>  
>>  #include "libxfs_priv.h"
>>  #include "init.h"
>> @@ -60,9 +63,21 @@ int
>>  libxfs_device_zero(struct xfs_buftarg *btp, xfs_daddr_t start, uint len)
>>  {
>>  	xfs_off_t	start_offset, end_offset, offset;
>> -	ssize_t		zsize, bytes;
>> +	ssize_t		zsize, bytes, len_bytes;
>>  	char		*z;
>> -	int		fd;
>> +	int		ret, fd;
>> +
>> +	fd = libxfs_device_to_fd(btp->dev);
>> +	start_offset = LIBXFS_BBTOOFF64(start);
>> +	end_offset = LIBXFS_BBTOOFF64(start + len) - start_offset;
>> +
>> +#if defined(HAVE_FALLOCATE)
>> +	/* try to use special zeroing methods, fall back to writes if needed */
>> +	len_bytes = LIBXFS_BBTOOFF64(len);
>> +	ret = fallocate(fd, FALLOC_FL_ZERO_RANGE, start_offset, len_bytes);
>> +	if (ret == 0)
>> +		return 0;
>> +#endif
> 
> I kinda dislike the "return if success" hidden inside the ifdef -
> it's not a code pattern I'd expect to see. This is what I'd tend
> to expect in include/linux.h:
> 
> #if defined(HAVE_FALLOCATE_ZERO_RANGE)
> static inline int
> platform_zero_range(
> 	int		fd,
> 	xfs_off_t	start_offset,
> 	xfs_off_t	end_offset)
> {
> 	int		ret;
> 
> 	ret = fallocate(fd, FALLOC_FL_ZERO_RANGE, start_offset, len_bytes);
> 	if (!ret)
> 		return 0;
> 	return -errno;
> }
> #else
> #define platform_zero_range(fd, s, o)	(true)
> #endif
> 
> and then the code in libxfs_device_zero() does:
> 
> 	error = platform_zero_range(fd, start_offset, len_bytes);
> 	if (!error)
> 		return 0;
> 
> without adding nasty #defines...

Yeah that's much better.  Tho I hate adding more platform_* stuff when really
we're down to only one platform but maybe that's a project for another day.

Thanks for the review,
-Eric
Dave Chinner Feb. 14, 2020, 12:25 a.m. UTC | #3
On Thu, Feb 13, 2020 at 05:57:17PM -0600, Eric Sandeen wrote:
> On 2/13/20 5:48 PM, Dave Chinner wrote:
> > On Thu, Feb 13, 2020 at 03:12:24PM -0600, Eric Sandeen wrote:
> >> I had a request from someone who cared about mkfs speed(!)
> >> over a slower network block device to look into using faster
> >> zeroing methods, particularly for the log, during mkfs.xfs.
> >>
> >> e2fsprogs already does this, thanks to some guy named Darrick:
> >>
> >> /*
> >>  * If we know about ZERO_RANGE, try that before we try PUNCH_HOLE because
> >>  * ZERO_RANGE doesn't unmap preallocated blocks.  We prefer fallocate because
> >>  * it always invalidates page cache, and libext2fs requires that reads after
> >>  * ZERO_RANGE return zeroes.
> >>  */
> >> static int __unix_zeroout(int fd, off_t offset, off_t len)
> >> {
> >>         int ret = -1;
> >>
> >> #if defined(HAVE_FALLOCATE) && defined(FALLOC_FL_ZERO_RANGE)
> >>         ret = fallocate(fd, FALLOC_FL_ZERO_RANGE, offset, len);
> >>         if (ret == 0)
> >>                 return 0;
> >> #endif
> >> #if defined(HAVE_FALLOCATE) && defined(FALLOC_FL_PUNCH_HOLE) && defined(FALLOC_FL_KEEP_SIZE)
> >>         ret = fallocate(fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
> >>                         offset,  len);
> >>         if (ret == 0)
> >>                 return 0;
> >> #endif
> >>         errno = EOPNOTSUPP;
> >>         return ret;
> >> }
> >>
> >> and nobody has exploded so far, AFAIK.  :)  So, floating this idea
> >> for xfsprogs.  I'm a little scared of the second #ifdef block above, but
> >> if that's really ok/consistent/safe we could add it too.
> > 
> > If FALLOC_FL_PUNCH_HOLE is defined, then FALLOC_FL_KEEP_SIZE is
> > guaranteed to be defined, so that condition check is somewhat
> > redundant. See commit 79124f18b335 ("fs: add hole punching to
> > fallocate")....
> 
> Style aside, is that 2nd zeroing method something we want to try?

Oh, I misunderstood what you were asking.  I don't think we want to
implement that.

e.g. If it's a pre-allocated image file or block device on
sparse-capable storage being mkfs'd, then the user really doesn't
want it punched out, do they?

And it's for the journal: any ENOSPC error from the block device in
the journal is immediately fatal to the filesystem. Hence I think
we should not punch out the journal blocks that already are allocated
in the storage.

And, really, the space may have already been punched out via
attempting to trim the device (i.e. "-K" mkfs option wasn't set), in
which case punching here is redundant.

Also, libxfs_device_zero() is called by xfs_repair: do we want
xfs_repair to be punching out blocks for the realtime bitmap and
summary storage? I don't think we want that to happen, either.

Fast zeroing, yes. Punching out allocated storage, no.

> >> +#define FALLOC_FL_COLLAPSE_RANGE 0x08
> >> +#endif
> >> +
> >> +#ifndef FALLOC_FL_ZERO_RANGE
> >> +#define FALLOC_FL_ZERO_RANGE 0x10
> >> +#endif
> >> +
> >> +#ifndef FALLOC_FL_INSERT_RANGE
> >> +#define FALLOC_FL_INSERT_RANGE 0x20
> >> +#endif
> >> +
> >> +#ifndef FALLOC_FL_UNSHARE_RANGE
> >> +#define FALLOC_FL_UNSHARE_RANGE 0x40
> >> +#endif
> > 
> > These were added to allow xfs_io to test these operations before
> > there was userspace support for them. I do not think we should
> > propagate them outside of xfs_io - if they are not supported by the
> > distro at build time, we shouldn't attempt to use them in mkfs.
> > 
> > i.e. xfs_io is test-enablement code for developers, mkfs.xfs is
> > production code for users, so different rules kinda exist for
> > them...
> 
> Fair enough.  people /could/ use newer xfsprogs on older kernels, but...
> those defines are getting pretty old by now in any case.

*nod*

> It's probably not terrible to just fail the build on a system that doesn't
> have falloc.h, by now?

That might be going a bit far. fallocate() in most cases is not
critical to the correct functioning of the production tools, so if
it's not present it shouldn't break anything.

> >> diff --git a/libxfs/Makefile b/libxfs/Makefile
> >> index fbcc963a..b4e8864b 100644
> >> --- a/libxfs/Makefile
> >> +++ b/libxfs/Makefile
> >> @@ -105,6 +105,10 @@ CFILES = cache.c \
> >>  #
> >>  #LCFLAGS +=
> >>  
> >> +ifeq ($(HAVE_FALLOCATE),yes)
> >> +LCFLAGS += -DHAVE_FALLOCATE
> >> +endif
> > 
> > HAVE_FALLOCATE comes from an autoconf test. I suspect that this
> > needs to be made more finegrained, testing for fallocate() features,
> > not just just whether the syscall exists. And the above should
> > probably be in include/builddefs.in so that it's available to all
> > of xfsprogs, not just libxfs code...
> 
> But that's testing the kernel on the build host, right?  What's the
> point of that?  Or am I misunderstanding?

Who builds their distro binaries on a userspace that doesn't contain
the headers and libaries the distro is going to ship with?

That's the whole point of "build-root" infrastructure, right? i.e.
build against what you are going to ship, not what is installed on
the build machine....

> >> +	fd = libxfs_device_to_fd(btp->dev);
> >> +	start_offset = LIBXFS_BBTOOFF64(start);
> >> +	end_offset = LIBXFS_BBTOOFF64(start + len) - start_offset;
> >> +
> >> +#if defined(HAVE_FALLOCATE)
> >> +	/* try to use special zeroing methods, fall back to writes if needed */
> >> +	len_bytes = LIBXFS_BBTOOFF64(len);
> >> +	ret = fallocate(fd, FALLOC_FL_ZERO_RANGE, start_offset, len_bytes);
> >> +	if (ret == 0)
> >> +		return 0;
> >> +#endif
> > 
> > I kinda dislike the "return if success" hidden inside the ifdef -
> > it's not a code pattern I'd expect to see. This is what I'd tend
> > to expect in include/linux.h:
> > 
> > #if defined(HAVE_FALLOCATE_ZERO_RANGE)
> > static inline int
> > platform_zero_range(
> > 	int		fd,
> > 	xfs_off_t	start_offset,
> > 	xfs_off_t	end_offset)
> > {
> > 	int		ret;
> > 
> > 	ret = fallocate(fd, FALLOC_FL_ZERO_RANGE, start_offset, len_bytes);
> > 	if (!ret)
> > 		return 0;
> > 	return -errno;
> > }
> > #else
> > #define platform_zero_range(fd, s, o)	(true)
> > #endif
> > 
> > and then the code in libxfs_device_zero() does:
> > 
> > 	error = platform_zero_range(fd, start_offset, len_bytes);
> > 	if (!error)
> > 		return 0;
> > 
> > without adding nasty #defines...
> 
> Yeah that's much better.  Tho I hate adding more platform_* stuff when really
> we're down to only one platform but maybe that's a project for another day.

There's more than one linux "platform" we have to support, and this
abstraction makes it easy to include new functionality in a clean
and autoconf detection friendly way... :)

Cheers,

Dave.
diff mbox series

Patch

diff --git a/include/linux.h b/include/linux.h
index 8f3c32b0..425badb5 100644
--- a/include/linux.h
+++ b/include/linux.h
@@ -113,6 +113,26 @@  static __inline__ void platform_uuid_copy(uuid_t *dst, uuid_t *src)
 	uuid_copy(*dst, *src);
 }
 
+#ifndef FALLOC_FL_PUNCH_HOLE
+#define FALLOC_FL_PUNCH_HOLE	0x02
+#endif
+
+#ifndef FALLOC_FL_COLLAPSE_RANGE
+#define FALLOC_FL_COLLAPSE_RANGE 0x08
+#endif
+
+#ifndef FALLOC_FL_ZERO_RANGE
+#define FALLOC_FL_ZERO_RANGE 0x10
+#endif
+
+#ifndef FALLOC_FL_INSERT_RANGE
+#define FALLOC_FL_INSERT_RANGE 0x20
+#endif
+
+#ifndef FALLOC_FL_UNSHARE_RANGE
+#define FALLOC_FL_UNSHARE_RANGE 0x40
+#endif
+
 #ifndef BLKDISCARD
 #define BLKDISCARD	_IO(0x12,119)
 #endif
diff --git a/io/prealloc.c b/io/prealloc.c
index 6d452354..0b4efc45 100644
--- a/io/prealloc.c
+++ b/io/prealloc.c
@@ -12,26 +12,6 @@ 
 #include "init.h"
 #include "io.h"
 
-#ifndef FALLOC_FL_PUNCH_HOLE
-#define FALLOC_FL_PUNCH_HOLE	0x02
-#endif
-
-#ifndef FALLOC_FL_COLLAPSE_RANGE
-#define FALLOC_FL_COLLAPSE_RANGE 0x08
-#endif
-
-#ifndef FALLOC_FL_ZERO_RANGE
-#define FALLOC_FL_ZERO_RANGE 0x10
-#endif
-
-#ifndef FALLOC_FL_INSERT_RANGE
-#define FALLOC_FL_INSERT_RANGE 0x20
-#endif
-
-#ifndef FALLOC_FL_UNSHARE_RANGE
-#define FALLOC_FL_UNSHARE_RANGE 0x40
-#endif
-
 static cmdinfo_t allocsp_cmd;
 static cmdinfo_t freesp_cmd;
 static cmdinfo_t resvsp_cmd;
diff --git a/libxfs/Makefile b/libxfs/Makefile
index fbcc963a..b4e8864b 100644
--- a/libxfs/Makefile
+++ b/libxfs/Makefile
@@ -105,6 +105,10 @@  CFILES = cache.c \
 #
 #LCFLAGS +=
 
+ifeq ($(HAVE_FALLOCATE),yes)
+LCFLAGS += -DHAVE_FALLOCATE
+endif
+
 FCFLAGS = -I.
 
 LTLIBS = $(LIBPTHREAD) $(LIBRT)
diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
index 0d9d7202..94f63bbf 100644
--- a/libxfs/rdwr.c
+++ b/libxfs/rdwr.c
@@ -4,6 +4,9 @@ 
  * All Rights Reserved.
  */
 
+#if defined(HAVE_FALLOCATE)
+#include <linux/falloc.h>
+#endif
 
 #include "libxfs_priv.h"
 #include "init.h"
@@ -60,9 +63,21 @@  int
 libxfs_device_zero(struct xfs_buftarg *btp, xfs_daddr_t start, uint len)
 {
 	xfs_off_t	start_offset, end_offset, offset;
-	ssize_t		zsize, bytes;
+	ssize_t		zsize, bytes, len_bytes;
 	char		*z;
-	int		fd;
+	int		ret, fd;
+
+	fd = libxfs_device_to_fd(btp->dev);
+	start_offset = LIBXFS_BBTOOFF64(start);
+	end_offset = LIBXFS_BBTOOFF64(start + len) - start_offset;
+
+#if defined(HAVE_FALLOCATE)
+	/* try to use special zeroing methods, fall back to writes if needed */
+	len_bytes = LIBXFS_BBTOOFF64(len);
+	ret = fallocate(fd, FALLOC_FL_ZERO_RANGE, start_offset, len_bytes);
+	if (ret == 0)
+		return 0;
+#endif
 
 	zsize = min(BDSTRAT_SIZE, BBTOB(len));
 	if ((z = memalign(libxfs_device_alignment(), zsize)) == NULL) {
@@ -73,9 +88,6 @@  libxfs_device_zero(struct xfs_buftarg *btp, xfs_daddr_t start, uint len)
 	}
 	memset(z, 0, zsize);
 
-	fd = libxfs_device_to_fd(btp->dev);
-	start_offset = LIBXFS_BBTOOFF64(start);
-
 	if ((lseek(fd, start_offset, SEEK_SET)) < 0) {
 		fprintf(stderr, _("%s: %s seek to offset %llu failed: %s\n"),
 			progname, __FUNCTION__,
@@ -83,7 +95,6 @@  libxfs_device_zero(struct xfs_buftarg *btp, xfs_daddr_t start, uint len)
 		exit(1);
 	}
 
-	end_offset = LIBXFS_BBTOOFF64(start + len) - start_offset;
 	for (offset = 0; offset < end_offset; ) {
 		bytes = min((ssize_t)(end_offset - offset), zsize);
 		if ((bytes = write(fd, z, bytes)) < 0) {