diff mbox series

[3/6] xfs: xfs_itruncate_extents has no extent count limitation

Message ID 20210527045202.1155628-4-david@fromorbit.com (mailing list archive)
State New
Headers show
Series xfs: bunmapi needs updating for deferred freeing | expand

Commit Message

Dave Chinner May 27, 2021, 4:51 a.m. UTC
From: Dave Chinner <dchinner@redhat.com>

Ever since we moved to freeing of extents by deferred operations,
we've already freed extents via individual transactions. Hence the
only limitation of how many extents we can mark for freeing in a
single xfs_bunmapi() call bound only by how many deferrals we want
to queue.

That is xfs_bunmapi() doesn't actually do any AG based extent
freeing, so there's no actually transaction reservation used up by
calling bunmapi with a large count of extents to be freed. RT
extents have always been freed directly by bunmapi, but that doesn't
require modification of large number of blocks as there are no
btrees to split.

Some callers of xfs_bunmapi assume that the extent count being freed
is bound by geometry (e.g. directories) and these can ask bunmapi to
free up to 64 extents in a single call. These functions just work as
tehy stand, so there's no reason for truncate to have a limit of
just two extents per bunmapi call anymore.

Increase XFS_ITRUNC_MAX_EXTENTS to 64 to match the number of extents
that can be deferred in a single loop to match what the directory
code already uses.

For realtime inodes, where xfs_bunmapi() directly frees extents,
leave the limit at 2 extents per loop as this is all the space that
the transaction reservation will cover.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_inode.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

Comments

Chandan Babu R May 31, 2021, 12:55 p.m. UTC | #1
On 27 May 2021 at 10:21, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> Ever since we moved to freeing of extents by deferred operations,
> we've already freed extents via individual transactions. Hence the
> only limitation of how many extents we can mark for freeing in a
> single xfs_bunmapi() call bound only by how many deferrals we want
> to queue.
>
> That is xfs_bunmapi() doesn't actually do any AG based extent
> freeing, so there's no actually transaction reservation used up by
> calling bunmapi with a large count of extents to be freed. RT
> extents have always been freed directly by bunmapi, but that doesn't
> require modification of large number of blocks as there are no
> btrees to split.
>
> Some callers of xfs_bunmapi assume that the extent count being freed
> is bound by geometry (e.g. directories) and these can ask bunmapi to
> free up to 64 extents in a single call. These functions just work as
> tehy stand, so there's no reason for truncate to have a limit of
> just two extents per bunmapi call anymore.
>
> Increase XFS_ITRUNC_MAX_EXTENTS to 64 to match the number of extents
> that can be deferred in a single loop to match what the directory
> code already uses.
>
> For realtime inodes, where xfs_bunmapi() directly frees extents,
> leave the limit at 2 extents per loop as this is all the space that
> the transaction reservation will cover.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_inode.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 0369eb22c1bb..db220eaa34b8 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -40,9 +40,18 @@ kmem_zone_t *xfs_inode_zone;
>
>  /*
>   * Used in xfs_itruncate_extents().  This is the maximum number of extents
> - * freed from a file in a single transaction.
> + * we will unmap and defer for freeing in a single call to xfs_bunmapi().
> + * Realtime inodes directly free extents in xfs_bunmapi(), so are bound
> + * by transaction reservation size to 2 extents.
>   */
> -#define	XFS_ITRUNC_MAX_EXTENTS	2
> +static inline int
> +xfs_itrunc_max_extents(
> +	struct xfs_inode	*ip)
> +{
> +	if (XFS_IS_REALTIME_INODE(ip))
> +		return 2;
> +	return 64;
> +}
>
>  STATIC int xfs_iunlink(struct xfs_trans *, struct xfs_inode *);
>  STATIC int xfs_iunlink_remove(struct xfs_trans *, struct xfs_inode *);
> @@ -1402,7 +1411,7 @@ xfs_itruncate_extents_flags(
>  	while (unmap_len > 0) {
>  		ASSERT(tp->t_firstblock == NULLFSBLOCK);
>  		error = __xfs_bunmapi(tp, ip, first_unmap_block, &unmap_len,
> -				flags, XFS_ITRUNC_MAX_EXTENTS);
> +				flags, xfs_itrunc_max_extents(ip));
>  		if (error)
>  			goto out;

The list of free extent items at xfs_defer_pending->dfp_work could
now contain XFS_EFI_MAX_FAST_EXTENTS (i.e. 16) entries in the worst case.

For a single transaction, xfs_calc_itruncate_reservation() reserves space for
logging only 4 extents (i.e. 4 exts * 2 trees * (2 * max depth - 1) * block
size). But with the above change, a single transaction can now free upto 16
extents. Wouldn't this overflow the reserved log space?

--
chandan
Chandan Babu R May 31, 2021, 1:05 p.m. UTC | #2
On 31 May 2021 at 18:25, Chandan Babu R wrote:
> On 27 May 2021 at 10:21, Dave Chinner wrote:
>> From: Dave Chinner <dchinner@redhat.com>
>>
>> Ever since we moved to freeing of extents by deferred operations,
>> we've already freed extents via individual transactions. Hence the
>> only limitation of how many extents we can mark for freeing in a
>> single xfs_bunmapi() call bound only by how many deferrals we want
>> to queue.
>>
>> That is xfs_bunmapi() doesn't actually do any AG based extent
>> freeing, so there's no actually transaction reservation used up by
>> calling bunmapi with a large count of extents to be freed. RT
>> extents have always been freed directly by bunmapi, but that doesn't
>> require modification of large number of blocks as there are no
>> btrees to split.
>>
>> Some callers of xfs_bunmapi assume that the extent count being freed
>> is bound by geometry (e.g. directories) and these can ask bunmapi to
>> free up to 64 extents in a single call. These functions just work as
>> tehy stand, so there's no reason for truncate to have a limit of
>> just two extents per bunmapi call anymore.
>>
>> Increase XFS_ITRUNC_MAX_EXTENTS to 64 to match the number of extents
>> that can be deferred in a single loop to match what the directory
>> code already uses.
>>
>> For realtime inodes, where xfs_bunmapi() directly frees extents,
>> leave the limit at 2 extents per loop as this is all the space that
>> the transaction reservation will cover.
>>
>> Signed-off-by: Dave Chinner <dchinner@redhat.com>
>> ---
>>  fs/xfs/xfs_inode.c | 15 ++++++++++++---
>>  1 file changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
>> index 0369eb22c1bb..db220eaa34b8 100644
>> --- a/fs/xfs/xfs_inode.c
>> +++ b/fs/xfs/xfs_inode.c
>> @@ -40,9 +40,18 @@ kmem_zone_t *xfs_inode_zone;
>>
>>  /*
>>   * Used in xfs_itruncate_extents().  This is the maximum number of extents
>> - * freed from a file in a single transaction.
>> + * we will unmap and defer for freeing in a single call to xfs_bunmapi().
>> + * Realtime inodes directly free extents in xfs_bunmapi(), so are bound
>> + * by transaction reservation size to 2 extents.
>>   */
>> -#define	XFS_ITRUNC_MAX_EXTENTS	2
>> +static inline int
>> +xfs_itrunc_max_extents(
>> +	struct xfs_inode	*ip)
>> +{
>> +	if (XFS_IS_REALTIME_INODE(ip))
>> +		return 2;
>> +	return 64;
>> +}
>>
>>  STATIC int xfs_iunlink(struct xfs_trans *, struct xfs_inode *);
>>  STATIC int xfs_iunlink_remove(struct xfs_trans *, struct xfs_inode *);
>> @@ -1402,7 +1411,7 @@ xfs_itruncate_extents_flags(
>>  	while (unmap_len > 0) {
>>  		ASSERT(tp->t_firstblock == NULLFSBLOCK);
>>  		error = __xfs_bunmapi(tp, ip, first_unmap_block, &unmap_len,
>> -				flags, XFS_ITRUNC_MAX_EXTENTS);
>> +				flags, xfs_itrunc_max_extents(ip));
>>  		if (error)
>>  			goto out;
>
> The list of free extent items at xfs_defer_pending->dfp_work could
> now contain XFS_EFI_MAX_FAST_EXTENTS (i.e. 16) entries in the worst case.
>
> For a single transaction, xfs_calc_itruncate_reservation() reserves space for
> logging only 4 extents (i.e. 4 exts * 2 trees * (2 * max depth - 1) * block
> size).

... Sorry, I meant to say "xfs_calc_itruncate_reservation() reserves log space
required for freeing 4 extents ..."

> But with the above change, a single transaction can now free upto 16
> extents. Wouldn't this overflow the reserved log space?
Dave Chinner May 31, 2021, 11:28 p.m. UTC | #3
On Mon, May 31, 2021 at 06:35:40PM +0530, Chandan Babu R wrote:
> On 31 May 2021 at 18:25, Chandan Babu R wrote:
> > On 27 May 2021 at 10:21, Dave Chinner wrote:
> >> From: Dave Chinner <dchinner@redhat.com>
> >>
> >> Ever since we moved to freeing of extents by deferred operations,
> >> we've already freed extents via individual transactions. Hence the
> >> only limitation of how many extents we can mark for freeing in a
> >> single xfs_bunmapi() call bound only by how many deferrals we want
> >> to queue.
> >>
> >> That is xfs_bunmapi() doesn't actually do any AG based extent
> >> freeing, so there's no actually transaction reservation used up by
> >> calling bunmapi with a large count of extents to be freed. RT
> >> extents have always been freed directly by bunmapi, but that doesn't
> >> require modification of large number of blocks as there are no
> >> btrees to split.
> >>
> >> Some callers of xfs_bunmapi assume that the extent count being freed
> >> is bound by geometry (e.g. directories) and these can ask bunmapi to
> >> free up to 64 extents in a single call. These functions just work as
> >> tehy stand, so there's no reason for truncate to have a limit of
> >> just two extents per bunmapi call anymore.
> >>
> >> Increase XFS_ITRUNC_MAX_EXTENTS to 64 to match the number of extents
> >> that can be deferred in a single loop to match what the directory
> >> code already uses.
> >>
> >> For realtime inodes, where xfs_bunmapi() directly frees extents,
> >> leave the limit at 2 extents per loop as this is all the space that
> >> the transaction reservation will cover.
> >>
> >> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> >> ---
> >>  fs/xfs/xfs_inode.c | 15 ++++++++++++---
> >>  1 file changed, 12 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> >> index 0369eb22c1bb..db220eaa34b8 100644
> >> --- a/fs/xfs/xfs_inode.c
> >> +++ b/fs/xfs/xfs_inode.c
> >> @@ -40,9 +40,18 @@ kmem_zone_t *xfs_inode_zone;
> >>
> >>  /*
> >>   * Used in xfs_itruncate_extents().  This is the maximum number of extents
> >> - * freed from a file in a single transaction.
> >> + * we will unmap and defer for freeing in a single call to xfs_bunmapi().
> >> + * Realtime inodes directly free extents in xfs_bunmapi(), so are bound
> >> + * by transaction reservation size to 2 extents.
> >>   */
> >> -#define	XFS_ITRUNC_MAX_EXTENTS	2
> >> +static inline int
> >> +xfs_itrunc_max_extents(
> >> +	struct xfs_inode	*ip)
> >> +{
> >> +	if (XFS_IS_REALTIME_INODE(ip))
> >> +		return 2;
> >> +	return 64;
> >> +}
> >>
> >>  STATIC int xfs_iunlink(struct xfs_trans *, struct xfs_inode *);
> >>  STATIC int xfs_iunlink_remove(struct xfs_trans *, struct xfs_inode *);
> >> @@ -1402,7 +1411,7 @@ xfs_itruncate_extents_flags(
> >>  	while (unmap_len > 0) {
> >>  		ASSERT(tp->t_firstblock == NULLFSBLOCK);
> >>  		error = __xfs_bunmapi(tp, ip, first_unmap_block, &unmap_len,
> >> -				flags, XFS_ITRUNC_MAX_EXTENTS);
> >> +				flags, xfs_itrunc_max_extents(ip));
> >>  		if (error)
> >>  			goto out;
> >
> > The list of free extent items at xfs_defer_pending->dfp_work could
> > now contain XFS_EFI_MAX_FAST_EXTENTS (i.e. 16) entries in the worst case.

Yes, but we do exactly this when freeing a large fragmented directly
block. That is, we ask xfs_bunmapi to unmap a 64kB range regardless
of how many extents map that range.

IOWs, the limitation in extent count placed in
xfs_itruncate_extents() doesn't actually address the underlying
problem - it's just a band-aid that has been placed over the easy to
trigger transaction overrun symptom that has always been present in
the underlying extent freeing code.

> > For a single transaction, xfs_calc_itruncate_reservation() reserves space for
> > logging only 4 extents (i.e. 4 exts * 2 trees * (2 * max depth - 1) * block
> > size).
> 
> ... Sorry, I meant to say "xfs_calc_itruncate_reservation() reserves log space
> required for freeing 4 extents ..."

My point exactly - the code has always had a mismatch between
reservations and what we can stuff into an EFI.  EFIs are unbound in
size, so having a fixed "4 extents per EFI" reservation limit has
never made any real sense given that unmaping a 64kB directory block
on a 1kb block size filesystem has a worst case of freeing 64
extents in a single transaction. As I said above, the limitations
placed on xfs_itruncate_extents is largely a hack because it doesn't
address other avenues to the same overruns...

This "4 extents per transaction" reservation makes even less sense
now that extents are freed by defer ops, not by the transaction that
unmaps them. We've completely decoupled extent freeing from the
higher level code that unmaps them, and so now the freeing
transactions are independent of the high level code that runs the
freeing operations.

IOWs, having a reservation big enough to free a single extent is all
we should need now as we can break the extent freeing up into
individual transactions and still have them complete atomically even
after a crash. That's where I'm trying to get to here, but it's
clear that I need to refine the code a bit further such that we only
allow individual EFIs to queue up and free multiple extents within
an AG to be freed in the same transaction....

FWIW, I suspect that the right thing to do here is make use of the
xfs_defer_finish_one() mechanism for relogging the remaining intents
in work list while it is processing that list. All we need is for
xfs_extent_free_finish_item() to return -EAGAIN instead of running
the free transaction, and we will log the completed EFDs and the
remaining EFIs to be run and roll the transaction.

Hence we can control exactly when we roll the extent freeing
transaction (e.g. when the next extent to free is in a different AG
to the one we've just been freeing extents in) and as a result we
can define a fixed transaction reservation for extent freeing that
works in every situation without having to care about arbitrary
"how many extents can we unmap without overrun" concerns.

That's the goal here - fixed, small reservation for extent freeing
that is decoupled from and independent of the number of extents that
need to be freed atomically by the high level operation...

Hence I think that a minor amount of rework will allow the EFI code
to log large numbers of extents to be freed whilst still processing
them within single AG free space tree modification reservation
bounds...

Cheers,

Dave.
Chandan Babu R June 1, 2021, 6:42 a.m. UTC | #4
On 01 Jun 2021 at 04:58, Dave Chinner wrote:
> On Mon, May 31, 2021 at 06:35:40PM +0530, Chandan Babu R wrote:
>> On 31 May 2021 at 18:25, Chandan Babu R wrote:
>> > On 27 May 2021 at 10:21, Dave Chinner wrote:
>> >> From: Dave Chinner <dchinner@redhat.com>
>> >>
>> >> Ever since we moved to freeing of extents by deferred operations,
>> >> we've already freed extents via individual transactions. Hence the
>> >> only limitation of how many extents we can mark for freeing in a
>> >> single xfs_bunmapi() call bound only by how many deferrals we want
>> >> to queue.
>> >>
>> >> That is xfs_bunmapi() doesn't actually do any AG based extent
>> >> freeing, so there's no actually transaction reservation used up by
>> >> calling bunmapi with a large count of extents to be freed. RT
>> >> extents have always been freed directly by bunmapi, but that doesn't
>> >> require modification of large number of blocks as there are no
>> >> btrees to split.
>> >>
>> >> Some callers of xfs_bunmapi assume that the extent count being freed
>> >> is bound by geometry (e.g. directories) and these can ask bunmapi to
>> >> free up to 64 extents in a single call. These functions just work as
>> >> tehy stand, so there's no reason for truncate to have a limit of
>> >> just two extents per bunmapi call anymore.
>> >>
>> >> Increase XFS_ITRUNC_MAX_EXTENTS to 64 to match the number of extents
>> >> that can be deferred in a single loop to match what the directory
>> >> code already uses.
>> >>
>> >> For realtime inodes, where xfs_bunmapi() directly frees extents,
>> >> leave the limit at 2 extents per loop as this is all the space that
>> >> the transaction reservation will cover.
>> >>
>> >> Signed-off-by: Dave Chinner <dchinner@redhat.com>
>> >> ---
>> >>  fs/xfs/xfs_inode.c | 15 ++++++++++++---
>> >>  1 file changed, 12 insertions(+), 3 deletions(-)
>> >>
>> >> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
>> >> index 0369eb22c1bb..db220eaa34b8 100644
>> >> --- a/fs/xfs/xfs_inode.c
>> >> +++ b/fs/xfs/xfs_inode.c
>> >> @@ -40,9 +40,18 @@ kmem_zone_t *xfs_inode_zone;
>> >>
>> >>  /*
>> >>   * Used in xfs_itruncate_extents().  This is the maximum number of extents
>> >> - * freed from a file in a single transaction.
>> >> + * we will unmap and defer for freeing in a single call to xfs_bunmapi().
>> >> + * Realtime inodes directly free extents in xfs_bunmapi(), so are bound
>> >> + * by transaction reservation size to 2 extents.
>> >>   */
>> >> -#define	XFS_ITRUNC_MAX_EXTENTS	2
>> >> +static inline int
>> >> +xfs_itrunc_max_extents(
>> >> +	struct xfs_inode	*ip)
>> >> +{
>> >> +	if (XFS_IS_REALTIME_INODE(ip))
>> >> +		return 2;
>> >> +	return 64;
>> >> +}
>> >>
>> >>  STATIC int xfs_iunlink(struct xfs_trans *, struct xfs_inode *);
>> >>  STATIC int xfs_iunlink_remove(struct xfs_trans *, struct xfs_inode *);
>> >> @@ -1402,7 +1411,7 @@ xfs_itruncate_extents_flags(
>> >>  	while (unmap_len > 0) {
>> >>  		ASSERT(tp->t_firstblock == NULLFSBLOCK);
>> >>  		error = __xfs_bunmapi(tp, ip, first_unmap_block, &unmap_len,
>> >> -				flags, XFS_ITRUNC_MAX_EXTENTS);
>> >> +				flags, xfs_itrunc_max_extents(ip));
>> >>  		if (error)
>> >>  			goto out;
>> >
>> > The list of free extent items at xfs_defer_pending->dfp_work could
>> > now contain XFS_EFI_MAX_FAST_EXTENTS (i.e. 16) entries in the worst case.
>
> Yes, but we do exactly this when freeing a large fragmented directly
> block. That is, we ask xfs_bunmapi to unmap a 64kB range regardless
> of how many extents map that range.
>
> IOWs, the limitation in extent count placed in
> xfs_itruncate_extents() doesn't actually address the underlying
> problem - it's just a band-aid that has been placed over the easy to
> trigger transaction overrun symptom that has always been present in
> the underlying extent freeing code.
>
>> > For a single transaction, xfs_calc_itruncate_reservation() reserves space for
>> > logging only 4 extents (i.e. 4 exts * 2 trees * (2 * max depth - 1) * block
>> > size).
>>
>> ... Sorry, I meant to say "xfs_calc_itruncate_reservation() reserves log space
>> required for freeing 4 extents ..."
>
> My point exactly - the code has always had a mismatch between
> reservations and what we can stuff into an EFI.  EFIs are unbound in
> size, so having a fixed "4 extents per EFI" reservation limit has
> never made any real sense given that unmaping a 64kB directory block
> on a 1kb block size filesystem has a worst case of freeing 64
> extents in a single transaction. As I said above, the limitations
> placed on xfs_itruncate_extents is largely a hack because it doesn't
> address other avenues to the same overruns...
>

Ok. Thanks for the explanation.

> This "4 extents per transaction" reservation makes even less sense
> now that extents are freed by defer ops, not by the transaction that
> unmaps them. We've completely decoupled extent freeing from the
> higher level code that unmaps them, and so now the freeing
> transactions are independent of the high level code that runs the
> freeing operations.
>
> IOWs, having a reservation big enough to free a single extent is all
> we should need now as we can break the extent freeing up into
> individual transactions and still have them complete atomically even
> after a crash. That's where I'm trying to get to here, but it's
> clear that I need to refine the code a bit further such that we only
> allow individual EFIs to queue up and free multiple extents within
> an AG to be freed in the same transaction....

Ok. That should limit the number of CNTBT/BNOBT blocks being logged by a
transaction which is freeing extents.

>
> FWIW, I suspect that the right thing to do here is make use of the
> xfs_defer_finish_one() mechanism for relogging the remaining intents
> in work list while it is processing that list. All we need is for
> xfs_extent_free_finish_item() to return -EAGAIN instead of running
> the free transaction, and we will log the completed EFDs and the
> remaining EFIs to be run and roll the transaction.
>
> Hence we can control exactly when we roll the extent freeing
> transaction (e.g. when the next extent to free is in a different AG
> to the one we've just been freeing extents in) and as a result we
> can define a fixed transaction reservation for extent freeing that
> works in every situation without having to care about arbitrary
> "how many extents can we unmap without overrun" concerns.
>

That sounds perfect.

> That's the goal here - fixed, small reservation for extent freeing
> that is decoupled from and independent of the number of extents that
> need to be freed atomically by the high level operation...
>
> Hence I think that a minor amount of rework will allow the EFI code
> to log large numbers of extents to be freed whilst still processing
> them within single AG free space tree modification reservation
> bounds...


--
chandan
diff mbox series

Patch

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 0369eb22c1bb..db220eaa34b8 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -40,9 +40,18 @@  kmem_zone_t *xfs_inode_zone;
 
 /*
  * Used in xfs_itruncate_extents().  This is the maximum number of extents
- * freed from a file in a single transaction.
+ * we will unmap and defer for freeing in a single call to xfs_bunmapi().
+ * Realtime inodes directly free extents in xfs_bunmapi(), so are bound
+ * by transaction reservation size to 2 extents.
  */
-#define	XFS_ITRUNC_MAX_EXTENTS	2
+static inline int
+xfs_itrunc_max_extents(
+	struct xfs_inode	*ip)
+{
+	if (XFS_IS_REALTIME_INODE(ip))
+		return 2;
+	return 64;
+}
 
 STATIC int xfs_iunlink(struct xfs_trans *, struct xfs_inode *);
 STATIC int xfs_iunlink_remove(struct xfs_trans *, struct xfs_inode *);
@@ -1402,7 +1411,7 @@  xfs_itruncate_extents_flags(
 	while (unmap_len > 0) {
 		ASSERT(tp->t_firstblock == NULLFSBLOCK);
 		error = __xfs_bunmapi(tp, ip, first_unmap_block, &unmap_len,
-				flags, XFS_ITRUNC_MAX_EXTENTS);
+				flags, xfs_itrunc_max_extents(ip));
 		if (error)
 			goto out;