diff mbox series

[5/5] xfs: reduce buffer log item shadow allocations

Message ID 20210128044154.806715-6-david@fromorbit.com (mailing list archive)
State Superseded
Headers show
Series xfs: various log stuff... | expand

Commit Message

Dave Chinner Jan. 28, 2021, 4:41 a.m. UTC
From: Dave Chinner <dchinner@redhat.com>

When we modify btrees repeatedly, we regularly increase the size of
the logged region by a single chunk at a time (per transaction
commit). This results in the CIL formatting code having to
reallocate the log vector buffer every time the buffer dirty region
grows. Hence over a typical 4kB btree buffer, we might grow the log
vector 4096/128 = 32x over a short period where we repeatedly add
or remove records to/from the buffer over a series of running
transaction. This means we are doing 32 memory allocations and frees
over this time during a performance critical path in the journal.

The amount of space tracked in the CIL for the object is calculated
during the ->iop_format() call for the buffer log item, but the
buffer memory allocated for it is calculated by the ->iop_size()
call. The size callout determines the size of the buffer, the format
call determines the space used in the buffer.

Hence we can oversize the buffer space required in the size
calculation without impacting the amount of space used and accounted
to the CIL for the changes being logged. This allows us to reduce
the number of allocations by rounding up the buffer size to allow
for future growth. This can safe a substantial amount of CPU time in
this path:

-   46.52%     2.02%  [kernel]                  [k] xfs_log_commit_cil
   - 44.49% xfs_log_commit_cil
      - 30.78% _raw_spin_lock
         - 30.75% do_raw_spin_lock
              30.27% __pv_queued_spin_lock_slowpath

(oh, ouch!)
....
      - 1.05% kmem_alloc_large
         - 1.02% kmem_alloc
              0.94% __kmalloc

This overhead here us what this patch is aimed at. After:

      - 0.76% kmem_alloc_large                                                                                                                                      ▒
         - 0.75% kmem_alloc                                                                                                                                         ▒
              0.70% __kmalloc                                                                                                                                       ▒


Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_buf_item.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

Comments

Brian Foster Jan. 28, 2021, 4:54 p.m. UTC | #1
On Thu, Jan 28, 2021 at 03:41:54PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When we modify btrees repeatedly, we regularly increase the size of
> the logged region by a single chunk at a time (per transaction
> commit). This results in the CIL formatting code having to
> reallocate the log vector buffer every time the buffer dirty region
> grows. Hence over a typical 4kB btree buffer, we might grow the log
> vector 4096/128 = 32x over a short period where we repeatedly add
> or remove records to/from the buffer over a series of running
> transaction. This means we are doing 32 memory allocations and frees
> over this time during a performance critical path in the journal.
> 
> The amount of space tracked in the CIL for the object is calculated
> during the ->iop_format() call for the buffer log item, but the
> buffer memory allocated for it is calculated by the ->iop_size()
> call. The size callout determines the size of the buffer, the format
> call determines the space used in the buffer.
> 
> Hence we can oversize the buffer space required in the size
> calculation without impacting the amount of space used and accounted
> to the CIL for the changes being logged. This allows us to reduce
> the number of allocations by rounding up the buffer size to allow
> for future growth. This can safe a substantial amount of CPU time in
> this path:
> 
> -   46.52%     2.02%  [kernel]                  [k] xfs_log_commit_cil
>    - 44.49% xfs_log_commit_cil
>       - 30.78% _raw_spin_lock
>          - 30.75% do_raw_spin_lock
>               30.27% __pv_queued_spin_lock_slowpath
> 
> (oh, ouch!)
> ....
>       - 1.05% kmem_alloc_large
>          - 1.02% kmem_alloc
>               0.94% __kmalloc
> 
> This overhead here us what this patch is aimed at. After:
> 
>       - 0.76% kmem_alloc_large                                                                                                                                      ▒
>          - 0.75% kmem_alloc                                                                                                                                         ▒
>               0.70% __kmalloc                                                                                                                                       ▒
> 
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_buf_item.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
> index 17960b1ce5ef..0628a65d9c55 100644
> --- a/fs/xfs/xfs_buf_item.c
> +++ b/fs/xfs/xfs_buf_item.c
...
> @@ -181,10 +182,18 @@ xfs_buf_item_size(
>  	 * count for the extra buf log format structure that will need to be
>  	 * written.
>  	 */
> +	bytes = 0;
>  	for (i = 0; i < bip->bli_format_count; i++) {
>  		xfs_buf_item_size_segment(bip, &bip->bli_formats[i],
> -					  nvecs, nbytes);
> +					  nvecs, &bytes);
>  	}
> +
> +	/*
> +	 * Round up the buffer size required to minimise the number of memory
> +	 * allocations that need to be done as this item grows when relogged by
> +	 * repeated modifications.
> +	 */
> +	*nbytes = round_up(bytes, 512);

If nbytes starts out as zero anyways, what's the need for the new
variable? Otherwise looks reasonable.

Brian

>  	trace_xfs_buf_item_size(bip);
>  }
>  
> -- 
> 2.28.0
>
Dave Chinner Jan. 28, 2021, 9:58 p.m. UTC | #2
On Thu, Jan 28, 2021 at 11:54:35AM -0500, Brian Foster wrote:
> On Thu, Jan 28, 2021 at 03:41:54PM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > When we modify btrees repeatedly, we regularly increase the size of
> > the logged region by a single chunk at a time (per transaction
> > commit). This results in the CIL formatting code having to
> > reallocate the log vector buffer every time the buffer dirty region
> > grows. Hence over a typical 4kB btree buffer, we might grow the log
> > vector 4096/128 = 32x over a short period where we repeatedly add
> > or remove records to/from the buffer over a series of running
> > transaction. This means we are doing 32 memory allocations and frees
> > over this time during a performance critical path in the journal.
> > 
> > The amount of space tracked in the CIL for the object is calculated
> > during the ->iop_format() call for the buffer log item, but the
> > buffer memory allocated for it is calculated by the ->iop_size()
> > call. The size callout determines the size of the buffer, the format
> > call determines the space used in the buffer.
> > 
> > Hence we can oversize the buffer space required in the size
> > calculation without impacting the amount of space used and accounted
> > to the CIL for the changes being logged. This allows us to reduce
> > the number of allocations by rounding up the buffer size to allow
> > for future growth. This can safe a substantial amount of CPU time in
> > this path:
> > 
> > -   46.52%     2.02%  [kernel]                  [k] xfs_log_commit_cil
> >    - 44.49% xfs_log_commit_cil
> >       - 30.78% _raw_spin_lock
> >          - 30.75% do_raw_spin_lock
> >               30.27% __pv_queued_spin_lock_slowpath
> > 
> > (oh, ouch!)
> > ....
> >       - 1.05% kmem_alloc_large
> >          - 1.02% kmem_alloc
> >               0.94% __kmalloc
> > 
> > This overhead here us what this patch is aimed at. After:
> > 
> >       - 0.76% kmem_alloc_large                                                                                                                                      ▒
> >          - 0.75% kmem_alloc                                                                                                                                         ▒
> >               0.70% __kmalloc                                                                                                                                       ▒
> > 
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/xfs_buf_item.c | 13 +++++++++++--
> >  1 file changed, 11 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
> > index 17960b1ce5ef..0628a65d9c55 100644
> > --- a/fs/xfs/xfs_buf_item.c
> > +++ b/fs/xfs/xfs_buf_item.c
> ...
> > @@ -181,10 +182,18 @@ xfs_buf_item_size(
> >  	 * count for the extra buf log format structure that will need to be
> >  	 * written.
> >  	 */
> > +	bytes = 0;
> >  	for (i = 0; i < bip->bli_format_count; i++) {
> >  		xfs_buf_item_size_segment(bip, &bip->bli_formats[i],
> > -					  nvecs, nbytes);
> > +					  nvecs, &bytes);
> >  	}
> > +
> > +	/*
> > +	 * Round up the buffer size required to minimise the number of memory
> > +	 * allocations that need to be done as this item grows when relogged by
> > +	 * repeated modifications.
> > +	 */
> > +	*nbytes = round_up(bytes, 512);
> 
> If nbytes starts out as zero anyways, what's the need for the new
> variable? Otherwise looks reasonable.

Personal preference. It makes the code clear that we are returning
just the size of this item, not blindly adding it to an external
variable.

Just another example of an API wart that is a hold over from ancient
code from before the days of delayed logging. ->iop_size is always
called with nvecs = nbytes = 0, so they only ever return the
size/vecs the item will use. The ancient code passed running count
variables to these functions, not "always initialised to zero".
varaibles. We should really clean that up across the entire
interface at some point...

Cheers,

Dave.
Chandan Babu R Feb. 2, 2021, 12:01 p.m. UTC | #3
On 28 Jan 2021 at 10:11, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> When we modify btrees repeatedly, we regularly increase the size of
> the logged region by a single chunk at a time (per transaction
> commit). This results in the CIL formatting code having to
> reallocate the log vector buffer every time the buffer dirty region
> grows. Hence over a typical 4kB btree buffer, we might grow the log
> vector 4096/128 = 32x over a short period where we repeatedly add
> or remove records to/from the buffer over a series of running
> transaction. This means we are doing 32 memory allocations and frees
> over this time during a performance critical path in the journal.
>
> The amount of space tracked in the CIL for the object is calculated
> during the ->iop_format() call for the buffer log item, but the
> buffer memory allocated for it is calculated by the ->iop_size()
> call. The size callout determines the size of the buffer, the format
> call determines the space used in the buffer.
>
> Hence we can oversize the buffer space required in the size
> calculation without impacting the amount of space used and accounted
> to the CIL for the changes being logged. This allows us to reduce
> the number of allocations by rounding up the buffer size to allow
> for future growth. This can safe a substantial amount of CPU time in
> this path:
>
> -   46.52%     2.02%  [kernel]                  [k] xfs_log_commit_cil
>    - 44.49% xfs_log_commit_cil
>       - 30.78% _raw_spin_lock
>          - 30.75% do_raw_spin_lock
>               30.27% __pv_queued_spin_lock_slowpath
>
> (oh, ouch!)
> ....
>       - 1.05% kmem_alloc_large
>          - 1.02% kmem_alloc
>               0.94% __kmalloc
>
> This overhead here us what this patch is aimed at. After:
>
>       - 0.76% kmem_alloc_large                                                                                                                                      ▒
>          - 0.75% kmem_alloc                                                                                                                                         ▒
>               0.70% __kmalloc                                                                                                                                       ▒

Apart from the trailing whitespace above,

The changes look good to me.

Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>

> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_buf_item.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
> index 17960b1ce5ef..0628a65d9c55 100644
> --- a/fs/xfs/xfs_buf_item.c
> +++ b/fs/xfs/xfs_buf_item.c
> @@ -142,6 +142,7 @@ xfs_buf_item_size(
>  {
>  	struct xfs_buf_log_item	*bip = BUF_ITEM(lip);
>  	int			i;
> +	int			bytes;
>  
>  	ASSERT(atomic_read(&bip->bli_refcount) > 0);
>  	if (bip->bli_flags & XFS_BLI_STALE) {
> @@ -173,7 +174,7 @@ xfs_buf_item_size(
>  	}
>  
>  	/*
> -	 * the vector count is based on the number of buffer vectors we have
> +	 * The vector count is based on the number of buffer vectors we have
>  	 * dirty bits in. This will only be greater than one when we have a
>  	 * compound buffer with more than one segment dirty. Hence for compound
>  	 * buffers we need to track which segment the dirty bits correspond to,
> @@ -181,10 +182,18 @@ xfs_buf_item_size(
>  	 * count for the extra buf log format structure that will need to be
>  	 * written.
>  	 */
> +	bytes = 0;
>  	for (i = 0; i < bip->bli_format_count; i++) {
>  		xfs_buf_item_size_segment(bip, &bip->bli_formats[i],
> -					  nvecs, nbytes);
> +					  nvecs, &bytes);
>  	}
> +
> +	/*
> +	 * Round up the buffer size required to minimise the number of memory
> +	 * allocations that need to be done as this item grows when relogged by
> +	 * repeated modifications.
> +	 */
> +	*nbytes = round_up(bytes, 512);
>  	trace_xfs_buf_item_size(bip);
>  }
diff mbox series

Patch

diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index 17960b1ce5ef..0628a65d9c55 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -142,6 +142,7 @@  xfs_buf_item_size(
 {
 	struct xfs_buf_log_item	*bip = BUF_ITEM(lip);
 	int			i;
+	int			bytes;
 
 	ASSERT(atomic_read(&bip->bli_refcount) > 0);
 	if (bip->bli_flags & XFS_BLI_STALE) {
@@ -173,7 +174,7 @@  xfs_buf_item_size(
 	}
 
 	/*
-	 * the vector count is based on the number of buffer vectors we have
+	 * The vector count is based on the number of buffer vectors we have
 	 * dirty bits in. This will only be greater than one when we have a
 	 * compound buffer with more than one segment dirty. Hence for compound
 	 * buffers we need to track which segment the dirty bits correspond to,
@@ -181,10 +182,18 @@  xfs_buf_item_size(
 	 * count for the extra buf log format structure that will need to be
 	 * written.
 	 */
+	bytes = 0;
 	for (i = 0; i < bip->bli_format_count; i++) {
 		xfs_buf_item_size_segment(bip, &bip->bli_formats[i],
-					  nvecs, nbytes);
+					  nvecs, &bytes);
 	}
+
+	/*
+	 * Round up the buffer size required to minimise the number of memory
+	 * allocations that need to be done as this item grows when relogged by
+	 * repeated modifications.
+	 */
+	*nbytes = round_up(bytes, 512);
 	trace_xfs_buf_item_size(bip);
 }