btrfs: do not evaluate the expression with !CONFIG_BTRFS_ASSERT
diff mbox series

Message ID 20200724164147.39925-1-josef@toxicpanda.com
State New
Headers show
Series
  • btrfs: do not evaluate the expression with !CONFIG_BTRFS_ASSERT
Related show

Commit Message

Josef Bacik July 24, 2020, 4:41 p.m. UTC
While investigating a performance issue I noticed that turning off
CONFIG_BTRFS_ASSERT had no effect in what I was seeing in perf,
specifically check_setget_bounds() was around 5% for this workload.
Upon investigation I realized that I made a mistake when I added
ASSERT(), I would still evaluate the expression, but simply ignore the
result.

This is useless, and has a marked impact on performance.  This
microbenchmark is the watered down version of an application that is
experiencing performance issues, and does renames and creates over and
over again.  Doing these operations 200k times without this patch takes
13 seconds on my machine.  With this patch it takes 7 seconds.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/ctree.h        | 2 +-
 fs/btrfs/struct-funcs.c | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

Comments

Darrick J. Wong July 24, 2020, 4:57 p.m. UTC | #1
On Fri, Jul 24, 2020 at 12:41:47PM -0400, Josef Bacik wrote:
> While investigating a performance issue I noticed that turning off
> CONFIG_BTRFS_ASSERT had no effect in what I was seeing in perf,
> specifically check_setget_bounds() was around 5% for this workload.
> Upon investigation I realized that I made a mistake when I added
> ASSERT(), I would still evaluate the expression, but simply ignore the
> result.
> 
> This is useless, and has a marked impact on performance.  This
> microbenchmark is the watered down version of an application that is
> experiencing performance issues, and does renames and creates over and
> over again.  Doing these operations 200k times without this patch takes
> 13 seconds on my machine.  With this patch it takes 7 seconds.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

lolz,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/btrfs/ctree.h        | 2 +-
>  fs/btrfs/struct-funcs.c | 2 ++
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 9c7e466f27a9..b0fe8cca7e86 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -3238,7 +3238,7 @@ static inline void assertfail(const char *expr, const char *file, int line)
>  
>  #else
>  static inline void assertfail(const char *expr, const char* file, int line) { }
> -#define ASSERT(expr)	(void)(expr)
> +#define ASSERT(expr)	((void)0)
>  #endif
>  
>  /*
> diff --git a/fs/btrfs/struct-funcs.c b/fs/btrfs/struct-funcs.c
> index 079b059818e9..f44dc1207792 100644
> --- a/fs/btrfs/struct-funcs.c
> +++ b/fs/btrfs/struct-funcs.c
> @@ -17,6 +17,7 @@ static inline void put_unaligned_le8(u8 val, void *p)
>         *(u8 *)p = val;
>  }
>  
> +#ifdef CONFIG_BTRFS_ASSERT
>  static bool check_setget_bounds(const struct extent_buffer *eb,
>  				const void *ptr, unsigned off, int size)
>  {
> @@ -37,6 +38,7 @@ static bool check_setget_bounds(const struct extent_buffer *eb,
>  
>  	return true;
>  }
> +#endif
>  
>  /*
>   * Macro templates that define helpers to read/write extent buffer data of a
> -- 
> 2.24.1
>
Johannes Thumshirn July 27, 2020, 8:32 a.m. UTC | #2
On 24/07/2020 18:41, Josef Bacik wrote:
> While investigating a performance issue I noticed that turning off
> CONFIG_BTRFS_ASSERT had no effect in what I was seeing in perf,
> specifically check_setget_bounds() was around 5% for this workload.
> Upon investigation I realized that I made a mistake when I added
> ASSERT(), I would still evaluate the expression, but simply ignore the
> result.
> 
> This is useless, and has a marked impact on performance.  This
> microbenchmark is the watered down version of an application that is
> experiencing performance issues, and does renames and creates over and
> over again.  Doing these operations 200k times without this patch takes
> 13 seconds on my machine.  With this patch it takes 7 seconds.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
David Sterba July 27, 2020, 4:55 p.m. UTC | #3
On Fri, Jul 24, 2020 at 12:41:47PM -0400, Josef Bacik wrote:
> While investigating a performance issue I noticed that turning off
> CONFIG_BTRFS_ASSERT had no effect in what I was seeing in perf,
> specifically check_setget_bounds() was around 5% for this workload.

Can you please share the perf profile and .config?  I find it hard to
believe that check_setget_bounds would be taking 5% overall. Also you
said that this was with integrity-checker compiled in so this kind of
invalidates any performance claims.

I've been watching perf top for various debugging and release builds for
some time and this one makes it to top 5 but never #1 or #2.

The function compiles to a few instructions and the hot path is
correctly predicted by compiler, so I'm really curious what's so special
about the workload that it needs to call it in 1/20th of overall time.

> Upon investigation I realized that I made a mistake when I added
> ASSERT(), I would still evaluate the expression, but simply ignore the
> result.

Vast majority of the assert expressions are simple expressions without
side effects, but compiler still generates the code. In most cases it's
a few no-op movs, so this leaves the impact on the function calls.

Making the assert a true no-op saves some asm code and gains some
performance, but I don't want to remove the check_setget_bounds calls as
it's another line of defence against random in-memory corruptions.

Given that it's called deep inside many functions, it would be
impractical to add checking of each call. Instead, we can set a bit and
do a delayed abort in case it's found. I have that as a prototype and
will post it later.

> This is useless, and has a marked impact on performance.  This
> microbenchmark is the watered down version of an application that is
> experiencing performance issues, and does renames and creates over and
> over again.  Doing these operations 200k times without this patch takes
> 13 seconds on my machine.  With this patch it takes 7 seconds.

Do you have that as a script?
Josef Bacik July 27, 2020, 5:27 p.m. UTC | #4
On 7/27/20 12:55 PM, David Sterba wrote:
> On Fri, Jul 24, 2020 at 12:41:47PM -0400, Josef Bacik wrote:
>> While investigating a performance issue I noticed that turning off
>> CONFIG_BTRFS_ASSERT had no effect in what I was seeing in perf,
>> specifically check_setget_bounds() was around 5% for this workload.
> 
> Can you please share the perf profile and .config?  I find it hard to
> believe that check_setget_bounds would be taking 5% overall. Also you
> said that this was with integrity-checker compiled in so this kind of
> invalidates any performance claims.
> 

How?  It was a straight A/B test, do you think I'm making this up?

> I've been watching perf top for various debugging and release builds for
> some time and this one makes it to top 5 but never #1 or #2.
> 
> The function compiles to a few instructions and the hot path is
> correctly predicted by compiler, so I'm really curious what's so special
> about the workload that it needs to call it in 1/20th of overall time.
> 
>> Upon investigation I realized that I made a mistake when I added
>> ASSERT(), I would still evaluate the expression, but simply ignore the
>> result.
> 
> Vast majority of the assert expressions are simple expressions without
> side effects, but compiler still generates the code. In most cases it's
> a few no-op movs, so this leaves the impact on the function calls.
> 
> Making the assert a true no-op saves some asm code and gains some
> performance, but I don't want to remove the check_setget_bounds calls as
> it's another line of defence against random in-memory corruptions.
> 
> Given that it's called deep inside many functions, it would be
> impractical to add checking of each call. Instead, we can set a bit and
> do a delayed abort in case it's found. I have that as a prototype and
> will post it later.

Then make it configurable, because with ECC memory the performance overhead 
isn't worth it.

> 
>> This is useless, and has a marked impact on performance.  This
>> microbenchmark is the watered down version of an application that is
>> experiencing performance issues, and does renames and creates over and
>> over again.  Doing these operations 200k times without this patch takes
>> 13 seconds on my machine.  With this patch it takes 7 seconds.
> 
> Do you have that as a script?
> 

Yeah, you can find it here.  It was written by somebody internally to illustrate 
an issue they're seeing with their application.

https://paste.centos.org/view/f01126bf

Thanks,

Josef
David Sterba July 30, 2020, 11:09 a.m. UTC | #5
On Mon, Jul 27, 2020 at 01:27:12PM -0400, Josef Bacik wrote:
> On 7/27/20 12:55 PM, David Sterba wrote:
> > On Fri, Jul 24, 2020 at 12:41:47PM -0400, Josef Bacik wrote:
> >> While investigating a performance issue I noticed that turning off
> >> CONFIG_BTRFS_ASSERT had no effect in what I was seeing in perf,
> >> specifically check_setget_bounds() was around 5% for this workload.
> > 
> > Can you please share the perf profile and .config?  I find it hard to
> > believe that check_setget_bounds would be taking 5% overall. Also you
> > said that this was with integrity-checker compiled in so this kind of
> > invalidates any performance claims.
> > 
> 
> How?  It was a straight A/B test, do you think I'm making this up?

No, I want to understand it and reproduce eventually to see if deleting
the check is the only option to get the percents back.

> > Vast majority of the assert expressions are simple expressions without
> > side effects, but compiler still generates the code. In most cases it's
> > a few no-op movs, so this leaves the impact on the function calls.
> > 
> > Making the assert a true no-op saves some asm code and gains some
> > performance, but I don't want to remove the check_setget_bounds calls as
> > it's another line of defence against random in-memory corruptions.
> > 
> > Given that it's called deep inside many functions, it would be
> > impractical to add checking of each call. Instead, we can set a bit and
> > do a delayed abort in case it's found. I have that as a prototype and
> > will post it later.
> 
> Then make it configurable, because with ECC memory the performance overhead 
> isn't worth it.

ECC would protect against bitflips but what I had in mind were memory
corruptins caused by code bugs (use after free, page reference count
bugs). I've analyzed enough crash dumps to realize that cheap early
checks can save a lot of trouble later.

Regarding the configurability, that would cover wider range of
performance/safety trade offs, like the pre-write/post-read checks and
others. That sounds like a specific need for a deployment that can
afford that.

> >> This is useless, and has a marked impact on performance.  This
> >> microbenchmark is the watered down version of an application that is
> >> experiencing performance issues, and does renames and creates over and
> >> over again.  Doing these operations 200k times without this patch takes
> >> 13 seconds on my machine.  With this patch it takes 7 seconds.
> > 
> > Do you have that as a script?
> 
> Yeah, you can find it here.  It was written by somebody internally to illustrate 
> an issue they're seeing with their application.
> 
> https://paste.centos.org/view/f01126bf

Thanks, that helped.

The tool is set to do 2M operations, I switched it to 200k. The backing
device was a 4G file in /dev/shm, mkfs -d single -m single. I did a few
rounds with the following results:

1. baseline, asserts on, setget check on

run time:		46s
run time with perf:	48s

2. asserts on, comment out setget check

run time:		44s
run time with perf:	47s

So this is confirms the 5% difference

3. asserts on, optimized seget check

run time:		44s
run time with perf:	47s

The optimizations are reducing the number of ifs to 1 and inlining the
hot path. Low-level stuff, gets the performance back. Patch below.

4. asserts off, no setget check

run time:		44s
run time with perf:	45s

This verifies that asserts other than the setget check have negligible
impact on performance and it's not harmful to keep them on.


Analysis where the performance is lost:

* check_setget_bounds is short function, but it's still a function call,
  changing the flow of instructions and given how many times it's
  called the overhead adds up

* there are two conditions, one to check if the range is
  completely outside (member_offset > eb->len) or partially inside
  (member_offset + size > eb->len)


The range checks were present in map_private_extent_buffer,that got
replaced by check_setget_bounds:

    if (start + min_len > eb->len) {
            WARN(1, KERN_ERR "btrfs bad mapping eb start %llu len %lu, wanted %lu %lu\n",
                   eb->start, eb->len, start, min_len);
            return -EINVAL;
    }

This is the interesting one, two conditions are just for reporting
purposes and this one is enough to detect the overflow. So first step is
to reduce that to

+        if (unlikely(member_offset + size > eb->len)) {
+                btrfs_warn(eb->fs_info,
+        "bad eb member %s: ptr 0x%lx start %llu member offset %lu size %d",
+                        (member_offset > eb->len ? "start" : "end"),
+                        (unsigned long)ptr, eb->start, member_offset, size);
+                return false;
+        }

Whether it's partial or complete overflow can be determined at the time
of message print. So we have the same as before, with just one
condition.

Removing the function call is straightforward, when the function is
static inline and separates the reporting to a normal function:

static inline void check_setget_bounds(const struct extent_buffer *eb,
                                const void *ptr, unsigned off, int size)
{
        const unsigned long member_offset = (unsigned long)ptr + off;

        if (unlikely(member_offset + size > eb->len))
               report_setget_bounds(eb, ptr, off, size);
}

This generates efficient assembly code and does not affect performance
noticeably. Further, report_setget_bounds could do some heavier
operations like saving the bogus values, setting a bit to filesystem for
a delayed transaction abort.

Patch
diff mbox series

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 9c7e466f27a9..b0fe8cca7e86 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3238,7 +3238,7 @@  static inline void assertfail(const char *expr, const char *file, int line)
 
 #else
 static inline void assertfail(const char *expr, const char* file, int line) { }
-#define ASSERT(expr)	(void)(expr)
+#define ASSERT(expr)	((void)0)
 #endif
 
 /*
diff --git a/fs/btrfs/struct-funcs.c b/fs/btrfs/struct-funcs.c
index 079b059818e9..f44dc1207792 100644
--- a/fs/btrfs/struct-funcs.c
+++ b/fs/btrfs/struct-funcs.c
@@ -17,6 +17,7 @@  static inline void put_unaligned_le8(u8 val, void *p)
        *(u8 *)p = val;
 }
 
+#ifdef CONFIG_BTRFS_ASSERT
 static bool check_setget_bounds(const struct extent_buffer *eb,
 				const void *ptr, unsigned off, int size)
 {
@@ -37,6 +38,7 @@  static bool check_setget_bounds(const struct extent_buffer *eb,
 
 	return true;
 }
+#endif
 
 /*
  * Macro templates that define helpers to read/write extent buffer data of a