diff mbox series

btrfs-progs: align btrfs receive buffer to enable fast CRC

Message ID 20201226214606.49241-1-shngmao@gmail.com (mailing list archive)
State New, archived
Headers show
Series btrfs-progs: align btrfs receive buffer to enable fast CRC | expand

Commit Message

Sheng Mao Dec. 26, 2020, 9:46 p.m. UTC
From: Sheng Mao <shngmao@gmail.com>

To use optimized CRC implemention, the input buffer must be
unsigned long aligned. btrfs receive calculates checksum based on
read_buf, including btrfs_cmd_header (with zero-ed CRC field)
and command content. GCC attribute is added to both struct
btrfs_send_stream and read_buf to make sure read_buf is allocated
with proper alignment.

Issue: #324
Signed-off-by: Sheng Mao <shngmao@gmail.com>
---
 common/send-stream.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Wang Yugui Jan. 3, 2021, 2:53 a.m. UTC | #1
Hi, Sheng

> From: Sheng Mao <shngmao@gmail.com>
> 
> To use optimized CRC implemention, the input buffer must be
> unsigned long aligned. btrfs receive calculates checksum based on
> read_buf, including btrfs_cmd_header (with zero-ed CRC field)
> and command content. GCC attribute is added to both struct
> btrfs_send_stream and read_buf to make sure read_buf is allocated
> with proper alignment.
> 
> Issue: #324
> Signed-off-by: Sheng Mao <shngmao@gmail.com>
> ---
>  common/send-stream.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/common/send-stream.c b/common/send-stream.c
> index 69d75168..b13aa16e 100644
> --- a/common/send-stream.c
> +++ b/common/send-stream.c
> @@ -26,7 +26,8 @@
>  
>  struct btrfs_send_stream {
>  	int fd;
> -	char read_buf[BTRFS_SEND_BUF_SIZE];
> +	char read_buf[BTRFS_SEND_BUF_SIZE]
> +		__attribute__((aligned(sizeof(unsigned long))));
>  
>  	int cmd;

Can we move 'int cmd' to before 'char read_buf' too?

There is a hole between 'int fd' and 'char read_buf' after the
addiatioin of '__attribute__((aligned(sizeof(unsigned long))))'
in x86_64.

Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2021/01/03

>  	struct btrfs_cmd_header *cmd_hdr;
> @@ -41,7 +42,7 @@ struct btrfs_send_stream {
>  
>  	struct btrfs_send_ops *ops;
>  	void *user;
> -};
> +} __attribute__((aligned(sizeof(unsigned long))));
>  
>  /*
>   * Read len bytes to buf.
> -- 
> 2.29.2
David Sterba Jan. 6, 2021, 12:16 p.m. UTC | #2
On Sat, Dec 26, 2020 at 02:46:06PM -0700, shngmao@gmail.com wrote:
> From: Sheng Mao <shngmao@gmail.com>
> 
> To use optimized CRC implemention, the input buffer must be
> unsigned long aligned. btrfs receive calculates checksum based on
> read_buf, including btrfs_cmd_header (with zero-ed CRC field)
> and command content. GCC attribute is added to both struct
> btrfs_send_stream and read_buf to make sure read_buf is allocated
> with proper alignment.
> 
> Issue: #324

The issue has a lot of interesting debugging and performance information
so I'd put something to the changelog as well.

The alignment fixup sounds correct, though I'd push it a bit further and
move read_buf to the beginning of the structure and align the whole
structure to 64 bytes, as this is the common cache line size. This could
potentially help too with memcpy.
Sheng Mao Jan. 8, 2021, 1:58 a.m. UTC | #3
Alignment to cache line is a great idea. I guess compiler can issue
faster memcpy version.

Thank you!

On Wed, Jan 06, 2021 at 01:16:35PM +0100, David Sterba wrote:
> 
> On Sat, Dec 26, 2020 at 02:46:06PM -0700, shngmao@gmail.com wrote:
> > From: Sheng Mao <shngmao@gmail.com>
> > 
> > To use optimized CRC implemention, the input buffer must be
> > unsigned long aligned. btrfs receive calculates checksum based on
> > read_buf, including btrfs_cmd_header (with zero-ed CRC field)
> > and command content. GCC attribute is added to both struct
> > btrfs_send_stream and read_buf to make sure read_buf is allocated
> > with proper alignment.
> > 
> > Issue: #324
> 
> The issue has a lot of interesting debugging and performance information
> so I'd put something to the changelog as well.
> 
> The alignment fixup sounds correct, though I'd push it a bit further and
> move read_buf to the beginning of the structure and align the whole
> structure to 64 bytes, as this is the common cache line size. This could
> potentially help too with memcpy.
David Sterba Jan. 14, 2021, 6:37 p.m. UTC | #4
On Sat, Dec 26, 2020 at 02:46:06PM -0700, shngmao@gmail.com wrote:
> From: Sheng Mao <shngmao@gmail.com>
> 
> To use optimized CRC implemention, the input buffer must be
> unsigned long aligned. btrfs receive calculates checksum based on
> read_buf, including btrfs_cmd_header (with zero-ed CRC field)
> and command content. GCC attribute is added to both struct
> btrfs_send_stream and read_buf to make sure read_buf is allocated
> with proper alignment.
> 
> Issue: #324
> Signed-off-by: Sheng Mao <shngmao@gmail.com>

Fix added to devel, with the mentined changes: read_buf is at the
begninning and whole structure 64 bytes aligned. Thanks.
diff mbox series

Patch

diff --git a/common/send-stream.c b/common/send-stream.c
index 69d75168..b13aa16e 100644
--- a/common/send-stream.c
+++ b/common/send-stream.c
@@ -26,7 +26,8 @@ 
 
 struct btrfs_send_stream {
 	int fd;
-	char read_buf[BTRFS_SEND_BUF_SIZE];
+	char read_buf[BTRFS_SEND_BUF_SIZE]
+		__attribute__((aligned(sizeof(unsigned long))));
 
 	int cmd;
 	struct btrfs_cmd_header *cmd_hdr;
@@ -41,7 +42,7 @@  struct btrfs_send_stream {
 
 	struct btrfs_send_ops *ops;
 	void *user;
-};
+} __attribute__((aligned(sizeof(unsigned long))));
 
 /*
  * Read len bytes to buf.