btrfs: reorder struct btrfs_key for better alignment
diff mbox series

Message ID 20190618141514.17322-1-dsterba@suse.com
State New
Headers show
Series
  • btrfs: reorder struct btrfs_key for better alignment
Related show

Commit Message

David Sterba June 18, 2019, 2:15 p.m. UTC
We don't use the plain key for any on-disk operations so there's no
requirement for the member order. As the offset is a u64 that should be
on an 8byte aligned address, this can generate ineffective code on
strict alignment architectures and can potentially hurt even on others
(cross-cacheline access).

The resulting asm code on x86_64 only differes in the offset, no significant
change in size of the object size.

The alignment of the structure is unchanged.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 include/uapi/linux/btrfs_tree.h | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

Comments

Nikolay Borisov June 18, 2019, 2:20 p.m. UTC | #1
On 18.06.19 г. 17:15 ч., David Sterba wrote:
> We don't use the plain key for any on-disk operations so there's no
> requirement for the member order. As the offset is a u64 that should be
> on an 8byte aligned address, this can generate ineffective code on
> strict alignment architectures and can potentially hurt even on others
> (cross-cacheline access).
> 
> The resulting asm code on x86_64 only differes in the offset, no significant
> change in size of the object size.
> 
> The alignment of the structure is unchanged.
> 
> Signed-off-by: David Sterba <dsterba@suse.com>


> ---
>  include/uapi/linux/btrfs_tree.h | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h
> index aff1356c2bb8..9ca7adcf3b7f 100644
> --- a/include/uapi/linux/btrfs_tree.h
> +++ b/include/uapi/linux/btrfs_tree.h
> @@ -342,10 +342,17 @@ struct btrfs_disk_key {
>  	__le64 offset;
>  } __attribute__ ((__packed__));
>  
> +/*
> + * NOTE: this structure does not match the on-disk format of key and must be
> + * converted with the right helpers. The btrfs_key is for in-memory use and the
> + * members are reordered for better alignment. It's still packed as it's never
> + * used in arrays and the extra alignment would consume stack space in
> + * functions.
> + */

Since packing is only used as a "space optimisation" as far as I
understand it, I think the comment above btrfs_disk_key could be
modified such that the last sentence is perhahps removed?

>  struct btrfs_key {
>  	__u64 objectid;
> -	__u8 type;
>  	__u64 offset;
> +	__u8 type;
>  } __attribute__ ((__packed__));
>  
>  struct btrfs_dev_item {
>
Qu Wenruo June 19, 2019, 1:37 a.m. UTC | #2
On 2019/6/18 下午10:15, David Sterba wrote:
> We don't use the plain key for any on-disk operations so there's no
> requirement for the member order. As the offset is a u64 that should be
> on an 8byte aligned address, this can generate ineffective code on
> strict alignment architectures and can potentially hurt even on others
> (cross-cacheline access).
> 
> The resulting asm code on x86_64 only differes in the offset, no significant
> change in size of the object size.
> 
> The alignment of the structure is unchanged.
> 
> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
>  include/uapi/linux/btrfs_tree.h | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h
> index aff1356c2bb8..9ca7adcf3b7f 100644
> --- a/include/uapi/linux/btrfs_tree.h
> +++ b/include/uapi/linux/btrfs_tree.h
> @@ -342,10 +342,17 @@ struct btrfs_disk_key {
>  	__le64 offset;
>  } __attribute__ ((__packed__));
>  
> +/*
> + * NOTE: this structure does not match the on-disk format of key and must be
> + * converted with the right helpers. The btrfs_key is for in-memory use and the
> + * members are reordered for better alignment. It's still packed as it's never
> + * used in arrays and the extra alignment would consume stack space in
> + * functions.
> + */
>  struct btrfs_key {
>  	__u64 objectid;
> -	__u8 type;
>  	__u64 offset;
> +	__u8 type;
>  } __attribute__ ((__packed__));

And why not remove the packed attribute?

Thanks,
Qu
>  
>  struct btrfs_dev_item {
>
David Sterba June 19, 2019, 1:39 p.m. UTC | #3
On Wed, Jun 19, 2019 at 09:37:39AM +0800, Qu Wenruo wrote:
> >  struct btrfs_key {
> >  	__u64 objectid;
> > -	__u8 type;
> >  	__u64 offset;
> > +	__u8 type;
> >  } __attribute__ ((__packed__));
> 
> And why not remove the packed attribute?

Because of this (stack usage changes):

do_relocation                                                      +8 (304 -> 312)
btrfs_orphan_cleanup                                              +16 (128 -> 144)
btrfs_init_new_device                                             -16 (288 -> 272)
get_subvol_name_from_objectid                                      +8 (136 -> 144)
log_conflicting_inodes                                             +8 (184 -> 192)
btrfs_read_chunk_tree                                             +16 (128 -> 144)
btrfs_recover_relocation                                           +8 (136 -> 144)
scrub_stripe                                                      +16 (536 -> 552)
btrfs_clone                                                       +16 (352 -> 368)
free_space_next_bitmap                                             +8 (80 -> 88)
btrfs_find_orphan_roots                                           +16 (112 -> 128)
find_free_dev_extent_start                                         +8 (160 -> 168)
btrfs_lookup_extent_info                                           +8 (160 -> 168)
btrfs_find_item                                                    +8 (80 -> 88)
btrfs_create_pending_block_groups                                  +8 (120 -> 128)
btrfs_read_block_groups                                           -24 (176 -> 152)
__remove_from_free_space_tree                                      +8 (120 -> 128)
check_committed_ref                                                +8 (104 -> 112)
replay_one_buffer                                                  +8 (152 -> 160)
fixup_inode_link_counts                                            +8 (96 -> 104)
btrfs_del_csums                                                    +8 (192 -> 200)
btrfs_search_path_in_tree_user                                    +16 (192 -> 208)
btrfs_lookup_dentry                                                +8 (168 -> 176)
__add_tree_block                                                   +8 (120 -> 128)
__btrfs_balance                                                    +8 (264 -> 272)
__lookup_free_space_inode                                         +16 (112 -> 128)
__readahead_hook                                                  +16 (168 -> 184)
btrfs_get_parent                                                   +8 (96 -> 104)
btrfs_run_dev_replace                                              +8 (88 -> 96)
add_qgroup_item                                                    +8 (80 -> 88)
merge_reloc_root                                                  +16 (240 -> 256)
link_to_fixup_dir                                                  +8 (80 -> 88)
btrfs_insert_orphan_item                                           +8 (64 -> 72)
btrfs_delete_delayed_items                                         +8 (184 -> 192)
btrfs_read_qgroup_config                                           +8 (136 -> 144)
is_extent_unchanged                                                +8 (152 -> 160)
btrfs_recover_log_trees                                           +24 (192 -> 216)
btrfs_create_free_space_tree                                       +8 (136 -> 144)
send_subvol                                                        +8 (136 -> 144)
btrfs_compare_trees                                               +24 (176 -> 200)
get_first_ref                                                      +8 (136 -> 144)
qgroup_trace_new_subtree_blocks                                    +8 (176 -> 184)
qgroup_trace_extent_swap                                          +16 (192 -> 208)
generic_bin_search                                                 +8 (160 -> 168)
replay_one_name                                                    +8 (152 -> 160)
btrfs_print_tree                                                   +8 (160 -> 168)
__create_free_space_inode                                          +8 (112 -> 120)
copy_items                                                        +24 (328 -> 352)
walk_down_reloc_tree                                               +8 (160 -> 168)
btrfs_find_next_key                                                +8 (184 -> 192)
btrfs_del_inode_ref                                                +8 (144 -> 152)
btrfs_check_node                                                  +16 (144 -> 160)
find_next_devid                                                    +8 (88 -> 96)
btrfs_new_inode                                                    +8 (184 -> 192)
__add_to_free_space_tree                                          +16 (136 -> 152)
__btrfs_run_delayed_refs                                           +8 (168 -> 176)
btrfs_qgroup_trace_subtree                                         +8 (224 -> 232)
btrfs_lookup_csums_range                                           +8 (184 -> 192)
__btrfs_update_delayed_inode                                       +8 (112 -> 120)
lookup_inline_extent_backref                                       +8 (184 -> 192)
find_data_references                                               +8 (152 -> 160)
replay_dir_deletes                                                +16 (168 -> 184)
check_leaf                                                         +8 (168 -> 176)
btrfs_ioctl_get_subvol_info                                       +16 (128 -> 144)
btrfs_insert_inode_ref                                             +8 (152 -> 160)
btrfs_finish_sprout                                                +8 (152 -> 160)
build_backref_tree                                                 +8 (272 -> 280)
insert_extent_data_ref                                             +8 (96 -> 104)
send_clone                                                        -24 (160 -> 136)
insert_tree_block_ref                                              +8 (56 -> 64)
read_node_slot                                                     +8 (88 -> 96)
btrfs_csum_file_blocks                                             +8 (168 -> 176)
replace_path                                                      +16 (352 -> 368)
changed_inode                                                     +16 (144 -> 160)
process_all_refs                                                  +16 (112 -> 128)
btrfs_find_root                                                   +16 (144 -> 160)
reada_walk_down                                                    +8 (176 -> 184)
btrfs_insert_xattr_item                                            +8 (136 -> 144)
maybe_send_hole                                                    +8 (152 -> 160)
btrfs_uuid_scan_kthread                                            +8 (520 -> 528)
modify_free_space_bitmap                                           +8 (192 -> 200)
lookup_extent_data_ref                                             +8 (144 -> 152)
btrfs_run_delayed_refs_for_head                                    +8 (208 -> 216)
btrfs_search_dir_index_item                                        +8 (112 -> 120)
btrfs_prev_leaf                                                   +16 (88 -> 104)
may_destroy_subvol                                                 +8 (88 -> 96)
convert_free_space_to_extents                                      +8 (176 -> 184)
btrfs_quota_enable                                                 +8 (128 -> 136)
remove_block_group_free_space                                      +8 (120 -> 128)
__btrfs_drop_extents                                              +24 (352 -> 376)
btrfs_finish_chunk_alloc                                          +16 (192 -> 208)
log_dir_items                                                     +16 (192 -> 208)
btrfs_log_changed_extents                                          +8 (240 -> 248)
update_cache_item                                                  +8 (112 -> 120)
extent_from_logical                                                +8 (104 -> 112)
btrfs_drop_snapshot                                                +8 (176 -> 184)
btrfs_realloc_node                                                 +8 (208 -> 216)
did_create_dir                                                     +8 (88 -> 96)
add_qgroup_relation_item                                           +8 (80 -> 88)
find_dir_range                                                     +8 (112 -> 120)
btrfs_verify_level_key                                             +8 (128 -> 136)
do_walk_down                                                       +8 (336 -> 344)
get_last_extent                                                    +8 (88 -> 96)
add_keyed_refs                                                     +8 (168 -> 176)
read_block_for_search                                              +8 (168 -> 176)
can_rmdir                                                          +8 (120 -> 128)
btrfs_insert_empty_inode                                           +8 (40 -> 48)
btrfs_add_root_ref                                                 +8 (144 -> 152)
__btrfs_free_extent                                                +8 (216 -> 224)
add_all_parents                                                    +8 (160 -> 168)
btrfs_search_path_in_tree                                          +8 (120 -> 128)
add_new_free_space_info                                            +8 (72 -> 80)
btrfs_set_inode_index                                              +8 (96 -> 104)
btrfs_search_forward                                               +8 (128 -> 136)
insert_balance_item                                                +8 (232 -> 240)
btrfs_find_one_extref                                              +8 (104 -> 112)
scrub_raid56_parity                                                +8 (400 -> 408)
btrfs_real_readdir                                                 +8 (192 -> 200)
btrfs_log_inode_parent                                             +8 (264 -> 272)
btrfs_listxattr                                                    +8 (168 -> 176)
convert_free_space_to_bitmaps                                      +8 (168 -> 176)
btrfs_verify_dev_extents                                           +8 (152 -> 160)
find_next_extent                                                   +8 (120 -> 128)
btrfs_set_item_key_safe                                            +8 (160 -> 168)
btrfs_mark_extent_written                                         +16 (296 -> 312)
replay_xattr_deletes                                               +8 (192 -> 200)
caching_kthread                                                    +8 (128 -> 136)
btrfs_remove_chunk                                                 +8 (184 -> 192)
scrub_print_warning_inode                                          +8 (208 -> 216)
caching_thread                                                     +8 (184 -> 192)
log_new_dir_dentries                                              +16 (216 -> 232)
drop_objectid_items                                               +16 (112 -> 128)
insert_dir_log_key                                                 +8 (56 -> 64)
iterate_dir_item                                                   +8 (176 -> 184)
btrfs_lookup_csum                                                  +8 (104 -> 112)
relocate_tree_blocks                                               +8 (168 -> 176)
btrfs_build_ref_tree                                               -8 (168 -> 160)
btrfs_find_highest_objectid                                        +8 (80 -> 88)
btrfs_insert_dir_item                                              +8 (136 -> 144)
walk_down_log_tree                                                 +8 (176 -> 184)
btrfs_insert_file_extent                                           +8 (112 -> 120)
setup_leaf_for_split                                               +8 (120 -> 128)
btrfs_shrink_device                                                +8 (176 -> 184)

LOST (0):

NEW (544):
	btrfs_relocate_sys_chunks
	find_first_block_group
	add_delayed_refs
	process_leaf

LOST/NEW DELTA:     +544
PRE/POST DELTA:    +1840
Qu Wenruo June 19, 2019, 1:50 p.m. UTC | #4
On 2019/6/19 下午9:39, David Sterba wrote:
> On Wed, Jun 19, 2019 at 09:37:39AM +0800, Qu Wenruo wrote:
>>>  struct btrfs_key {
>>>  	__u64 objectid;
>>> -	__u8 type;
>>>  	__u64 offset;
>>> +	__u8 type;
>>>  } __attribute__ ((__packed__));
>>
>> And why not remove the packed attribute?
> 
> Because of this (stack usage changes):

That's expected as long as we're using btrfs_key on stack.

But if we're using btrfs_key on stack and follow the packed feature,
then adjacent on stack memory is not accessed aligned, which could cause
(unobvious) performance drop.

If the unaligned memory access is really causing some performance even
on stack memory, then I'd say the bump in stack memory usage is acceptable.

If not, then the idea of default -Waddress-of-packed-member makes no sense.

Thanks,
Qu

> 
> do_relocation                                                      +8 (304 -> 312)
> btrfs_orphan_cleanup                                              +16 (128 -> 144)
> btrfs_init_new_device                                             -16 (288 -> 272)
> get_subvol_name_from_objectid                                      +8 (136 -> 144)
> log_conflicting_inodes                                             +8 (184 -> 192)
> btrfs_read_chunk_tree                                             +16 (128 -> 144)
> btrfs_recover_relocation                                           +8 (136 -> 144)
> scrub_stripe                                                      +16 (536 -> 552)
> btrfs_clone                                                       +16 (352 -> 368)
> free_space_next_bitmap                                             +8 (80 -> 88)
> btrfs_find_orphan_roots                                           +16 (112 -> 128)
> find_free_dev_extent_start                                         +8 (160 -> 168)
> btrfs_lookup_extent_info                                           +8 (160 -> 168)
> btrfs_find_item                                                    +8 (80 -> 88)
> btrfs_create_pending_block_groups                                  +8 (120 -> 128)
> btrfs_read_block_groups                                           -24 (176 -> 152)
> __remove_from_free_space_tree                                      +8 (120 -> 128)
> check_committed_ref                                                +8 (104 -> 112)
> replay_one_buffer                                                  +8 (152 -> 160)
> fixup_inode_link_counts                                            +8 (96 -> 104)
> btrfs_del_csums                                                    +8 (192 -> 200)
> btrfs_search_path_in_tree_user                                    +16 (192 -> 208)
> btrfs_lookup_dentry                                                +8 (168 -> 176)
> __add_tree_block                                                   +8 (120 -> 128)
> __btrfs_balance                                                    +8 (264 -> 272)
> __lookup_free_space_inode                                         +16 (112 -> 128)
> __readahead_hook                                                  +16 (168 -> 184)
> btrfs_get_parent                                                   +8 (96 -> 104)
> btrfs_run_dev_replace                                              +8 (88 -> 96)
> add_qgroup_item                                                    +8 (80 -> 88)
> merge_reloc_root                                                  +16 (240 -> 256)
> link_to_fixup_dir                                                  +8 (80 -> 88)
> btrfs_insert_orphan_item                                           +8 (64 -> 72)
> btrfs_delete_delayed_items                                         +8 (184 -> 192)
> btrfs_read_qgroup_config                                           +8 (136 -> 144)
> is_extent_unchanged                                                +8 (152 -> 160)
> btrfs_recover_log_trees                                           +24 (192 -> 216)
> btrfs_create_free_space_tree                                       +8 (136 -> 144)
> send_subvol                                                        +8 (136 -> 144)
> btrfs_compare_trees                                               +24 (176 -> 200)
> get_first_ref                                                      +8 (136 -> 144)
> qgroup_trace_new_subtree_blocks                                    +8 (176 -> 184)
> qgroup_trace_extent_swap                                          +16 (192 -> 208)
> generic_bin_search                                                 +8 (160 -> 168)
> replay_one_name                                                    +8 (152 -> 160)
> btrfs_print_tree                                                   +8 (160 -> 168)
> __create_free_space_inode                                          +8 (112 -> 120)
> copy_items                                                        +24 (328 -> 352)
> walk_down_reloc_tree                                               +8 (160 -> 168)
> btrfs_find_next_key                                                +8 (184 -> 192)
> btrfs_del_inode_ref                                                +8 (144 -> 152)
> btrfs_check_node                                                  +16 (144 -> 160)
> find_next_devid                                                    +8 (88 -> 96)
> btrfs_new_inode                                                    +8 (184 -> 192)
> __add_to_free_space_tree                                          +16 (136 -> 152)
> __btrfs_run_delayed_refs                                           +8 (168 -> 176)
> btrfs_qgroup_trace_subtree                                         +8 (224 -> 232)
> btrfs_lookup_csums_range                                           +8 (184 -> 192)
> __btrfs_update_delayed_inode                                       +8 (112 -> 120)
> lookup_inline_extent_backref                                       +8 (184 -> 192)
> find_data_references                                               +8 (152 -> 160)
> replay_dir_deletes                                                +16 (168 -> 184)
> check_leaf                                                         +8 (168 -> 176)
> btrfs_ioctl_get_subvol_info                                       +16 (128 -> 144)
> btrfs_insert_inode_ref                                             +8 (152 -> 160)
> btrfs_finish_sprout                                                +8 (152 -> 160)
> build_backref_tree                                                 +8 (272 -> 280)
> insert_extent_data_ref                                             +8 (96 -> 104)
> send_clone                                                        -24 (160 -> 136)
> insert_tree_block_ref                                              +8 (56 -> 64)
> read_node_slot                                                     +8 (88 -> 96)
> btrfs_csum_file_blocks                                             +8 (168 -> 176)
> replace_path                                                      +16 (352 -> 368)
> changed_inode                                                     +16 (144 -> 160)
> process_all_refs                                                  +16 (112 -> 128)
> btrfs_find_root                                                   +16 (144 -> 160)
> reada_walk_down                                                    +8 (176 -> 184)
> btrfs_insert_xattr_item                                            +8 (136 -> 144)
> maybe_send_hole                                                    +8 (152 -> 160)
> btrfs_uuid_scan_kthread                                            +8 (520 -> 528)
> modify_free_space_bitmap                                           +8 (192 -> 200)
> lookup_extent_data_ref                                             +8 (144 -> 152)
> btrfs_run_delayed_refs_for_head                                    +8 (208 -> 216)
> btrfs_search_dir_index_item                                        +8 (112 -> 120)
> btrfs_prev_leaf                                                   +16 (88 -> 104)
> may_destroy_subvol                                                 +8 (88 -> 96)
> convert_free_space_to_extents                                      +8 (176 -> 184)
> btrfs_quota_enable                                                 +8 (128 -> 136)
> remove_block_group_free_space                                      +8 (120 -> 128)
> __btrfs_drop_extents                                              +24 (352 -> 376)
> btrfs_finish_chunk_alloc                                          +16 (192 -> 208)
> log_dir_items                                                     +16 (192 -> 208)
> btrfs_log_changed_extents                                          +8 (240 -> 248)
> update_cache_item                                                  +8 (112 -> 120)
> extent_from_logical                                                +8 (104 -> 112)
> btrfs_drop_snapshot                                                +8 (176 -> 184)
> btrfs_realloc_node                                                 +8 (208 -> 216)
> did_create_dir                                                     +8 (88 -> 96)
> add_qgroup_relation_item                                           +8 (80 -> 88)
> find_dir_range                                                     +8 (112 -> 120)
> btrfs_verify_level_key                                             +8 (128 -> 136)
> do_walk_down                                                       +8 (336 -> 344)
> get_last_extent                                                    +8 (88 -> 96)
> add_keyed_refs                                                     +8 (168 -> 176)
> read_block_for_search                                              +8 (168 -> 176)
> can_rmdir                                                          +8 (120 -> 128)
> btrfs_insert_empty_inode                                           +8 (40 -> 48)
> btrfs_add_root_ref                                                 +8 (144 -> 152)
> __btrfs_free_extent                                                +8 (216 -> 224)
> add_all_parents                                                    +8 (160 -> 168)
> btrfs_search_path_in_tree                                          +8 (120 -> 128)
> add_new_free_space_info                                            +8 (72 -> 80)
> btrfs_set_inode_index                                              +8 (96 -> 104)
> btrfs_search_forward                                               +8 (128 -> 136)
> insert_balance_item                                                +8 (232 -> 240)
> btrfs_find_one_extref                                              +8 (104 -> 112)
> scrub_raid56_parity                                                +8 (400 -> 408)
> btrfs_real_readdir                                                 +8 (192 -> 200)
> btrfs_log_inode_parent                                             +8 (264 -> 272)
> btrfs_listxattr                                                    +8 (168 -> 176)
> convert_free_space_to_bitmaps                                      +8 (168 -> 176)
> btrfs_verify_dev_extents                                           +8 (152 -> 160)
> find_next_extent                                                   +8 (120 -> 128)
> btrfs_set_item_key_safe                                            +8 (160 -> 168)
> btrfs_mark_extent_written                                         +16 (296 -> 312)
> replay_xattr_deletes                                               +8 (192 -> 200)
> caching_kthread                                                    +8 (128 -> 136)
> btrfs_remove_chunk                                                 +8 (184 -> 192)
> scrub_print_warning_inode                                          +8 (208 -> 216)
> caching_thread                                                     +8 (184 -> 192)
> log_new_dir_dentries                                              +16 (216 -> 232)
> drop_objectid_items                                               +16 (112 -> 128)
> insert_dir_log_key                                                 +8 (56 -> 64)
> iterate_dir_item                                                   +8 (176 -> 184)
> btrfs_lookup_csum                                                  +8 (104 -> 112)
> relocate_tree_blocks                                               +8 (168 -> 176)
> btrfs_build_ref_tree                                               -8 (168 -> 160)
> btrfs_find_highest_objectid                                        +8 (80 -> 88)
> btrfs_insert_dir_item                                              +8 (136 -> 144)
> walk_down_log_tree                                                 +8 (176 -> 184)
> btrfs_insert_file_extent                                           +8 (112 -> 120)
> setup_leaf_for_split                                               +8 (120 -> 128)
> btrfs_shrink_device                                                +8 (176 -> 184)
> 
> LOST (0):
> 
> NEW (544):
> 	btrfs_relocate_sys_chunks
> 	find_first_block_group
> 	add_delayed_refs
> 	process_leaf
> 
> LOST/NEW DELTA:     +544
> PRE/POST DELTA:    +1840
>
David Sterba June 19, 2019, 2:14 p.m. UTC | #5
On Wed, Jun 19, 2019 at 09:50:58PM +0800, Qu Wenruo wrote:
> 
> 
> On 2019/6/19 下午9:39, David Sterba wrote:
> > On Wed, Jun 19, 2019 at 09:37:39AM +0800, Qu Wenruo wrote:
> >>>  struct btrfs_key {
> >>>  	__u64 objectid;
> >>> -	__u8 type;
> >>>  	__u64 offset;
> >>> +	__u8 type;
> >>>  } __attribute__ ((__packed__));
> >>
> >> And why not remove the packed attribute?
> > 
> > Because of this (stack usage changes):
> 
> That's expected as long as we're using btrfs_key on stack.

And that's in many functions.

> But if we're using btrfs_key on stack and follow the packed feature,
> then adjacent on stack memory is not accessed aligned, which could cause
> (unobvious) performance drop.

No, the placement of local variables is up to the compiler and does not
necessarily follow the same order as teh definition, some of the
variables can be completely optimized out etc. And the types must be
accessed according to the type alignments, so one packed structure does
not cause a disaster.

> If the unaligned memory access is really causing some performance even
> on stack memory, then I'd say the bump in stack memory usage is acceptable.

Unaligned access overhead depends on architecture, and there's almost
none on intel, so the packed key makes no difference regarding speed.

I consider the bump in stack consumption high enough to stick to the
packing, given that it's still ok from the performance POV.

> If not, then the idea of default -Waddress-of-packed-member makes no sense.

Yeah, I start to think that it's causing more harm than good.

Patch
diff mbox series

diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h
index aff1356c2bb8..9ca7adcf3b7f 100644
--- a/include/uapi/linux/btrfs_tree.h
+++ b/include/uapi/linux/btrfs_tree.h
@@ -342,10 +342,17 @@  struct btrfs_disk_key {
 	__le64 offset;
 } __attribute__ ((__packed__));
 
+/*
+ * NOTE: this structure does not match the on-disk format of key and must be
+ * converted with the right helpers. The btrfs_key is for in-memory use and the
+ * members are reordered for better alignment. It's still packed as it's never
+ * used in arrays and the extra alignment would consume stack space in
+ * functions.
+ */
 struct btrfs_key {
 	__u64 objectid;
-	__u8 type;
 	__u64 offset;
+	__u8 type;
 } __attribute__ ((__packed__));
 
 struct btrfs_dev_item {