Message ID | 20231218082758.247831-1-sumang@marvell.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] octeontx2-af: Fix marking couple of structure as __packed | expand |
On 12/18/2023 12:27 AM, Suman Ghosh wrote: > Couple of structures was not marked as __packed which may have some > performance implication. This patch fixes the same and mark them as > __packed. Not sure I follow why lack of __packed would have performance implications? I get that __packed is important to ensure layout is correct or to ensure the whole structure has the right size rather than unexpected gaps. I'd guess maybe because the structures size would include padding without __packed, leading to a lot of gaps when combining several structures together... I did test on my system with pahole, and even without __packed, I don't get any gaps in the npc_lt_def_cfg structure: > struct npc_lt_def_cfg { > struct npc_lt_def rx_ol2; /* 0 3 */ > struct npc_lt_def rx_oip4; /* 3 3 */ > struct npc_lt_def rx_iip4; /* 6 3 */ > struct npc_lt_def rx_oip6; /* 9 3 */ > struct npc_lt_def rx_iip6; /* 12 3 */ > struct npc_lt_def rx_otcp; /* 15 3 */ > struct npc_lt_def rx_itcp; /* 18 3 */ > struct npc_lt_def rx_oudp; /* 21 3 */ > struct npc_lt_def rx_iudp; /* 24 3 */ > struct npc_lt_def rx_osctp; /* 27 3 */ > struct npc_lt_def rx_isctp; /* 30 3 */ > struct npc_lt_def_ipsec rx_ipsec[2]; /* 33 10 */ > struct npc_lt_def pck_ol2; /* 43 3 */ > struct npc_lt_def pck_oip4; /* 46 3 */ > struct npc_lt_def pck_oip6; /* 49 3 */ > struct npc_lt_def pck_iip4; /* 52 3 */ > struct npc_lt_def_apad rx_apad0; /* 55 4 */ > struct npc_lt_def_apad rx_apad1; /* 59 4 */ > struct npc_lt_def_color ovlan; /* 63 5 */ > /* --- cacheline 1 boundary (64 bytes) was 4 bytes ago --- */ > struct npc_lt_def_color ivlan; /* 68 5 */ > struct npc_lt_def_color rx_gen0_color; /* 73 5 */ > struct npc_lt_def_color rx_gen1_color; /* 78 5 */ > struct npc_lt_def_et rx_et[2]; /* 83 10 */ > > /* size: 93, cachelines: 2, members: 23 */ > /* last cacheline: 29 bytes */ > }; However that may not be true across all compilers etc. Also all the other structures are __packed. Makes sense. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> > > Fixes: 42006910b5ea ("octeontx2-af: cleanup KPU config data") > Signed-off-by: Suman Ghosh <sumang@marvell.com> > --- > drivers/net/ethernet/marvell/octeontx2/af/npc.h | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/marvell/octeontx2/af/npc.h b/drivers/net/ethernet/marvell/octeontx2/af/npc.h > index ab3e39eef2eb..8c0732c9a7ee 100644 > --- a/drivers/net/ethernet/marvell/octeontx2/af/npc.h > +++ b/drivers/net/ethernet/marvell/octeontx2/af/npc.h > @@ -528,7 +528,7 @@ struct npc_lt_def { > u8 ltype_mask; > u8 ltype_match; > u8 lid; > -}; > +} __packed; > > struct npc_lt_def_ipsec { > u8 ltype_mask; > @@ -536,7 +536,7 @@ struct npc_lt_def_ipsec { > u8 lid; > u8 spi_offset; > u8 spi_nz; > -}; > +} __packed; > > struct npc_lt_def_apad { > u8 ltype_mask;
>Not sure I follow why lack of __packed would have performance >implications? I get that __packed is important to ensure layout is >correct or to ensure the whole structure has the right size rather than >unexpected gaps. I'd guess maybe because the structures size would >include padding without __packed, leading to a lot of gaps when >combining several structures together... > >I did test on my system with pahole, and even without __packed, I don't >get any gaps in the npc_lt_def_cfg structure: > > >> struct npc_lt_def_cfg { >> struct npc_lt_def rx_ol2; /* 0 >3 */ >> struct npc_lt_def rx_oip4; /* 3 >3 */ >> struct npc_lt_def rx_iip4; /* 6 >3 */ >> struct npc_lt_def rx_oip6; /* 9 >3 */ >> struct npc_lt_def rx_iip6; /* 12 >3 */ >> struct npc_lt_def rx_otcp; /* 15 >3 */ >> struct npc_lt_def rx_itcp; /* 18 >3 */ >> struct npc_lt_def rx_oudp; /* 21 >3 */ >> struct npc_lt_def rx_iudp; /* 24 >3 */ >> struct npc_lt_def rx_osctp; /* 27 >3 */ >> struct npc_lt_def rx_isctp; /* 30 >3 */ >> struct npc_lt_def_ipsec rx_ipsec[2]; /* 33 >10 */ >> struct npc_lt_def pck_ol2; /* 43 >3 */ >> struct npc_lt_def pck_oip4; /* 46 >3 */ >> struct npc_lt_def pck_oip6; /* 49 >3 */ >> struct npc_lt_def pck_iip4; /* 52 >3 */ >> struct npc_lt_def_apad rx_apad0; /* 55 >4 */ >> struct npc_lt_def_apad rx_apad1; /* 59 >4 */ >> struct npc_lt_def_color ovlan; /* 63 >5 */ >> /* --- cacheline 1 boundary (64 bytes) was 4 bytes ago --- */ >> struct npc_lt_def_color ivlan; /* 68 >5 */ >> struct npc_lt_def_color rx_gen0_color; /* 73 >5 */ >> struct npc_lt_def_color rx_gen1_color; /* 78 >5 */ >> struct npc_lt_def_et rx_et[2]; /* 83 >10 */ >> >> /* size: 93, cachelines: 2, members: 23 */ >> /* last cacheline: 29 bytes */ }; > > >However that may not be true across all compilers etc. Also all the >other structures are __packed. Makes sense. > >Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> > [Suman] I agree, "having performance impact" is a wrong statement. I will update the same in v2.
From: Jacob Keller > Sent: 18 December 2023 20:44 > > On 12/18/2023 12:27 AM, Suman Ghosh wrote: > > Couple of structures was not marked as __packed which may have some > > performance implication. This patch fixes the same and mark them as > > __packed. > > Not sure I follow why lack of __packed would have performance > implications? I get that __packed is important to ensure layout is > correct or to ensure the whole structure has the right size rather than > unexpected gaps. I'd guess maybe because the structures size would > include padding without __packed, leading to a lot of gaps when > combining several structures together... > > I did test on my system with pahole, and even without __packed, I don't > get any gaps in the npc_lt_def_cfg structure: > > > > struct npc_lt_def_cfg { > > struct npc_lt_def rx_ol2; /* 0 3 */ > > struct npc_lt_def rx_oip4; /* 3 3 */ > > struct npc_lt_def rx_iip4; /* 6 3 */ > > struct npc_lt_def rx_oip6; /* 9 3 */ > > struct npc_lt_def rx_iip6; /* 12 3 */ > > struct npc_lt_def rx_otcp; /* 15 3 */ > > struct npc_lt_def rx_itcp; /* 18 3 */ > > struct npc_lt_def rx_oudp; /* 21 3 */ > > struct npc_lt_def rx_iudp; /* 24 3 */ > > struct npc_lt_def rx_osctp; /* 27 3 */ > > struct npc_lt_def rx_isctp; /* 30 3 */ > > struct npc_lt_def_ipsec rx_ipsec[2]; /* 33 10 */ > > struct npc_lt_def pck_ol2; /* 43 3 */ > > struct npc_lt_def pck_oip4; /* 46 3 */ > > struct npc_lt_def pck_oip6; /* 49 3 */ > > struct npc_lt_def pck_iip4; /* 52 3 */ > > struct npc_lt_def_apad rx_apad0; /* 55 4 */ > > struct npc_lt_def_apad rx_apad1; /* 59 4 */ > > struct npc_lt_def_color ovlan; /* 63 5 */ > > /* --- cacheline 1 boundary (64 bytes) was 4 bytes ago --- */ > > struct npc_lt_def_color ivlan; /* 68 5 */ > > struct npc_lt_def_color rx_gen0_color; /* 73 5 */ > > struct npc_lt_def_color rx_gen1_color; /* 78 5 */ > > struct npc_lt_def_et rx_et[2]; /* 83 10 */ > > > > /* size: 93, cachelines: 2, members: 23 */ > > /* last cacheline: 29 bytes */ > > }; > > > However that may not be true across all compilers etc. Also all the > other structures are __packed. Makes sense. Or not - maybe all the __packed should be removed instead! Unless these structures (or any others) appear in 'messages' which get transferred between systems they really shouldn't be __packed. And a 93 byte 'message' with all those fields seems rather odd. The above breakdown seems to imply everything is 'unsigned char' so the __packed makes no difference. Using __packed requires the compiler generate byte loads/store with shifts (etc) on many architectures and should really be avoided unless it is absolutely needed for binary compatibility. Even then if the problem is a 64bit field that only needs to be 32bit aligned (as is common for some compat32 code) then the 64bit fields should be marked as being 32bit aligned. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On 12/19/2023 7:26 AM, David Laight wrote: > From: Jacob Keller >> Sent: 18 December 2023 20:44 >> >> On 12/18/2023 12:27 AM, Suman Ghosh wrote: >>> Couple of structures was not marked as __packed which may have some >>> performance implication. This patch fixes the same and mark them as >>> __packed. >> >> Not sure I follow why lack of __packed would have performance >> implications? I get that __packed is important to ensure layout is >> correct or to ensure the whole structure has the right size rather than >> unexpected gaps. I'd guess maybe because the structures size would >> include padding without __packed, leading to a lot of gaps when >> combining several structures together... >> >> I did test on my system with pahole, and even without __packed, I don't >> get any gaps in the npc_lt_def_cfg structure: >> >> >>> struct npc_lt_def_cfg { >>> struct npc_lt_def rx_ol2; /* 0 3 */ >>> struct npc_lt_def rx_oip4; /* 3 3 */ >>> struct npc_lt_def rx_iip4; /* 6 3 */ >>> struct npc_lt_def rx_oip6; /* 9 3 */ >>> struct npc_lt_def rx_iip6; /* 12 3 */ >>> struct npc_lt_def rx_otcp; /* 15 3 */ >>> struct npc_lt_def rx_itcp; /* 18 3 */ >>> struct npc_lt_def rx_oudp; /* 21 3 */ >>> struct npc_lt_def rx_iudp; /* 24 3 */ >>> struct npc_lt_def rx_osctp; /* 27 3 */ >>> struct npc_lt_def rx_isctp; /* 30 3 */ >>> struct npc_lt_def_ipsec rx_ipsec[2]; /* 33 10 */ >>> struct npc_lt_def pck_ol2; /* 43 3 */ >>> struct npc_lt_def pck_oip4; /* 46 3 */ >>> struct npc_lt_def pck_oip6; /* 49 3 */ >>> struct npc_lt_def pck_iip4; /* 52 3 */ >>> struct npc_lt_def_apad rx_apad0; /* 55 4 */ >>> struct npc_lt_def_apad rx_apad1; /* 59 4 */ >>> struct npc_lt_def_color ovlan; /* 63 5 */ >>> /* --- cacheline 1 boundary (64 bytes) was 4 bytes ago --- */ >>> struct npc_lt_def_color ivlan; /* 68 5 */ >>> struct npc_lt_def_color rx_gen0_color; /* 73 5 */ >>> struct npc_lt_def_color rx_gen1_color; /* 78 5 */ >>> struct npc_lt_def_et rx_et[2]; /* 83 10 */ >>> >>> /* size: 93, cachelines: 2, members: 23 */ >>> /* last cacheline: 29 bytes */ >>> }; >> >> >> However that may not be true across all compilers etc. Also all the >> other structures are __packed. Makes sense. > > Or not - maybe all the __packed should be removed instead! > > Unless these structures (or any others) appear in 'messages' which > get transferred between systems they really shouldn't be __packed. > And a 93 byte 'message' with all those fields seems rather odd. > > The above breakdown seems to imply everything is 'unsigned char' > so the __packed makes no difference. > > Using __packed requires the compiler generate byte loads/store > with shifts (etc) on many architectures and should really be avoided > unless it is absolutely needed for binary compatibility. > > Even then if the problem is a 64bit field that only needs to be > 32bit aligned (as is common for some compat32 code) then the 64bit > fields should be marked as being 32bit aligned. > > David > Right. Typically packed is only required when dealing with something where the exact binary layout matters (i.e. copying to/from hardware or across systems in such a way that the layout might change with different compilers/arch). If that isn't how this structure is used, then ya, removing __packed seems reasonable. And at least for one system I see no difference in the actual generated layout, making __packed redundant. However, its not clear to me at a glance how this structure is used and whether it really is copied between places where binary compatibility is a requirement. > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK > Registration No: 1397386 (Wales)
>>> >>> However that may not be true across all compilers etc. Also all the >>> other structures are __packed. Makes sense. >> >> Or not - maybe all the __packed should be removed instead! >> >> Unless these structures (or any others) appear in 'messages' which get >> transferred between systems they really shouldn't be __packed. >> And a 93 byte 'message' with all those fields seems rather odd. >> >> The above breakdown seems to imply everything is 'unsigned char' >> so the __packed makes no difference. >> >> Using __packed requires the compiler generate byte loads/store with >> shifts (etc) on many architectures and should really be avoided unless >> it is absolutely needed for binary compatibility. >> >> Even then if the problem is a 64bit field that only needs to be 32bit >> aligned (as is common for some compat32 code) then the 64bit fields >> should be marked as being 32bit aligned. >> >> David >> >Right. Typically packed is only required when dealing with something >where the exact binary layout matters (i.e. copying to/from hardware or >across systems in such a way that the layout might change with different >compilers/arch). > >If that isn't how this structure is used, then ya, removing __packed >seems reasonable. And at least for one system I see no difference in the >actual generated layout, making __packed redundant. > >However, its not clear to me at a glance how this structure is used and >whether it really is copied between places where binary compatibility is >a requirement. > >> - >> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, >> MK1 1PT, UK Registration No: 1397386 (Wales) [Suman] Yes, these structures are copied from firmware. It is needed to inform kernel about some parsing information required by hardware. That is the reason structures are packed and these two were missed.
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/npc.h b/drivers/net/ethernet/marvell/octeontx2/af/npc.h index ab3e39eef2eb..8c0732c9a7ee 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/npc.h +++ b/drivers/net/ethernet/marvell/octeontx2/af/npc.h @@ -528,7 +528,7 @@ struct npc_lt_def { u8 ltype_mask; u8 ltype_match; u8 lid; -}; +} __packed; struct npc_lt_def_ipsec { u8 ltype_mask; @@ -536,7 +536,7 @@ struct npc_lt_def_ipsec { u8 lid; u8 spi_offset; u8 spi_nz; -}; +} __packed; struct npc_lt_def_apad { u8 ltype_mask;
Couple of structures was not marked as __packed which may have some performance implication. This patch fixes the same and mark them as __packed. Fixes: 42006910b5ea ("octeontx2-af: cleanup KPU config data") Signed-off-by: Suman Ghosh <sumang@marvell.com> --- drivers/net/ethernet/marvell/octeontx2/af/npc.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)