mbox series

[v2,net-next,0/7] Cleanup to main DSA structures

Message ID 20220105132141.2648876-1-vladimir.oltean@nxp.com (mailing list archive)
Headers show
Series Cleanup to main DSA structures | expand

Message

Vladimir Oltean Jan. 5, 2022, 1:21 p.m. UTC
This series contains changes that do the following:

- struct dsa_port reduced from 576 to 544 bytes, and first cache line a
  bit better organized
- struct dsa_switch from 160 to 136 bytes, and first cache line a bit
  better organized
- struct dsa_switch_tree from 112 to 104 bytes, and first cache line a
  bit better organized

No changes compared to v1, just split into a separate patch set.

Vladimir Oltean (7):
  net: dsa: move dsa_port :: stp_state near dsa_port :: mac
  net: dsa: merge all bools of struct dsa_port into a single u8
  net: dsa: move dsa_port :: type near dsa_port :: index
  net: dsa: merge all bools of struct dsa_switch into a single u32
  net: dsa: make dsa_switch :: num_ports an unsigned int
  net: dsa: move dsa_switch_tree :: ports and lags to first cache line
  net: dsa: combine two holes in struct dsa_switch_tree

 include/net/dsa.h | 146 +++++++++++++++++++++++++---------------------
 net/dsa/dsa2.c    |   2 +-
 2 files changed, 81 insertions(+), 67 deletions(-)

Comments

Vladimir Oltean Jan. 5, 2022, 2:28 p.m. UTC | #1
On Wed, Jan 05, 2022 at 03:21:34PM +0200, Vladimir Oltean wrote:
> This series contains changes that do the following:
> 
> - struct dsa_port reduced from 576 to 544 bytes, and first cache line a
>   bit better organized
> - struct dsa_switch from 160 to 136 bytes, and first cache line a bit
>   better organized
> - struct dsa_switch_tree from 112 to 104 bytes, and first cache line a
>   bit better organized
> 
> No changes compared to v1, just split into a separate patch set.
> 
> Vladimir Oltean (7):
>   net: dsa: move dsa_port :: stp_state near dsa_port :: mac
>   net: dsa: merge all bools of struct dsa_port into a single u8
>   net: dsa: move dsa_port :: type near dsa_port :: index
>   net: dsa: merge all bools of struct dsa_switch into a single u32
>   net: dsa: make dsa_switch :: num_ports an unsigned int
>   net: dsa: move dsa_switch_tree :: ports and lags to first cache line
>   net: dsa: combine two holes in struct dsa_switch_tree
> 
>  include/net/dsa.h | 146 +++++++++++++++++++++++++---------------------
>  net/dsa/dsa2.c    |   2 +-
>  2 files changed, 81 insertions(+), 67 deletions(-)
> 
> -- 
> 2.25.1
>

Let's keep this version for review only (RFC). For the final version I
just figured that I can use this syntax:

	u8			vlan_filtering:1;

	/* Managed by DSA on user ports and by drivers on CPU and DSA ports */
	u8			learning:1;

	u8			lag_tx_enabled:1;

	u8			devlink_port_setup:1;

	u8			setup:1;

instead of this syntax:

	u8			vlan_filtering:1,
				/* Managed by DSA on user ports and by
				 * drivers on CPU and DSA ports
				 */
				learning:1,
				lag_tx_enabled:1,
				devlink_port_setup:1,
				setup:1;

which is what I'm going to prefer.
Florian Fainelli Jan. 5, 2022, 6:37 p.m. UTC | #2
On 1/5/22 6:28 AM, Vladimir Oltean wrote:
> On Wed, Jan 05, 2022 at 03:21:34PM +0200, Vladimir Oltean wrote:
>> This series contains changes that do the following:
>>
>> - struct dsa_port reduced from 576 to 544 bytes, and first cache line a
>>   bit better organized
>> - struct dsa_switch from 160 to 136 bytes, and first cache line a bit
>>   better organized
>> - struct dsa_switch_tree from 112 to 104 bytes, and first cache line a
>>   bit better organized
>>
>> No changes compared to v1, just split into a separate patch set.
>>
>> Vladimir Oltean (7):
>>   net: dsa: move dsa_port :: stp_state near dsa_port :: mac
>>   net: dsa: merge all bools of struct dsa_port into a single u8
>>   net: dsa: move dsa_port :: type near dsa_port :: index
>>   net: dsa: merge all bools of struct dsa_switch into a single u32
>>   net: dsa: make dsa_switch :: num_ports an unsigned int
>>   net: dsa: move dsa_switch_tree :: ports and lags to first cache line
>>   net: dsa: combine two holes in struct dsa_switch_tree
>>
>>  include/net/dsa.h | 146 +++++++++++++++++++++++++---------------------
>>  net/dsa/dsa2.c    |   2 +-
>>  2 files changed, 81 insertions(+), 67 deletions(-)
>>
>> -- 
>> 2.25.1
>>
> 
> Let's keep this version for review only (RFC). For the final version I
> just figured that I can use this syntax:
> 
> 	u8			vlan_filtering:1;
> 
> 	/* Managed by DSA on user ports and by drivers on CPU and DSA ports */
> 	u8			learning:1;
> 
> 	u8			lag_tx_enabled:1;
> 
> 	u8			devlink_port_setup:1;
> 
> 	u8			setup:1;
> 
> instead of this syntax:
> 
> 	u8			vlan_filtering:1,
> 				/* Managed by DSA on user ports and by
> 				 * drivers on CPU and DSA ports
> 				 */
> 				learning:1,
> 				lag_tx_enabled:1,
> 				devlink_port_setup:1,
> 				setup:1;
> 
> which is what I'm going to prefer.

Yes this is indeed more readable. Thanks!
Florian Fainelli Jan. 5, 2022, 6:39 p.m. UTC | #3
On 1/5/22 5:21 AM, Vladimir Oltean wrote:
> This series contains changes that do the following:
> 
> - struct dsa_port reduced from 576 to 544 bytes, and first cache line a
>   bit better organized
> - struct dsa_switch from 160 to 136 bytes, and first cache line a bit
>   better organized
> - struct dsa_switch_tree from 112 to 104 bytes, and first cache line a
>   bit better organized
> 
> No changes compared to v1, just split into a separate patch set.

This is all looking good to me. I suppose we could possibly swap the
'nh' and 'tag_ops' member since dst->tag_ops is used in
dsa_tag_generic_flow_dissect() which is a fast path, what do you think?
Vladimir Oltean Jan. 5, 2022, 6:59 p.m. UTC | #4
On Wed, Jan 05, 2022 at 10:39:04AM -0800, Florian Fainelli wrote:
> On 1/5/22 5:21 AM, Vladimir Oltean wrote:
> > This series contains changes that do the following:
> >
> > - struct dsa_port reduced from 576 to 544 bytes, and first cache line a
> >   bit better organized
> > - struct dsa_switch from 160 to 136 bytes, and first cache line a bit
> >   better organized
> > - struct dsa_switch_tree from 112 to 104 bytes, and first cache line a
> >   bit better organized
> >
> > No changes compared to v1, just split into a separate patch set.
>
> This is all looking good to me. I suppose we could possibly swap the
> 'nh' and 'tag_ops' member since dst->tag_ops is used in
> dsa_tag_generic_flow_dissect() which is a fast path, what do you think?

pahole is telling me that dst->tag_ops is in the first cache line on
both arm64 and arm32. Are you saying that it's better for it to take
dst->nh's place?

[/opt/arm64-linux] $ pahole -C dsa_switch_tree net/dsa/slave.o
struct dsa_switch_tree {
        struct list_head           list;                 /*     0    16 */
        struct list_head           ports;                /*    16    16 */
        struct raw_notifier_head   nh;                   /*    32     8 */
        unsigned int               index;                /*    40     4 */
        struct kref                refcount;             /*    44     4 */
        struct net_device * *      lags;                 /*    48     8 */
        const struct dsa_device_ops  * tag_ops;          /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        enum dsa_tag_protocol      default_proto;        /*    64     4 */
        bool                       setup;                /*    68     1 */

        /* XXX 3 bytes hole, try to pack */

        struct dsa_platform_data * pd;                   /*    72     8 */
        struct list_head           rtable;               /*    80    16 */
        unsigned int               lags_len;             /*    96     4 */
        unsigned int               last_switch;          /*   100     4 */

        /* size: 104, cachelines: 2, members: 13 */
        /* sum members: 101, holes: 1, sum holes: 3 */
        /* last cacheline: 40 bytes */
};

[/opt/arm-linux] $ pahole -C dsa_switch_tree net/dsa/slave.o
struct dsa_switch_tree {
        struct list_head           list;                 /*     0     8 */
        struct list_head           ports;                /*     8     8 */
        struct raw_notifier_head   nh;                   /*    16     4 */
        unsigned int               index;                /*    20     4 */
        struct kref                refcount;             /*    24     4 */
        struct net_device * *      lags;                 /*    28     4 */
        const struct dsa_device_ops  * tag_ops;          /*    32     4 */
        enum dsa_tag_protocol      default_proto;        /*    36     4 */
        bool                       setup;                /*    40     1 */

        /* XXX 3 bytes hole, try to pack */

        struct dsa_platform_data * pd;                   /*    44     4 */
        struct list_head           rtable;               /*    48     8 */
        unsigned int               lags_len;             /*    56     4 */
        unsigned int               last_switch;          /*    60     4 */

        /* size: 64, cachelines: 1, members: 13 */
        /* sum members: 61, holes: 1, sum holes: 3 */
};
Florian Fainelli Jan. 5, 2022, 7:04 p.m. UTC | #5
On 1/5/22 10:59 AM, Vladimir Oltean wrote:
> On Wed, Jan 05, 2022 at 10:39:04AM -0800, Florian Fainelli wrote:
>> On 1/5/22 5:21 AM, Vladimir Oltean wrote:
>>> This series contains changes that do the following:
>>>
>>> - struct dsa_port reduced from 576 to 544 bytes, and first cache line a
>>>   bit better organized
>>> - struct dsa_switch from 160 to 136 bytes, and first cache line a bit
>>>   better organized
>>> - struct dsa_switch_tree from 112 to 104 bytes, and first cache line a
>>>   bit better organized
>>>
>>> No changes compared to v1, just split into a separate patch set.
>>
>> This is all looking good to me. I suppose we could possibly swap the
>> 'nh' and 'tag_ops' member since dst->tag_ops is used in
>> dsa_tag_generic_flow_dissect() which is a fast path, what do you think?
> 
> pahole is telling me that dst->tag_ops is in the first cache line on
> both arm64 and arm32. Are you saying that it's better for it to take
> dst->nh's place?

Sorry it stuck in my head somehow based upon patch 7's pahole output
that tag_ops was still in the second cache line when the after clearly
shows that it moved to the first cache line. No need to add additional
changes then, thanks!
Vladimir Oltean Jan. 5, 2022, 7:22 p.m. UTC | #6
On Wed, Jan 05, 2022 at 11:04:24AM -0800, Florian Fainelli wrote:
> On 1/5/22 10:59 AM, Vladimir Oltean wrote:
> > On Wed, Jan 05, 2022 at 10:39:04AM -0800, Florian Fainelli wrote:
> >> On 1/5/22 5:21 AM, Vladimir Oltean wrote:
> >>> This series contains changes that do the following:
> >>>
> >>> - struct dsa_port reduced from 576 to 544 bytes, and first cache line a
> >>>   bit better organized
> >>> - struct dsa_switch from 160 to 136 bytes, and first cache line a bit
> >>>   better organized
> >>> - struct dsa_switch_tree from 112 to 104 bytes, and first cache line a
> >>>   bit better organized
> >>>
> >>> No changes compared to v1, just split into a separate patch set.
> >>
> >> This is all looking good to me. I suppose we could possibly swap the
> >> 'nh' and 'tag_ops' member since dst->tag_ops is used in
> >> dsa_tag_generic_flow_dissect() which is a fast path, what do you think?
> > 
> > pahole is telling me that dst->tag_ops is in the first cache line on
> > both arm64 and arm32. Are you saying that it's better for it to take
> > dst->nh's place?
> 
> Sorry it stuck in my head somehow based upon patch 7's pahole output
> that tag_ops was still in the second cache line when the after clearly
> shows that it moved to the first cache line. No need to add additional
> changes then, thanks!

Yes, that output is a bit verbose and small details are easy to miss
(adding --expand_types makes it even worse). Happened to me quite a few
times during the session...

I'll prepare a v3 once we figure out what would suffice in terms of
warning other developers about the lack of bit field atomicity.