mbox series

[net-next,v4,0/5] net: mtip: Add support for MTIP imx287 L2 switch driver

Message ID 20250407145157.3626463-1-lukma@denx.de (mailing list archive)
Headers show
Series net: mtip: Add support for MTIP imx287 L2 switch driver | expand

Message

Lukasz Majewski April 7, 2025, 2:51 p.m. UTC
This patch series adds support for More Than IP's L2 switch driver embedded
in some NXP's SoCs. This one has been tested on imx287, but is also available
in the vf610.

In the past there has been performed some attempts to upstream this driver:

1. The 4.19-cip based one [1]
2. DSA based one for 5.12 [2] - i.e. the switch itself was treat as a DSA switch
   with NO tag appended.
3. The extension for FEC driver for 5.12 [3] - the trick here was to fully reuse
   FEC when the in-HW switching is disabled. When bridge offloading is enabled,
   the driver uses already configured MAC and PHY to also configure PHY.

All three approaches were not accepted as eligible for upstreaming.

The driver from this series has floowing features:

1. It is fully separated from fec_main - i.e. can be used interchangeable
   with it. To be more specific - one can build them as modules and
   if required switch between them when e.g. bridge offloading is required.

   To be more specific:
        - Use FEC_MAIN: When one needs support for two ETH ports with separate
          uDMAs used for both and bridging can be realized in SW.

        - Use MTIPL2SW: When it is enough to support two ports with only uDMA0
          attached to switch and bridging shall be offloaded to HW. 

2. This driver uses MTIP's L2 switch internal VLAN feature to provide port
   separation at boot time. Port separation is disabled when bridging is
   required.

3. Example usage:
        Configuration:
        ip link set lan0 up; sleep 1;
        ip link set lan1 up; sleep 1;
        ip link add name br0 type bridge;
        ip link set br0 up; sleep 1;
        ip link set lan0 master br0;
        ip link set lan1 master br0;
        bridge link;
        ip addr add 192.168.2.17/24 dev br0;
        ping -c 5 192.168.2.222

        Removal:
        ip link set br0 down;
        ip link delete br0 type bridge;
        ip link set dev lan1 down
        ip link set dev lan0 down

4. Limitations:
        - Driver enables and disables switch operation with learning and ageing.
        - Missing is the advanced configuration (e.g. adding entries to FBD). This is
          on purpose, as up till now we didn't had consensus about how the driver
          shall be added to Linux.

Links:
[1] - https://github.com/lmajewski/linux-imx28-l2switch/commits/master
[2] - https://github.com/lmajewski/linux-imx28-l2switch/tree/imx28-v5.12-L2-upstream-RFC_v1
[3] - https://source.denx.de/linux/linux-imx28-l2switch/-/tree/imx28-v5.12-L2-upstream-switchdev-RFC_v1?ref_type=heads

Lukasz Majewski (5):
  dt-bindings: net: Add MTIP L2 switch description
  ARM: dts: nxp: mxs: Adjust the imx28.dtsi L2 switch description
  ARM: dts: nxp: mxs: Adjust XEA board's DTS to support L2 switch
  net: mtip: The L2 switch driver for imx287
  ARM: mxs_defconfig: Enable CONFIG_FEC_MTIP_L2SW to support MTIP L2
    switch

 .../bindings/net/nxp,imx28-mtip-switch.yaml   |  126 ++
 MAINTAINERS                                   |    7 +
 arch/arm/boot/dts/nxp/mxs/imx28-xea.dts       |   54 +
 arch/arm/boot/dts/nxp/mxs/imx28.dtsi          |    8 +-
 arch/arm/configs/mxs_defconfig                |   14 +-
 drivers/net/ethernet/freescale/Kconfig        |    1 +
 drivers/net/ethernet/freescale/Makefile       |    1 +
 drivers/net/ethernet/freescale/mtipsw/Kconfig |   13 +
 .../net/ethernet/freescale/mtipsw/Makefile    |    3 +
 .../net/ethernet/freescale/mtipsw/mtipl2sw.c  | 1970 +++++++++++++++++
 .../net/ethernet/freescale/mtipsw/mtipl2sw.h  |  782 +++++++
 .../ethernet/freescale/mtipsw/mtipl2sw_br.c   |  122 +
 .../ethernet/freescale/mtipsw/mtipl2sw_mgnt.c |  449 ++++
 13 files changed, 3537 insertions(+), 13 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/nxp,imx28-mtip-switch.yaml
 create mode 100644 drivers/net/ethernet/freescale/mtipsw/Kconfig
 create mode 100644 drivers/net/ethernet/freescale/mtipsw/Makefile
 create mode 100644 drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
 create mode 100644 drivers/net/ethernet/freescale/mtipsw/mtipl2sw.h
 create mode 100644 drivers/net/ethernet/freescale/mtipsw/mtipl2sw_br.c
 create mode 100644 drivers/net/ethernet/freescale/mtipsw/mtipl2sw_mgnt.c

Comments

Simon Horman April 8, 2025, 3:14 p.m. UTC | #1
On Mon, Apr 07, 2025 at 04:51:56PM +0200, Lukasz Majewski wrote:
> This patch series provides support for More Than IP L2 switch embedded
> in the imx287 SoC.
> 
> This is a two port switch (placed between uDMA[01] and MAC-NET[01]),
> which can be used for offloading the network traffic.
> 
> It can be used interchangeably with current FEC driver - to be more
> specific: one can use either of it, depending on the requirements.
> 
> The biggest difference is the usage of DMA - when FEC is used, separate
> DMAs are available for each ENET-MAC block.
> However, with switch enabled - only the DMA0 is used to send/receive data
> to/form switch (and then switch sends them to respecitive ports).
> 
> Signed-off-by: Lukasz Majewski <lukma@denx.de>

Hi Lukasz,

This is not a complete review, but I did spend a bit of time
looking over this and have provided some feedback on
things I noticed below.

...

> diff --git a/drivers/net/ethernet/freescale/mtipsw/Kconfig b/drivers/net/ethernet/freescale/mtipsw/Kconfig
> new file mode 100644
> index 000000000000..450ff734a321
> --- /dev/null
> +++ b/drivers/net/ethernet/freescale/mtipsw/Kconfig
> @@ -0,0 +1,13 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +config FEC_MTIP_L2SW
> +	tristate "MoreThanIP L2 switch support to FEC driver"
> +	depends on OF
> +	depends on NET_SWITCHDEV
> +	depends on BRIDGE
> +	depends on ARCH_MXS || ARCH_MXC || COMPILE_TEST
> +	help
> +	  This enables support for the MoreThan IP L2 switch on i.MX
> +	  SoCs (e.g. iMX28, vf610). It offloads bridging to this IP block's
> +	  hardware and allows switch management with standard Linux tools.
> +	  This switch driver can be used interchangeable with the already
> +	  available FEC driver, depending on the use case's requirments.

nit: requirements

Flagged by checkpatch.pl --codespell

...

> diff --git a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c

...

> +static void mtip_enet_init(struct switch_enet_private *fep, int port)
> +{
> +	void __iomem *enet_addr = fep->enet_addr;
> +	u32 mii_speed, holdtime, tmp;

I think it would be best to avoid variable names like tmp which have little
meaning. Although still rather generic, perhaps reg would be more
appropriate. Or better still something relating to the name the register,
say rcr.

> +
> +	if (port == 2)
> +		enet_addr += MCF_ESW_ENET_PORT_OFFSET;
> +
> +	tmp = MCF_FEC_RCR_PROM | MCF_FEC_RCR_MII_MODE |
> +		MCF_FEC_RCR_MAX_FL(1522);
> +
> +	if (fep->phy_interface[port - 1]  == PHY_INTERFACE_MODE_RMII)
> +		tmp |= MCF_FEC_RCR_RMII_MODE;
> +
> +	writel(tmp, enet_addr + MCF_FEC_RCR);
> +
> +	/* TCR */
> +	writel(MCF_FEC_TCR_FDEN, enet_addr + MCF_FEC_TCR);
> +
> +	/* ECR */
> +	writel(MCF_FEC_ECR_ETHER_EN, enet_addr + MCF_FEC_ECR);
> +
> +	/* Set MII speed to 2.5 MHz
> +	 */
> +	mii_speed = DIV_ROUND_UP(clk_get_rate(fep->clk_ipg), 5000000);
> +	mii_speed--;
> +
> +	/* The i.MX28 and i.MX6 types have another filed in the MSCR (aka
> +	 * MII_SPEED) register that defines the MDIO output hold time. Earlier
> +	 * versions are RAZ there, so just ignore the difference and write the
> +	 * register always.
> +	 * The minimal hold time according to IEE802.3 (clause 22) is 10 ns.
> +	 * HOLDTIME + 1 is the number of clk cycles the fec is holding the
> +	 * output.
> +	 * The HOLDTIME bitfield takes values between 0 and 7 (inclusive).
> +	 * Given that ceil(clkrate / 5000000) <= 64, the calculation for
> +	 * holdtime cannot result in a value greater than 3.
> +	 */
> +	holdtime = DIV_ROUND_UP(clk_get_rate(fep->clk_ipg), 100000000) - 1;
> +
> +	fep->phy_speed = mii_speed << 1 | holdtime << 8;
> +
> +	writel(fep->phy_speed, enet_addr + MCF_FEC_MSCR);
> +}
> +
> +static int mtip_setup_mac(struct net_device *dev)
> +{
> +	struct mtip_ndev_priv *priv = netdev_priv(dev);
> +	struct switch_enet_private *fep = priv->fep;
> +	unsigned char *iap, tmpaddr[ETH_ALEN];

Maybe mac_addr instead of tmpaddr.

> +
> +	/* Use MAC address from DTS */
> +	iap = &fep->mac[priv->portnum - 1][0];
> +
> +	/* Use MAC address set by bootloader */
> +	if (!is_valid_ether_addr(iap)) {
> +		*((unsigned long *)&tmpaddr[0]) =
> +			be32_to_cpu(readl(fep->enet_addr + MCF_FEC_PALR));
> +		*((unsigned short *)&tmpaddr[4]) =
> +			be16_to_cpu(readl(fep->enet_addr +
> +					  MCF_FEC_PAUR) >> 16);

* Above, and elsewhere in this patch unsigned long seems to be
  used for 32 bit values. But unsigned long can be 64 bits wide.

  I would suggest using u32, u16, and friends throughout this
  patch where an integer has a specific number of bits.

* readl returns a 32-bit value in host byte order.
  But the above assumes it returns a big endian value.

  This does not seem correct.

* The point immediately above aside, the assignment of
  host byte order values to the byte-array tmpaddr
  seems to assume an endianness (little endian?).

  It should work on either endian.

> +		iap = &tmpaddr[0];
> +	}
> +
> +	/* Use random MAC address */
> +	if (!is_valid_ether_addr(iap)) {
> +		eth_hw_addr_random(dev);
> +		dev_info(&fep->pdev->dev, "Using random MAC address: %pM\n",
> +			 dev->dev_addr);
> +		iap = (unsigned char *)dev->dev_addr;
> +	}
> +
> +	/* Adjust MAC if using macaddr (and increment if needed) */
> +	eth_hw_addr_gen(dev, iap, priv->portnum - 1);
> +
> +	return 0;
> +}
> +
> +/**
> + * crc8_calc - calculate CRC for MAC storage
> + *
> + * @pmacaddress: A 6-byte array with the MAC address. The first byte is
> + *               the first byte transmitted.
> + *
> + * Calculate Galois Field Arithmetic CRC for Polynom x^8+x^2+x+1.
> + * It omits the final shift in of 8 zeroes a "normal" CRC would do
> + * (getting the remainder).
> + *
> + *  Examples (hexadecimal values):<br>
> + *   10-11-12-13-14-15  => CRC=0xc2
> + *   10-11-cc-dd-ee-00  => CRC=0xe6
> + *
> + * Return: The 8-bit CRC in bits 7:0
> + */
> +static int crc8_calc(unsigned char *pmacaddress)

Can lib/crc8.c:crc8() be used here?

> +{
> +	int byt; /* byte index */
> +	int bit; /* bit index */
> +	int crc = 0x12;
> +	int inval;
> +
> +	for (byt = 0; byt < 6; byt++) {
> +		inval = (((int)pmacaddress[byt]) & 0xff);
> +		/* shift bit 0 to bit 8 so all our bits
> +		 * travel through bit 8
> +		 * (simplifies below calc)
> +		 */
> +		inval <<= 8;
> +
> +		for (bit = 0; bit < 8; bit++) {
> +			/* next input bit comes into d7 after shift */
> +			crc |= inval & 0x100;
> +			if (crc & 0x01)
> +				/* before shift  */
> +				crc ^= 0x1c0;
> +
> +			crc >>= 1;
> +			inval >>= 1;
> +		}
> +	}
> +	/* upper bits are clean as we shifted in zeroes! */
> +	return crc;
> +}
> +
> +static void mtip_read_atable(struct switch_enet_private *fep, int index,
> +			     unsigned long *read_lo, unsigned long *read_hi)
> +{
> +	unsigned long atable_base = (unsigned long)fep->hwentry;
> +
> +	*read_lo = readl((const void *)atable_base + (index << 3));
> +	*read_hi = readl((const void *)atable_base + (index << 3) + 4);
> +}

It is unclear why hwentry, which is a pointer, is being cast to an
integer and then back to a pointer. I see pointer arithmetic, but
that can operate on pointers just as well as integers, without making
assumptions about how wide pointers are with respect to longs.

And in any case, can't the types be used to directly access the
offsets needed like this?

	atable = fep->hwentry.mtip_table64b_entry;

	*read_lo = readl(&atable[index].lo);
	*read_hi = readl(&atable[index].hi);

Also, and perhaps more importantly, readl expects to be passed
a pointer to __iomem. But the appropriate annotations seem
to be missing (forcing them with a cast is not advisable here IMHO).

Please do run sparse over your patches to iron out __iomem
(and endian) issues.

> +
> +static void mtip_write_atable(struct switch_enet_private *fep, int index,
> +			      unsigned long write_lo, unsigned long write_hi)
> +{
> +	unsigned long atable_base = (unsigned long)fep->hwentry;
> +
> +	writel(write_lo, (void *)atable_base + (index << 3));
> +	writel(write_hi, (void *)atable_base + (index << 3) + 4);

Likewise here.

> +}

...

> +/* Clear complete MAC Look Up Table */
> +void mtip_clear_atable(struct switch_enet_private *fep)
> +{
> +	int index;
> +
> +	for (index = 0; index < 2048; index++)
> +		mtip_write_atable(fep, index, 0, 0);
> +}
> +
> +/**
> + * mtip_update_atable_static - Update switch static address table
> + *
> + * @mac_addr: Pointer to the array containing MAC address to
> + *            be put as static entry
> + * @port:     Port bitmask numbers to be added in static entry,
> + *            valid values are 1-7
> + * @priority: The priority for the static entry in table

@fep should also be documented here.

Flagged by ./scripts/kernel-doc -none
and W=1 builds.

> + *
> + * Updates MAC address lookup table with a static entry.
> + *
> + * Searches if the MAC address is already there in the block and replaces
> + * the older entry with the new one. If MAC address is not there then puts
> + * a new entry in the first empty slot available in the block.
> + *
> + * Return: 0 for a successful update else -ENOSPC when no slot available
> + */
> +static int mtip_update_atable_static(unsigned char *mac_addr, unsigned int port,
> +				     unsigned int priority,
> +				     struct switch_enet_private *fep)

...

> +/* During a receive, the cur_rx points to the current incoming buffer.
> + * When we update through the ring, if the next incoming buffer has
> + * not been given to the system, we just set the empty indicator,
> + * effectively tossing the packet.
> + */
> +static int mtip_switch_rx(struct net_device *dev, int budget, int *port)
> +{
> +	struct mtip_ndev_priv *priv = netdev_priv(dev);
> +	struct switch_enet_private *fep = priv->fep;
> +	struct switch_t *fecp = fep->hwp;
> +	unsigned short status, pkt_len;
> +	struct net_device *pndev;
> +	u8 *data, rx_port = 0xFF;
> +	struct ethhdr *eth_hdr;
> +	int pkt_received = 0;
> +	struct sk_buff *skb;
> +	unsigned long flags;
> +	struct cbd_t *bdp;
> +
> +	spin_lock_irqsave(&fep->hw_lock, flags);
> +
> +	/* First, grab all of the stats for the incoming packet.
> +	 * These get messed up if we get called due to a busy condition.
> +	 */
> +	bdp = fep->cur_rx;
> +
> +	while (!((status = bdp->cbd_sc) & BD_ENET_RX_EMPTY)) {

...

> +	} /* while (!((status = bdp->cbd_sc) & BD_ENET_RX_EMPTY)) */
> +
> +	writel(bdp, &fep->cur_rx);

I'm confused buy this.

At the top of this function, bdp is assigned using:

	bdp = fep->cur_rx;

But here writel() is used to assign bdp to &fep->cur_rx.
Which assumes that bdp is a 32-bit little endian value.
But it is a pointer in host byte order which may be wide than 32-bits.

On x86_64 int is 32-bits while pointers are 64 bits.
W=1 builds with gcc 14.2.0 flag this problem like this:


.../mtipl2sw.c:1108:9: error: incompatible pointer to integer conversion passing 'struct cbd_t *' to parameter of type 'unsigned int' [-Wint-conversion]
 1108 |         writel(bdp, &fep->cur_rx);
      |                ^~~


This also assumes that &fep->cur_rx is a pointer to __iomem,
but that does not seem to be the case.

> +	spin_unlock_irqrestore(&fep->hw_lock, flags);
> +
> +	return pkt_received;
> +}

...

> +static int __init mtip_switch_dma_init(struct switch_enet_private *fep)
> +{
> +	struct cbd_t *bdp, *cbd_base;
> +	int ret, i;
> +
> +	/* Check mask of the streaming and coherent API */
> +	ret = dma_set_mask_and_coherent(&fep->pdev->dev, DMA_BIT_MASK(32));
> +	if (ret < 0) {
> +		dev_err(&fep->pdev->dev, "No suitable DMA available\n");
> +		return ret;
> +	}
> +
> +	/* Allocate memory for buffer descriptors */
> +	cbd_base = dma_alloc_coherent(&fep->pdev->dev, PAGE_SIZE, &fep->bd_dma,
> +				      GFP_KERNEL);
> +	if (!cbd_base)
> +		return -ENOMEM;
> +
> +	/* Set receive and transmit descriptor base */
> +	fep->rx_bd_base = cbd_base;
> +	fep->tx_bd_base = cbd_base + RX_RING_SIZE;
> +
> +	/* Initialize the receive buffer descriptors */
> +	bdp = fep->rx_bd_base;
> +	for (i = 0; i < RX_RING_SIZE; i++) {
> +		bdp->cbd_sc = 0;
> +		bdp++;
> +	}
> +
> +	/* Set the last buffer to wrap */
> +	bdp--;
> +	bdp->cbd_sc |= BD_SC_WRAP;
> +
> +	/* ...and the same for transmmit */

nit: transmit

> +	bdp = fep->tx_bd_base;
> +	for (i = 0; i < TX_RING_SIZE; i++) {
> +		/* Initialize the BD for every fragment in the page */
> +		bdp->cbd_sc = 0;
> +		bdp->cbd_bufaddr = 0;
> +		bdp++;
> +	}
> +
> +	/* Set the last buffer to wrap */
> +	bdp--;
> +	bdp->cbd_sc |= BD_SC_WRAP;
> +
> +	return 0;
> +}
> +
> +static void mtip_ndev_cleanup(struct switch_enet_private *fep)
> +{
> +	int i;
> +
> +	for (i = 0; i < SWITCH_EPORT_NUMBER; i++) {
> +		if (fep->ndev[i]) {
> +			unregister_netdev(fep->ndev[i]);
> +			free_netdev(fep->ndev[i]);
> +		}
> +	}
> +}

...

> +static const struct of_device_id mtipl2_of_match[] = {
> +	{ .compatible = "nxp,imx28-mtip-switch",
> +	  .data = &mtip_imx28_l2switch_info},
> +	{ /* sentinel */ }
> +}

There should be a trailing ';' on the line above.

...

> +struct  addr_table64b_entry {

One space after struct is enough.

> +	unsigned int lo;  /* lower 32 bits */
> +	unsigned int hi;  /* upper 32 bits */
> +};
> +
> +struct  mtip_addr_table_t {

I think you can drop the '_t' from the name of this struct.
We know it is a type :)

> +	struct addr_table64b_entry  mtip_table64b_entry[2048];

One space is enough after addr_table64b_entry.
And in general, unless the aim is to align field names
(not here because there is only one field :)

> +};
> +
> +#define MCF_ESW_LOOKUP_MEM_OFFSET      0x4000
> +#define MCF_ESW_ENET_PORT_OFFSET      0x4000
> +#define ENET_SWI_PHYS_ADDR_OFFSET	0x8000

Ditto.

...

> diff --git a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw_br.c b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw_br.c

> new file mode 100644
> index 000000000000..0b76a60858a5
> --- /dev/null
> +++ b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw_br.c
> @@ -0,0 +1,122 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + *  L2 switch Controller driver for MTIP block - bridge network interface
> + *
> + *  Copyright (C) 2025 DENX Software Engineering GmbH
> + *  Lukasz Majewski <lukma@denx.de>
> + */
> +
> +#include <linux/netdevice.h>
> +#include <linux/etherdevice.h>
> +#include <linux/platform_device.h>
> +
> +#include "mtipl2sw.h"

Blank line here please.

> +static int mtip_ndev_port_link(struct net_device *ndev,
> +			       struct net_device *br_ndev,
> +			       struct netlink_ext_ack *extack)
> +{
> +	struct mtip_ndev_priv *priv = netdev_priv(ndev), *other_priv;
> +	struct switch_enet_private *fep = priv->fep;
> +	struct net_device *other_ndev;
> +
> +	/* Check if one port of MTIP switch is already bridged */
> +	if (fep->br_members && !fep->br_offload) {
> +		/* Get the second bridge ndev */
> +		other_ndev = fep->ndev[fep->br_members - 1];
> +		other_priv = netdev_priv(other_ndev);
> +		if (other_priv->master_dev != br_ndev) {
> +			NL_SET_ERR_MSG_MOD(extack,
> +					   "L2 offloading only possible for the same bridge!");
> +			return notifier_from_errno(-EOPNOTSUPP);
> +		}
> +
> +		fep->br_offload = 1;
> +		mtip_switch_dis_port_separation(fep);
> +		mtip_clear_atable(fep);
> +	}
> +
> +	if (!priv->master_dev)
> +		priv->master_dev = br_ndev;
> +
> +	fep->br_members |= BIT(priv->portnum - 1);
> +
> +	dev_dbg(&ndev->dev,
> +		"%s: ndev: %s br: %s fep: 0x%x members: 0x%x offload: %d\n",
> +		__func__, ndev->name,  br_ndev->name, (unsigned int)fep,

Perhaps it would be best to use %p as the format specifier for fep
and not cast it.

On x86_64 int is 32-bits while pointers are 64 bits.
W=1 builds with gcc 14.2.0 flag this problem like this:

.../mtipl2sw_br.c:45:55: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
   45 |                 __func__, ndev->name,  br_ndev->name, (unsigned int)fep,
      |                                                       ^

> +		fep->br_members, fep->br_offload);
> +
> +	return NOTIFY_DONE;
> +}
> +
> +static void mtip_netdevice_port_unlink(struct net_device *ndev)
> +{
> +	struct mtip_ndev_priv *priv = netdev_priv(ndev);
> +	struct switch_enet_private *fep = priv->fep;
> +
> +	dev_dbg(&ndev->dev, "%s: ndev: %s members: 0x%x\n", __func__,
> +		ndev->name, fep->br_members);
> +
> +	fep->br_members &= ~BIT(priv->portnum - 1);
> +	priv->master_dev = NULL;
> +
> +	if (!fep->br_members) {
> +		fep->br_offload = 0;
> +		mtip_switch_en_port_separation(fep);
> +		mtip_clear_atable(fep);
> +	}
> +}

...
Lukasz Majewski April 9, 2025, 2:28 p.m. UTC | #2
Hi Simon,

> On Mon, Apr 07, 2025 at 04:51:56PM +0200, Lukasz Majewski wrote:
> > This patch series provides support for More Than IP L2 switch
> > embedded in the imx287 SoC.
> > 
> > This is a two port switch (placed between uDMA[01] and MAC-NET[01]),
> > which can be used for offloading the network traffic.
> > 
> > It can be used interchangeably with current FEC driver - to be more
> > specific: one can use either of it, depending on the requirements.
> > 
> > The biggest difference is the usage of DMA - when FEC is used,
> > separate DMAs are available for each ENET-MAC block.
> > However, with switch enabled - only the DMA0 is used to
> > send/receive data to/form switch (and then switch sends them to
> > respecitive ports).
> > 
> > Signed-off-by: Lukasz Majewski <lukma@denx.de>  
> 
> Hi Lukasz,
> 
> This is not a complete review, but I did spend a bit of time
> looking over this and have provided some feedback on
> things I noticed below.
> 

Thanks for your feedback.

> ...
> 
> > diff --git a/drivers/net/ethernet/freescale/mtipsw/Kconfig
> > b/drivers/net/ethernet/freescale/mtipsw/Kconfig new file mode 100644
> > index 000000000000..450ff734a321
> > --- /dev/null
> > +++ b/drivers/net/ethernet/freescale/mtipsw/Kconfig
> > @@ -0,0 +1,13 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +config FEC_MTIP_L2SW
> > +	tristate "MoreThanIP L2 switch support to FEC driver"
> > +	depends on OF
> > +	depends on NET_SWITCHDEV
> > +	depends on BRIDGE
> > +	depends on ARCH_MXS || ARCH_MXC || COMPILE_TEST
> > +	help
> > +	  This enables support for the MoreThan IP L2 switch on
> > i.MX
> > +	  SoCs (e.g. iMX28, vf610). It offloads bridging to this
> > IP block's
> > +	  hardware and allows switch management with standard
> > Linux tools.
> > +	  This switch driver can be used interchangeable with the
> > already
> > +	  available FEC driver, depending on the use case's
> > requirments.  
> 
> nit: requirements
> 
> Flagged by checkpatch.pl --codespell
> 

Ok.

> ...
> 
> > diff --git a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
> > b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c  
> 
> ...
> 
> > +static void mtip_enet_init(struct switch_enet_private *fep, int
> > port) +{
> > +	void __iomem *enet_addr = fep->enet_addr;
> > +	u32 mii_speed, holdtime, tmp;  
> 
> I think it would be best to avoid variable names like tmp which have
> little meaning. Although still rather generic, perhaps reg would be
> more appropriate. Or better still something relating to the name the
> register, say rcr.

Ok, I will use reg/rcr instead of tmp.

> 
> > +
> > +	if (port == 2)
> > +		enet_addr += MCF_ESW_ENET_PORT_OFFSET;
> > +
> > +	tmp = MCF_FEC_RCR_PROM | MCF_FEC_RCR_MII_MODE |
> > +		MCF_FEC_RCR_MAX_FL(1522);
> > +
> > +	if (fep->phy_interface[port - 1]  ==
> > PHY_INTERFACE_MODE_RMII)
> > +		tmp |= MCF_FEC_RCR_RMII_MODE;
> > +
> > +	writel(tmp, enet_addr + MCF_FEC_RCR);
> > +
> > +	/* TCR */
> > +	writel(MCF_FEC_TCR_FDEN, enet_addr + MCF_FEC_TCR);
> > +
> > +	/* ECR */
> > +	writel(MCF_FEC_ECR_ETHER_EN, enet_addr + MCF_FEC_ECR);
> > +
> > +	/* Set MII speed to 2.5 MHz
> > +	 */
> > +	mii_speed = DIV_ROUND_UP(clk_get_rate(fep->clk_ipg),
> > 5000000);
> > +	mii_speed--;
> > +
> > +	/* The i.MX28 and i.MX6 types have another filed in the
> > MSCR (aka
> > +	 * MII_SPEED) register that defines the MDIO output hold
> > time. Earlier
> > +	 * versions are RAZ there, so just ignore the difference
> > and write the
> > +	 * register always.
> > +	 * The minimal hold time according to IEE802.3 (clause 22)
> > is 10 ns.
> > +	 * HOLDTIME + 1 is the number of clk cycles the fec is
> > holding the
> > +	 * output.
> > +	 * The HOLDTIME bitfield takes values between 0 and 7
> > (inclusive).
> > +	 * Given that ceil(clkrate / 5000000) <= 64, the
> > calculation for
> > +	 * holdtime cannot result in a value greater than 3.
> > +	 */
> > +	holdtime = DIV_ROUND_UP(clk_get_rate(fep->clk_ipg),
> > 100000000) - 1; +
> > +	fep->phy_speed = mii_speed << 1 | holdtime << 8;
> > +
> > +	writel(fep->phy_speed, enet_addr + MCF_FEC_MSCR);
> > +}
> > +
> > +static int mtip_setup_mac(struct net_device *dev)
> > +{
> > +	struct mtip_ndev_priv *priv = netdev_priv(dev);
> > +	struct switch_enet_private *fep = priv->fep;
> > +	unsigned char *iap, tmpaddr[ETH_ALEN];  
> 
> Maybe mac_addr instead of tmpaddr.

Ok.

> 
> > +
> > +	/* Use MAC address from DTS */
> > +	iap = &fep->mac[priv->portnum - 1][0];
> > +
> > +	/* Use MAC address set by bootloader */
> > +	if (!is_valid_ether_addr(iap)) {
> > +		*((unsigned long *)&tmpaddr[0]) =
> > +			be32_to_cpu(readl(fep->enet_addr +
> > MCF_FEC_PALR));
> > +		*((unsigned short *)&tmpaddr[4]) =
> > +			be16_to_cpu(readl(fep->enet_addr +
> > +					  MCF_FEC_PAUR) >> 16);  
> 
> * Above, and elsewhere in this patch unsigned long seems to be
>   used for 32 bit values. But unsigned long can be 64 bits wide.
> 

As fair as I know - from the outset - this driver had some implicit
assumption to be used on 32 bit SoCs (imx28, vf610).

As a result the unsigned long would be 32 bits.

On the other hand - the documentation clearly says that registers in
this IP block implementation are 32 bit wide.

>   I would suggest using u32, u16, and friends throughout this
>   patch where an integer has a specific number of bits.
> 

I think that values, which directly readl()/writel() values from the
registers shall be u32. However, more generic code could use int/
unsigned int though.

> * readl returns a 32-bit value in host byte order.
>   But the above assumes it returns a big endian value.
> 
>   This does not seem correct.
> 

Please consult similar implementation from fec_main.c:
https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/net/ethernet/freescale/fec_main.c#L2044

I would use the same approach as in the above link.

> * The point immediately above aside, the assignment of
>   host byte order values to the byte-array tmpaddr
>   seems to assume an endianness (little endian?).
> 
>   It should work on either endian.
> 

I guess that implementation from above link is the correct one.

> > +		iap = &tmpaddr[0];
> > +	}
> > +
> > +	/* Use random MAC address */
> > +	if (!is_valid_ether_addr(iap)) {
> > +		eth_hw_addr_random(dev);
> > +		dev_info(&fep->pdev->dev, "Using random MAC
> > address: %pM\n",
> > +			 dev->dev_addr);
> > +		iap = (unsigned char *)dev->dev_addr;
> > +	}
> > +
> > +	/* Adjust MAC if using macaddr (and increment if needed) */
> > +	eth_hw_addr_gen(dev, iap, priv->portnum - 1);
> > +
> > +	return 0;
> > +}
> > +
> > +/**
> > + * crc8_calc - calculate CRC for MAC storage
> > + *
> > + * @pmacaddress: A 6-byte array with the MAC address. The first
> > byte is
> > + *               the first byte transmitted.
> > + *
> > + * Calculate Galois Field Arithmetic CRC for Polynom x^8+x^2+x+1.
> > + * It omits the final shift in of 8 zeroes a "normal" CRC would do
> > + * (getting the remainder).
> > + *
> > + *  Examples (hexadecimal values):<br>
> > + *   10-11-12-13-14-15  => CRC=0xc2
> > + *   10-11-cc-dd-ee-00  => CRC=0xe6
> > + *
> > + * Return: The 8-bit CRC in bits 7:0
> > + */
> > +static int crc8_calc(unsigned char *pmacaddress)  
> 
> Can lib/crc8.c:crc8() be used here?

This seems a bit problematic, as the above code is taken from the
vendor driver.

The documentation states - that the hash uses CRC-8 with following
polynomial:

x^8 + x^2 + x + 1 (0x07), which shall correspond to available in Linux:

static unsigned char mac1[] = {0x10, 0x11, 0x12, 0x13, 0x14, 0x15};
crc8_populate_msb(mtipl2sw_crc8_table, 0x07);
crc8(mtipl2sw_crc8_table, mac1, sizeof(mac1), CRC8_INIT_VALUE);

However, those results don't match (either with 0xFF and 0x00 as the
initial value).

I've also searched the Internet to compare different CRC
implementations used in field:
https://crccalc.com/?crc=0x10 0x11 0x12 0x13 0x14
0x15&method=CRC-8&datatype=hex&outtype=hex

they also doesn't match results from this code.

Hence, the question - which implementation is right?

Vendor developer could have known a bit more than we do, so I would
prefer to keep this code as is - especially that switch internally may
use this crc8 calculation to find proper "slot/index" in the atable.

> 
> > +{
> > +	int byt; /* byte index */
> > +	int bit; /* bit index */
> > +	int crc = 0x12;
> > +	int inval;
> > +
> > +	for (byt = 0; byt < 6; byt++) {
> > +		inval = (((int)pmacaddress[byt]) & 0xff);
> > +		/* shift bit 0 to bit 8 so all our bits
> > +		 * travel through bit 8
> > +		 * (simplifies below calc)
> > +		 */
> > +		inval <<= 8;
> > +
> > +		for (bit = 0; bit < 8; bit++) {
> > +			/* next input bit comes into d7 after
> > shift */
> > +			crc |= inval & 0x100;
> > +			if (crc & 0x01)
> > +				/* before shift  */
> > +				crc ^= 0x1c0;
> > +
> > +			crc >>= 1;
> > +			inval >>= 1;
> > +		}
> > +	}
> > +	/* upper bits are clean as we shifted in zeroes! */
> > +	return crc;
> > +}
> > +
> > +static void mtip_read_atable(struct switch_enet_private *fep, int
> > index,
> > +			     unsigned long *read_lo, unsigned long
> > *read_hi) +{
> > +	unsigned long atable_base = (unsigned long)fep->hwentry;
> > +
> > +	*read_lo = readl((const void *)atable_base + (index << 3));
> > +	*read_hi = readl((const void *)atable_base + (index << 3)
> > + 4); +}  
> 
> It is unclear why hwentry, which is a pointer, is being cast to an
> integer and then back to a pointer. I see pointer arithmetic, but
> that can operate on pointers just as well as integers, without making
> assumptions about how wide pointers are with respect to longs.
> 
> And in any case, can't the types be used to directly access the
> offsets needed like this?
> 
> 	atable = fep->hwentry.mtip_table64b_entry;
> 
> 	*read_lo = readl(&atable[index].lo);
> 	*read_hi = readl(&atable[index].hi);
> 

The code as is seems to be OK.

The (atable) memory structure is as follows:

1. You can store 2048 MAC addresses (2x32 bit each).

2. Memory from point 1 is addressed as follows:
	2.1 -> from MAC address the CRC8 is calculated (0x00 - 0xFF).
	This is the 'index' in the original code.
	2.2 -> as it may happen that for two different MAC address the
	same CRC8 is calculated (i.e. 'index' is the same), each
	'index' can store 8 entries for MAC addresses (and it is
	searched in a linear way if needed).

IMHO, the index above shall be multiplied by 8.

> Also, and perhaps more importantly, readl expects to be passed
> a pointer to __iomem. But the appropriate annotations seem
> to be missing (forcing them with a cast is not advisable here IMHO).
> 

I think that the code below:
unsigned long atable_base = (unsigned long)fep->hwentry;

could be replaced with
void __iomem *atable_base = fep->hwentry;

and the (index << 3) with (index * ATABLE_ENTRY_PER_SLOT)

> Please do run sparse over your patches to iron out __iomem
> (and endian) issues.
> 

Ok, I will run the make C=1 when compiling.

> > +
> > +static void mtip_write_atable(struct switch_enet_private *fep, int
> > index,
> > +			      unsigned long write_lo, unsigned
> > long write_hi) +{
> > +	unsigned long atable_base = (unsigned long)fep->hwentry;
> > +
> > +	writel(write_lo, (void *)atable_base + (index << 3));
> > +	writel(write_hi, (void *)atable_base + (index << 3) + 4);  
> 
> Likewise here.

Please see the above comment.

> 
> > +}  
> 
> ...
> 
> > +/* Clear complete MAC Look Up Table */
> > +void mtip_clear_atable(struct switch_enet_private *fep)
> > +{
> > +	int index;
> > +
> > +	for (index = 0; index < 2048; index++)
> > +		mtip_write_atable(fep, index, 0, 0);
> > +}
> > +
> > +/**
> > + * mtip_update_atable_static - Update switch static address table
> > + *
> > + * @mac_addr: Pointer to the array containing MAC address to
> > + *            be put as static entry
> > + * @port:     Port bitmask numbers to be added in static entry,
> > + *            valid values are 1-7
> > + * @priority: The priority for the static entry in table  
> 
> @fep should also be documented here.
> 

OK.

> Flagged by ./scripts/kernel-doc -none
> and W=1 builds.
> 
> > + *
> > + * Updates MAC address lookup table with a static entry.
> > + *
> > + * Searches if the MAC address is already there in the block and
> > replaces
> > + * the older entry with the new one. If MAC address is not there
> > then puts
> > + * a new entry in the first empty slot available in the block.
> > + *
> > + * Return: 0 for a successful update else -ENOSPC when no slot
> > available
> > + */
> > +static int mtip_update_atable_static(unsigned char *mac_addr,
> > unsigned int port,
> > +				     unsigned int priority,
> > +				     struct switch_enet_private
> > *fep)  
> 
> ...
> 
> > +/* During a receive, the cur_rx points to the current incoming
> > buffer.
> > + * When we update through the ring, if the next incoming buffer has
> > + * not been given to the system, we just set the empty indicator,
> > + * effectively tossing the packet.
> > + */
> > +static int mtip_switch_rx(struct net_device *dev, int budget, int
> > *port) +{
> > +	struct mtip_ndev_priv *priv = netdev_priv(dev);
> > +	struct switch_enet_private *fep = priv->fep;
> > +	struct switch_t *fecp = fep->hwp;
> > +	unsigned short status, pkt_len;
> > +	struct net_device *pndev;
> > +	u8 *data, rx_port = 0xFF;
> > +	struct ethhdr *eth_hdr;
> > +	int pkt_received = 0;
> > +	struct sk_buff *skb;
> > +	unsigned long flags;
> > +	struct cbd_t *bdp;
> > +
> > +	spin_lock_irqsave(&fep->hw_lock, flags);
> > +
> > +	/* First, grab all of the stats for the incoming packet.
> > +	 * These get messed up if we get called due to a busy
> > condition.
> > +	 */
> > +	bdp = fep->cur_rx;
> > +
> > +	while (!((status = bdp->cbd_sc) & BD_ENET_RX_EMPTY)) {  
> 
> ...
> 
> > +	} /* while (!((status = bdp->cbd_sc) & BD_ENET_RX_EMPTY))
> > */ +
> > +	writel(bdp, &fep->cur_rx);  
> 
> I'm confused buy this.
> 
> At the top of this function, bdp is assigned using:
> 
> 	bdp = fep->cur_rx;
> 
> But here writel() is used to assign bdp to &fep->cur_rx.
> Which assumes that bdp is a 32-bit little endian value.
> But it is a pointer in host byte order which may be wide than 32-bits.
> 
> On x86_64 int is 32-bits while pointers are 64 bits.

I will cross compile with make W=1 and C=1 to find all problematic
places.

> W=1 builds with gcc 14.2.0 flag this problem like this:
> 
> 
> .../mtipl2sw.c:1108:9: error: incompatible pointer to integer
> conversion passing 'struct cbd_t *' to parameter of type 'unsigned
> int' [-Wint-conversion] 1108 |         writel(bdp, &fep->cur_rx); |
>              ^~~
> 
> 
> This also assumes that &fep->cur_rx is a pointer to __iomem,
> but that does not seem to be the case.

The writel(bdp, &fep->cur_rx); shall be just replaced with fep->cur_rx
= bdp.

Thanks for spotting.

> 
> > +	spin_unlock_irqrestore(&fep->hw_lock, flags);
> > +
> > +	return pkt_received;
> > +}  
> 
> ...
> 
> > +static int __init mtip_switch_dma_init(struct switch_enet_private
> > *fep) +{
> > +	struct cbd_t *bdp, *cbd_base;
> > +	int ret, i;
> > +
> > +	/* Check mask of the streaming and coherent API */
> > +	ret = dma_set_mask_and_coherent(&fep->pdev->dev,
> > DMA_BIT_MASK(32));
> > +	if (ret < 0) {
> > +		dev_err(&fep->pdev->dev, "No suitable DMA
> > available\n");
> > +		return ret;
> > +	}
> > +
> > +	/* Allocate memory for buffer descriptors */
> > +	cbd_base = dma_alloc_coherent(&fep->pdev->dev, PAGE_SIZE,
> > &fep->bd_dma,
> > +				      GFP_KERNEL);
> > +	if (!cbd_base)
> > +		return -ENOMEM;
> > +
> > +	/* Set receive and transmit descriptor base */
> > +	fep->rx_bd_base = cbd_base;
> > +	fep->tx_bd_base = cbd_base + RX_RING_SIZE;
> > +
> > +	/* Initialize the receive buffer descriptors */
> > +	bdp = fep->rx_bd_base;
> > +	for (i = 0; i < RX_RING_SIZE; i++) {
> > +		bdp->cbd_sc = 0;
> > +		bdp++;
> > +	}
> > +
> > +	/* Set the last buffer to wrap */
> > +	bdp--;
> > +	bdp->cbd_sc |= BD_SC_WRAP;
> > +
> > +	/* ...and the same for transmmit */  
> 
> nit: transmit

Ok.

> 
> > +	bdp = fep->tx_bd_base;
> > +	for (i = 0; i < TX_RING_SIZE; i++) {
> > +		/* Initialize the BD for every fragment in the
> > page */
> > +		bdp->cbd_sc = 0;
> > +		bdp->cbd_bufaddr = 0;
> > +		bdp++;
> > +	}
> > +
> > +	/* Set the last buffer to wrap */
> > +	bdp--;
> > +	bdp->cbd_sc |= BD_SC_WRAP;
> > +
> > +	return 0;
> > +}
> > +
> > +static void mtip_ndev_cleanup(struct switch_enet_private *fep)
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < SWITCH_EPORT_NUMBER; i++) {
> > +		if (fep->ndev[i]) {
> > +			unregister_netdev(fep->ndev[i]);
> > +			free_netdev(fep->ndev[i]);
> > +		}
> > +	}
> > +}  
> 
> ...
> 
> > +static const struct of_device_id mtipl2_of_match[] = {
> > +	{ .compatible = "nxp,imx28-mtip-switch",
> > +	  .data = &mtip_imx28_l2switch_info},
> > +	{ /* sentinel */ }
> > +}  
> 
> There should be a trailing ';' on the line above.
> 

+1

> ...
> 
> > +struct  addr_table64b_entry {  
> 
> One space after struct is enough.
> 
> > +	unsigned int lo;  /* lower 32 bits */
> > +	unsigned int hi;  /* upper 32 bits */
> > +};
> > +
> > +struct  mtip_addr_table_t {  
> 
> I think you can drop the '_t' from the name of this struct.
> We know it is a type :)

Ok.

> 
> > +	struct addr_table64b_entry  mtip_table64b_entry[2048];  
> 
> One space is enough after addr_table64b_entry.
> And in general, unless the aim is to align field names
> (not here because there is only one field :)
> 
> > +};
> > +
> > +#define MCF_ESW_LOOKUP_MEM_OFFSET      0x4000
> > +#define MCF_ESW_ENET_PORT_OFFSET      0x4000
> > +#define ENET_SWI_PHYS_ADDR_OFFSET	0x8000  
> 
> Ditto.
> 

+1

> ...
> 
> > diff --git a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw_br.c
> > b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw_br.c  
> 
> > new file mode 100644
> > index 000000000000..0b76a60858a5
> > --- /dev/null
> > +++ b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw_br.c
> > @@ -0,0 +1,122 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + *  L2 switch Controller driver for MTIP block - bridge network
> > interface
> > + *
> > + *  Copyright (C) 2025 DENX Software Engineering GmbH
> > + *  Lukasz Majewski <lukma@denx.de>
> > + */
> > +
> > +#include <linux/netdevice.h>
> > +#include <linux/etherdevice.h>
> > +#include <linux/platform_device.h>
> > +
> > +#include "mtipl2sw.h"  
> 
> Blank line here please.
> 

Ok.

> > +static int mtip_ndev_port_link(struct net_device *ndev,
> > +			       struct net_device *br_ndev,
> > +			       struct netlink_ext_ack *extack)
> > +{
> > +	struct mtip_ndev_priv *priv = netdev_priv(ndev),
> > *other_priv;
> > +	struct switch_enet_private *fep = priv->fep;
> > +	struct net_device *other_ndev;
> > +
> > +	/* Check if one port of MTIP switch is already bridged */
> > +	if (fep->br_members && !fep->br_offload) {
> > +		/* Get the second bridge ndev */
> > +		other_ndev = fep->ndev[fep->br_members - 1];
> > +		other_priv = netdev_priv(other_ndev);
> > +		if (other_priv->master_dev != br_ndev) {
> > +			NL_SET_ERR_MSG_MOD(extack,
> > +					   "L2 offloading only
> > possible for the same bridge!");
> > +			return notifier_from_errno(-EOPNOTSUPP);
> > +		}
> > +
> > +		fep->br_offload = 1;
> > +		mtip_switch_dis_port_separation(fep);
> > +		mtip_clear_atable(fep);
> > +	}
> > +
> > +	if (!priv->master_dev)
> > +		priv->master_dev = br_ndev;
> > +
> > +	fep->br_members |= BIT(priv->portnum - 1);
> > +
> > +	dev_dbg(&ndev->dev,
> > +		"%s: ndev: %s br: %s fep: 0x%x members: 0x%x
> > offload: %d\n",
> > +		__func__, ndev->name,  br_ndev->name, (unsigned
> > int)fep,  
> 
> Perhaps it would be best to use %p as the format specifier for fep
> and not cast it.
> 
> On x86_64 int is 32-bits while pointers are 64 bits.
> W=1 builds with gcc 14.2.0 flag this problem like this:
> 
> .../mtipl2sw_br.c:45:55: warning: cast from pointer to integer of
> different size [-Wpointer-to-int-cast] 45 |                 __func__,
> ndev->name,  br_ndev->name, (unsigned int)fep, |
>                                  ^

Ok.

> 
> > +		fep->br_members, fep->br_offload);
> > +
> > +	return NOTIFY_DONE;
> > +}
> > +
> > +static void mtip_netdevice_port_unlink(struct net_device *ndev)
> > +{
> > +	struct mtip_ndev_priv *priv = netdev_priv(ndev);
> > +	struct switch_enet_private *fep = priv->fep;
> > +
> > +	dev_dbg(&ndev->dev, "%s: ndev: %s members: 0x%x\n",
> > __func__,
> > +		ndev->name, fep->br_members);
> > +
> > +	fep->br_members &= ~BIT(priv->portnum - 1);
> > +	priv->master_dev = NULL;
> > +
> > +	if (!fep->br_members) {
> > +		fep->br_offload = 0;
> > +		mtip_switch_en_port_separation(fep);
> > +		mtip_clear_atable(fep);
> > +	}
> > +}  
> 
> ...
> 




Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Erika Unter
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
Andrew Lunn April 9, 2025, 9:26 p.m. UTC | #3
> +static int mtip_ndev_port_link(struct net_device *ndev,
> +			       struct net_device *br_ndev,
> +			       struct netlink_ext_ack *extack)
> +{
> +	struct mtip_ndev_priv *priv = netdev_priv(ndev), *other_priv;
> +	struct switch_enet_private *fep = priv->fep;
> +	struct net_device *other_ndev;
> +
> +	/* Check if one port of MTIP switch is already bridged */
> +	if (fep->br_members && !fep->br_offload) {
> +		/* Get the second bridge ndev */
> +		other_ndev = fep->ndev[fep->br_members - 1];
> +		other_priv = netdev_priv(other_ndev);
> +		if (other_priv->master_dev != br_ndev) {
> +			NL_SET_ERR_MSG_MOD(extack,
> +					   "L2 offloading only possible for the same bridge!");
> +			return notifier_from_errno(-EOPNOTSUPP);

This is not an error condition as such. The switch cannot do it, so
-EOPNOTSUPP is correct, but the Linux software bridge can, and it
will. So there is no need to use NL_SET_ERR_MSG_MOD().


> +		}
> +
> +		fep->br_offload = 1;
> +		mtip_switch_dis_port_separation(fep);
> +		mtip_clear_atable(fep);
> +	}
> +
> +	if (!priv->master_dev)
> +		priv->master_dev = br_ndev;
> +
> +	fep->br_members |= BIT(priv->portnum - 1);
> +
> +	dev_dbg(&ndev->dev,
> +		"%s: ndev: %s br: %s fep: 0x%x members: 0x%x offload: %d\n",
> +		__func__, ndev->name,  br_ndev->name, (unsigned int)fep,
> +		fep->br_members, fep->br_offload);
> +
> +	return NOTIFY_DONE;
> +}
> +
> +static void mtip_netdevice_port_unlink(struct net_device *ndev)
> +{
> +	struct mtip_ndev_priv *priv = netdev_priv(ndev);
> +	struct switch_enet_private *fep = priv->fep;
> +
> +	dev_dbg(&ndev->dev, "%s: ndev: %s members: 0x%x\n", __func__,
> +		ndev->name, fep->br_members);
> +
> +	fep->br_members &= ~BIT(priv->portnum - 1);
> +	priv->master_dev = NULL;
> +
> +	if (!fep->br_members) {
> +		fep->br_offload = 0;
> +		mtip_switch_en_port_separation(fep);
> +		mtip_clear_atable(fep);
> +	}

This does not look quite correct. So you disable port separation once
both ports are a member of the same bridge. So you should enable port
separation as soon as one leaves. So if fep->br_members has 0 or 1
bits set, you need to enable port separation.

	Andrew
Lukasz Majewski April 10, 2025, 7:35 a.m. UTC | #4
Hi Andrew,

> > +static int mtip_ndev_port_link(struct net_device *ndev,
> > +			       struct net_device *br_ndev,
> > +			       struct netlink_ext_ack *extack)
> > +{
> > +	struct mtip_ndev_priv *priv = netdev_priv(ndev),
> > *other_priv;
> > +	struct switch_enet_private *fep = priv->fep;
> > +	struct net_device *other_ndev;
> > +
> > +	/* Check if one port of MTIP switch is already bridged */
> > +	if (fep->br_members && !fep->br_offload) {
> > +		/* Get the second bridge ndev */
> > +		other_ndev = fep->ndev[fep->br_members - 1];
> > +		other_priv = netdev_priv(other_ndev);
> > +		if (other_priv->master_dev != br_ndev) {
> > +			NL_SET_ERR_MSG_MOD(extack,
> > +					   "L2 offloading only
> > possible for the same bridge!");
> > +			return notifier_from_errno(-EOPNOTSUPP);  
> 
> This is not an error condition as such. The switch cannot do it, so
> -EOPNOTSUPP is correct,

Ok, so the:

return notifier_from_errno(-EOPNOTSUPP);

is correct.


> but the Linux software bridge can, and it
> will. So there is no need to use NL_SET_ERR_MSG_MOD().
> 

This NL_SET_ERR_MSG_MOD() only is relevant for two ports managed by
MTIP L2 switch driver.

As a result - other bridges created by other drivers will not follow
this execution path.

Considering the above - I would keep this message - it is informative
for the potential user (similar approach has been taken when HSR driver
was added for KSZ9477 switch).

> 
> > +		}
> > +
> > +		fep->br_offload = 1;
> > +		mtip_switch_dis_port_separation(fep);
> > +		mtip_clear_atable(fep);
> > +	}
> > +
> > +	if (!priv->master_dev)
> > +		priv->master_dev = br_ndev;
> > +
> > +	fep->br_members |= BIT(priv->portnum - 1);
> > +
> > +	dev_dbg(&ndev->dev,
> > +		"%s: ndev: %s br: %s fep: 0x%x members: 0x%x
> > offload: %d\n",
> > +		__func__, ndev->name,  br_ndev->name, (unsigned
> > int)fep,
> > +		fep->br_members, fep->br_offload);
> > +
> > +	return NOTIFY_DONE;
> > +}
> > +
> > +static void mtip_netdevice_port_unlink(struct net_device *ndev)
> > +{
> > +	struct mtip_ndev_priv *priv = netdev_priv(ndev);
> > +	struct switch_enet_private *fep = priv->fep;
> > +
> > +	dev_dbg(&ndev->dev, "%s: ndev: %s members: 0x%x\n",
> > __func__,
> > +		ndev->name, fep->br_members);
> > +
> > +	fep->br_members &= ~BIT(priv->portnum - 1);
> > +	priv->master_dev = NULL;
> > +
> > +	if (!fep->br_members) {
> > +		fep->br_offload = 0;
> > +		mtip_switch_en_port_separation(fep);
> > +		mtip_clear_atable(fep);
> > +	}  
> 
> This does not look quite correct. So you disable port separation once
> both ports are a member of the same bridge. So you should enable port
> separation as soon as one leaves. So if fep->br_members has 0 or 1
> bits set, you need to enable port separation.

Yes, the if (!fep->br_members)

shall be replaced with:

if (fep->br_members && fep->br_offload)

to avoid situation when disabling second port would re-enable
separation and clear atable.

> 
> 	Andrew




Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Erika Unter
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de