mbox series

[RFC,platform-next,0/8] platform: mellanox: Introduce initial chassis management support for modular Ethernet system

Message ID 20210203173622.5845-1-vadimp@nvidia.com (mailing list archive)
Headers show
Series platform: mellanox: Introduce initial chassis management support for modular Ethernet system | expand

Message

Vadim Pasternak Feb. 3, 2021, 5:36 p.m. UTC
Add initial chassis management support for Nvidia modular Ethernet
switch systems MSN4800, providing a high performance switching solution
for Enterprise Data Centers (EDC) for building Ethernet based clusters,
High-Performance Computing (HPC) and embedded environments.

This system could be equipped with the different types of replaceable
line cards and management board. The first system flavor will support
the line card type MSN4800-C16 equipped with Lattice CPLD devices aimed
for system and ASIC control, one Nvidia FPGA for gearboxes (PHYs)
management, and four Nvidia gearboxes for the port control and with
16x100GbE QSFP28 ports and also with various devices for electrical
control.

The system is equipped with eight slots for line cards, four slots for
power supplies and six slots for fans. It could be configured as fully
populated or with even only one line card. The line cards are
hot-pluggable.
In the future when more line card flavors are to be available (for
example line cards with 8x200Gb Eth port, with 4x400 Eth ports, or with
some kind of smart cards for offloading purpose), any type of line card
could be inserted at any slot.

The system is based on Nvidia Spectrum-3 ASIC. The switch height is
4U and it fits standard rack size.

The next coming  card generations are supposed to support:
- Line cards with 8x200Gbe QSFP28 Ethernet ports.
- Line cards with 4x400Gbe QSFP-DD Ethernet ports.
- Smart cards equipped with Nvidia ARM CPU for offloading and for fast
  access to the storage (EBoF).
- Fabric cards for inter-connection.

Patch set contains:
Patch #1 – adds new types for modular system support.
Patch #2 - adds support for the modular system equipped with replicable
		line cards.
Patches #3 & #8 – add documentation.
Patches #4 & #6 - extend logic for hotplug devices operations for the
		modular system support.
Patch #5 – extends number of hwmon attributes for mlxreg-io driver,
		since modular system introduces more attributes.  
Patches #7 - introduces initial support for Mellanox line card devices.

Vadim Pasternak (8):
  platform_data/mlxreg: Add new types to support for modular systems
  platform/x86: mlx-platform: Add initial support for new modular system
  Documentation/ABI: Add new attributes for mlxreg-io sysfs interfaces
  platform/mellanox: mlxreg-hotplug: Extend logic for hotplug devices
    operations
  platform/mellanox: mlxreg-io: Extend number of hwmon attributes
  platform/mellanox: mlxreg-hotplug: Add line card event callbacks
    support for modular system
  platform/mellanox: mlxreg-lc: Add initial support for Mellanox line
    card devices
  Documentation/ABI: Add new line card attributes for mlxreg-io sysfs
    interfaces

 Documentation/ABI/stable/sysfs-driver-mlxreg-io |  195 +++
 drivers/platform/mellanox/Kconfig               |   12 +
 drivers/platform/mellanox/Makefile              |    1 +
 drivers/platform/mellanox/mlxreg-hotplug.c      |  120 +-
 drivers/platform/mellanox/mlxreg-io.c           |    2 +-
 drivers/platform/mellanox/mlxreg-lc.c           |  807 ++++++++++
 drivers/platform/x86/mlx-platform.c             | 1817 ++++++++++++++++++++---
 include/linux/platform_data/mlxreg.h            |   61 +
 8 files changed, 2785 insertions(+), 230 deletions(-)
 create mode 100644 drivers/platform/mellanox/mlxreg-lc.c

Comments

Hans de Goede Feb. 15, 2021, 2:40 p.m. UTC | #1
Hi Vadim,

On 2/3/21 6:36 PM, Vadim Pasternak wrote:
> Add initial chassis management support for Nvidia modular Ethernet
> switch systems MSN4800, providing a high performance switching solution
> for Enterprise Data Centers (EDC) for building Ethernet based clusters,
> High-Performance Computing (HPC) and embedded environments.
> 
> This system could be equipped with the different types of replaceable
> line cards and management board. The first system flavor will support
> the line card type MSN4800-C16 equipped with Lattice CPLD devices aimed
> for system and ASIC control, one Nvidia FPGA for gearboxes (PHYs)
> management, and four Nvidia gearboxes for the port control and with
> 16x100GbE QSFP28 ports and also with various devices for electrical
> control.
> 
> The system is equipped with eight slots for line cards, four slots for
> power supplies and six slots for fans. It could be configured as fully
> populated or with even only one line card. The line cards are
> hot-pluggable.
> In the future when more line card flavors are to be available (for
> example line cards with 8x200Gb Eth port, with 4x400 Eth ports, or with
> some kind of smart cards for offloading purpose), any type of line card
> could be inserted at any slot.
> 
> The system is based on Nvidia Spectrum-3 ASIC. The switch height is
> 4U and it fits standard rack size.
> 
> The next coming  card generations are supposed to support:
> - Line cards with 8x200Gbe QSFP28 Ethernet ports.
> - Line cards with 4x400Gbe QSFP-DD Ethernet ports.
> - Smart cards equipped with Nvidia ARM CPU for offloading and for fast
>   access to the storage (EBoF).
> - Fabric cards for inter-connection.

Is there a specific reason why this series is RFC?  Typically that
indicates the code is not yet ready for merging and normally the
cover-letter indicates why the series is RFC.

The hardware this is for is pretty specialized, so I'm mostly just going
to trust that you know what you are doing here.

I see that this has not been reviewed by any of the other Melanox people,
it would be could if you could get someone else from your time to review
this series and give there Reviewed-by once they are happy with it.

Regards,

Hans




> 
> Patch set contains:
> Patch #1 – adds new types for modular system support.
> Patch #2 - adds support for the modular system equipped with replicable
> 		line cards.
> Patches #3 & #8 – add documentation.
> Patches #4 & #6 - extend logic for hotplug devices operations for the
> 		modular system support.
> Patch #5 – extends number of hwmon attributes for mlxreg-io driver,
> 		since modular system introduces more attributes.  
> Patches #7 - introduces initial support for Mellanox line card devices.
> 
> Vadim Pasternak (8):
>   platform_data/mlxreg: Add new types to support for modular systems
>   platform/x86: mlx-platform: Add initial support for new modular system
>   Documentation/ABI: Add new attributes for mlxreg-io sysfs interfaces
>   platform/mellanox: mlxreg-hotplug: Extend logic for hotplug devices
>     operations
>   platform/mellanox: mlxreg-io: Extend number of hwmon attributes
>   platform/mellanox: mlxreg-hotplug: Add line card event callbacks
>     support for modular system
>   platform/mellanox: mlxreg-lc: Add initial support for Mellanox line
>     card devices
>   Documentation/ABI: Add new line card attributes for mlxreg-io sysfs
>     interfaces
> 
>  Documentation/ABI/stable/sysfs-driver-mlxreg-io |  195 +++
>  drivers/platform/mellanox/Kconfig               |   12 +
>  drivers/platform/mellanox/Makefile              |    1 +
>  drivers/platform/mellanox/mlxreg-hotplug.c      |  120 +-
>  drivers/platform/mellanox/mlxreg-io.c           |    2 +-
>  drivers/platform/mellanox/mlxreg-lc.c           |  807 ++++++++++
>  drivers/platform/x86/mlx-platform.c             | 1817 ++++++++++++++++++++---
>  include/linux/platform_data/mlxreg.h            |   61 +
>  8 files changed, 2785 insertions(+), 230 deletions(-)
>  create mode 100644 drivers/platform/mellanox/mlxreg-lc.c
>
Vadim Pasternak Feb. 15, 2021, 3:01 p.m. UTC | #2
Hi Hans,

> -----Original Message-----
> From: Hans de Goede <hdegoede@redhat.com>
> Sent: Monday, February 15, 2021 4:41 PM
> To: Vadim Pasternak <vadimp@nvidia.com>; andy@infradead.org
> Cc: platform-driver-x86@vger.kernel.org
> Subject: Re: [PATCH RFC platform-next 0/8] platform: mellanox: Introduce
> initial chassis management support for modular Ethernet system
> 
> Hi Vadim,
> 
> On 2/3/21 6:36 PM, Vadim Pasternak wrote:
> > Add initial chassis management support for Nvidia modular Ethernet
> > switch systems MSN4800, providing a high performance switching
> > solution for Enterprise Data Centers (EDC) for building Ethernet based
> > clusters, High-Performance Computing (HPC) and embedded environments.
> >
> > This system could be equipped with the different types of replaceable
> > line cards and management board. The first system flavor will support
> > the line card type MSN4800-C16 equipped with Lattice CPLD devices
> > aimed for system and ASIC control, one Nvidia FPGA for gearboxes
> > (PHYs) management, and four Nvidia gearboxes for the port control and
> > with 16x100GbE QSFP28 ports and also with various devices for
> > electrical control.
> >
> > The system is equipped with eight slots for line cards, four slots for
> > power supplies and six slots for fans. It could be configured as fully
> > populated or with even only one line card. The line cards are
> > hot-pluggable.
> > In the future when more line card flavors are to be available (for
> > example line cards with 8x200Gb Eth port, with 4x400 Eth ports, or
> > with some kind of smart cards for offloading purpose), any type of
> > line card could be inserted at any slot.
> >
> > The system is based on Nvidia Spectrum-3 ASIC. The switch height is 4U
> > and it fits standard rack size.
> >
> > The next coming  card generations are supposed to support:
> > - Line cards with 8x200Gbe QSFP28 Ethernet ports.
> > - Line cards with 4x400Gbe QSFP-DD Ethernet ports.
> > - Smart cards equipped with Nvidia ARM CPU for offloading and for fast
> >   access to the storage (EBoF).
> > - Fabric cards for inter-connection.
> 
> Is there a specific reason why this series is RFC?  Typically that indicates the
> code is not yet ready for merging and normally the cover-letter indicates why
> the series is RFC.

Sorry, I missed to mention the reason why this is RFC. I don't have
real hardware yet, which is arriving in 1.5 month.
Code has been tested on hardware simulation setup.

My intention was to get some feedback, since the modular system with
the replaceable line cards is something new in kernel.

> 
> The hardware this is for is pretty specialized, so I'm mostly just going to trust
> that you know what you are doing here.
> 
> I see that this has not been reviewed by any of the other Melanox people, it
> would be could if you could get someone else from your time to review this
> series and give there Reviewed-by once they are happy with it.

Sure.

> 
> Regards,
> 
> Hans
> 
> 
> 
> 
> >
> > Patch set contains:
> > Patch #1 – adds new types for modular system support.
> > Patch #2 - adds support for the modular system equipped with replicable
> > 		line cards.
> > Patches #3 & #8 – add documentation.
> > Patches #4 & #6 - extend logic for hotplug devices operations for the
> > 		modular system support.
> > Patch #5 – extends number of hwmon attributes for mlxreg-io driver,
> > 		since modular system introduces more attributes.
> > Patches #7 - introduces initial support for Mellanox line card devices.
> >
> > Vadim Pasternak (8):
> >   platform_data/mlxreg: Add new types to support for modular systems
> >   platform/x86: mlx-platform: Add initial support for new modular system
> >   Documentation/ABI: Add new attributes for mlxreg-io sysfs interfaces
> >   platform/mellanox: mlxreg-hotplug: Extend logic for hotplug devices
> >     operations
> >   platform/mellanox: mlxreg-io: Extend number of hwmon attributes
> >   platform/mellanox: mlxreg-hotplug: Add line card event callbacks
> >     support for modular system
> >   platform/mellanox: mlxreg-lc: Add initial support for Mellanox line
> >     card devices
> >   Documentation/ABI: Add new line card attributes for mlxreg-io sysfs
> >     interfaces
> >
> >  Documentation/ABI/stable/sysfs-driver-mlxreg-io |  195 +++
> >  drivers/platform/mellanox/Kconfig               |   12 +
> >  drivers/platform/mellanox/Makefile              |    1 +
> >  drivers/platform/mellanox/mlxreg-hotplug.c      |  120 +-
> >  drivers/platform/mellanox/mlxreg-io.c           |    2 +-
> >  drivers/platform/mellanox/mlxreg-lc.c           |  807 ++++++++++
> >  drivers/platform/x86/mlx-platform.c             | 1817 ++++++++++++++++++++--
> -
> >  include/linux/platform_data/mlxreg.h            |   61 +
> >  8 files changed, 2785 insertions(+), 230 deletions(-)  create mode
> > 100644 drivers/platform/mellanox/mlxreg-lc.c
> >
Hans de Goede March 4, 2021, 10:32 a.m. UTC | #3
Hi,

On 2/15/21 4:01 PM, Vadim Pasternak wrote:
> Hi Hans,
> 
>> -----Original Message-----
>> From: Hans de Goede <hdegoede@redhat.com>
>> Sent: Monday, February 15, 2021 4:41 PM
>> To: Vadim Pasternak <vadimp@nvidia.com>; andy@infradead.org
>> Cc: platform-driver-x86@vger.kernel.org
>> Subject: Re: [PATCH RFC platform-next 0/8] platform: mellanox: Introduce
>> initial chassis management support for modular Ethernet system
>>
>> Hi Vadim,
>>
>> On 2/3/21 6:36 PM, Vadim Pasternak wrote:
>>> Add initial chassis management support for Nvidia modular Ethernet
>>> switch systems MSN4800, providing a high performance switching
>>> solution for Enterprise Data Centers (EDC) for building Ethernet based
>>> clusters, High-Performance Computing (HPC) and embedded environments.
>>>
>>> This system could be equipped with the different types of replaceable
>>> line cards and management board. The first system flavor will support
>>> the line card type MSN4800-C16 equipped with Lattice CPLD devices
>>> aimed for system and ASIC control, one Nvidia FPGA for gearboxes
>>> (PHYs) management, and four Nvidia gearboxes for the port control and
>>> with 16x100GbE QSFP28 ports and also with various devices for
>>> electrical control.
>>>
>>> The system is equipped with eight slots for line cards, four slots for
>>> power supplies and six slots for fans. It could be configured as fully
>>> populated or with even only one line card. The line cards are
>>> hot-pluggable.
>>> In the future when more line card flavors are to be available (for
>>> example line cards with 8x200Gb Eth port, with 4x400 Eth ports, or
>>> with some kind of smart cards for offloading purpose), any type of
>>> line card could be inserted at any slot.
>>>
>>> The system is based on Nvidia Spectrum-3 ASIC. The switch height is 4U
>>> and it fits standard rack size.
>>>
>>> The next coming  card generations are supposed to support:
>>> - Line cards with 8x200Gbe QSFP28 Ethernet ports.
>>> - Line cards with 4x400Gbe QSFP-DD Ethernet ports.
>>> - Smart cards equipped with Nvidia ARM CPU for offloading and for fast
>>>   access to the storage (EBoF).
>>> - Fabric cards for inter-connection.
>>
>> Is there a specific reason why this series is RFC?  Typically that indicates the
>> code is not yet ready for merging and normally the cover-letter indicates why
>> the series is RFC.
> 
> Sorry, I missed to mention the reason why this is RFC. I don't have
> real hardware yet, which is arriving in 1.5 month.
> Code has been tested on hardware simulation setup.

Ah I see, this not being tested on real hw is a good reason for the
series to be a RFC series :)

> My intention was to get some feedback, since the modular system with
> the replaceable line cards is something new in kernel.

So my main concern here would be any new userspace API being added.

AFAICT from quickly scanning all 8 patches all new userspace API
is sysfs based and it is being documented/specified in patches 3 + 8,
so I will reply to those separately.

Please let me know if I've missed any other new userspace API bits.

I'm less worried about the API's between various kernel parts since
we can always refactor those later.

Regards,

Hans




>>> Patch set contains:
>>> Patch #1 – adds new types for modular system support.
>>> Patch #2 - adds support for the modular system equipped with replicable
>>> 		line cards.
>>> Patches #3 & #8 – add documentation.
>>> Patches #4 & #6 - extend logic for hotplug devices operations for the
>>> 		modular system support.
>>> Patch #5 – extends number of hwmon attributes for mlxreg-io driver,
>>> 		since modular system introduces more attributes.
>>> Patches #7 - introduces initial support for Mellanox line card devices.
>>>
>>> Vadim Pasternak (8):
>>>   platform_data/mlxreg: Add new types to support for modular systems
>>>   platform/x86: mlx-platform: Add initial support for new modular system
>>>   Documentation/ABI: Add new attributes for mlxreg-io sysfs interfaces
>>>   platform/mellanox: mlxreg-hotplug: Extend logic for hotplug devices
>>>     operations
>>>   platform/mellanox: mlxreg-io: Extend number of hwmon attributes
>>>   platform/mellanox: mlxreg-hotplug: Add line card event callbacks
>>>     support for modular system
>>>   platform/mellanox: mlxreg-lc: Add initial support for Mellanox line
>>>     card devices
>>>   Documentation/ABI: Add new line card attributes for mlxreg-io sysfs
>>>     interfaces
>>>
>>>  Documentation/ABI/stable/sysfs-driver-mlxreg-io |  195 +++
>>>  drivers/platform/mellanox/Kconfig               |   12 +
>>>  drivers/platform/mellanox/Makefile              |    1 +
>>>  drivers/platform/mellanox/mlxreg-hotplug.c      |  120 +-
>>>  drivers/platform/mellanox/mlxreg-io.c           |    2 +-
>>>  drivers/platform/mellanox/mlxreg-lc.c           |  807 ++++++++++
>>>  drivers/platform/x86/mlx-platform.c             | 1817 ++++++++++++++++++++--
>> -
>>>  include/linux/platform_data/mlxreg.h            |   61 +
>>>  8 files changed, 2785 insertions(+), 230 deletions(-)  create mode
>>> 100644 drivers/platform/mellanox/mlxreg-lc.c
>>>
>