mbox series

[rdma-next,v5,00/12] RDMA/efa: Elastic Fabric Adapter (EFA) driver

Message ID 1555330841-7904-1-git-send-email-galpress@amazon.com (mailing list archive)
Headers show
Series RDMA/efa: Elastic Fabric Adapter (EFA) driver | expand

Message

Gal Pressman April 15, 2019, 12:20 p.m. UTC
Hello all,
The following v5 patchset introduces the Elastic Fabric Adapter (EFA) driver,
that was pre-announced by Amazon.

EFA is a networking adapter designed to support user space network
communication, initially offered in the Amazon EC2 environment. First release
of EFA supports datagram send/receive operations and does not support
connection-oriented or read/write operations.

EFA supports Unreliable Datagrams (UD) as well as a new unordered, Scalable
Reliable Datagram protocol (SRD). SRD provides support for reliable datagrams
and more complete error handling than typical RD, but, unlike RD, it does not
support ordering nor segmentation.

EFA reliable datagram transport provides reliable out-of-order delivery,
transparently utilizing multiple network paths to reduce network tail
latency. Its interface is similar to UD, in particular it supports
message size up to MTU, with error handling extended to support reliable
communication. More information regarding SRD can be found at [1].

Kernel verbs and in-kernel services are initially not supported but are planned
for future releases.

EFA enabled EC2 instances have two different devices allocated, one for ENA
(netdev) and one for EFA, the two are separate pci devices with no in-kernel
communication between them.

This patchset also introduces RDMA subsystem ibdev_* print helpers which should
be used by the other new drivers that are currently under review (irdma, siw)
and over time by all drivers in the subsystem.
The print format is similar to the netdev_* helpers.

PR for rdma-core provider was sent:
https://github.com/linux-rdma/rdma-core/pull/475

Thanks to everyone who took the time to review our last submissions (Jason, Doug,
Sean, Dennis, Leon, Christoph, Parav, Sagi, Steve, Shiraz), it is very
appreciated.

Issues addressed in v5:
* Adapt to subsystem verbs API changes
* Remove unnecessary 'do ... while' in ibdev_dbg (Jason)
* Use a non-macro implementation for ibdev_dbg (Jason)
* Use for_each_sg_dma_page() in umem iterations (Jason)
* Remove unused enum value EFA_ADMIN_START_CMD_RANGE (Leon)
* Don't assume the sg element offset is zero (Jason)
* Cherry-picked Shiraz's new ib_umem_find_single_pg_size() work, encountered
  some issues, debugging with Shiraz off-list. Will convert to use his work once
  solved.

Issues addressed in v4:
* Add RDMA subsystem ibdev_* printk helpers (Leon, Jason)
* Use xarray for mmap keys (Jason)
* Use module_pci_driver macro (Jason)
* Remove redundant cast in efa_remove (Jason)
* Avoid unnecessary use of pci_get_drvdata (Jason)
* Remove unnecessary admin queue sizes macros (Leon)
* Remove EFA_DEVICE_RUNNING bit (Leon, Jason)
* Remove incorrect comment in efa_com_validate_version (Leon)
* Keep lists sorted (Jason)
* Keep efa_com_dev as part of efa_dev instead of allocating it (Jason)

Issues addressed in v3:
* Use new rdma_udata_to_drv_context API
* Adapt to new core ucontext allocations
* Remove EFA transport/protocol/node type and use unspecified instead (Leon, Jason)
* Replace stats lock with atomic variables (Leon, Jason, Steve)
* Remove vertical alignment from structs (Steve)
* Remove license text from ABI file (Leon)
* Undefine macro when it's no longer used (Steve)
* Fix kdoc formatting (Steve)
* Remove unneeded lock from reg read destroy flow (Steve)
* Prefer {} initializations over memset (Leon)
* Remove highmem WARN_ON_ONCE (Steve)
* ib_alloc_device returns NULL in case of error (Leon)
* Remove redundant check from remove remove device flow (Leon)
* Remove redundant zero assignments after memsets
* Remove unnecessary WARN_ON_ONCEs from create QP verbs (Steve)
* Remove redundant memsets (Steve, Shiraz)
* Change all non-privileged flows error prints to debug level (Steve, Leon, Jason, Shiraz)
* Remove likely/unlikelys from control path (Leon, Jason)
* Fixes to reg MR indirect flow wrong PAGE_SIZE usage (Jason)
* Use decimal array size in ABI file (Steve)
* Remove redundant comments (Steve, Shiraz)
* Change efa_verbs.c to GPL-2.0 OR Linux-OpenIB license (Leon, Jason)
* Replace WARN in admin completion processing with WARN_ONCE (Steve)

Major issues addressed in this v2:
* Userspace libibverbs provider is implemented and attached for review.
* Respect the atomic requirement of create/destroy AH flows using the new
  sleepable flag [2].
* Change link layer from Ethernet to Unspecified (Proprietary EC2 link layer).
* Use RDMA mmap API.
* Coherent DMA memory is no longer mapped to the userspace, streaming DMA
  mappings are used instead.
* Introduce alloc/dealloc PD admin commands, PDs are now backed by an object on
  the device. This removes the bitmap used for PD number allocations.
* Addressed the mmap lifetime issues:
  Each ucontext now uses a new User Access Region (UAR) abstraction.
  Objects which are tied to a specific UAR will not be allocated to a different
  user until the UAR is deallocated (on application exit).
  DMA memory will be unmapped when the QP/CQ is destroyed, but the buffers will
  remain allocated until application exit.
  The mmap entries now remain valid until application exit and allow for reuse
  of the same mmap key more than once.
* SRD QP type is now a driver QP type (previously was IB_QPT_SRD).
* Match UD QP Infiniband semantics, including 40 bytes offset, state transitions,
  QKey validation, etc.
* Move AH reference counts to the device (previously was in the driver).
  When creating more than one AH with the same GID, the same device resource is
  used internally. Instead of keeping the reference count in the driver (and issue
  one create AH command only), each AH creation is now passed on to the device
  (accompanied with the PD number).
  This allows for future optimizations for AHs that are no longer used by a
  specific PD.
* Removed all stub functions, which will mark EFA driver as a non-kverbs provider [3].
* Replace all pr_* prints with dev_* prints

[1] https://github.com/amzn/rdma-core/wiki/SRD
[2] https://patchwork.kernel.org/cover/10725727/
[3] https://patchwork.kernel.org/cover/10775039/

Thanks,
Gal

Gal Pressman (12):
  RDMA/core: Introduce RDMA subsystem ibdev_* print functions
  RDMA: Add EFA related definitions
  RDMA/efa: Add EFA device definitions
  RDMA/efa: Add the efa.h header file
  RDMA/efa: Add the efa_com.h file
  RDMA/efa: Add the com service API definitions
  RDMA/efa: Add the ABI definitions
  RDMA/efa: Implement functions that submit and complete admin commands
  RDMA/efa: Add common command handlers
  RDMA/efa: Add EFA verbs implementation
  RDMA/efa: Add the efa module
  RDMA/efa: Add driver to Kconfig/Makefile

 MAINTAINERS                                     |    9 +
 drivers/infiniband/Kconfig                      |    1 +
 drivers/infiniband/core/device.c                |   60 +
 drivers/infiniband/core/sysfs.c                 |    1 +
 drivers/infiniband/core/verbs.c                 |    2 +
 drivers/infiniband/hw/Makefile                  |    1 +
 drivers/infiniband/hw/efa/Kconfig               |   15 +
 drivers/infiniband/hw/efa/Makefile              |    9 +
 drivers/infiniband/hw/efa/efa.h                 |  163 ++
 drivers/infiniband/hw/efa/efa_admin_cmds_defs.h |  794 ++++++++++
 drivers/infiniband/hw/efa/efa_admin_defs.h      |  136 ++
 drivers/infiniband/hw/efa/efa_com.c             | 1162 ++++++++++++++
 drivers/infiniband/hw/efa/efa_com.h             |  144 ++
 drivers/infiniband/hw/efa/efa_com_cmd.c         |  692 ++++++++
 drivers/infiniband/hw/efa/efa_com_cmd.h         |  270 ++++
 drivers/infiniband/hw/efa/efa_common_defs.h     |   18 +
 drivers/infiniband/hw/efa/efa_main.c            |  533 +++++++
 drivers/infiniband/hw/efa/efa_regs_defs.h       |  113 ++
 drivers/infiniband/hw/efa/efa_verbs.c           | 1924 +++++++++++++++++++++++
 include/linux/dynamic_debug.h                   |   11 +
 include/rdma/ib_verbs.h                         |   34 +-
 include/uapi/rdma/efa-abi.h                     |  101 ++
 include/uapi/rdma/rdma_user_ioctl_cmds.h        |    1 +
 lib/dynamic_debug.c                             |   40 +
 24 files changed, 6233 insertions(+), 1 deletion(-)
 create mode 100644 drivers/infiniband/hw/efa/Kconfig
 create mode 100644 drivers/infiniband/hw/efa/Makefile
 create mode 100644 drivers/infiniband/hw/efa/efa.h
 create mode 100644 drivers/infiniband/hw/efa/efa_admin_cmds_defs.h
 create mode 100644 drivers/infiniband/hw/efa/efa_admin_defs.h
 create mode 100644 drivers/infiniband/hw/efa/efa_com.c
 create mode 100644 drivers/infiniband/hw/efa/efa_com.h
 create mode 100644 drivers/infiniband/hw/efa/efa_com_cmd.c
 create mode 100644 drivers/infiniband/hw/efa/efa_com_cmd.h
 create mode 100644 drivers/infiniband/hw/efa/efa_common_defs.h
 create mode 100644 drivers/infiniband/hw/efa/efa_main.c
 create mode 100644 drivers/infiniband/hw/efa/efa_regs_defs.h
 create mode 100644 drivers/infiniband/hw/efa/efa_verbs.c
 create mode 100644 include/uapi/rdma/efa-abi.h

Comments

Gal Pressman April 22, 2019, 12:59 p.m. UTC | #1
On 15-Apr-19 15:20, Gal Pressman wrote:
> Hello all,
> The following v5 patchset introduces the Elastic Fabric Adapter (EFA) driver,
> that was pre-announced by Amazon.
> 
> EFA is a networking adapter designed to support user space network
> communication, initially offered in the Amazon EC2 environment. First release
> of EFA supports datagram send/receive operations and does not support
> connection-oriented or read/write operations.
> 
> EFA supports Unreliable Datagrams (UD) as well as a new unordered, Scalable
> Reliable Datagram protocol (SRD). SRD provides support for reliable datagrams
> and more complete error handling than typical RD, but, unlike RD, it does not
> support ordering nor segmentation.
> 
> EFA reliable datagram transport provides reliable out-of-order delivery,
> transparently utilizing multiple network paths to reduce network tail
> latency. Its interface is similar to UD, in particular it supports
> message size up to MTU, with error handling extended to support reliable
> communication. More information regarding SRD can be found at [1].
> 
> Kernel verbs and in-kernel services are initially not supported but are planned
> for future releases.
> 
> EFA enabled EC2 instances have two different devices allocated, one for ENA
> (netdev) and one for EFA, the two are separate pci devices with no in-kernel
> communication between them.
> 
> This patchset also introduces RDMA subsystem ibdev_* print helpers which should
> be used by the other new drivers that are currently under review (irdma, siw)
> and over time by all drivers in the subsystem.
> The print format is similar to the netdev_* helpers.
> 
> PR for rdma-core provider was sent:
> https://github.com/linux-rdma/rdma-core/pull/475
> 
> Thanks to everyone who took the time to review our last submissions (Jason, Doug,
> Sean, Dennis, Leon, Christoph, Parav, Sagi, Steve, Shiraz), it is very
> appreciated.
> 
> Issues addressed in v5:
> * Adapt to subsystem verbs API changes
> * Remove unnecessary 'do ... while' in ibdev_dbg (Jason)
> * Use a non-macro implementation for ibdev_dbg (Jason)
> * Use for_each_sg_dma_page() in umem iterations (Jason)
> * Remove unused enum value EFA_ADMIN_START_CMD_RANGE (Leon)
> * Don't assume the sg element offset is zero (Jason)
> * Cherry-picked Shiraz's new ib_umem_find_single_pg_size() work, encountered
>   some issues, debugging with Shiraz off-list. Will convert to use his work once
>   solved.
> 
> Issues addressed in v4:
> * Add RDMA subsystem ibdev_* printk helpers (Leon, Jason)
> * Use xarray for mmap keys (Jason)
> * Use module_pci_driver macro (Jason)
> * Remove redundant cast in efa_remove (Jason)
> * Avoid unnecessary use of pci_get_drvdata (Jason)
> * Remove unnecessary admin queue sizes macros (Leon)
> * Remove EFA_DEVICE_RUNNING bit (Leon, Jason)
> * Remove incorrect comment in efa_com_validate_version (Leon)
> * Keep lists sorted (Jason)
> * Keep efa_com_dev as part of efa_dev instead of allocating it (Jason)
> 
> Issues addressed in v3:
> * Use new rdma_udata_to_drv_context API
> * Adapt to new core ucontext allocations
> * Remove EFA transport/protocol/node type and use unspecified instead (Leon, Jason)
> * Replace stats lock with atomic variables (Leon, Jason, Steve)
> * Remove vertical alignment from structs (Steve)
> * Remove license text from ABI file (Leon)
> * Undefine macro when it's no longer used (Steve)
> * Fix kdoc formatting (Steve)
> * Remove unneeded lock from reg read destroy flow (Steve)
> * Prefer {} initializations over memset (Leon)
> * Remove highmem WARN_ON_ONCE (Steve)
> * ib_alloc_device returns NULL in case of error (Leon)
> * Remove redundant check from remove remove device flow (Leon)
> * Remove redundant zero assignments after memsets
> * Remove unnecessary WARN_ON_ONCEs from create QP verbs (Steve)
> * Remove redundant memsets (Steve, Shiraz)
> * Change all non-privileged flows error prints to debug level (Steve, Leon, Jason, Shiraz)
> * Remove likely/unlikelys from control path (Leon, Jason)
> * Fixes to reg MR indirect flow wrong PAGE_SIZE usage (Jason)
> * Use decimal array size in ABI file (Steve)
> * Remove redundant comments (Steve, Shiraz)
> * Change efa_verbs.c to GPL-2.0 OR Linux-OpenIB license (Leon, Jason)
> * Replace WARN in admin completion processing with WARN_ONCE (Steve)
> 
> Major issues addressed in this v2:
> * Userspace libibverbs provider is implemented and attached for review.
> * Respect the atomic requirement of create/destroy AH flows using the new
>   sleepable flag [2].
> * Change link layer from Ethernet to Unspecified (Proprietary EC2 link layer).
> * Use RDMA mmap API.
> * Coherent DMA memory is no longer mapped to the userspace, streaming DMA
>   mappings are used instead.
> * Introduce alloc/dealloc PD admin commands, PDs are now backed by an object on
>   the device. This removes the bitmap used for PD number allocations.
> * Addressed the mmap lifetime issues:
>   Each ucontext now uses a new User Access Region (UAR) abstraction.
>   Objects which are tied to a specific UAR will not be allocated to a different
>   user until the UAR is deallocated (on application exit).
>   DMA memory will be unmapped when the QP/CQ is destroyed, but the buffers will
>   remain allocated until application exit.
>   The mmap entries now remain valid until application exit and allow for reuse
>   of the same mmap key more than once.
> * SRD QP type is now a driver QP type (previously was IB_QPT_SRD).
> * Match UD QP Infiniband semantics, including 40 bytes offset, state transitions,
>   QKey validation, etc.
> * Move AH reference counts to the device (previously was in the driver).
>   When creating more than one AH with the same GID, the same device resource is
>   used internally. Instead of keeping the reference count in the driver (and issue
>   one create AH command only), each AH creation is now passed on to the device
>   (accompanied with the PD number).
>   This allows for future optimizations for AHs that are no longer used by a
>   specific PD.
> * Removed all stub functions, which will mark EFA driver as a non-kverbs provider [3].
> * Replace all pr_* prints with dev_* prints
> 
> [1] https://github.com/amzn/rdma-core/wiki/SRD
> [2] https://patchwork.kernel.org/cover/10725727/
> [3] https://patchwork.kernel.org/cover/10775039/
> 
> Thanks,
> Gal
> 
> Gal Pressman (12):
>   RDMA/core: Introduce RDMA subsystem ibdev_* print functions
>   RDMA: Add EFA related definitions
>   RDMA/efa: Add EFA device definitions
>   RDMA/efa: Add the efa.h header file
>   RDMA/efa: Add the efa_com.h file
>   RDMA/efa: Add the com service API definitions
>   RDMA/efa: Add the ABI definitions
>   RDMA/efa: Implement functions that submit and complete admin commands
>   RDMA/efa: Add common command handlers
>   RDMA/efa: Add EFA verbs implementation
>   RDMA/efa: Add the efa module
>   RDMA/efa: Add driver to Kconfig/Makefile
> 
>  MAINTAINERS                                     |    9 +
>  drivers/infiniband/Kconfig                      |    1 +
>  drivers/infiniband/core/device.c                |   60 +
>  drivers/infiniband/core/sysfs.c                 |    1 +
>  drivers/infiniband/core/verbs.c                 |    2 +
>  drivers/infiniband/hw/Makefile                  |    1 +
>  drivers/infiniband/hw/efa/Kconfig               |   15 +
>  drivers/infiniband/hw/efa/Makefile              |    9 +
>  drivers/infiniband/hw/efa/efa.h                 |  163 ++
>  drivers/infiniband/hw/efa/efa_admin_cmds_defs.h |  794 ++++++++++
>  drivers/infiniband/hw/efa/efa_admin_defs.h      |  136 ++
>  drivers/infiniband/hw/efa/efa_com.c             | 1162 ++++++++++++++
>  drivers/infiniband/hw/efa/efa_com.h             |  144 ++
>  drivers/infiniband/hw/efa/efa_com_cmd.c         |  692 ++++++++
>  drivers/infiniband/hw/efa/efa_com_cmd.h         |  270 ++++
>  drivers/infiniband/hw/efa/efa_common_defs.h     |   18 +
>  drivers/infiniband/hw/efa/efa_main.c            |  533 +++++++
>  drivers/infiniband/hw/efa/efa_regs_defs.h       |  113 ++
>  drivers/infiniband/hw/efa/efa_verbs.c           | 1924 +++++++++++++++++++++++
>  include/linux/dynamic_debug.h                   |   11 +
>  include/rdma/ib_verbs.h                         |   34 +-
>  include/uapi/rdma/efa-abi.h                     |  101 ++
>  include/uapi/rdma/rdma_user_ioctl_cmds.h        |    1 +
>  lib/dynamic_debug.c                             |   40 +
>  24 files changed, 6233 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/infiniband/hw/efa/Kconfig
>  create mode 100644 drivers/infiniband/hw/efa/Makefile
>  create mode 100644 drivers/infiniband/hw/efa/efa.h
>  create mode 100644 drivers/infiniband/hw/efa/efa_admin_cmds_defs.h
>  create mode 100644 drivers/infiniband/hw/efa/efa_admin_defs.h
>  create mode 100644 drivers/infiniband/hw/efa/efa_com.c
>  create mode 100644 drivers/infiniband/hw/efa/efa_com.h
>  create mode 100644 drivers/infiniband/hw/efa/efa_com_cmd.c
>  create mode 100644 drivers/infiniband/hw/efa/efa_com_cmd.h
>  create mode 100644 drivers/infiniband/hw/efa/efa_common_defs.h
>  create mode 100644 drivers/infiniband/hw/efa/efa_main.c
>  create mode 100644 drivers/infiniband/hw/efa/efa_regs_defs.h
>  create mode 100644 drivers/infiniband/hw/efa/efa_verbs.c
>  create mode 100644 include/uapi/rdma/efa-abi.h
> 

Hi,
The only comment so far was to remove the two BUG_ONs in the first patch, can
this patch be modified when applied or should I make another submission for the
whole thing?

Thanks