mbox series

[for-next,v11,00/11] Elastic RDMA Adapter (ERDMA) driver

Message ID 20220615015227.65686-1-chengyou@linux.alibaba.com (mailing list archive)
Headers show
Series Elastic RDMA Adapter (ERDMA) driver | expand

Message

Cheng Xu June 15, 2022, 1:52 a.m. UTC
Hello all,

This v11 patch set introduces the Elastic RDMA Adapter (ERDMA) driver,
which released in Apsara Conference 2021 by Alibaba. The PR of ERDMA
userspace provider has already been created [1].

ERDMA enables large-scale RDMA acceleration capability in Alibaba ECS
environment, initially offered in g7re instance. It can improve the
efficiency of large-scale distributed computing and communication
significantly and expand dynamically with the cluster scale of Alibaba
Cloud.

ERDMA is a RDMA networking adapter based on the Alibaba MOC hardware. It
works in the VPC network environment (overlay network), and uses iWarp
transport protocol. ERDMA supports reliable connection (RC). ERDMA also
supports both kernel space and user space verbs. Now we have already
supported HPC/AI applications with libfabric, NoF and some other internal
verbs libraries, such as xrdma, epsl, etc,.

For the ECS instance with RDMA enabled, our MOC hardware generates two
kinds of PCI devices: one for ERDMA, and one for the original net device
(virtio-net). They are separated PCI devices.

Fixed issues in v11:
- Return -EIO when CMDQ response has error status.
- Eliminate static checker warnings.

Fixed issues in v10:
- Remove unneeded semicolon in erdma_qp.c reported by Abcci Robot.
- Remove duplicated include in erdma_cm.c reported by Abcci Robot.
- Fix return value check in erdma_alloc_ucontext() reported by Hulk
  Robot.
- Sort the include headers.

Fixed issues or changes in v9:
- Refactor the implementation of netdev bind flow in erdma.
- Remove the modification of iw_query_port due to the refactor.

Fixed issues or changes in v8:
- Sort the source order in drivers/infiniband/Kconfig.
- Remove !CPU_BIG_ENDIAN in our Kconfig, and fix warnings reported by
  sparse.
- Remove rdma_link_ops implementation in erdma. Instead, we implement a
  workqueue to handle the link operation after registering erdma device
  successfully.

Changes in v7:
- Fix a wrong doorbell records' address calculation issue in
  erdma_create_qp.
- Fix a condition race issue when reporting IW_CM_EVENT_CONNECT_REQUEST
  event in cm.
- Sorry for a mmap_free implementation missing, we add it in this version.
- Remove unnecessary reference to erdma_dev in erdma_ucontext.

Changes in v6:
- Rebase to the latest for-next code, and solve the compilation issues.

Fixed issues or changes in v5:
- Rename the reserved fields of structure definitions to improve
  readability.
- Remove some magic numbers and unnecessary initializations.
- Fix some coding style format issues.
- Fix some typos in comments.
- No casting in the assignment if the function's returned pointer is
  "void *".
- Re-write the polling functions (cmdq cq, verbs cq, aeq and ceq), which
  all check the valid bit in order to get next valid QE. This new
  implementation is more simple. Thank Wenpeng.
- Fix an issue reported by kernel test robot.
- Some minor changes in code (such as removing SRQ definitions since we do
  not support it yet).

Fixed issues in v4:
- Fix some typos.
- Use __GFP_ZERO flags in dma_alloc_coherent, instead of memset after
  buffer allocation.
- Use one single polling function for AEQ and CEQ, before there had two.
- Fix wrong iov_num when calling kernel_sendmsg.
- Add necessary comment in erdma_cm.
- Remove duplicated check in MPA processing function.
- Always return 0 in erdma_query_port.
- Directly return error code instead of assigning "ret", and then returning
  "ret" in init_kernel_qp.

Fixed issues or changes in v3:
- Change char limit of column from 100 to 80.
- Remove unnecessary field or structure definitions in erdma.h.
- Use exactly type (bool, unsigned int) instead of "int" in erdma_dev.
- Make ibdev and pci device having the same lifecycle. ERDMA will remain
  an invalid port state until binded to the corresponding netdev.
- ib_core: allow query_port when netdev is NULL for iWarp device.
- Move large inline function in erdma.h to .c files.
- Use dev_{info, warn, err} or ibdev_{info, warn, err} instead of
  pr_{info, warn, err} function calls.
- Remove print function calls in userspace-triggered paths.
- Add necessary comments in CM part.
- Remove unused entries in map_cqe_opcode[] table.
- Use rdma_is_kernel_res instead of self-definitions.
- Remove unsed resources counter in erdma_dev.
- Use pgprot_device instead of pgprot_noncached in erdma_mmap.
- Remove disassociate_ucontext interface implementation

Fixed issues in v2:
- No "extern" to function declarations.
- No inline functions in .c files, no void casting for functions with
  return values.
- Based on siw's newest kernel version, rewrite the code (mainly CM and
  CM related part) which originally based on an old siw version.
- remove debugfs.
- fix issues reported by kernel test robot.
- Using RDMA_NLDEV_CMD_NEWLINK instead of binding in net notifiers.

[1] https://github.com/linux-rdma/rdma-core/pull/1126

Thanks,
Cheng Xu

Cheng Xu (11):
  RDMA: Add ERDMA to rdma_driver_id definition
  RDMA/erdma: Add the hardware related definitions
  RDMA/erdma: Add main include file
  RDMA/erdma: Add cmdq implementation
  RDMA/erdma: Add event queue implementation
  RDMA/erdma: Add verbs header file
  RDMA/erdma: Add verbs implementation
  RDMA/erdma: Add connection management (CM) support
  RDMA/erdma: Add the erdma module
  RDMA/erdma: Add the ABI definitions
  RDMA/erdma: Add driver to kernel build environment

 MAINTAINERS                               |    8 +
 drivers/infiniband/Kconfig                |   15 +-
 drivers/infiniband/hw/Makefile            |    1 +
 drivers/infiniband/hw/erdma/Kconfig       |   12 +
 drivers/infiniband/hw/erdma/Makefile      |    4 +
 drivers/infiniband/hw/erdma/erdma.h       |  287 ++++
 drivers/infiniband/hw/erdma/erdma_cm.c    | 1430 ++++++++++++++++++++
 drivers/infiniband/hw/erdma/erdma_cm.h    |  167 +++
 drivers/infiniband/hw/erdma/erdma_cmdq.c  |  498 +++++++
 drivers/infiniband/hw/erdma/erdma_cq.c    |  205 +++
 drivers/infiniband/hw/erdma/erdma_eq.c    |  329 +++++
 drivers/infiniband/hw/erdma/erdma_hw.h    |  508 +++++++
 drivers/infiniband/hw/erdma/erdma_main.c  |  630 +++++++++
 drivers/infiniband/hw/erdma/erdma_qp.c    |  566 ++++++++
 drivers/infiniband/hw/erdma/erdma_verbs.c | 1460 +++++++++++++++++++++
 drivers/infiniband/hw/erdma/erdma_verbs.h |  342 +++++
 include/uapi/rdma/erdma-abi.h             |   49 +
 include/uapi/rdma/ib_user_ioctl_verbs.h   |    1 +
 18 files changed, 6505 insertions(+), 7 deletions(-)
 create mode 100644 drivers/infiniband/hw/erdma/Kconfig
 create mode 100644 drivers/infiniband/hw/erdma/Makefile
 create mode 100644 drivers/infiniband/hw/erdma/erdma.h
 create mode 100644 drivers/infiniband/hw/erdma/erdma_cm.c
 create mode 100644 drivers/infiniband/hw/erdma/erdma_cm.h
 create mode 100644 drivers/infiniband/hw/erdma/erdma_cmdq.c
 create mode 100644 drivers/infiniband/hw/erdma/erdma_cq.c
 create mode 100644 drivers/infiniband/hw/erdma/erdma_eq.c
 create mode 100644 drivers/infiniband/hw/erdma/erdma_hw.h
 create mode 100644 drivers/infiniband/hw/erdma/erdma_main.c
 create mode 100644 drivers/infiniband/hw/erdma/erdma_qp.c
 create mode 100644 drivers/infiniband/hw/erdma/erdma_verbs.c
 create mode 100644 drivers/infiniband/hw/erdma/erdma_verbs.h
 create mode 100644 include/uapi/rdma/erdma-abi.h

Comments

Jason Gunthorpe June 24, 2022, 7:17 p.m. UTC | #1
On Wed, Jun 15, 2022 at 09:52:16AM +0800, Cheng Xu wrote:
> Hello all,
> 
> This v11 patch set introduces the Elastic RDMA Adapter (ERDMA) driver,
> which released in Apsara Conference 2021 by Alibaba. The PR of ERDMA
> userspace provider has already been created [1].
> 
> ERDMA enables large-scale RDMA acceleration capability in Alibaba ECS
> environment, initially offered in g7re instance. It can improve the
> efficiency of large-scale distributed computing and communication
> significantly and expand dynamically with the cluster scale of Alibaba
> Cloud.
> 
> ERDMA is a RDMA networking adapter based on the Alibaba MOC hardware. It
> works in the VPC network environment (overlay network), and uses iWarp
> transport protocol. ERDMA supports reliable connection (RC). ERDMA also
> supports both kernel space and user space verbs. Now we have already
> supported HPC/AI applications with libfabric, NoF and some other internal
> verbs libraries, such as xrdma, epsl, etc,.
> 
> For the ECS instance with RDMA enabled, our MOC hardware generates two
> kinds of PCI devices: one for ERDMA, and one for the original net device
> (virtio-net). They are separated PCI devices.
> 
> Fixed issues in v11:
> - Return -EIO when CMDQ response has error status.
> - Eliminate static checker warnings.

I updated the linux-next branch

Thanks,
Jason