diff mbox

[rdma-core,1/5] Common infrastructure for auto loading rdma modules

Message ID 1500929067-1583-2-git-send-email-jgunthorpe@obsidianresearch.com (mailing list archive)
State Superseded
Headers show

Commit Message

Jason Gunthorpe July 24, 2017, 8:44 p.m. UTC
This is inspired by the similar approach in the redhat directory but
takes a more general approach relying on udev and systemd to do the
actual work fully dynamically instead of a oneshot shell script.

Loading is split into two cases
 1) Loading RDMA support modules when RDMA capable hardware is installed.
    This is only needed for ethernet devices which do not load their RDMA
    support modules via request_module in the kernel.

    udev is used to detect when an ethernet device controlled by a specific
    module is hot plugged and then udev directly loads the RDMA module

 2) Loading RDMA ULP support when RDMA hardware is installed
    This is done by having udev detect when RDMA hardware is installed and
    udev causes systemd to load a list of modules from config files in
    /etc/rdma/modules/

    The user can customize these files to select which ULP modules should be
    loaded.

This broadly replaces the redhat/rdma.conf scheme.

In all cases the users can prevent a module from being auto-loaded on their
system by blacking listing it in a file in /etc/modprobe.d/

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
---
 CMakeLists.txt                            |  1 +
 Documentation/udev.md                     | 83 +++++++++++++++++++++++++++++++
 debian/rdma-core.install                  |  9 ++++
 kernel-boot/CMakeLists.txt                | 24 +++++++++
 kernel-boot/modules/infiniband.conf       | 12 +++++
 kernel-boot/modules/iwarp.conf            |  2 +
 kernel-boot/modules/opa.conf              | 10 ++++
 kernel-boot/modules/rdma.conf             | 21 ++++++++
 kernel-boot/modules/roce.conf             |  2 +
 kernel-boot/rdma-description.rules        | 41 +++++++++++++++
 kernel-boot/rdma-hw-modules.rules         | 39 +++++++++++++++
 kernel-boot/rdma-load-modules@.service.in | 15 ++++++
 kernel-boot/rdma-ulp-modules.rules        | 11 ++++
 redhat/rdma-core.spec                     | 11 +++-
 14 files changed, 280 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/udev.md
 create mode 100644 kernel-boot/CMakeLists.txt
 create mode 100644 kernel-boot/modules/infiniband.conf
 create mode 100644 kernel-boot/modules/iwarp.conf
 create mode 100644 kernel-boot/modules/opa.conf
 create mode 100644 kernel-boot/modules/rdma.conf
 create mode 100644 kernel-boot/modules/roce.conf
 create mode 100644 kernel-boot/rdma-description.rules
 create mode 100644 kernel-boot/rdma-hw-modules.rules
 create mode 100644 kernel-boot/rdma-load-modules@.service.in
 create mode 100644 kernel-boot/rdma-ulp-modules.rules

Comments

Bart Van Assche July 25, 2017, 5:15 p.m. UTC | #1
On Mon, 2017-07-24 at 14:44 -0600, Jason Gunthorpe wrote:
> +This is to avoid exposing systems not using RDMA from having RDMA enabled, for
> +instance if a system has a multi-protocol ethernet adaptor, but is only using
> +the net stack interface.

adaptor -> adapter ?

> +Finally udev will cause systemd to start RDMA specific daemons like
> +srp_deamon, rdma-ndd and iwpmd. These starts are linked to the detection of
> +the first RDMA hardware, and the daemons internally handle hot plug events for
> +other hardware.

Please change srp_deamon into srp_daemon such that the spelling matches the name
of the executable.

> +
> +## Hot Plug compatible services
> +
> +RDMA using services need to have device specific systemd dependencies in their
> +unit files, either created by hand by the admin or by using udev rules.

"RDMA using services" -> "Services using RDMA" ?

> +++ b/kernel-boot/modules/infiniband.conf
> @@ -0,0 +1,12 @@
> +# These modules are loaded by the system if any InfiniBand device is installed
> +# Infiniband over IP netdevice

Please spell "InfiniBand" consistently in the above comment.

> +ib_ipoib
> +
> +# Access to fabric management SMPs and GMPs from userspace.
> +ib_umad
> +
> +# SCSI Remote Protocol target support
> +# ib_srpt
> +
> +# ib_ucm provides the obsolete /dev/infiniband/ucm0
> +# ib_ucm

If ib_iser is loaded by default, should ib_srp also be loaded by default if the
appropriate hardware is present? I don't think that there are fewer SRP users
than iSER users.

> diff --git a/kernel-boot/rdma-description.rules b/kernel-boot/rdma-description.rules
> [ ... ]
> +# Hardware that supports RoCE
> +DRIVERS=="be2net", ENV{ID_RDMA_ROCE}="1"
> +DRIVERS=="bnxt_en", ENV{ID_RDMA_ROCE}="1"
> +DRIVERS=="hns", ENV{ID_RDMA_ROCE}="1"
> +DRIVERS=="i40e", ENV{ID_RDMA_ROCE}="1"
> +DRIVERS=="mlx4_core", ENV{ID_RDMA_ROCE}="1"
> +DRIVERS=="mlx5_core", ENV{ID_RDMA_ROCE}="1"
> +DRIVERS=="qede", ENV{ID_RDMA_ROCE}="1"

Should the "rdma_rxe" driver be added to this list?

> +ENV{ID_NET_DRIVER}=="mlx4_en", RUN{builtin}+="kmod load mlx4_ib"
> +ENV{ID_NET_DRIVER}=="mlx5_core", RUN{builtin}+="kmod load mlx5_ib"

Why this inconsistency between mlx4 and mlx5? Additionally, if these rules are
added, shouldn't the request_module() calls be removed from the mlx4 and ml5 core
drivers?

Anyway, nice work!

Bart.--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe July 25, 2017, 5:39 p.m. UTC | #2
On Tue, Jul 25, 2017 at 05:15:18PM +0000, Bart Van Assche wrote:

Got all the spelling errors, thanks

> > +ib_ipoib
> > +
> > +# Access to fabric management SMPs and GMPs from userspace.
> > +ib_umad
> > +
> > +# SCSI Remote Protocol target support
> > +# ib_srpt
> > +
> > +# ib_ucm provides the obsolete /dev/infiniband/ucm0
> > +# ib_ucm
> 
> If ib_iser is loaded by default, should ib_srp also be loaded by default if the
> appropriate hardware is present? I don't think that there are fewer SRP users
> than iSER users.

ib_srp now autoloads when srp_daemon is started, which AFAIK, is a
requirement to use srp? Since it autoloads it does not need to be
specified in this file any more.

Generally as things learn to autoload they should be removed from
these files, and people should be really thinking hard about getting
things to autoload as it solves hard boot time ordering problems and
races.

iser is left uncommented only because that is historically what RH has
done by default. 

> > diff --git a/kernel-boot/rdma-description.rules b/kernel-boot/rdma-description.rules
> > [ ... ]
> > +# Hardware that supports RoCE
> > +DRIVERS=="be2net", ENV{ID_RDMA_ROCE}="1"
> > +DRIVERS=="bnxt_en", ENV{ID_RDMA_ROCE}="1"
> > +DRIVERS=="hns", ENV{ID_RDMA_ROCE}="1"
> > +DRIVERS=="i40e", ENV{ID_RDMA_ROCE}="1"
> > +DRIVERS=="mlx4_core", ENV{ID_RDMA_ROCE}="1"
> > +DRIVERS=="mlx5_core", ENV{ID_RDMA_ROCE}="1"
> > +DRIVERS=="qede", ENV{ID_RDMA_ROCE}="1"
> 
> Should the "rdma_rxe" driver be added to this list?

No.. I need to figure out something else for rxe. We have no way in
RDMA to determine the driver binding to the RDMA device (eg like
ethtool -i), the above is detecting which driver is bound to the PCI
device that the RDMA device is a child of.

Since rxe doesn't work that way, it cannot use this detection method.

What I'd like is proper kernel support to determine ID_RDMA_*, which I
think we can ultimately implement in Leon's rdma netlink.

> > +ENV{ID_NET_DRIVER}=="mlx4_en", RUN{builtin}+="kmod load mlx4_ib"
> > +ENV{ID_NET_DRIVER}=="mlx5_core", RUN{builtin}+="kmod load mlx5_ib"
> 
> Why this inconsistency between mlx4 and mlx5?

This is matching the net driver name (eg struct ethtool_drvinfo ->
driver) used by each thing.. As far as I can tell these are the names
Mellanox choose to report here. I think it reflects that in mlx5 the
mlx5_core module provides the netdevice support, while in mlx4 it is
in the mlx4_en module.

> Additionally, if these rules are added, shouldn't the
> request_module() calls be removed from the mlx4 and ml5 core
> drivers?

This approach still relies on those request_modules to support pure-IB
versions of Mellanox cards. I would think the rocee related
request_modules should ultimately be removed though.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dennis Dalessandro July 26, 2017, 1:48 p.m. UTC | #3
On 7/24/2017 4:44 PM, Jason Gunthorpe wrote:
> diff --git a/kernel-boot/modules/opa.conf b/kernel-boot/modules/opa.conf
> new file mode 100644
> index 00000000000000..b9bc9f1f0146af
> --- /dev/null
> +++ b/kernel-boot/modules/opa.conf
> @@ -0,0 +1,10 @@
> +# These modules are loaded by the system if any OmniPath Architecture device
> +# is installed
> +# Infiniband over IP netdevice
> +ib_ipoib

Do we need to have rdmavt listed here too?

> +# Hardware that supports InfiniBand
> +DRIVERS=="mlx4_core", ENV{ID_RDMA_INFINIBAND}="1"
> +DRIVERS=="mlx5_core", ENV{ID_RDMA_INFINIBAND}="1"
> +DRIVERS=="qib", ENV{ID_RDMA_INFINIBAND}="1"
> +
> +# Hardware that supports OPA
> +DRIVERS=="hfi1verbs", ENV{ID_RDMA_IWARP}="1"

Why calling this IWARP?

-Denny
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe July 26, 2017, 4:04 p.m. UTC | #4
On Wed, Jul 26, 2017 at 09:48:04AM -0400, Dennis Dalessandro wrote:
> On 7/24/2017 4:44 PM, Jason Gunthorpe wrote:
> >diff --git a/kernel-boot/modules/opa.conf b/kernel-boot/modules/opa.conf
> >new file mode 100644
> >index 00000000000000..b9bc9f1f0146af
> >+++ b/kernel-boot/modules/opa.conf
> >@@ -0,0 +1,10 @@
> >+# These modules are loaded by the system if any OmniPath Architecture device
> >+# is installed
> >+# Infiniband over IP netdevice
> >+ib_ipoib
> 
> Do we need to have rdmavt listed here too?

Don't think so, isn't it an internal library? It will autoload when
hfi1 autoloads.


> >+# Hardware that supports InfiniBand
> >+DRIVERS=="mlx4_core", ENV{ID_RDMA_INFINIBAND}="1"
> >+DRIVERS=="mlx5_core", ENV{ID_RDMA_INFINIBAND}="1"
> >+DRIVERS=="qib", ENV{ID_RDMA_INFINIBAND}="1"
> >+
> >+# Hardware that supports OPA
> >+DRIVERS=="hfi1verbs", ENV{ID_RDMA_IWARP}="1"
> 
> Why calling this IWARP?

Oops, thanks, copy&paste error

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe July 27, 2017, 10:18 p.m. UTC | #5
On Tue, Jul 25, 2017 at 11:39:47AM -0600, Jason Gunthorpe wrote:
> > > +# Hardware that supports RoCE
> > > +DRIVERS=="be2net", ENV{ID_RDMA_ROCE}="1"
> > > +DRIVERS=="bnxt_en", ENV{ID_RDMA_ROCE}="1"
> > > +DRIVERS=="hns", ENV{ID_RDMA_ROCE}="1"
> > > +DRIVERS=="i40e", ENV{ID_RDMA_ROCE}="1"
> > > +DRIVERS=="mlx4_core", ENV{ID_RDMA_ROCE}="1"
> > > +DRIVERS=="mlx5_core", ENV{ID_RDMA_ROCE}="1"
> > > +DRIVERS=="qede", ENV{ID_RDMA_ROCE}="1"
> > 
> > Should the "rdma_rxe" driver be added to this list?
> 
> No.. I need to figure out something else for rxe. We have no way in
> RDMA to determine the driver binding to the RDMA device (eg like

This looks like a good bet for now:

DEVPATH=="*/infiniband/rxe*", ATTR{parent}=="*", ENV{ID_RDMA_ROCE}="1"

Now that I'm looking at rxe I notice that srp_daemon starts
automatically for roce ports.

I assume this is not desired?

This is going to be a bit harder to fix..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche July 27, 2017, 10:28 p.m. UTC | #6
On Thu, 2017-07-27 at 16:18 -0600, Jason Gunthorpe wrote:
> On Tue, Jul 25, 2017 at 11:39:47AM -0600, Jason Gunthorpe wrote:
> > > > +# Hardware that supports RoCE
> > > > +DRIVERS=="be2net", ENV{ID_RDMA_ROCE}="1"
> > > > +DRIVERS=="bnxt_en", ENV{ID_RDMA_ROCE}="1"
> > > > +DRIVERS=="hns", ENV{ID_RDMA_ROCE}="1"
> > > > +DRIVERS=="i40e", ENV{ID_RDMA_ROCE}="1"
> > > > +DRIVERS=="mlx4_core", ENV{ID_RDMA_ROCE}="1"
> > > > +DRIVERS=="mlx5_core", ENV{ID_RDMA_ROCE}="1"
> > > > +DRIVERS=="qede", ENV{ID_RDMA_ROCE}="1"
> > > 
> > > Should the "rdma_rxe" driver be added to this list?
> > 
> > No.. I need to figure out something else for rxe. We have no way in
> > RDMA to determine the driver binding to the RDMA device (eg like
> 
> This looks like a good bet for now:
> 
> DEVPATH=="*/infiniband/rxe*", ATTR{parent}=="*", ENV{ID_RDMA_ROCE}="1"
> 
> Now that I'm looking at rxe I notice that srp_daemon starts
> automatically for roce ports.
> 
> I assume this is not desired?

Hello Jason,

The srp_daemon communicates with the IB SA to discover SRP target ports. As
you know there is no SA in a RoCE network. I think it is harmless to start
srp_daemon against a RoCE port but I don't think this makes sense.

Bart.--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe July 27, 2017, 10:38 p.m. UTC | #7
On Thu, Jul 27, 2017 at 10:28:59PM +0000, Bart Van Assche wrote:

> The srp_daemon communicates with the IB SA to discover SRP target ports. As
> you know there is no SA in a RoCE network. I think it is harmless to start
> srp_daemon against a RoCE port but I don't think this makes sense.

I agree, it looks like it stops here:

 Jul 27 22:14:34 ib9 srp_daemon[258]: SM LID is 0, maybe no opensm is running

and seems harmless. Let us leave it like this for now.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/CMakeLists.txt b/CMakeLists.txt
index 16196205035f61..a03d8da31cbc5d 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -398,6 +398,7 @@  configure_file("${BUILDLIB}/config.h.in" "${BUILD_INCLUDE}/config.h" ESCAPE_QUOT
 add_subdirectory(ccan)
 add_subdirectory(util)
 add_subdirectory(Documentation)
+add_subdirectory(kernel-boot)
 # Libraries
 add_subdirectory(libibumad)
 add_subdirectory(libibumad/man)
diff --git a/Documentation/udev.md b/Documentation/udev.md
new file mode 100644
index 00000000000000..e30e6007344fba
--- /dev/null
+++ b/Documentation/udev.md
@@ -0,0 +1,83 @@ 
+# Kernel Module Loading
+
+The RDMA subsystem relies on the kernel, udev and systemd to load modules on
+demand when RDMA hardware is present. The RDMA subsystem is unique in that we
+do not load the optional RDMA hardware modules unless the system has the
+rdma-core package installed.
+
+This is to avoid exposing systems not using RDMA from having RDMA enabled, for
+instance if a system has a multi-protocol ethernet adaptor, but is only using
+the net stack interface.
+
+## Boot ordering with systemd
+
+systemd assumes everything is hot pluggable and runs in an event driven
+manner.  When working with RDMA devices we are firstly concerned with when the
+physical hardware its module loaded into the kernel.
+
+This can happen in several spots along the bootup:
+
+ - From the initrd or built into the kernel. If hardware modules are present
+   inthe initrd then they are loaded into the kernel before booting the
+   system. This is done largely synchronously with the boot process.
+
+ - From udev when it auto detects PCI hardware or otherwise.
+   This happens asynchronously in the boot process, systemd does not wait for
+   udev to finish loading modules before it continues on.
+
+   This path makes it very likely the system will experience a RDMA 'hot plug'
+   scenario.
+
+ - From systemd's fixed module loader systemd-modules-load.service, eg from
+   the list in /etc/modules-load.d/. In this case the modules load happens
+   synchronously within systemd and it will hold off sysinit.target until
+   modules are loaded
+
+Once the hardware module is loaded it may be necessary to load a protocol
+module, eg to enable RDMA support on an ethernet device.
+
+This is triggered automatically by udev rules that match the master devices
+and load the protocol module with udev's module loader. This happens
+asynchronously to the rest of the systemd startup.
+
+Once a RDMA device is created by the kernel then udev will cause systemd to
+schedule ULP module loading services (eg rdma-load-modules@.service) specific
+to the plugged hardware. If sysinit.target has not yet been passed then these
+loaders will defer sysinit.target until they complete, otherwise this is a hot
+plug event and things will load asynchronously to the boot up process.
+
+Finally udev will cause systemd to start RDMA specific daemons like
+srp_deamon, rdma-ndd and iwpmd. These starts are linked to the detection of
+the first RDMA hardware, and the daemons internally handle hot plug events for
+other hardware.
+
+## Hot Plug compatible services
+
+RDMA using services need to have device specific systemd dependencies in their
+unit files, either created by hand by the admin or by using udev rules.
+
+For instance, a service that uses /dev/infiniband/umad0 requires:
+
+```
+After=dev-infiniband-umad0.device
+BindsTo=dev-infiniband-umad0.device
+```
+
+Which will ensure the service will not run until the required umad device
+appears.
+
+This is similar to how systemd handles mounting filesystems and configuring
+ethernet devices.
+
+## Interaction with legacy non-hotplug services
+
+Services that cannot handle hot plug must be ordered after
+systemd-udev-settle.service, which will wait for udev to complete loading
+modules and scheduling systemd services. This ensures that all RDMA hardware
+present at boot is setup before proceeding to run the legacy service.
+
+Admins using legacy services can also place their RDMA hardware modules (eg
+mlx4_ib) directly in /etc/modules-load.d/ or in their initrd which will cause
+systemd to defer passing to sysinit.target until all RDMA hardware is setup,
+this is usually sufficient for legacy services. This is probably the default
+behavior in many configurations.
diff --git a/debian/rdma-core.install b/debian/rdma-core.install
index c19d766fc0f48e..0bda539494189a 100644
--- a/debian/rdma-core.install
+++ b/debian/rdma-core.install
@@ -1,7 +1,16 @@ 
 etc/modprobe.d/mlx4.conf
 etc/modprobe.d/truescale.conf
+etc/rdma/modules/infiniband.conf
+etc/rdma/modules/iwarp.conf
+etc/rdma/modules/opa.conf
+etc/rdma/modules/rdma.conf
+etc/rdma/modules/roce.conf
+lib/systemd/system/rdma-load-modules@.service
 lib/systemd/system/rdma-ndd.service
 lib/udev/rules.d/60-rdma-ndd.rules
+lib/udev/rules.d/75-rdma-description.rules
+lib/udev/rules.d/90-rdma-hw-modules.rules
+lib/udev/rules.d/90-rdma-ulp-modules.rules
 usr/bin/rxe_cfg
 usr/lib/truescale-serdes.cmds
 usr/sbin/rdma-ndd
diff --git a/kernel-boot/CMakeLists.txt b/kernel-boot/CMakeLists.txt
new file mode 100644
index 00000000000000..0d4a2aec1c6a94
--- /dev/null
+++ b/kernel-boot/CMakeLists.txt
@@ -0,0 +1,24 @@ 
+rdma_subst_install(FILES rdma-load-modules@.service.in
+  DESTINATION "${CMAKE_INSTALL_SYSTEMD_SERVICEDIR}"
+  RENAME rdma-load-modules@.service
+  PERMISSIONS OWNER_WRITE OWNER_READ GROUP_READ WORLD_READ)
+
+install(FILES
+  modules/infiniband.conf
+  modules/iwarp.conf
+  modules/opa.conf
+  modules/rdma.conf
+  modules/roce.conf
+  DESTINATION "${CMAKE_INSTALL_SYSCONFDIR}/rdma/modules")
+
+install(FILES "rdma-description.rules"
+  RENAME "75-rdma-description.rules"
+  DESTINATION "${CMAKE_INSTALL_UDEV_RULESDIR}")
+
+install(FILES "rdma-hw-modules.rules"
+  RENAME "90-rdma-hw-modules.rules"
+  DESTINATION "${CMAKE_INSTALL_UDEV_RULESDIR}")
+
+install(FILES "rdma-ulp-modules.rules"
+  RENAME "90-rdma-ulp-modules.rules"
+  DESTINATION "${CMAKE_INSTALL_UDEV_RULESDIR}")
diff --git a/kernel-boot/modules/infiniband.conf b/kernel-boot/modules/infiniband.conf
new file mode 100644
index 00000000000000..f247d840b7d1bd
--- /dev/null
+++ b/kernel-boot/modules/infiniband.conf
@@ -0,0 +1,12 @@ 
+# These modules are loaded by the system if any InfiniBand device is installed
+# Infiniband over IP netdevice
+ib_ipoib
+
+# Access to fabric management SMPs and GMPs from userspace.
+ib_umad
+
+# SCSI Remote Protocol target support
+# ib_srpt
+
+# ib_ucm provides the obsolete /dev/infiniband/ucm0
+# ib_ucm
diff --git a/kernel-boot/modules/iwarp.conf b/kernel-boot/modules/iwarp.conf
new file mode 100644
index 00000000000000..882146e41ee2ba
--- /dev/null
+++ b/kernel-boot/modules/iwarp.conf
@@ -0,0 +1,2 @@ 
+# These modules are loaded by the system if any iWarp device is installed
+iw_cm
diff --git a/kernel-boot/modules/opa.conf b/kernel-boot/modules/opa.conf
new file mode 100644
index 00000000000000..b9bc9f1f0146af
--- /dev/null
+++ b/kernel-boot/modules/opa.conf
@@ -0,0 +1,10 @@ 
+# These modules are loaded by the system if any OmniPath Architecture device
+# is installed
+# Infiniband over IP netdevice
+ib_ipoib
+
+# Access to fabric management SMPs and GMPs from userspace.
+ib_umad
+
+# Omnipath Ethernet Virtual NIC netdevice
+opa_vnic
diff --git a/kernel-boot/modules/rdma.conf b/kernel-boot/modules/rdma.conf
new file mode 100644
index 00000000000000..2d342dd82f7db0
--- /dev/null
+++ b/kernel-boot/modules/rdma.conf
@@ -0,0 +1,21 @@ 
+# These modules are loaded by the system if any RDMA devices is installed
+# iSCSI over RDMA client support
+ib_iser
+
+# iSCSI over RDMA target support
+# ib_isert
+
+# User access to RDMA verbs (supports libibverbs)
+ib_uverbs
+
+# User access to RDMA connection management (supports librdmacm)
+rdma_ucm
+
+# RDS over RDMA support
+# rds_rdma
+
+# NFS over RDMA client support
+xprtrdma
+
+# NFS over RDMA server support
+svcrdma
diff --git a/kernel-boot/modules/roce.conf b/kernel-boot/modules/roce.conf
new file mode 100644
index 00000000000000..8e4927ce26f043
--- /dev/null
+++ b/kernel-boot/modules/roce.conf
@@ -0,0 +1,2 @@ 
+# These modules are loaded by the system if any RDMA over Converged Ethernet
+# device is installed
diff --git a/kernel-boot/rdma-description.rules b/kernel-boot/rdma-description.rules
new file mode 100644
index 00000000000000..dc25db07219cac
--- /dev/null
+++ b/kernel-boot/rdma-description.rules
@@ -0,0 +1,41 @@ 
+# This is a version of net-description.rules for /sys/class/infiniband devices
+
+ACTION=="remove", GOTO="rdma_description_end"
+SUBSYSTEM!="infiniband", GOTO="rdma_description_end"
+
+# NOTE: DRIVERS searches up the sysfs path to find the driver that is bound to
+# the PCI/etc device that the RDMA device is linked to. This is not the kernel
+# driver that is supplying the RDMA device (eg as seen in ID_NET_DRIVER)
+
+# FIXME: with kernel support we could actually detect the protocols the RDMA
+# driver itself supports, this is a work around for lack of that support.
+# In future we could do this with a udev IMPORT{program} helper program
+# that extracted the ID information from the RDMA netlink.
+
+# Hardware that supports InfiniBand
+DRIVERS=="mlx4_core", ENV{ID_RDMA_INFINIBAND}="1"
+DRIVERS=="mlx5_core", ENV{ID_RDMA_INFINIBAND}="1"
+DRIVERS=="qib", ENV{ID_RDMA_INFINIBAND}="1"
+
+# Hardware that supports OPA
+DRIVERS=="hfi1verbs", ENV{ID_RDMA_IWARP}="1"
+
+# Hardware that supports iWarp
+DRIVERS=="cxgb3", ENV{ID_RDMA_IWARP}="1"
+DRIVERS=="cxgb4", ENV{ID_RDMA_IWARP}="1"
+
+# Hardware that supports RoCE
+DRIVERS=="be2net", ENV{ID_RDMA_ROCE}="1"
+DRIVERS=="bnxt_en", ENV{ID_RDMA_ROCE}="1"
+DRIVERS=="hns", ENV{ID_RDMA_ROCE}="1"
+DRIVERS=="i40e", ENV{ID_RDMA_ROCE}="1"
+DRIVERS=="mlx4_core", ENV{ID_RDMA_ROCE}="1"
+DRIVERS=="mlx5_core", ENV{ID_RDMA_ROCE}="1"
+DRIVERS=="qede", ENV{ID_RDMA_ROCE}="1"
+
+# Setup the usual ID information so that systemd will display a sane name for
+# the RDMA device units.
+SUBSYSTEMS=="pci", ENV{ID_BUS}="pci", ENV{ID_VENDOR_ID}="$attr{vendor}", ENV{ID_MODEL_ID}="$attr{device}"
+SUBSYSTEMS=="pci", IMPORT{builtin}="hwdb --subsystem=pci"
+
+LABEL="rdma_description_end"
diff --git a/kernel-boot/rdma-hw-modules.rules b/kernel-boot/rdma-hw-modules.rules
new file mode 100644
index 00000000000000..dde0ab8dacacab
--- /dev/null
+++ b/kernel-boot/rdma-hw-modules.rules
@@ -0,0 +1,39 @@ 
+ACTION=="remove", GOTO="rdma_hw_modules_end"
+SUBSYSTEM!="net", GOTO="rdma_hw_modules_end"
+
+# Automatically load RDMA specific kernel modules when a multi-function device is installed
+
+# These drivers autoload an ethernet driver based on hardware detection and
+# need userspace to load the module that has their RDMA component to turn on
+# RDMA.
+ENV{ID_NET_DRIVER}=="be2net", RUN{builtin}+="kmod load ocrdma"
+ENV{ID_NET_DRIVER}=="bnxt_en", RUN{builtin}+="kmod load bnxt_re"
+ENV{ID_NET_DRIVER}=="cxgb3", RUN{builtin}+="kmod load iw_cxgb3"
+ENV{ID_NET_DRIVER}=="cxgb4", RUN{builtin}+="kmod load iw_cxgb4"
+ENV{ID_NET_DRIVER}=="hns", RUN{builtin}+="kmod load hns_roce"
+ENV{ID_NET_DRIVER}=="i40e", RUN{builtin}+="kmod load i40iw"
+ENV{ID_NET_DRIVER}=="mlx4_en", RUN{builtin}+="kmod load mlx4_ib"
+ENV{ID_NET_DRIVER}=="mlx5_core", RUN{builtin}+="kmod load mlx5_ib"
+ENV{ID_NET_DRIVER}=="qede", RUN{builtin}+="kmod load qedr"
+
+# The user must explicitly load these modules via /etc/modules-load.d/ or otherwise
+# rxe
+
+# When in IB mode the kernel PCI core module autoloads the protocol modules
+# for these providers
+# mlx4
+# mlx5
+
+# enic no longer has a userspace verbs driver, this rule should probably be
+# owned by libfabric
+ENV{ID_NET_DRIVER}=="enic", RUN{builtin}+="kmod load usnic_verbs"
+
+# These providers are single function and autoload RDMA automatically based on
+# PCI probing
+# hfi1verbs
+# ipathverbs
+# mthca
+# vmw_pvrdma
+# nes
+
+LABEL="rdma_hw_modules_end"
diff --git a/kernel-boot/rdma-load-modules@.service.in b/kernel-boot/rdma-load-modules@.service.in
new file mode 100644
index 00000000000000..b35a493ebf230b
--- /dev/null
+++ b/kernel-boot/rdma-load-modules@.service.in
@@ -0,0 +1,15 @@ 
+[Unit]
+Description=Load RDMA modules from @CMAKE_INSTALL_SYSCONFDIR@/rdma/modules/%I.conf
+DefaultDependencies=no
+Conflicts=shutdown.target
+# network-pre.target is to support distro network setup scripts that run after
+# systemd-modules-load.service but before sysinit.target, eg a classic network
+# setup script.
+Before=sysinit.target shutdown.target network-pre.target
+ConditionCapability=CAP_SYS_MODULE
+
+[Service]
+Type=oneshot
+RemainAfterExit=yes
+ExecStart=/lib/systemd/systemd-modules-load @CMAKE_INSTALL_SYSCONFDIR@/rdma/modules/%I.conf
+TimeoutSec=90s
diff --git a/kernel-boot/rdma-ulp-modules.rules b/kernel-boot/rdma-ulp-modules.rules
new file mode 100644
index 00000000000000..c090700c754b19
--- /dev/null
+++ b/kernel-boot/rdma-ulp-modules.rules
@@ -0,0 +1,11 @@ 
+ACTION=="remove", GOTO="rdma_ulp_modules_end"
+SUBSYSTEM!="infiniband", GOTO="rdma_ulp_modules_end"
+
+# Automatically load general RDMA ULP modules when RDMA hardware is installed
+TAG+="systemd", ENV{SYSTEMD_WANTS}+="rdma-load-modules@rdma.service"
+TAG+="systemd", ENV{ID_RDMA_INFINIBAND}=="1", ENV{SYSTEMD_WANTS}+="rdma-load-modules@infiniband.service"
+TAG+="systemd", ENV{ID_RDMA_IWARP}=="1", ENV{SYSTEMD_WANTS}+="rdma-load-modules@iwarp.service"
+TAG+="systemd", ENV{ID_RDMA_OPA}=="1", ENV{SYSTEMD_WANTS}+="rdma-load-modules@opa.service"
+TAG+="systemd", ENV{ID_RDMA_ROCE}=="1", ENV{SYSTEMD_WANTS}+="rdma-load-modules@roce.service"
+
+LABEL="rdma_ulp_modules_end"
diff --git a/redhat/rdma-core.spec b/redhat/rdma-core.spec
index 2565290e72c04e..612cf5c808091c 100644
--- a/redhat/rdma-core.spec
+++ b/redhat/rdma-core.spec
@@ -320,17 +320,26 @@  rm -rf %{buildroot}/%{_initrddir}/
 %doc %{_docdir}/%{name}-%{version}/README.md
 %doc %{_docdir}/%{name}-%{version}/rxe.md
 %config(noreplace) %{_sysconfdir}/rdma/mlx4.conf
+%config(noreplace) %{_sysconfdir}/rdma/modules/infiniband.conf
+%config(noreplace) %{_sysconfdir}/rdma/modules/iwarp.conf
+%config(noreplace) %{_sysconfdir}/rdma/modules/opa.conf
+%config(noreplace) %{_sysconfdir}/rdma/modules/rdma.conf
+%config(noreplace) %{_sysconfdir}/rdma/modules/roce.conf
 %config(noreplace) %{_sysconfdir}/rdma/rdma.conf
 %config(noreplace) %{_sysconfdir}/rdma/sriov-vfs
 %config(noreplace) %{_sysconfdir}/udev/rules.d/*
 %config(noreplace) %{_sysconfdir}/modprobe.d/mlx4.conf
 %config(noreplace) %{_sysconfdir}/modprobe.d/truescale.conf
 %{_sysconfdir}/sysconfig/network-scripts/*
+%{_unitdir}/rdma-load-modules@.service
 %{_unitdir}/rdma.service
 %dir %{dracutlibdir}/modules.d/05rdma
 %{dracutlibdir}/modules.d/05rdma/module-setup.sh
-%{_udevrulesdir}/98-rdma.rules
 %{_udevrulesdir}/60-rdma-ndd.rules
+%{_udevrulesdir}/75-rdma-description.rules
+%{_udevrulesdir}/90-rdma-hw-modules.rules
+%{_udevrulesdir}/90-rdma-ulp-modules.rules
+%{_udevrulesdir}/98-rdma.rules
 %{sysmodprobedir}/libmlx4.conf
 %{sysmodprobedir}/cxgb3.conf
 %{sysmodprobedir}/cxgb4.conf