new file mode 100644
@@ -0,0 +1,146 @@
+Linux RDMA devices and their sysfs entries
+------------------------------------------
+
+1. Background
+--------------
+RDMA networking devices have at least 3 link or transport layers.
+(a) InfiniBand
+(b) RoCE
+(c) iWarp
+
+These networking devices provide kernel bypass for sending/receiving
+data to/from the network.
+
+There are various modes in which these devices are used along with
+other protocols for connection establishment and/or for data transfer.
+Such as,
+(a) rdmacm for connection establishement and verbs for data transfer.
+(b) tcp/ip for connection establishment and verbs for data transfer.
+
+Additionally rdma devices can be shared among multiple net namespaces.
+
+It is also desired to have per net namespace rdma devices as the
+stack matures.
+
+sysfs entries are heavily used for device discovery, statistics and network
+addresses in rdma stack.
+
+Therefore, to have minimal impact on backward compatibility for these 3
+transports and to provide forward looking method, the following sysfs
+isolation approach is taken.
+
+2. Design
+----------
+
+For every rdma ib_device, core code creates an ib_core_device in every
+net namespace to give the appearance that the rdma device is present
+in all net namespaces.
+Each ib_core_device owns the sysfs entries in their net namespace.
+
+All ib_core_device(s) points to one owner ib_device using owner pointer.
+
+2.1 Shared rdma ib_device view in different net namespaces
+-----------------------------------------------------------
+
+ ib_core_device (net_ns_1)
+ +--------------+
+ | |
+ | device |
+ | +----------+ |
+ | | | |
+ | | | |
+ | | | |
+ | +----------+ | (init_net)
+ | *net | ib_device
+ | *owner-------------------------+------>+--------------------+<--+
+ +--------------+ | | | |
+ | | ib_core_device | |
+ | | +--------------+ | |
+ | | | | | |
+ | | | device | | |
+ | | | +----------+ | | |
+ ib_core_device (net_ns_2) | | | | | | | |
+ +--------------+ | | | | | | | |
+ | | | | | | | | | |
+ | device | | | | +----------+ | | |
+ | +----------+ | | | | *net | | |
+ | | | | | | | *owner--------------+
+ | | | | | | +--------------+ |
+ | | | | | +--------------------+
+ | +----------+ | |
+ | *net | |
+ | *owner------------------------+
+ +--------------+
+
+2.2 rdma ib_device bound to a net namespace (in future)
+--------------------------------------------------------
+
+In this mode, when an rdma device is bound to a net namespace, all compat
+sysfs entries will be terminated. sysfs entries will reside in single
+net namespace which device is bound to.
+Thereby having one-to-one mapping and providing isolation of devices
+to their owning net namespace.
+
+(net_ns_1)
+ib_device
++--------------------+
+| |
+| |
+| ib_core_device |
+| +--------------+ |
+| | | |
+| | device | |
+| | +----------+ | |
+| | | | | |
+| | | | | |
+| | | | | |
+| | +----------+ | |
+| | | |
+| | *net | |
+| | *owner | |
+| +--------------+ |
++--------------------+
+
+2.3 locking scheme
+--------------------------------------------------------
+There are three locks involved to provide synchronization between five
+operations.
+These five operations are
+(a) device addition using ib_register_device()
+(b) device removal using ib_unregister_device()
+(c) net namespace addition using _init_net() notifier
+(d) net namespace removal using _exit_net() notifier
+(e) device renaming netlink command
+
+Each of above operations can happen in parallel.
+Few interesting combinations to consider are:
+1. init_net() and register_device() trying to add compat devices
+2. exit_net() and unregister_device() trying to remove compat devices
+3. renaming compat devices while doing init_net() or exit_net().
+
+Net namespaces are identified using a unique id in an xarray.
+This xarray operation is protected using rdma_net_rwsem.
+Same id is being used for adding compat device for a given rdma device.
+
+compat devices of a given ib device is maintained using per device xarray.
+This xarray is used because two paths - net ns notifiers and device life cycle
+routines, both attempt to add compat devices. Such work is protected using per
+device compat_rw_mutex.
+
+Below lock sequence ensures that whoever sees the device adds/removes compat
+devices for a given net namespace(s).
+
+ cpu-0 cpu-1
+ ----- -----
+init_net()/exit_net() reg_dev()/unreg_dev()
+
+ lock_N lock_D
+ [..] [..]
+ unlock_N [..]
+ unlock_D
+
+ lock_N
+ [..]
+ lock_D unlock_N
+ [..]
+ unlock_D