From patchwork Wed Feb 13 17:23:09 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 10810463 X-Patchwork-Delegate: jgg@ziepe.ca Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A4D841399 for ; Wed, 13 Feb 2019 17:23:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 899C92DD96 for ; Wed, 13 Feb 2019 17:23:54 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7E41A2DD8A; Wed, 13 Feb 2019 17:23:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D6AF72DD12 for ; Wed, 13 Feb 2019 17:23:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389979AbfBMRXw (ORCPT ); Wed, 13 Feb 2019 12:23:52 -0500 Received: from mail.kernel.org ([198.145.29.99]:53858 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731253AbfBMRXv (ORCPT ); Wed, 13 Feb 2019 12:23:51 -0500 Received: from localhost (unknown [77.138.135.184]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 1225E21872; Wed, 13 Feb 2019 17:23:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1550078631; bh=rr+M/JB6fYRuugp2IX6QVbmveWoO7iiixeYmblg9sTE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=bFdZ9cB/y+rzzgpQBs7UJAQZzylw6ow3zUjFYPFA6RRUR3O+yM5XdLpIwAVU02vCA 1AYzV3gm3HAn979IO6nFKAgKwB4i6/JElgi6w2zw0JJmifXycvyv5Wkq/df+98p/6K 0V4vAV6FBPYY8sRF+Qf3q+Dpn1alCd2YW93Rv6Ys= From: Leon Romanovsky To: Doug Ledford , Jason Gunthorpe Cc: Leon Romanovsky , RDMA mailing list , Parav Pandit Subject: [PATCH rdma-next 7/8] RDMA/core: Add Documentation for ib_core_device Date: Wed, 13 Feb 2019 19:23:09 +0200 Message-Id: <20190213172310.1681-8-leon@kernel.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190213172310.1681-1-leon@kernel.org> References: <20190213172310.1681-1-leon@kernel.org> MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Parav Pandit Describe ib_core_device, ib_device association and their existence in net namespaces for backward compatibility, and locking scheme. Signed-off-by: Parav Pandit Signed-off-by: Leon Romanovsky --- Documentation/infiniband/core_devices.txt | 146 ++++++++++++++++++++++ 1 file changed, 146 insertions(+) create mode 100644 Documentation/infiniband/core_devices.txt diff --git a/Documentation/infiniband/core_devices.txt b/Documentation/infiniband/core_devices.txt new file mode 100644 index 000000000000..34f7d5cea54f --- /dev/null +++ b/Documentation/infiniband/core_devices.txt @@ -0,0 +1,146 @@ +Linux RDMA devices and their sysfs entries +------------------------------------------ + +1. Background +-------------- +RDMA networking devices have at least 3 link or transport layers. +(a) InfiniBand +(b) RoCE +(c) iWarp + +These networking devices provide kernel bypass for sending/receiving +data to/from the network. + +There are various modes in which these devices are used along with +other protocols for connection establishment and/or for data transfer. +Such as, +(a) rdmacm for connection establishement and verbs for data transfer. +(b) tcp/ip for connection establishment and verbs for data transfer. + +Additionally rdma devices can be shared among multiple net namespaces. + +It is also desired to have per net namespace rdma devices as the +stack matures. + +sysfs entries are heavily used for device discovery, statistics and network +addresses in rdma stack. + +Therefore, to have minimal impact on backward compatibility for these 3 +transports and to provide forward looking method, the following sysfs +isolation approach is taken. + +2. Design +---------- + +For every rdma ib_device, core code creates an ib_core_device in every +net namespace to give the appearance that the rdma device is present +in all net namespaces. +Each ib_core_device owns the sysfs entries in their net namespace. + +All ib_core_device(s) points to one owner ib_device using owner pointer. + +2.1 Shared rdma ib_device view in different net namespaces +----------------------------------------------------------- + + ib_core_device (net_ns_1) + +--------------+ + | | + | device | + | +----------+ | + | | | | + | | | | + | | | | + | +----------+ | (init_net) + | *net | ib_device + | *owner-------------------------+------>+--------------------+<--+ + +--------------+ | | | | + | | ib_core_device | | + | | +--------------+ | | + | | | | | | + | | | device | | | + | | | +----------+ | | | + ib_core_device (net_ns_2) | | | | | | | | + +--------------+ | | | | | | | | + | | | | | | | | | | + | device | | | | +----------+ | | | + | +----------+ | | | | *net | | | + | | | | | | | *owner--------------+ + | | | | | | +--------------+ | + | | | | | +--------------------+ + | +----------+ | | + | *net | | + | *owner------------------------+ + +--------------+ + +2.2 rdma ib_device bound to a net namespace (in future) +-------------------------------------------------------- + +In this mode, when an rdma device is bound to a net namespace, all compat +sysfs entries will be terminated. sysfs entries will reside in single +net namespace which device is bound to. +Thereby having one-to-one mapping and providing isolation of devices +to their owning net namespace. + +(net_ns_1) +ib_device ++--------------------+ +| | +| | +| ib_core_device | +| +--------------+ | +| | | | +| | device | | +| | +----------+ | | +| | | | | | +| | | | | | +| | | | | | +| | +----------+ | | +| | | | +| | *net | | +| | *owner | | +| +--------------+ | ++--------------------+ + +2.3 locking scheme +-------------------------------------------------------- +There are three locks involved to provide synchronization between five +operations. +These five operations are +(a) device addition using ib_register_device() +(b) device removal using ib_unregister_device() +(c) net namespace addition using _init_net() notifier +(d) net namespace removal using _exit_net() notifier +(e) device renaming netlink command + +Each of above operations can happen in parallel. +Few interesting combinations to consider are: +1. init_net() and register_device() trying to add compat devices +2. exit_net() and unregister_device() trying to remove compat devices +3. renaming compat devices while doing init_net() or exit_net(). + +Net namespaces are identified using a unique id in an xarray. +This xarray operation is protected using rdma_net_rwsem. +Same id is being used for adding compat device for a given rdma device. + +compat devices of a given ib device is maintained using per device xarray. +This xarray is used because two paths - net ns notifiers and device life cycle +routines, both attempt to add compat devices. Such work is protected using per +device compat_rw_mutex. + +Below lock sequence ensures that whoever sees the device adds/removes compat +devices for a given net namespace(s). + + cpu-0 cpu-1 + ----- ----- +init_net()/exit_net() reg_dev()/unreg_dev() + + lock_N lock_D + [..] [..] + unlock_N [..] + unlock_D + + lock_N + [..] + lock_D unlock_N + [..] + unlock_D