mbox series

[v5,00/25] RTRS (former IBTRS) rdma transport library and the corresponding RNBD (former IBNBD) rdma network block device

Message ID 20191220155109.8959-1-jinpuwang@gmail.com (mailing list archive)
Headers show
Series RTRS (former IBTRS) rdma transport library and the corresponding RNBD (former IBNBD) rdma network block device | expand

Message

Jinpu Wang Dec. 20, 2019, 3:50 p.m. UTC
Hi all,

here is V5 of the RTRS (former IBTRS) rdma transport library and the 
corresponding RNBD (former IBNBD) rdma network block device.

Main changes are the following:
1. Fix the security problem pointed out by Jason
2. Implement code-style/readability/API/etc suggestions by Bart van Assche
3. Rename IBTRS and IBNBD to RTRS and RNBD accordingly
4. Fileio mode support in rnbd-srv has been removed.

The main functional change is a fix for the security problem pointed out by 
Jason and discussed both on the mailing list and during the last LPC RDMA MC 2019.
On the server side we now invalidate in RTRS each rdma buffer before we hand it
over to RNBD server and in turn to the block layer. A new rkey is generated and
registered for the buffer after it returns back from the block layer and RNBD
server. The new rkey is sent back to the client along with the IO result.
The procedure is the default behaviour of the driver. This invalidation and
registration on each IO causes performance drop of up to 20%. A user of the
driver may choose to load the modules with this mechanism switched off
(always_invalidate=N), if he understands and can take the risk of a malicious 
client being able to corrupt memory of a server it is connected to. This might
be a reasonable option in a scenario where all the clients and all the servers
are located within a secure datacenter.

Huge thanks to Bart van Assche for the very detailed review of both RNBD and 
RTRS. These included suggestions for style fixes, better readability and 
documentation, code simplifications, eliminating usage of deprecated APIs,
too many to name.

The transport library and the network block device using it have been renamed to
RTRS and RNBD accordingly in order to reflect the fact that they are based on 
the rdma subsystem and not bound to InfiniBand only.

Fileio mode support in rnbd-server is not so efficent as pointed out by Bart,
and we can use loop device in between if there is need, hence we just
removed the fileio mode support.


 Introduction
 -------------

RTRS (RDMA Transport) is a reliable high speed transport library
which allows for establishing connection between client and server
machines via RDMA. It is based on RDMA-CM, so expect also to support RoCE
and iWARP, but we mainly tested in IB environment. It is optimized to
transfer (read/write) IO blocks in the sense that it follows the BIO
semantics of providing the possibility to either write data from a
scatter-gather list to the remote side or to request ("read") data
transfer from the remote side into a given set of buffers.

RTRS is multipath capable and provides I/O fail-over and load-balancing
functionality, i.e. in RTRS terminology, an RTRS path is a set of RDMA
connections and particular path is selected according to the load-balancing policy.
It can be used for other components not bind to RNBD.

RNBD (InfiniBand Network Block Device) is a pair of kernel modules
(client and server) that allow for remote access of a block device on
the server over RTRS protocol. After being mapped, the remote block
devices can be accessed on the client side as local block devices.
Internally RNBD uses RTRS as an RDMA transport library.

Commits for kernel can be found here:
   https://github.com/ionos-enterprise/ibnbd/commits/linux-5.5-rc2-ibnbd-v5
The out-of-tree modules are here:
   https://github.com/ionos-enterprise/ibnbd

LPC 2019 presentation:
  https://linuxplumbersconf.org/event/4/contributions/367/attachments/331/555/LPC_2019_RMDA_MC_IBNBD_IBTRS_Upstreaming.pdf

Performance results for the v5.5-rc1 kernel
  link: https://github.com/ionos-enterprise/ibnbd/tree/develop/performance/v5-v5.5-rc1

Jack Wang (25):
  sysfs: export sysfs_remove_file_self()
  rtrs: public interface header to establish RDMA connections
  rtrs: private headers with rtrs protocol structs and helpers
  rtrs: core: lib functions shared between client and server modules
  rtrs: client: private header with client structs and functions
  rtrs: client: main functionality
  rtrs: client: statistics functions
  rtrs: client: sysfs interface functions
  rtrs: server: private header with server structs and functions
  rtrs: server: main functionality
  rtrs: server: statistics functions
  rtrs: server: sysfs interface functions
  rtrs: include client and server modules into kernel compilation
  rtrs: a bit of documentation
  rnbd: private headers with rnbd protocol structs and helpers
  rnbd: client: private header with client structs and functions
  rnbd: client: main functionality
  rnbd: client: sysfs interface functions
  rnbd: server: private header with server structs and functions
  rnbd: server: main functionality
  rnbd: server: functionality for IO submission to file or block dev
  rnbd: server: sysfs interface functions
  rnbd: include client and server modules into kernel compilation
  rnbd: a bit of documentation
  MAINTAINERS: Add maintainers for RNBD/RTRS modules

 Documentation/ABI/testing/sysfs-block-rnbd    |   51 +
 .../ABI/testing/sysfs-class-rnbd-client       |  117 +
 .../ABI/testing/sysfs-class-rnbd-server       |   57 +
 .../ABI/testing/sysfs-class-rtrs-client       |  190 ++
 .../ABI/testing/sysfs-class-rtrs-server       |   81 +
 MAINTAINERS                                   |   14 +
 drivers/block/Kconfig                         |    2 +
 drivers/block/Makefile                        |    1 +
 drivers/block/rnbd/Kconfig                    |   28 +
 drivers/block/rnbd/Makefile                   |   17 +
 drivers/block/rnbd/README                     |   80 +
 drivers/block/rnbd/rnbd-clt-sysfs.c           |  659 ++++
 drivers/block/rnbd/rnbd-clt.c                 | 1761 ++++++++++
 drivers/block/rnbd/rnbd-clt.h                 |  169 +
 drivers/block/rnbd/rnbd-common.c              |   43 +
 drivers/block/rnbd/rnbd-log.h                 |   61 +
 drivers/block/rnbd/rnbd-proto.h               |  325 ++
 drivers/block/rnbd/rnbd-srv-dev.c             |  162 +
 drivers/block/rnbd/rnbd-srv-dev.h             |  130 +
 drivers/block/rnbd/rnbd-srv-sysfs.c           |  234 ++
 drivers/block/rnbd/rnbd-srv.c                 |  882 +++++
 drivers/block/rnbd/rnbd-srv.h                 |   99 +
 drivers/infiniband/Kconfig                    |    1 +
 drivers/infiniband/ulp/Makefile               |    1 +
 drivers/infiniband/ulp/rtrs/Kconfig           |   27 +
 drivers/infiniband/ulp/rtrs/Makefile          |   17 +
 drivers/infiniband/ulp/rtrs/README            |  137 +
 drivers/infiniband/ulp/rtrs/rtrs-clt-stats.c  |  453 +++
 drivers/infiniband/ulp/rtrs/rtrs-clt-sysfs.c  |  519 +++
 drivers/infiniband/ulp/rtrs/rtrs-clt.c        | 2952 +++++++++++++++++
 drivers/infiniband/ulp/rtrs/rtrs-clt.h        |  314 ++
 drivers/infiniband/ulp/rtrs/rtrs-log.h        |   50 +
 drivers/infiniband/ulp/rtrs/rtrs-pri.h        |  426 +++
 drivers/infiniband/ulp/rtrs/rtrs-srv-stats.c  |  109 +
 drivers/infiniband/ulp/rtrs/rtrs-srv-sysfs.c  |  315 ++
 drivers/infiniband/ulp/rtrs/rtrs-srv.c        | 2187 ++++++++++++
 drivers/infiniband/ulp/rtrs/rtrs-srv.h        |  159 +
 drivers/infiniband/ulp/rtrs/rtrs.c            |  646 ++++
 drivers/infiniband/ulp/rtrs/rtrs.h            |  334 ++
 fs/sysfs/file.c                               |    1 +
 40 files changed, 13811 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-block-rnbd
 create mode 100644 Documentation/ABI/testing/sysfs-class-rnbd-client
 create mode 100644 Documentation/ABI/testing/sysfs-class-rnbd-server
 create mode 100644 Documentation/ABI/testing/sysfs-class-rtrs-client
 create mode 100644 Documentation/ABI/testing/sysfs-class-rtrs-server
 create mode 100644 drivers/block/rnbd/Kconfig
 create mode 100644 drivers/block/rnbd/Makefile
 create mode 100644 drivers/block/rnbd/README
 create mode 100644 drivers/block/rnbd/rnbd-clt-sysfs.c
 create mode 100644 drivers/block/rnbd/rnbd-clt.c
 create mode 100644 drivers/block/rnbd/rnbd-clt.h
 create mode 100644 drivers/block/rnbd/rnbd-common.c
 create mode 100644 drivers/block/rnbd/rnbd-log.h
 create mode 100644 drivers/block/rnbd/rnbd-proto.h
 create mode 100644 drivers/block/rnbd/rnbd-srv-dev.c
 create mode 100644 drivers/block/rnbd/rnbd-srv-dev.h
 create mode 100644 drivers/block/rnbd/rnbd-srv-sysfs.c
 create mode 100644 drivers/block/rnbd/rnbd-srv.c
 create mode 100644 drivers/block/rnbd/rnbd-srv.h
 create mode 100644 drivers/infiniband/ulp/rtrs/Kconfig
 create mode 100644 drivers/infiniband/ulp/rtrs/Makefile
 create mode 100644 drivers/infiniband/ulp/rtrs/README
 create mode 100644 drivers/infiniband/ulp/rtrs/rtrs-clt-stats.c
 create mode 100644 drivers/infiniband/ulp/rtrs/rtrs-clt-sysfs.c
 create mode 100644 drivers/infiniband/ulp/rtrs/rtrs-clt.c
 create mode 100644 drivers/infiniband/ulp/rtrs/rtrs-clt.h
 create mode 100644 drivers/infiniband/ulp/rtrs/rtrs-log.h
 create mode 100644 drivers/infiniband/ulp/rtrs/rtrs-pri.h
 create mode 100644 drivers/infiniband/ulp/rtrs/rtrs-srv-stats.c
 create mode 100644 drivers/infiniband/ulp/rtrs/rtrs-srv-sysfs.c
 create mode 100644 drivers/infiniband/ulp/rtrs/rtrs-srv.c
 create mode 100644 drivers/infiniband/ulp/rtrs/rtrs-srv.h
 create mode 100644 drivers/infiniband/ulp/rtrs/rtrs.c
 create mode 100644 drivers/infiniband/ulp/rtrs/rtrs.h

Comments

Leon Romanovsky Dec. 21, 2019, 10:17 a.m. UTC | #1
On Fri, Dec 20, 2019 at 04:50:44PM +0100, Jack Wang wrote:
> Hi all,
>
> here is V5 of the RTRS (former IBTRS) rdma transport library and the
> corresponding RNBD (former IBNBD) rdma network block device.
>
> Main changes are the following:
> 1. Fix the security problem pointed out by Jason
> 2. Implement code-style/readability/API/etc suggestions by Bart van Assche
> 3. Rename IBTRS and IBNBD to RTRS and RNBD accordingly
> 4. Fileio mode support in rnbd-srv has been removed.
>
> The main functional change is a fix for the security problem pointed out by
> Jason and discussed both on the mailing list and during the last LPC RDMA MC 2019.
> On the server side we now invalidate in RTRS each rdma buffer before we hand it
> over to RNBD server and in turn to the block layer. A new rkey is generated and
> registered for the buffer after it returns back from the block layer and RNBD
> server. The new rkey is sent back to the client along with the IO result.
> The procedure is the default behaviour of the driver. This invalidation and
> registration on each IO causes performance drop of up to 20%. A user of the
> driver may choose to load the modules with this mechanism switched off
> (always_invalidate=N), if he understands and can take the risk of a malicious
> client being able to corrupt memory of a server it is connected to. This might
> be a reasonable option in a scenario where all the clients and all the servers
> are located within a secure datacenter.
>
> Huge thanks to Bart van Assche for the very detailed review of both RNBD and
> RTRS. These included suggestions for style fixes, better readability and
> documentation, code simplifications, eliminating usage of deprecated APIs,
> too many to name.
>
> The transport library and the network block device using it have been renamed to
> RTRS and RNBD accordingly in order to reflect the fact that they are based on
> the rdma subsystem and not bound to InfiniBand only.
>
> Fileio mode support in rnbd-server is not so efficent as pointed out by Bart,
> and we can use loop device in between if there is need, hence we just
> removed the fileio mode support.

Thanks for pushing the code forward.
Jason Gunthorpe Jan. 2, 2020, 6:18 p.m. UTC | #2
On Fri, Dec 20, 2019 at 04:50:44PM +0100, Jack Wang wrote:
> Hi all,
> 
> here is V5 of the RTRS (former IBTRS) rdma transport library and the 
> corresponding RNBD (former IBNBD) rdma network block device.
> 
> Main changes are the following:
> 1. Fix the security problem pointed out by Jason
> 2. Implement code-style/readability/API/etc suggestions by Bart van Assche
> 3. Rename IBTRS and IBNBD to RTRS and RNBD accordingly
> 4. Fileio mode support in rnbd-srv has been removed.
> 
> The main functional change is a fix for the security problem pointed out by 
> Jason and discussed both on the mailing list and during the last LPC RDMA MC 2019.
> On the server side we now invalidate in RTRS each rdma buffer before we hand it
> over to RNBD server and in turn to the block layer. A new rkey is generated and
> registered for the buffer after it returns back from the block layer and RNBD
> server. The new rkey is sent back to the client along with the IO result.
> The procedure is the default behaviour of the driver. This invalidation and
> registration on each IO causes performance drop of up to 20%. A user of the
> driver may choose to load the modules with this mechanism switched
> off

So, how does this compare now to nvme over fabrics?

I recall there were questiosn why we needed yet another RDMA block
transport?

Jason
Jinpu Wang Jan. 3, 2020, 12:39 p.m. UTC | #3
On Thu, Jan 2, 2020 at 7:19 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Fri, Dec 20, 2019 at 04:50:44PM +0100, Jack Wang wrote:
> > Hi all,
> >
> > here is V5 of the RTRS (former IBTRS) rdma transport library and the
> > corresponding RNBD (former IBNBD) rdma network block device.
> >
> > Main changes are the following:
> > 1. Fix the security problem pointed out by Jason
> > 2. Implement code-style/readability/API/etc suggestions by Bart van Assche
> > 3. Rename IBTRS and IBNBD to RTRS and RNBD accordingly
> > 4. Fileio mode support in rnbd-srv has been removed.
> >
> > The main functional change is a fix for the security problem pointed out by
> > Jason and discussed both on the mailing list and during the last LPC RDMA MC 2019.
> > On the server side we now invalidate in RTRS each rdma buffer before we hand it
> > over to RNBD server and in turn to the block layer. A new rkey is generated and
> > registered for the buffer after it returns back from the block layer and RNBD
> > server. The new rkey is sent back to the client along with the IO result.
> > The procedure is the default behaviour of the driver. This invalidation and
> > registration on each IO causes performance drop of up to 20%. A user of the
> > driver may choose to load the modules with this mechanism switched
> > off
>
> So, how does this compare now to nvme over fabrics?
>
> I recall there were questiosn why we needed yet another RDMA block
> transport?
>
> Jason
Performance results for the v5.5-rc1 kernel are here:
  link: https://github.com/ionos-enterprise/ibnbd/tree/develop/performance/v5-v5.5-rc1

Some workloads RNBD are faster, some workloads NVMeoF are faster.
Bart Van Assche Jan. 3, 2020, 4:28 p.m. UTC | #4
On 1/3/20 4:39 AM, Jinpu Wang wrote:
> Performance results for the v5.5-rc1 kernel are here:
>    link: https://github.com/ionos-enterprise/ibnbd/tree/develop/performance/v5-v5.5-rc1
> 
> Some workloads RNBD are faster, some workloads NVMeoF are faster.

Thank you for having shared these graphs.

Do the graphs in RNBD-SinglePath.pdf show that NVMeOF achieves similar 
or higher IOPS, higher bandwidth and lower latency than RNBD for 
workloads with a block size of 4 KB and also for mixed workloads with 
less than 20 disks, whether or not invalidation is enabled for RNBD?

Is it already clear why NVMeOF performance drops if the number of disks 
is above 25? Is that perhaps caused by contention on the block layer tag 
allocator because multiple NVMe namespaces share a tag set? Can that 
contention be avoided by increasing the NVMeOF queue depth further?

Thanks,

Bart.
Jinpu Wang Jan. 6, 2020, 5:07 p.m. UTC | #5
On Fri, Jan 3, 2020 at 5:29 PM Bart Van Assche <bvanassche@acm.org> wrote:
>
> On 1/3/20 4:39 AM, Jinpu Wang wrote:
> > Performance results for the v5.5-rc1 kernel are here:
> >    link: https://github.com/ionos-enterprise/ibnbd/tree/develop/performance/v5-v5.5-rc1
> >
> > Some workloads RNBD are faster, some workloads NVMeoF are faster.
>
> Thank you for having shared these graphs.
>
> Do the graphs in RNBD-SinglePath.pdf show that NVMeOF achieves similar
> or higher IOPS, higher bandwidth and lower latency than RNBD for
> workloads with a block size of 4 KB and also for mixed workloads with
> less than 20 disks, whether or not invalidation is enabled for RNBD?
Hi Bart,

Yes, that's the result on one pair of Server with Connect X4 HCA, I
did another test on another
2 servers with Connect X5 HCA, results are quite different, we will
double-check the
performance results also on old machines, will share new results later.


>
> Is it already clear why NVMeOF performance drops if the number of disks
> is above 25? Is that perhaps caused by contention on the block layer tag
> allocator because multiple NVMe namespaces share a tag set? Can that
> contention be avoided by increasing the NVMeOF queue depth further?
No yet, will check.
>
> Thanks,
>
> Bart.
>
>
Thanks
Jinpu Wang Jan. 7, 2020, 10:56 a.m. UTC | #6
On Mon, Jan 6, 2020 at 6:07 PM Jinpu Wang <jinpu.wang@cloud.ionos.com> wrote:
>
> On Fri, Jan 3, 2020 at 5:29 PM Bart Van Assche <bvanassche@acm.org> wrote:
> >
> > On 1/3/20 4:39 AM, Jinpu Wang wrote:
> > > Performance results for the v5.5-rc1 kernel are here:
> > >    link: https://github.com/ionos-enterprise/ibnbd/tree/develop/performance/v5-v5.5-rc1
> > >
> > > Some workloads RNBD are faster, some workloads NVMeoF are faster.
> >
> > Thank you for having shared these graphs.
> >
> > Do the graphs in RNBD-SinglePath.pdf show that NVMeOF achieves similar
> > or higher IOPS, higher bandwidth and lower latency than RNBD for
> > workloads with a block size of 4 KB and also for mixed workloads with
> > less than 20 disks, whether or not invalidation is enabled for RNBD?
> Hi Bart,
>
> Yes, that's the result on one pair of Server with Connect X4 HCA, I
> did another test on another
> 2 servers with Connect X5 HCA, results are quite different, we will
> double-check the
> performance results also on old machines, will share new results later.
>
here are the results with ConnectX5 HCA MT4119 + Intel(R) Xeon(R) Gold
6130 CPU @ 2.10GHz, sorry no graph for now,
will prepare the next round.

 disks   4k nvme dual  4k nvme single    4k rnbd dual  4k rnbd single
4k rnbd-inv dual  4k rnbd-inv single
   x1  251637.436256   254312.068793   270886.311369   260934.306569
  218632.336766       190800.519948
   x2  460894.610539   463925.907409   496318.068193   466374.862514
   418960.30397       372848.815118
   x3  603263.673633    605004.49955   675073.892611   614552.144786
  586223.077692       524221.977802
   x4  731648.935106   733174.482552   850245.575442   743062.493751
  744380.361964       656861.813819
   x5  827732.326767   827444.855514  1026939.306069   840515.548445
  897801.719828       762707.329267
   x6  876705.329467     873963.0037  1142399.960004    876974.70253
 1037773.522648       834073.892611
   x7  893808.719128   893268.073193  1239282.471753   892728.027197
 1135570.742926       871336.966303
   x8  906589.741026   905938.006199  1287178.964207   906189.381062
   1225040.9959       895292.070793
   x9   912048.09519   912400.259974  1386211.878812   913885.311469
 1302472.964884       910176.282372
  x10  915566.243376   915602.739726   1442959.70403   916288.871113
 1350296.325879        914966.40336
  x11  917116.188381   916905.809419  1418574.942506   916264.373563
 1370438.698083       915255.874413
  x12  915852.814719   917710.128987  1423534.546545   916348.386452
 1352357.364264       914966.656684
  x13   919042.69573   918819.536093  1429697.830217   917631.036896
 1378083.824558       916519.161677
  x14   920000.49995    920031.59684  1443317.268273   917562.843716
  1395023.56936       918935.706429
  x15  920160.883912   920367.363264  1445306.425863   918278.472153
 1440776.944611       916352.265921
  x16  920652.869426   920673.832617  1454705.229477   917902.948526
   1455708.2501       918198.001998
  x17  916892.310769   916883.623275  1491071.841948   918936.706329
 1436507.428457       917182.934132
  x18  917247.775222   917762.523748  1612129.058036   918546.835949
 1488716.583417       919521.095781
  x19  920084.791521   920349.930014   1615690.87821   915371.496958
  1406747.32954       918347.248577
  x20  922339.232154   922208.058388  1591415.958404   917922.631526
 1343806.744488       918903.393214
  x21  923141.771646   923297.040592  1581547.169811   919371.025795
 1342568.406347       919157.752944
  x22  923063.787243   924072.385523  1574972.846162   919173.713143
 1340318.639673       920577.995403
  x23  923549.490102   924643.471306  1573597.440256   918705.060385
 1333047.771337       917469.027431
  x24  925584.483103   925224.955009  1578143.485651    921744.15117
 1321494.708466       920001.498352
  x25  926165.366927   926842.031594  1579288.271173   921845.392764
 1319902.568202       920448.830702
  x26  926852.729454   927382.123575  1585351.318945    922669.59912
 1325670.791338       919137.796847
  x27  928196.260748   928093.981204  1581427.557244   921379.155436
 1325009.972078       919017.550858
  x28  929330.433913   929606.778644  1581726.527347   924325.834833
 1331721.074174       919557.761373
  x29  929885.522895   929876.924615  1578317.436513   922977.240966
  1333612.86783       921094.386736
  x30  930635.972805   930599.520144  1583537.946205   922746.107784
 1333446.651362       922821.171531
Bart Van Assche Jan. 16, 2020, 4:41 p.m. UTC | #7
On 1/7/20 2:56 AM, Jinpu Wang wrote:
> On Mon, Jan 6, 2020 at 6:07 PM Jinpu Wang <jinpu.wang@cloud.ionos.com> wrote:
>>
>> On Fri, Jan 3, 2020 at 5:29 PM Bart Van Assche <bvanassche@acm.org> wrote:
>>>
>>> On 1/3/20 4:39 AM, Jinpu Wang wrote:
>>>> Performance results for the v5.5-rc1 kernel are here:
>>>>     link: https://github.com/ionos-enterprise/ibnbd/tree/develop/performance/v5-v5.5-rc1
>>>>
>>>> Some workloads RNBD are faster, some workloads NVMeoF are faster.
>>>
>>> Thank you for having shared these graphs.
>>>
>>> Do the graphs in RNBD-SinglePath.pdf show that NVMeOF achieves similar
>>> or higher IOPS, higher bandwidth and lower latency than RNBD for
>>> workloads with a block size of 4 KB and also for mixed workloads with
>>> less than 20 disks, whether or not invalidation is enabled for RNBD?
>> Hi Bart,
>>
>> Yes, that's the result on one pair of Server with Connect X4 HCA, I
>> did another test on another
>> 2 servers with Connect X5 HCA, results are quite different, we will
>> double-check the
>> performance results also on old machines, will share new results later.
>>
> here are the results with ConnectX5 HCA MT4119 + Intel(R) Xeon(R) Gold
> 6130 CPU @ 2.10GHz, sorry no graph for now,
> will prepare the next round.
> 
>   disks   4k nvme dual  4k nvme single    4k rnbd dual  4k rnbd single
> 4k rnbd-inv dual  4k rnbd-inv single
>     x1  251637.436256   254312.068793   270886.311369   260934.306569
>    218632.336766       190800.519948
>     x2  460894.610539   463925.907409   496318.068193   466374.862514
>     418960.30397       372848.815118
>     x3  603263.673633    605004.49955   675073.892611   614552.144786
>    586223.077692       524221.977802
>     x4  731648.935106   733174.482552   850245.575442   743062.493751
>    744380.361964       656861.813819
>     x5  827732.326767   827444.855514  1026939.306069   840515.548445
>    897801.719828       762707.329267
>     x6  876705.329467     873963.0037  1142399.960004    876974.70253
>   1037773.522648       834073.892611
>     x7  893808.719128   893268.073193  1239282.471753   892728.027197
>   1135570.742926       871336.966303
>     x8  906589.741026   905938.006199  1287178.964207   906189.381062
>     1225040.9959       895292.070793
>     x9   912048.09519   912400.259974  1386211.878812   913885.311469
>   1302472.964884       910176.282372
>    x10  915566.243376   915602.739726   1442959.70403   916288.871113
>   1350296.325879        914966.40336
>    x11  917116.188381   916905.809419  1418574.942506   916264.373563
>   1370438.698083       915255.874413
>    x12  915852.814719   917710.128987  1423534.546545   916348.386452
>   1352357.364264       914966.656684
>    x13   919042.69573   918819.536093  1429697.830217   917631.036896
>   1378083.824558       916519.161677
>    x14   920000.49995    920031.59684  1443317.268273   917562.843716
>    1395023.56936       918935.706429
>    x15  920160.883912   920367.363264  1445306.425863   918278.472153
>   1440776.944611       916352.265921
>    x16  920652.869426   920673.832617  1454705.229477   917902.948526
>     1455708.2501       918198.001998
>    x17  916892.310769   916883.623275  1491071.841948   918936.706329
>   1436507.428457       917182.934132
>    x18  917247.775222   917762.523748  1612129.058036   918546.835949
>   1488716.583417       919521.095781
>    x19  920084.791521   920349.930014   1615690.87821   915371.496958
>    1406747.32954       918347.248577
>    x20  922339.232154   922208.058388  1591415.958404   917922.631526
>   1343806.744488       918903.393214
>    x21  923141.771646   923297.040592  1581547.169811   919371.025795
>   1342568.406347       919157.752944
>    x22  923063.787243   924072.385523  1574972.846162   919173.713143
>   1340318.639673       920577.995403
>    x23  923549.490102   924643.471306  1573597.440256   918705.060385
>   1333047.771337       917469.027431
>    x24  925584.483103   925224.955009  1578143.485651    921744.15117
>   1321494.708466       920001.498352
>    x25  926165.366927   926842.031594  1579288.271173   921845.392764
>   1319902.568202       920448.830702
>    x26  926852.729454   927382.123575  1585351.318945    922669.59912
>   1325670.791338       919137.796847
>    x27  928196.260748   928093.981204  1581427.557244   921379.155436
>   1325009.972078       919017.550858
>    x28  929330.433913   929606.778644  1581726.527347   924325.834833
>   1331721.074174       919557.761373
>    x29  929885.522895   929876.924615  1578317.436513   922977.240966
>    1333612.86783       921094.386736
>    x30  930635.972805   930599.520144  1583537.946205   922746.107784
>   1333446.651362       922821.171531

Hi Jack,

What does "dual" mean? What explains the big difference between the NVMe 
and RNBD results for the "dual" columns?

Thanks,

Bart.
Jinpu Wang Jan. 16, 2020, 4:46 p.m. UTC | #8
On Thu, Jan 16, 2020 at 5:41 PM Bart Van Assche <bvanassche@acm.org> wrote:
>
> On 1/7/20 2:56 AM, Jinpu Wang wrote:
> > On Mon, Jan 6, 2020 at 6:07 PM Jinpu Wang <jinpu.wang@cloud.ionos.com> wrote:
> >>
> >> On Fri, Jan 3, 2020 at 5:29 PM Bart Van Assche <bvanassche@acm.org> wrote:
> >>>
> >>> On 1/3/20 4:39 AM, Jinpu Wang wrote:
> >>>> Performance results for the v5.5-rc1 kernel are here:
> >>>>     link: https://github.com/ionos-enterprise/ibnbd/tree/develop/performance/v5-v5.5-rc1
> >>>>
> >>>> Some workloads RNBD are faster, some workloads NVMeoF are faster.
> >>>
> >>> Thank you for having shared these graphs.
> >>>
> >>> Do the graphs in RNBD-SinglePath.pdf show that NVMeOF achieves similar
> >>> or higher IOPS, higher bandwidth and lower latency than RNBD for
> >>> workloads with a block size of 4 KB and also for mixed workloads with
> >>> less than 20 disks, whether or not invalidation is enabled for RNBD?
> >> Hi Bart,
> >>
> >> Yes, that's the result on one pair of Server with Connect X4 HCA, I
> >> did another test on another
> >> 2 servers with Connect X5 HCA, results are quite different, we will
> >> double-check the
> >> performance results also on old machines, will share new results later.
> >>
> > here are the results with ConnectX5 HCA MT4119 + Intel(R) Xeon(R) Gold
> > 6130 CPU @ 2.10GHz, sorry no graph for now,
> > will prepare the next round.
> >
> >   disks   4k nvme dual  4k nvme single    4k rnbd dual  4k rnbd single
> > 4k rnbd-inv dual  4k rnbd-inv single
> >     x1  251637.436256   254312.068793   270886.311369   260934.306569
> >    218632.336766       190800.519948
> >     x2  460894.610539   463925.907409   496318.068193   466374.862514
> >     418960.30397       372848.815118
> >     x3  603263.673633    605004.49955   675073.892611   614552.144786
> >    586223.077692       524221.977802
> >     x4  731648.935106   733174.482552   850245.575442   743062.493751
> >    744380.361964       656861.813819
> >     x5  827732.326767   827444.855514  1026939.306069   840515.548445
> >    897801.719828       762707.329267
> >     x6  876705.329467     873963.0037  1142399.960004    876974.70253
> >   1037773.522648       834073.892611
> >     x7  893808.719128   893268.073193  1239282.471753   892728.027197
> >   1135570.742926       871336.966303
> >     x8  906589.741026   905938.006199  1287178.964207   906189.381062
> >     1225040.9959       895292.070793
> >     x9   912048.09519   912400.259974  1386211.878812   913885.311469
> >   1302472.964884       910176.282372
> >    x10  915566.243376   915602.739726   1442959.70403   916288.871113
> >   1350296.325879        914966.40336
> >    x11  917116.188381   916905.809419  1418574.942506   916264.373563
> >   1370438.698083       915255.874413
> >    x12  915852.814719   917710.128987  1423534.546545   916348.386452
> >   1352357.364264       914966.656684
> >    x13   919042.69573   918819.536093  1429697.830217   917631.036896
> >   1378083.824558       916519.161677
> >    x14   920000.49995    920031.59684  1443317.268273   917562.843716
> >    1395023.56936       918935.706429
> >    x15  920160.883912   920367.363264  1445306.425863   918278.472153
> >   1440776.944611       916352.265921
> >    x16  920652.869426   920673.832617  1454705.229477   917902.948526
> >     1455708.2501       918198.001998
> >    x17  916892.310769   916883.623275  1491071.841948   918936.706329
> >   1436507.428457       917182.934132
> >    x18  917247.775222   917762.523748  1612129.058036   918546.835949
> >   1488716.583417       919521.095781
> >    x19  920084.791521   920349.930014   1615690.87821   915371.496958
> >    1406747.32954       918347.248577
> >    x20  922339.232154   922208.058388  1591415.958404   917922.631526
> >   1343806.744488       918903.393214
> >    x21  923141.771646   923297.040592  1581547.169811   919371.025795
> >   1342568.406347       919157.752944
> >    x22  923063.787243   924072.385523  1574972.846162   919173.713143
> >   1340318.639673       920577.995403
> >    x23  923549.490102   924643.471306  1573597.440256   918705.060385
> >   1333047.771337       917469.027431
> >    x24  925584.483103   925224.955009  1578143.485651    921744.15117
> >   1321494.708466       920001.498352
> >    x25  926165.366927   926842.031594  1579288.271173   921845.392764
> >   1319902.568202       920448.830702
> >    x26  926852.729454   927382.123575  1585351.318945    922669.59912
> >   1325670.791338       919137.796847
> >    x27  928196.260748   928093.981204  1581427.557244   921379.155436
> >   1325009.972078       919017.550858
> >    x28  929330.433913   929606.778644  1581726.527347   924325.834833
> >   1331721.074174       919557.761373
> >    x29  929885.522895   929876.924615  1578317.436513   922977.240966
> >    1333612.86783       921094.386736
> >    x30  930635.972805   930599.520144  1583537.946205   922746.107784
> >   1333446.651362       922821.171531
>
> Hi Jack,
>
> What does "dual" mean? What explains the big difference between the NVMe
> and RNBD results for the "dual" columns?
I just sent out v7 with also graphs,  it's here:
https://github.com/ionos-enterprise/ibnbd/tree/develop/performance/v7-v5.5-rc2

Measurements are performed in single path (-single) and
multipath(-dual) setting. For multipath measurement two different
policies have been tested:
- RNBD - min-inflight (choose path with minimum inflights)
- NVMEoF - numa iopolicy


>
> Thanks,
>
> Bart.
>
>
Thanks