Message ID | 20190219100903.15408-1-bmt@zurich.ibm.com (mailing list archive) |
---|---|
Headers | show |
Series | SIW: Request for Comments | expand |
On Tue, Feb 19, 2019 at 11:08:50AM +0100, Bernard Metzler wrote: > This patch set contributes version 5 of the SoftiWarp > driver, as originally introduced to the list Oct 6th, 2017. > SoftiWarp (siw) implements the iWarp RDMA protocol over > kernel TCP sockets. The driver integrates with the > linux-rdma framework. > > In response to the various helpful feedback, we fixed (besides > other small fixes) the following issues: > > 1. The debugfs logic got completely removed. There is no > debugfs entry for siw anymore. We will interate with rdmatool > in the future. > > 2. The driver adheres to the device and object management as > recently proposed by Jason Gunthorpe. This includes the new > protection domain management. > > 3. An entry to linux MAINTAINERS for siw was added. > > 4. The packet length at MPA protocol level got restricted > to physical MTU size: to interoperate with hardware iWarp > devices such as Chelsio T5/T6, GSO usage got unselected. > It can be selcted with a global parameter 'gso_seg_limit'. > Setting 'gso_seg_limit = 0' makes full use of GSO if advertised > by the TCP socket. This almost doubles bandwidth for bulk data > transfers in a pure siw <-> siw setting, but does not interoperate > wth some hardware RNIC's. > > 5. An 'SPDX-License-Identifier' got added to all files > > 6. The user interface as defined in include/uapi/rdma/siwa_user.h > got cleaned up. > > 7. Useless redefinition of functions for kernel reference count > reading got removed. > > > The driver continues to make use of rdma-netdev extensions to > add and delete links, as proposed by Steve Wise. > > The driver continues to rely on the iWarp port mapper extensions > for software RDMA devices, as proposed by Steve Wise. > > We maintain a snapshot of the current code at > https://github.com/zrlio/softiwarp-for-linux-rdma.git > within branch 'siw-for-rdma-next-v5'. > This branch includes the latest netlink and portmapper patches from > Steve as well as the latest device and object management > from Jason. > > The matching siw user library is maintained at > https://github.com/zrlio/softiwarp-user-for-linux-rdma.git. > It is based on rdma-core, and extended with Steve's patches > to both rdma netlink and portmapper. The relevant branch > name is 'siw-for-rdma-next'. > > Since siw may generate work completion notifications from a > kthread context, performance of kernel siw clients like NVMeF > depends on the following pending kernel patch (included in > https://github.com/zrlio/softiwarp-for-linux-rdma.git): > https://www.spinics.net/lists/linux-rdma/msg75081.html > > As always, we'd highly appreciate your code review. Thanks > very much for your time! > > Bernard Bernard, Can you please post your fixed series so we will be able to continue review? Thanks
-----"Leon Romanovsky" <leon@kernel.org> wrote: ----- >To: "Bernard Metzler" <bmt@zurich.ibm.com> >From: "Leon Romanovsky" <leon@kernel.org> >Date: 02/28/2019 01:33PM >Cc: linux-rdma@vger.kernel.org >Subject: Re: [PATCH v5 00/13] SIW: Request for Comments > >On Tue, Feb 19, 2019 at 11:08:50AM +0100, Bernard Metzler wrote: >> This patch set contributes version 5 of the SoftiWarp >> driver, as originally introduced to the list Oct 6th, 2017. >> SoftiWarp (siw) implements the iWarp RDMA protocol over >> kernel TCP sockets. The driver integrates with the >> linux-rdma framework. >> >> In response to the various helpful feedback, we fixed (besides >> other small fixes) the following issues: >> >> 1. The debugfs logic got completely removed. There is no >> debugfs entry for siw anymore. We will interate with rdmatool >> in the future. >> >> 2. The driver adheres to the device and object management as >> recently proposed by Jason Gunthorpe. This includes the new >> protection domain management. >> >> 3. An entry to linux MAINTAINERS for siw was added. >> >> 4. The packet length at MPA protocol level got restricted >> to physical MTU size: to interoperate with hardware iWarp >> devices such as Chelsio T5/T6, GSO usage got unselected. >> It can be selcted with a global parameter 'gso_seg_limit'. >> Setting 'gso_seg_limit = 0' makes full use of GSO if advertised >> by the TCP socket. This almost doubles bandwidth for bulk data >> transfers in a pure siw <-> siw setting, but does not >interoperate >> wth some hardware RNIC's. >> >> 5. An 'SPDX-License-Identifier' got added to all files >> >> 6. The user interface as defined in include/uapi/rdma/siwa_user.h >> got cleaned up. >> >> 7. Useless redefinition of functions for kernel reference count >> reading got removed. >> >> >> The driver continues to make use of rdma-netdev extensions to >> add and delete links, as proposed by Steve Wise. >> >> The driver continues to rely on the iWarp port mapper extensions >> for software RDMA devices, as proposed by Steve Wise. >> >> We maintain a snapshot of the current code at >> https://github.com/zrlio/softiwarp-for-linux-rdma.git >> within branch 'siw-for-rdma-next-v5'. >> This branch includes the latest netlink and portmapper patches from >> Steve as well as the latest device and object management >> from Jason. >> >> The matching siw user library is maintained at >> https://github.com/zrlio/softiwarp-user-for-linux-rdma.git. >> It is based on rdma-core, and extended with Steve's patches >> to both rdma netlink and portmapper. The relevant branch >> name is 'siw-for-rdma-next'. >> >> Since siw may generate work completion notifications from a >> kthread context, performance of kernel siw clients like NVMeF >> depends on the following pending kernel patch (included in >> https://github.com/zrlio/softiwarp-for-linux-rdma.git): >> https://www.spinics.net/lists/linux-rdma/msg75081.html >> >> As always, we'd highly appreciate your code review. Thanks >> very much for your time! >> >> Bernard > >Bernard, > >Can you please post your fixed series so we will be able to continue >review? > Leon, I do my best as time permits to bring along a next version end of this week. I try to follow all advises, even if I end up in battles between applying clang-format and checkpatch, which yields partially contradicting results, which I then manually fix, hoping it still makes us all happy. I hope we then focus on functional/correctness, and keep style aside a little: I will have it clang-format formatted applying styles from .clang-format of the kernel plus fixes of that silly output to match with kernels checkpatch). Not to mention the potential result of checkpatch, sparse, clang or other tools applied to the given rdma-core files and other drivers. ;) Best, Bernard.
On Thu, Feb 28, 2019 at 02:28:41PM +0000, Bernard Metzler wrote: > -----"Leon Romanovsky" <leon@kernel.org> wrote: ----- > > >To: "Bernard Metzler" <bmt@zurich.ibm.com> > >From: "Leon Romanovsky" <leon@kernel.org> > >Date: 02/28/2019 01:33PM > >Cc: linux-rdma@vger.kernel.org > >Subject: Re: [PATCH v5 00/13] SIW: Request for Comments > > > >On Tue, Feb 19, 2019 at 11:08:50AM +0100, Bernard Metzler wrote: > >> This patch set contributes version 5 of the SoftiWarp > >> driver, as originally introduced to the list Oct 6th, 2017. > >> SoftiWarp (siw) implements the iWarp RDMA protocol over > >> kernel TCP sockets. The driver integrates with the > >> linux-rdma framework. > >> > >> In response to the various helpful feedback, we fixed (besides > >> other small fixes) the following issues: > >> > >> 1. The debugfs logic got completely removed. There is no > >> debugfs entry for siw anymore. We will interate with rdmatool > >> in the future. > >> > >> 2. The driver adheres to the device and object management as > >> recently proposed by Jason Gunthorpe. This includes the new > >> protection domain management. > >> > >> 3. An entry to linux MAINTAINERS for siw was added. > >> > >> 4. The packet length at MPA protocol level got restricted > >> to physical MTU size: to interoperate with hardware iWarp > >> devices such as Chelsio T5/T6, GSO usage got unselected. > >> It can be selcted with a global parameter 'gso_seg_limit'. > >> Setting 'gso_seg_limit = 0' makes full use of GSO if advertised > >> by the TCP socket. This almost doubles bandwidth for bulk data > >> transfers in a pure siw <-> siw setting, but does not > >interoperate > >> wth some hardware RNIC's. > >> > >> 5. An 'SPDX-License-Identifier' got added to all files > >> > >> 6. The user interface as defined in include/uapi/rdma/siwa_user.h > >> got cleaned up. > >> > >> 7. Useless redefinition of functions for kernel reference count > >> reading got removed. > >> > >> > >> The driver continues to make use of rdma-netdev extensions to > >> add and delete links, as proposed by Steve Wise. > >> > >> The driver continues to rely on the iWarp port mapper extensions > >> for software RDMA devices, as proposed by Steve Wise. > >> > >> We maintain a snapshot of the current code at > >> https://github.com/zrlio/softiwarp-for-linux-rdma.git > >> within branch 'siw-for-rdma-next-v5'. > >> This branch includes the latest netlink and portmapper patches from > >> Steve as well as the latest device and object management > >> from Jason. > >> > >> The matching siw user library is maintained at > >> https://github.com/zrlio/softiwarp-user-for-linux-rdma.git. > >> It is based on rdma-core, and extended with Steve's patches > >> to both rdma netlink and portmapper. The relevant branch > >> name is 'siw-for-rdma-next'. > >> > >> Since siw may generate work completion notifications from a > >> kthread context, performance of kernel siw clients like NVMeF > >> depends on the following pending kernel patch (included in > >> https://github.com/zrlio/softiwarp-for-linux-rdma.git): > >> https://www.spinics.net/lists/linux-rdma/msg75081.html > >> > >> As always, we'd highly appreciate your code review. Thanks > >> very much for your time! > >> > >> Bernard > > > >Bernard, > > > >Can you please post your fixed series so we will be able to continue > >review? > > > > Leon, > > I do my best as time permits to bring along a next version > end of this week. > I try to follow all advises, even if I end up in battles > between applying clang-format and checkpatch, which > yields partially contradicting results, which I then manually > fix, hoping it still makes us all happy. Very strange, I'm running all my patches through clang-formatter and saw only one difference between final result and checkpatch expectation - space in the loops. That error is ignored nowadays. > > I hope we then focus on functional/correctness, and keep > style aside a little: I will have it clang-format formatted > applying styles from .clang-format of the kernel plus fixes of > that silly output to match with kernels checkpatch). Not to > mention the potential result of checkpatch, sparse, clang > or other tools applied to the given rdma-core files and other > drivers. ;) This sounds like a tedious task, but it actually catches a lot of errors, this is why it is better to see them fixed before actual reviewing. > > > Best, > Bernard. >
--- Bernard Metzler, PhD Tech. Leader High Performance I/O, Principal Research Staff IBM Zurich Research Laboratory Saeumerstrasse 4 CH-8803 Rueschlikon, Switzerland +41 44 724 8605 -----"Leon Romanovsky" <leon@kernel.org> wrote: ----- >To: "Bernard Metzler" <BMT@zurich.ibm.com> >From: "Leon Romanovsky" <leon@kernel.org> >Date: 02/28/2019 08:23PM >Cc: linux-rdma@vger.kernel.org >Subject: Re: [PATCH v5 00/13] SIW: Request for Comments > >On Thu, Feb 28, 2019 at 02:28:41PM +0000, Bernard Metzler wrote: >> -----"Leon Romanovsky" <leon@kernel.org> wrote: ----- >> >> >To: "Bernard Metzler" <bmt@zurich.ibm.com> >> >From: "Leon Romanovsky" <leon@kernel.org> >> >Date: 02/28/2019 01:33PM >> >Cc: linux-rdma@vger.kernel.org >> >Subject: Re: [PATCH v5 00/13] SIW: Request for Comments >> > >> >On Tue, Feb 19, 2019 at 11:08:50AM +0100, Bernard Metzler wrote: >> >> This patch set contributes version 5 of the SoftiWarp >> >> driver, as originally introduced to the list Oct 6th, 2017. >> >> SoftiWarp (siw) implements the iWarp RDMA protocol over >> >> kernel TCP sockets. The driver integrates with the >> >> linux-rdma framework. >> >> >> >> In response to the various helpful feedback, we fixed (besides >> >> other small fixes) the following issues: >> >> >> >> 1. The debugfs logic got completely removed. There is no >> >> debugfs entry for siw anymore. We will interate with rdmatool >> >> in the future. >> >> >> >> 2. The driver adheres to the device and object management as >> >> recently proposed by Jason Gunthorpe. This includes the new >> >> protection domain management. >> >> >> >> 3. An entry to linux MAINTAINERS for siw was added. >> >> >> >> 4. The packet length at MPA protocol level got restricted >> >> to physical MTU size: to interoperate with hardware iWarp >> >> devices such as Chelsio T5/T6, GSO usage got unselected. >> >> It can be selcted with a global parameter 'gso_seg_limit'. >> >> Setting 'gso_seg_limit = 0' makes full use of GSO if >advertised >> >> by the TCP socket. This almost doubles bandwidth for bulk >data >> >> transfers in a pure siw <-> siw setting, but does not >> >interoperate >> >> wth some hardware RNIC's. >> >> >> >> 5. An 'SPDX-License-Identifier' got added to all files >> >> >> >> 6. The user interface as defined in >include/uapi/rdma/siwa_user.h >> >> got cleaned up. >> >> >> >> 7. Useless redefinition of functions for kernel reference count >> >> reading got removed. >> >> >> >> >> >> The driver continues to make use of rdma-netdev extensions to >> >> add and delete links, as proposed by Steve Wise. >> >> >> >> The driver continues to rely on the iWarp port mapper extensions >> >> for software RDMA devices, as proposed by Steve Wise. >> >> >> >> We maintain a snapshot of the current code at >> >> https://github.com/zrlio/softiwarp-for-linux-rdma.git >> >> within branch 'siw-for-rdma-next-v5'. >> >> This branch includes the latest netlink and portmapper patches >from >> >> Steve as well as the latest device and object management >> >> from Jason. >> >> >> >> The matching siw user library is maintained at >> >> https://github.com/zrlio/softiwarp-user-for-linux-rdma.git. >> >> It is based on rdma-core, and extended with Steve's patches >> >> to both rdma netlink and portmapper. The relevant branch >> >> name is 'siw-for-rdma-next'. >> >> >> >> Since siw may generate work completion notifications from a >> >> kthread context, performance of kernel siw clients like NVMeF >> >> depends on the following pending kernel patch (included in >> >> https://github.com/zrlio/softiwarp-for-linux-rdma.git): >> >> https://www.spinics.net/lists/linux-rdma/msg75081.html >> >> >> >> As always, we'd highly appreciate your code review. Thanks >> >> very much for your time! >> >> >> >> Bernard >> > >> >Bernard, >> > >> >Can you please post your fixed series so we will be able to >continue >> >review? >> > >> >> Leon, >> >> I do my best as time permits to bring along a next version >> end of this week. >> I try to follow all advises, even if I end up in battles >> between applying clang-format and checkpatch, which >> yields partially contradicting results, which I then manually >> fix, hoping it still makes us all happy. > >Very strange, I'm running all my patches through clang-formatter >and saw only one difference between final result and checkpatch >expectation - space in the loops. That error is ignored nowadays. > Yes, but I really ran into that. clang-formt even sometimes inserts me white spaces at the beginning of a line (within enum definitions), between '*' and variable name, between function name and opening parenthesis '(', it even inserts goto labels... Is there any style file available which fits our needs (I tried without explicit style, and -style=file (which picks the .clang-format from the root of the kernel sources)? Many thanks for your help, Bernard. >> >> I hope we then focus on functional/correctness, and keep >> style aside a little: I will have it clang-format formatted >> applying styles from .clang-format of the kernel plus fixes of >> that silly output to match with kernels checkpatch). Not to >> mention the potential result of checkpatch, sparse, clang >> or other tools applied to the given rdma-core files and other >> drivers. ;) > >This sounds like a tedious task, but it actually catches a lot of >errors, this is why it is better to see them fixed before actual >reviewing. > >> >> >> Best, >> Bernard. >> > [attachment "signature.asc" removed by Bernard Metzler/Zurich/IBM]
On Fri, Mar 01, 2019 at 09:16:32AM +0000, Bernard Metzler wrote: > > > --- > Bernard Metzler, PhD > Tech. Leader High Performance I/O, Principal Research Staff > IBM Zurich Research Laboratory > Saeumerstrasse 4 > CH-8803 Rueschlikon, Switzerland > +41 44 724 8605 > > -----"Leon Romanovsky" <leon@kernel.org> wrote: ----- > > >To: "Bernard Metzler" <BMT@zurich.ibm.com> > >From: "Leon Romanovsky" <leon@kernel.org> > >Date: 02/28/2019 08:23PM > >Cc: linux-rdma@vger.kernel.org > >Subject: Re: [PATCH v5 00/13] SIW: Request for Comments > > > >On Thu, Feb 28, 2019 at 02:28:41PM +0000, Bernard Metzler wrote: > >> -----"Leon Romanovsky" <leon@kernel.org> wrote: ----- > >> > >> >To: "Bernard Metzler" <bmt@zurich.ibm.com> > >> >From: "Leon Romanovsky" <leon@kernel.org> > >> >Date: 02/28/2019 01:33PM > >> >Cc: linux-rdma@vger.kernel.org > >> >Subject: Re: [PATCH v5 00/13] SIW: Request for Comments > >> > > >> >On Tue, Feb 19, 2019 at 11:08:50AM +0100, Bernard Metzler wrote: > >> >> This patch set contributes version 5 of the SoftiWarp > >> >> driver, as originally introduced to the list Oct 6th, 2017. > >> >> SoftiWarp (siw) implements the iWarp RDMA protocol over > >> >> kernel TCP sockets. The driver integrates with the > >> >> linux-rdma framework. > >> >> > >> >> In response to the various helpful feedback, we fixed (besides > >> >> other small fixes) the following issues: > >> >> > >> >> 1. The debugfs logic got completely removed. There is no > >> >> debugfs entry for siw anymore. We will interate with rdmatool > >> >> in the future. > >> >> > >> >> 2. The driver adheres to the device and object management as > >> >> recently proposed by Jason Gunthorpe. This includes the new > >> >> protection domain management. > >> >> > >> >> 3. An entry to linux MAINTAINERS for siw was added. > >> >> > >> >> 4. The packet length at MPA protocol level got restricted > >> >> to physical MTU size: to interoperate with hardware iWarp > >> >> devices such as Chelsio T5/T6, GSO usage got unselected. > >> >> It can be selcted with a global parameter 'gso_seg_limit'. > >> >> Setting 'gso_seg_limit = 0' makes full use of GSO if > >advertised > >> >> by the TCP socket. This almost doubles bandwidth for bulk > >data > >> >> transfers in a pure siw <-> siw setting, but does not > >> >interoperate > >> >> wth some hardware RNIC's. > >> >> > >> >> 5. An 'SPDX-License-Identifier' got added to all files > >> >> > >> >> 6. The user interface as defined in > >include/uapi/rdma/siwa_user.h > >> >> got cleaned up. > >> >> > >> >> 7. Useless redefinition of functions for kernel reference count > >> >> reading got removed. > >> >> > >> >> > >> >> The driver continues to make use of rdma-netdev extensions to > >> >> add and delete links, as proposed by Steve Wise. > >> >> > >> >> The driver continues to rely on the iWarp port mapper extensions > >> >> for software RDMA devices, as proposed by Steve Wise. > >> >> > >> >> We maintain a snapshot of the current code at > >> >> https://github.com/zrlio/softiwarp-for-linux-rdma.git > >> >> within branch 'siw-for-rdma-next-v5'. > >> >> This branch includes the latest netlink and portmapper patches > >from > >> >> Steve as well as the latest device and object management > >> >> from Jason. > >> >> > >> >> The matching siw user library is maintained at > >> >> https://github.com/zrlio/softiwarp-user-for-linux-rdma.git. > >> >> It is based on rdma-core, and extended with Steve's patches > >> >> to both rdma netlink and portmapper. The relevant branch > >> >> name is 'siw-for-rdma-next'. > >> >> > >> >> Since siw may generate work completion notifications from a > >> >> kthread context, performance of kernel siw clients like NVMeF > >> >> depends on the following pending kernel patch (included in > >> >> https://github.com/zrlio/softiwarp-for-linux-rdma.git): > >> >> https://www.spinics.net/lists/linux-rdma/msg75081.html > >> >> > >> >> As always, we'd highly appreciate your code review. Thanks > >> >> very much for your time! > >> >> > >> >> Bernard > >> > > >> >Bernard, > >> > > >> >Can you please post your fixed series so we will be able to > >continue > >> >review? > >> > > >> > >> Leon, > >> > >> I do my best as time permits to bring along a next version > >> end of this week. > >> I try to follow all advises, even if I end up in battles > >> between applying clang-format and checkpatch, which > >> yields partially contradicting results, which I then manually > >> fix, hoping it still makes us all happy. > > > >Very strange, I'm running all my patches through clang-formatter > >and saw only one difference between final result and checkpatch > >expectation - space in the loops. That error is ignored nowadays. > > > Yes, but I really ran into that. clang-formt even sometimes inserts > me white spaces at the beginning of a line (within enum definitions), > between '*' and variable name, between function name and opening > parenthesis '(', it even inserts goto labels... Is there any style > file available which fits our needs (I tried without explicit style, > and -style=file (which picks the .clang-format from the root of > the kernel sources)? For me it worked almost out of the box. I installed clang together with git-clang-format, downloaded clang-format.py from the web and put this snippet into .vimrc: " Format line in INSERT mode or buffer in VISUAL mode to clang format map <C-F> :pyf $HOME/.vim/clang-format.py<cr> imap <C-F> <c-o>:pyf $HOME/.vim/clang-format.py<cr> Now Ctrl-F formats me the source code according to kernel .clang-format file. Thanks > > Many thanks for your help, > Bernard. > > >> > >> I hope we then focus on functional/correctness, and keep > >> style aside a little: I will have it clang-format formatted > >> applying styles from .clang-format of the kernel plus fixes of > >> that silly output to match with kernels checkpatch). Not to > >> mention the potential result of checkpatch, sparse, clang > >> or other tools applied to the given rdma-core files and other > >> drivers. ;) > > > >This sounds like a tedious task, but it actually catches a lot of > >errors, this is why it is better to see them fixed before actual > >reviewing. > > > >> > >> > >> Best, > >> Bernard. > >> > > > [attachment "signature.asc" removed by Bernard Metzler/Zurich/IBM] >
-----linux-rdma-owner@vger.kernel.org wrote: ----- >To: "Bernard Metzler" <BMT@zurich.ibm.com> >From: "Leon Romanovsky" >Sent by: linux-rdma-owner@vger.kernel.org >Date: 03/03/2019 01:43PM >Cc: linux-rdma@vger.kernel.org >Subject: Re: [PATCH v5 00/13] SIW: Request for Comments > >On Fri, Mar 01, 2019 at 09:16:32AM +0000, Bernard Metzler wrote: >> >> >> --- >> Bernard Metzler, PhD >> Tech. Leader High Performance I/O, Principal Research Staff >> IBM Zurich Research Laboratory >> Saeumerstrasse 4 >> CH-8803 Rueschlikon, Switzerland >> +41 44 724 8605 >> >> -----"Leon Romanovsky" <leon@kernel.org> wrote: ----- >> >> >To: "Bernard Metzler" <BMT@zurich.ibm.com> >> >From: "Leon Romanovsky" <leon@kernel.org> >> >Date: 02/28/2019 08:23PM >> >Cc: linux-rdma@vger.kernel.org >> >Subject: Re: [PATCH v5 00/13] SIW: Request for Comments >> > >> >On Thu, Feb 28, 2019 at 02:28:41PM +0000, Bernard Metzler wrote: >> >> -----"Leon Romanovsky" <leon@kernel.org> wrote: ----- >> >> >> >> >To: "Bernard Metzler" <bmt@zurich.ibm.com> >> >> >From: "Leon Romanovsky" <leon@kernel.org> >> >> >Date: 02/28/2019 01:33PM >> >> >Cc: linux-rdma@vger.kernel.org >> >> >Subject: Re: [PATCH v5 00/13] SIW: Request for Comments >> >> > >> >> >On Tue, Feb 19, 2019 at 11:08:50AM +0100, Bernard Metzler >wrote: >> >> >> This patch set contributes version 5 of the SoftiWarp >> >> >> driver, as originally introduced to the list Oct 6th, 2017. >> >> >> SoftiWarp (siw) implements the iWarp RDMA protocol over >> >> >> kernel TCP sockets. The driver integrates with the >> >> >> linux-rdma framework. >> >> >> >> >> >> In response to the various helpful feedback, we fixed >(besides >> >> >> other small fixes) the following issues: >> >> >> >> >> >> 1. The debugfs logic got completely removed. There is no >> >> >> debugfs entry for siw anymore. We will interate with >rdmatool >> >> >> in the future. >> >> >> >> >> >> 2. The driver adheres to the device and object management as >> >> >> recently proposed by Jason Gunthorpe. This includes the >new >> >> >> protection domain management. >> >> >> >> >> >> 3. An entry to linux MAINTAINERS for siw was added. >> >> >> >> >> >> 4. The packet length at MPA protocol level got restricted >> >> >> to physical MTU size: to interoperate with hardware iWarp >> >> >> devices such as Chelsio T5/T6, GSO usage got unselected. >> >> >> It can be selcted with a global parameter 'gso_seg_limit'. >> >> >> Setting 'gso_seg_limit = 0' makes full use of GSO if >> >advertised >> >> >> by the TCP socket. This almost doubles bandwidth for bulk >> >data >> >> >> transfers in a pure siw <-> siw setting, but does not >> >> >interoperate >> >> >> wth some hardware RNIC's. >> >> >> >> >> >> 5. An 'SPDX-License-Identifier' got added to all files >> >> >> >> >> >> 6. The user interface as defined in >> >include/uapi/rdma/siwa_user.h >> >> >> got cleaned up. >> >> >> >> >> >> 7. Useless redefinition of functions for kernel reference >count >> >> >> reading got removed. >> >> >> >> >> >> >> >> >> The driver continues to make use of rdma-netdev extensions to >> >> >> add and delete links, as proposed by Steve Wise. >> >> >> >> >> >> The driver continues to rely on the iWarp port mapper >extensions >> >> >> for software RDMA devices, as proposed by Steve Wise. >> >> >> >> >> >> We maintain a snapshot of the current code at >> >> >> https://github.com/zrlio/softiwarp-for-linux-rdma.git >> >> >> within branch 'siw-for-rdma-next-v5'. >> >> >> This branch includes the latest netlink and portmapper >patches >> >from >> >> >> Steve as well as the latest device and object management >> >> >> from Jason. >> >> >> >> >> >> The matching siw user library is maintained at >> >> >> https://github.com/zrlio/softiwarp-user-for-linux-rdma.git. >> >> >> It is based on rdma-core, and extended with Steve's patches >> >> >> to both rdma netlink and portmapper. The relevant branch >> >> >> name is 'siw-for-rdma-next'. >> >> >> >> >> >> Since siw may generate work completion notifications from a >> >> >> kthread context, performance of kernel siw clients like NVMeF >> >> >> depends on the following pending kernel patch (included in >> >> >> https://github.com/zrlio/softiwarp-for-linux-rdma.git): >> >> >> https://www.spinics.net/lists/linux-rdma/msg75081.html >> >> >> >> >> >> As always, we'd highly appreciate your code review. Thanks >> >> >> very much for your time! >> >> >> >> >> >> Bernard >> >> > >> >> >Bernard, >> >> > >> >> >Can you please post your fixed series so we will be able to >> >continue >> >> >review? >> >> > >> >> >> >> Leon, >> >> >> >> I do my best as time permits to bring along a next version >> >> end of this week. >> >> I try to follow all advises, even if I end up in battles >> >> between applying clang-format and checkpatch, which >> >> yields partially contradicting results, which I then manually >> >> fix, hoping it still makes us all happy. >> > >> >Very strange, I'm running all my patches through clang-formatter >> >and saw only one difference between final result and checkpatch >> >expectation - space in the loops. That error is ignored nowadays. >> > >> Yes, but I really ran into that. clang-formt even sometimes inserts >> me white spaces at the beginning of a line (within enum >definitions), >> between '*' and variable name, between function name and opening >> parenthesis '(', it even inserts goto labels... Is there any style >> file available which fits our needs (I tried without explicit >style, >> and -style=file (which picks the .clang-format from the root of >> the kernel sources)? > >For me it worked almost out of the box. >I installed clang together with git-clang-format, downloaded >clang-format.py from the web and put this snippet into .vimrc: > " Format line in INSERT mode or buffer in VISUAL mode to clang >format > map <C-F> :pyf $HOME/.vim/clang-format.py<cr> > imap <C-F> <c-o>:pyf $HOME/.vim/clang-format.py<cr> > >Now Ctrl-F formats me the source code according to kernel >.clang-format >file. Thanks Leon, Let me try that. I hope next time it will avoid the mess I got. Thanks Bernard. > >> >> Many thanks for your help, >> Bernard. >> >> >> >> >> I hope we then focus on functional/correctness, and keep >> >> style aside a little: I will have it clang-format formatted >> >> applying styles from .clang-format of the kernel plus fixes of >> >> that silly output to match with kernels checkpatch). Not to >> >> mention the potential result of checkpatch, sparse, clang >> >> or other tools applied to the given rdma-core files and other >> >> drivers. ;) >> > >> >This sounds like a tedious task, but it actually catches a lot of >> >errors, this is why it is better to see them fixed before actual >> >reviewing. >> > >> >> >> >> >> >> Best, >> >> Bernard. >> >> >> > >> [attachment "signature.asc" removed by Bernard Metzler/Zurich/IBM] >> > [attachment "signature.asc" removed by Bernard Metzler/Zurich/IBM]
Hi Bernard, > -----Original Message----- > From: linux-rdma-owner@vger.kernel.org <linux-rdma- > owner@vger.kernel.org> On Behalf Of Bernard Metzler > Sent: Tuesday, February 19, 2019 4:09 AM > To: linux-rdma@vger.kernel.org > Cc: Bernard Metzler <bmt@zurich.ibm.com> > Subject: [PATCH v5 00/13] SIW: Request for Comments > > This patch set contributes version 5 of the SoftiWarp driver, as originally > introduced to the list Oct 6th, 2017. > SoftiWarp (siw) implements the iWarp RDMA protocol over kernel TCP > sockets. The driver integrates with the linux-rdma framework. > Great to see another software driver after rdma_rxe. rdma_rxe is improved incrementally by several people. rdma_rxe and siw have lot in common as below. 1. qp, mr, cq, srq resource allocation 2. state machine handling for qp, mr, srq 3. user interface for post_send, recv, cq poll, notification 4. user interface for resource creation qp, cq, mr, srq etc resources 5. data path handling for invalidate, send with invalidate etc 6. netlink life cycle commands for iwarp rdma devices What differs between them is - skb processing (roce) vs sockets (iwarp). This part of the transport layer is different between them. Hence please create transport callbacks to do transport specific work. So please reuse the rdma_rxe driver, refactor it to accommodate the need to iwarp.
Hi Steve, From: Steve Wise <larrystevenwise@gmail.com> Sent: Sunday, March 10, 2019 1:17 PM To: Parav Pandit <parav@mellanox.com> Cc: Bernard Metzler <bmt@zurich.ibm.com>; linux-rdma@vger.kernel.org; Leon Romanovsky <leonro@mellanox.com>; Yuval Shaia <yuval.shaia@oracle.com> Subject: Re: [PATCH v5 00/13] SIW: Request for Comments Hey Parav, On Sun, Mar 10, 2019 at 12:45 PM Parav Pandit <mailto:parav@mellanox.com> wrote: Hi Bernard, > -----Original Message----- > From: mailto:linux-rdma-owner@vger.kernel.org <linux-rdma- > mailto:owner@vger.kernel.org> On Behalf Of Bernard Metzler > Sent: Tuesday, February 19, 2019 4:09 AM > To: mailto:linux-rdma@vger.kernel.org > Cc: Bernard Metzler <mailto:bmt@zurich.ibm.com> > Subject: [PATCH v5 00/13] SIW: Request for Comments > > This patch set contributes version 5 of the SoftiWarp driver, as originally > introduced to the list Oct 6th, 2017. > SoftiWarp (siw) implements the iWarp RDMA protocol over kernel TCP > sockets. The driver integrates with the linux-rdma framework. > Great to see another software driver after rdma_rxe. rdma_rxe is improved incrementally by several people. rdma_rxe and siw have lot in common as below. 1. qp, mr, cq, srq resource allocation 2. state machine handling for qp, mr, srq The iwarp qp state machine is different. 3. user interface for post_send, recv, cq poll, notification > There are semantic differences as well. Two I can think of are the meaning behind a send completion for IB vs iWARP, and the > requirement that sink MRs for an iwarp rdma read must use an rkey with REMOTE_WRITE. [Parav] These are internals to take care at transport level. Doesn't deserve a different driver for it. >> 4. user interface for resource creation qp, cq, mr, srq etc resources > The user interface is already common for all rdma drivers. ib_create_qp(), ib_create_cq(), etc.. [Parav] user interface at code level between user and kernel level. siw needs to reuse rdma_rxe library too, instead of creating new user library and kernel code for handling all of it. > 5. data path handling for invalidate, send with invalidate etc > 6. netlink life cycle commands for iwarp rdma devices >> What differs between them is - skb processing (roce) vs sockets (iwarp). > And the fact that each siw connection is its own socket and tcp connection, vs a single tunneled UDP socket for all RoCE "connections". [Parav] This is transport internals, that should be treated at lowest level. This part of the transport layer is different between them. Hence please create transport callbacks to do transport specific work. So please reuse the rdma_rxe driver, refactor it to accommodate the need to iwarp. > I don't agree with this. Soft iWARP is different enough to let it stand on its own, certainly for the first inclusion into Linux. [Parav] It doesn't matter first or second inclusion. It is second inclusion of a sw rdma driver that can possibly leverage existing rdma rxe. > Also, the integration with the rdma-cm would have to be refactored, i think, since there are two very different paths through the CM for > RoCE vs iWARP. [Parav] So that CM handling can be different. QP, MR states are still handled in common way. Rdma cm path is just additional pieces, and its fine to include it in rdma_rxe when iwarp interfaces are created using netlink. modify_qp_is_ok() had link layer check that Kamal removed, and that can be pulled back easily. > I can see the two sharing some of the data structures and methods, but to ask this now basically asks for a rewrite of both rxe and siw, and > that is unreasonable, in my opinion. [Parav] why it is unreasonable? Leon already quoted " it is much more easier and exciting to write something new instead of fixing already existing piece of code. Luckily enough, kernel community doesn't allow new code without proving that old code is not possible to fix." Though it was on different thread, it equally applies here to reuse and refactor existing driver. So refactoring rdma_rxe should be possible. Yuval also likes to enhance rxe, so you might get help from him as well to do so. Not doing refactor is not a good idea. rdmavt is good example sharing common code between two drivers. > Let siw evolve incrementally, as rxe has. Sure, it can involve incrementally inside the rxe. Some of the in-reply is messed up. I tried but couldn't fix all the email text. Steve, Please fix the gmail client for text-only replies.
Hi Parav, I understand your point of having some common driver infrastructure between rxe and siw. I remember I was even proposing rxe folks looking into that when rxe emerged after siw was already prototyped and open sourced (yes, siw was first out there). There was not much interest from rxe folks at that point in time, which is OK and understood (always cool to write something new and better from scratch, who wants iWarp, etc.). At the current stage of the project, it is rather contra-productive to completely refactor siw to match with rxe, since I do not have the resources to rip apart the whole thing, and I am not yet convinced it makes sense. If one asks for merging these days, one asks at first for a delay in getting siw accepted. Over time, it might indeed be a good thing to merge parts. As said, I'd still like to investigate that. But let the sometimes intentionally different concepts of these two drivers co-exist for some time out there, and let's draw conclusions of what to make common later. And let's be clear, lots of things between RoCE and iWarp are very different. Not only wire protocol and lower interfaces, but also semantically. These things will stay different by specification, and an efficient implementation of those things will likely still go it's own ways. Thanks Bernard. -----"Parav Pandit" <parav@mellanox.com> wrote: ----- >To: "Steve Wise" <larrystevenwise@gmail.com> >From: "Parav Pandit" <parav@mellanox.com> >Date: 03/10/2019 08:06PM >Cc: "Bernard Metzler" <bmt@zurich.ibm.com>, >"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>, "Leon >Romanovsky" <leonro@mellanox.com>, "Yuval Shaia" ><yuval.shaia@oracle.com> >Subject: RE: [PATCH v5 00/13] SIW: Request for Comments > >Hi Steve, > >From: Steve Wise <larrystevenwise@gmail.com> >Sent: Sunday, March 10, 2019 1:17 PM >To: Parav Pandit <parav@mellanox.com> >Cc: Bernard Metzler <bmt@zurich.ibm.com>; linux-rdma@vger.kernel.org; >Leon Romanovsky <leonro@mellanox.com>; Yuval Shaia ><yuval.shaia@oracle.com> >Subject: Re: [PATCH v5 00/13] SIW: Request for Comments > >Hey Parav, > >On Sun, Mar 10, 2019 at 12:45 PM Parav Pandit ><mailto:parav@mellanox.com> wrote: >Hi Bernard, > >> -----Original Message----- >> From: mailto:linux-rdma-owner@vger.kernel.org <linux-rdma- >> mailto:owner@vger.kernel.org> On Behalf Of Bernard Metzler >> Sent: Tuesday, February 19, 2019 4:09 AM >> To: mailto:linux-rdma@vger.kernel.org >> Cc: Bernard Metzler <mailto:bmt@zurich.ibm.com> >> Subject: [PATCH v5 00/13] SIW: Request for Comments >> >> This patch set contributes version 5 of the SoftiWarp driver, as >originally >> introduced to the list Oct 6th, 2017. >> SoftiWarp (siw) implements the iWarp RDMA protocol over kernel TCP >> sockets. The driver integrates with the linux-rdma framework. >> > >Great to see another software driver after rdma_rxe. >rdma_rxe is improved incrementally by several people. > >rdma_rxe and siw have lot in common as below. >1. qp, mr, cq, srq resource allocation >2. state machine handling for qp, mr, srq > >The iwarp qp state machine is different. > >3. user interface for post_send, recv, cq poll, notification > >> There are semantic differences as well. Two I can think of are the >meaning behind a send completion for IB vs iWARP, and the >> requirement that sink MRs for an iwarp rdma read must use an rkey >with REMOTE_WRITE. > >[Parav] These are internals to take care at transport level. Doesn't >deserve a different driver for it. > >>> 4. user interface for resource creation qp, cq, mr, srq etc >resources >> The user interface is already common for all rdma drivers. >ib_create_qp(), ib_create_cq(), etc.. > >[Parav] user interface at code level between user and kernel level. >siw needs to reuse rdma_rxe library too, instead of creating new user >library and kernel code for handling all of it. > >> 5. data path handling for invalidate, send with invalidate etc >> 6. netlink life cycle commands for iwarp rdma devices > >>> What differs between them is - skb processing (roce) vs sockets >(iwarp). > >> And the fact that each siw connection is its own socket and tcp >connection, vs a single tunneled UDP socket for all RoCE >"connections". >[Parav] This is transport internals, that should be treated at lowest >level. >This part of the transport layer is different between them. >Hence please create transport callbacks to do transport specific >work. > >So please reuse the rdma_rxe driver, refactor it to accommodate the >need to iwarp. > >> I don't agree with this. Soft iWARP is different enough to let it >stand on its own, certainly for the first inclusion into Linux. >[Parav] It doesn't matter first or second inclusion. >It is second inclusion of a sw rdma driver that can possibly leverage >existing rdma rxe. > > > Also, the integration with the rdma-cm would have to be >refactored, i think, since there are two very different paths through >the CM for >> RoCE vs iWARP. >[Parav] So that CM handling can be different. QP, MR states are still >handled in common way. >Rdma cm path is just additional pieces, and its fine to include it in >rdma_rxe when iwarp interfaces are created using netlink. >modify_qp_is_ok() had link layer check that Kamal removed, and that >can be pulled back easily. > >> I can see the two sharing some of the data structures and methods, >but to ask this now basically asks for a rewrite of both rxe and siw, >and > that is unreasonable, in my opinion. > >[Parav] why it is unreasonable? >Leon already quoted " it is much more easier and exciting to write >something new instead of fixing already existing piece of code. >Luckily enough, kernel community doesn't allow new code without >proving that old code is not possible to fix." >Though it was on different thread, it equally applies here to reuse >and refactor existing driver. > >So refactoring rdma_rxe should be possible. >Yuval also likes to enhance rxe, so you might get help from him as >well to do so. > >Not doing refactor is not a good idea. >rdmavt is good example sharing common code between two drivers. > >> Let siw evolve incrementally, as rxe has. >Sure, it can involve incrementally inside the rxe. > >Some of the in-reply is messed up. I tried but couldn't fix all the >email text. >Steve, >Please fix the gmail client for text-only replies. >
Hi Bernard, > -----Original Message----- > From: Bernard Metzler <BMT@zurich.ibm.com> > Sent: Sunday, March 10, 2019 4:41 PM > To: Parav Pandit <parav@mellanox.com> > Cc: Steve Wise <larrystevenwise@gmail.com>; linux-rdma@vger.kernel.org; > Leon Romanovsky <leonro@mellanox.com>; Yuval Shaia > <yuval.shaia@oracle.com> > Subject: RE: [PATCH v5 00/13] SIW: Request for Comments > > Hi Parav, > > I understand your point of having some common driver infrastructure > between rxe and siw. I remember I was even proposing rxe folks looking into > that when rxe emerged after siw was already prototyped and open sourced > (yes, siw was first out there). There was not much interest from rxe folks at > that point in time, which is OK and understood (always cool to write > something new and better from scratch, who wants iWarp, etc.). > I didn't follow that part of the history, but it clearly proves the point that it was started incorrectly. With all the fancy code for doorbells, with mmap() call for QP, CQ, ground reality is, it doesn't even pass a developer's silly ib_write_bw today. And at that test shows 2Gbps bw on 64Gb RAM with 48 core cpus! > At the current stage of the project, it is rather contra-productive to > completely refactor siw to match with rxe, since I do not have the resources > to rip apart the whole thing, and I am not yet convinced it makes sense. > But that seems to be right approach to have modular code and reuse existing pieces. Leon, Yuval and others resonate that too. If existing pieces are inferior or mis fit, their technical details has to be called out. rdmavt improved and used library model to share code between two drivers. > If one asks for merging these days, one asks at first for a delay in getting siw > accepted. > Yuval and Leon wants to improve rxe in general and have desire to reuse rxe. They might be able to help you. Please check with them. > Over time, it might indeed be a good thing to merge parts. It should happen from beginning. > As said, I'd still like to investigate that. That will be very helpful. > But let the sometimes intentionally different concepts of these two drivers > co-exist for some time out there, and let's draw conclusions of what to make > common later. > It won't happen later once it is in tree. :-) > And let's be clear, lots of things between RoCE and iWarp are very different. All rest of the core components such as ib_core, addr, rdmacm are able to handle it two different transports. Cavium or qedr driver is dual protocol driver that support RoCE and iWarp both in single driver. > Not only wire protocol and lower interfaces, but also semantically. These > things will stay different by specification, and an efficient implementation of > those things will likely still go it's own ways. > We would like to understand technically why rxe is mis fit for it, and why it cannot be improved. Lack of resources is probably not good reason. > Thanks > Bernard. > -----"Parav Pandit" <parav@mellanox.com> wrote: ----- > > >To: "Steve Wise" <larrystevenwise@gmail.com> > >From: "Parav Pandit" <parav@mellanox.com> > >Date: 03/10/2019 08:06PM > >Cc: "Bernard Metzler" <bmt@zurich.ibm.com>, > >"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>, "Leon > >Romanovsky" <leonro@mellanox.com>, "Yuval Shaia" > ><yuval.shaia@oracle.com> > >Subject: RE: [PATCH v5 00/13] SIW: Request for Comments > > > >Hi Steve, > > > >From: Steve Wise <larrystevenwise@gmail.com> > >Sent: Sunday, March 10, 2019 1:17 PM > >To: Parav Pandit <parav@mellanox.com> > >Cc: Bernard Metzler <bmt@zurich.ibm.com>; linux-rdma@vger.kernel.org; > >Leon Romanovsky <leonro@mellanox.com>; Yuval Shaia > ><yuval.shaia@oracle.com> > >Subject: Re: [PATCH v5 00/13] SIW: Request for Comments > > > >Hey Parav, > > > >On Sun, Mar 10, 2019 at 12:45 PM Parav Pandit > ><mailto:parav@mellanox.com> wrote: > >Hi Bernard, > > > >> -----Original Message----- > >> From: mailto:linux-rdma-owner@vger.kernel.org <linux-rdma- > >> mailto:owner@vger.kernel.org> On Behalf Of Bernard Metzler > >> Sent: Tuesday, February 19, 2019 4:09 AM > >> To: mailto:linux-rdma@vger.kernel.org > >> Cc: Bernard Metzler <mailto:bmt@zurich.ibm.com> > >> Subject: [PATCH v5 00/13] SIW: Request for Comments > >> > >> This patch set contributes version 5 of the SoftiWarp driver, as > >originally > >> introduced to the list Oct 6th, 2017. > >> SoftiWarp (siw) implements the iWarp RDMA protocol over kernel TCP > >> sockets. The driver integrates with the linux-rdma framework. > >> > > > >Great to see another software driver after rdma_rxe. > >rdma_rxe is improved incrementally by several people. > > > >rdma_rxe and siw have lot in common as below. > >1. qp, mr, cq, srq resource allocation > >2. state machine handling for qp, mr, srq > > > >The iwarp qp state machine is different. > > > >3. user interface for post_send, recv, cq poll, notification > > > >> There are semantic differences as well. Two I can think of are the > >meaning behind a send completion for IB vs iWARP, and the > >> requirement that sink MRs for an iwarp rdma read must use an rkey > >with REMOTE_WRITE. > > > >[Parav] These are internals to take care at transport level. Doesn't > >deserve a different driver for it. > > > >>> 4. user interface for resource creation qp, cq, mr, srq etc > >resources > >> The user interface is already common for all rdma drivers. > >ib_create_qp(), ib_create_cq(), etc.. > > > >[Parav] user interface at code level between user and kernel level. > >siw needs to reuse rdma_rxe library too, instead of creating new user > >library and kernel code for handling all of it. > > > >> 5. data path handling for invalidate, send with invalidate etc 6. > >> netlink life cycle commands for iwarp rdma devices > > > >>> What differs between them is - skb processing (roce) vs sockets > >(iwarp). > > > >> And the fact that each siw connection is its own socket and tcp > >connection, vs a single tunneled UDP socket for all RoCE "connections". > >[Parav] This is transport internals, that should be treated at lowest > >level. > >This part of the transport layer is different between them. > >Hence please create transport callbacks to do transport specific work. > > > >So please reuse the rdma_rxe driver, refactor it to accommodate the > >need to iwarp. > > > >> I don't agree with this. Soft iWARP is different enough to let it > >stand on its own, certainly for the first inclusion into Linux. > >[Parav] It doesn't matter first or second inclusion. > >It is second inclusion of a sw rdma driver that can possibly leverage > >existing rdma rxe. > > > > > Also, the integration with the rdma-cm would have to be > >refactored, i think, since there are two very different paths through > >the CM for > >> RoCE vs iWARP. > >[Parav] So that CM handling can be different. QP, MR states are still > >handled in common way. > >Rdma cm path is just additional pieces, and its fine to include it in > >rdma_rxe when iwarp interfaces are created using netlink. > >modify_qp_is_ok() had link layer check that Kamal removed, and that can > >be pulled back easily. > > > >> I can see the two sharing some of the data structures and methods, > >but to ask this now basically asks for a rewrite of both rxe and siw, > >and > that is unreasonable, in my opinion. > > > >[Parav] why it is unreasonable? > >Leon already quoted " it is much more easier and exciting to write > >something new instead of fixing already existing piece of code. > >Luckily enough, kernel community doesn't allow new code without proving > >that old code is not possible to fix." > >Though it was on different thread, it equally applies here to reuse and > >refactor existing driver. > > > >So refactoring rdma_rxe should be possible. > >Yuval also likes to enhance rxe, so you might get help from him as well > >to do so. > > > >Not doing refactor is not a good idea. > >rdmavt is good example sharing common code between two drivers. > > > >> Let siw evolve incrementally, as rxe has. > >Sure, it can involve incrementally inside the rxe. > > > >Some of the in-reply is messed up. I tried but couldn't fix all the > >email text. > >Steve, > >Please fix the gmail client for text-only replies. > >
Hey Parav- > On Mar 10, 2019, at 6:01 PM, Parav Pandit <parav@mellanox.com> wrote: > > Hi Bernard, > > >> -----Original Message----- >> From: Bernard Metzler <BMT@zurich.ibm.com> >> Sent: Sunday, March 10, 2019 4:41 PM >> To: Parav Pandit <parav@mellanox.com> >> Cc: Steve Wise <larrystevenwise@gmail.com>; linux-rdma@vger.kernel.org; >> Leon Romanovsky <leonro@mellanox.com>; Yuval Shaia >> <yuval.shaia@oracle.com> >> Subject: RE: [PATCH v5 00/13] SIW: Request for Comments >> >> Hi Parav, >> >> I understand your point of having some common driver infrastructure >> between rxe and siw. I remember I was even proposing rxe folks looking into >> that when rxe emerged after siw was already prototyped and open sourced >> (yes, siw was first out there). There was not much interest from rxe folks at >> that point in time, which is OK and understood (always cool to write >> something new and better from scratch, who wants iWarp, etc.). >> > I didn't follow that part of the history, but it clearly proves the point that it was started incorrectly. > With all the fancy code for doorbells, with mmap() call for QP, CQ, ground reality is, it doesn't even pass a developer's silly ib_write_bw today. > And at that test shows 2Gbps bw on 64Gb RAM with 48 core cpus! > >> At the current stage of the project, it is rather contra-productive to >> completely refactor siw to match with rxe, since I do not have the resources >> to rip apart the whole thing, and I am not yet convinced it makes sense. >> > But that seems to be right approach to have modular code and reuse existing pieces. > Leon, Yuval and others resonate that too. I agree that code sharing can be valuable. However... > If existing pieces are inferior or mis fit, their technical details has to be called out. > rdmavt improved and used library model to share code between two drivers. > >> If one asks for merging these days, one asks at first for a delay in getting siw >> accepted. >> > Yuval and Leon wants to improve rxe in general and have desire to reuse rxe. > They might be able to help you. Please check with them. > >> Over time, it might indeed be a good thing to merge parts. > It should happen from beginning. I disagree that merging siw should wait for code de-duplication with rxe. The current siw is coming close to being merge-ready. I would like to see it in the kernel as soon as possible. RoCE, hard or soft, requires DCE in order to be fully useful, and thus it will never be possible to have a software RoCE driver that can operate via any Ethernet driver on any network. That is not the case with iWARP, which relies on TCP's built-in congestion avoidance, which is truly routable and available everywhere. The ULPs need to have a robust generic in-tree software driver as soon as we can get it to enable development, testing, and deployment in environments that have no physical RDMA device. siw is the fastest way to get that. >> As said, I'd still like to investigate that. > That will be very helpful. > >> But let the sometimes intentionally different concepts of these two drivers >> co-exist for some time out there, and let's draw conclusions of what to make >> common later. >> > It won't happen later once it is in tree. :-) There is nothing stopping code de-duplication once siw is in the kernel. The most obvious counter-example of your claim is the in-kernel ULPs which have adopted new core APIs such as ib_alloc_cq long after the ULPs were accepted and merged. >> And let's be clear, lots of things between RoCE and iWarp are very different. > All rest of the core components such as ib_core, addr, rdmacm are able to handle it two different transports. > Cavium or qedr driver is dual protocol driver that support RoCE and iWarp both in single driver. > >> Not only wire protocol and lower interfaces, but also semantically. These >> things will stay different by specification, and an efficient implementation of >> those things will likely still go it's own ways. >> > We would like to understand technically why rxe is mis fit for it, and why it cannot be improved. > Lack of resources is probably not good reason. Actually I find lack of resources to be a compelling argument. Before siw is merged, Bernard and his team are the only ones who can do this work, and they are not yet familiar with the rxe code base. After merging, anyone in the Linux RDMA community can help with it, including you, the person who is asking for this extra effort. It is reasonable to request that common code be refactored. It is neither fair nor productive to gate merging siw on that work. Why force Bernard to bear the burden of fixing rxe? Let's get siw merged, and then _everyone_ can have a say where it is valuable to share code. My two cents... -- Chuck Lever
Hi Chuck, > -----Original Message----- > From: Chuck Lever <chuck.lever@oracle.com> > Sent: Sunday, March 10, 2019 5:30 PM > To: Parav Pandit <parav@mellanox.com> > Cc: Bernard Metzler <BMT@zurich.ibm.com>; Steve Wise > <larrystevenwise@gmail.com>; linux-rdma@vger.kernel.org; Leon > Romanovsky <leonro@mellanox.com>; Yuval Shaia > <yuval.shaia@oracle.com> > Subject: Re: [PATCH v5 00/13] SIW: Request for Comments > > Hey Parav- > > > On Mar 10, 2019, at 6:01 PM, Parav Pandit <parav@mellanox.com> wrote: > > > > Hi Bernard, > > > > > >> -----Original Message----- > >> From: Bernard Metzler <BMT@zurich.ibm.com> > >> Sent: Sunday, March 10, 2019 4:41 PM > >> To: Parav Pandit <parav@mellanox.com> > >> Cc: Steve Wise <larrystevenwise@gmail.com>; > >> linux-rdma@vger.kernel.org; Leon Romanovsky <leonro@mellanox.com>; > >> Yuval Shaia <yuval.shaia@oracle.com> > >> Subject: RE: [PATCH v5 00/13] SIW: Request for Comments > >> > >> Hi Parav, > >> > >> I understand your point of having some common driver infrastructure > >> between rxe and siw. I remember I was even proposing rxe folks > >> looking into that when rxe emerged after siw was already prototyped > >> and open sourced (yes, siw was first out there). There was not much > >> interest from rxe folks at that point in time, which is OK and > >> understood (always cool to write something new and better from scratch, > who wants iWarp, etc.). > >> > > I didn't follow that part of the history, but it clearly proves the point that it > was started incorrectly. > > With all the fancy code for doorbells, with mmap() call for QP, CQ, ground > reality is, it doesn't even pass a developer's silly ib_write_bw today. > > And at that test shows 2Gbps bw on 64Gb RAM with 48 core cpus! > > > >> At the current stage of the project, it is rather contra-productive > >> to completely refactor siw to match with rxe, since I do not have the > >> resources to rip apart the whole thing, and I am not yet convinced it > makes sense. > >> > > But that seems to be right approach to have modular code and reuse > existing pieces. > > Leon, Yuval and others resonate that too. > > I agree that code sharing can be valuable. However... > > > > If existing pieces are inferior or mis fit, their technical details has to be > called out. > > rdmavt improved and used library model to share code between two > drivers. > > > >> If one asks for merging these days, one asks at first for a delay in > >> getting siw accepted. > >> > > Yuval and Leon wants to improve rxe in general and have desire to reuse > rxe. > > They might be able to help you. Please check with them. > > > >> Over time, it might indeed be a good thing to merge parts. > > It should happen from beginning. > > I disagree that merging siw should wait for code de-duplication with rxe. > > The current siw is coming close to being merge-ready. I would like to see it in > the kernel as soon as possible. RoCE, hard or soft, requires DCE in order to > be fully useful, and thus it will never be possible to have a software RoCE > driver that can operate via any Ethernet driver on any network. That is not > the case with iWARP, which relies on TCP's built-in congestion avoidance, > which is truly routable and available everywhere. Soft roce driver faces soft cpu lockup in single system at 2Gbps bandwidth with just two QPs (send and receive). :-) Going across a system is brave world. But Leon and Yuval likes to see it enhanced/used in general instead of new driver. And therefore, it extends to siw efforts too. > > The ULPs need to have a robust generic in-tree software driver as soon as we > can get it to enable development, testing, and deployment in environments > that have no physical RDMA device. siw is the fastest way to get that. > > I totally agree to your point. In case if you didn't notice, a loopback driver [1] that outperforms soft_roce by 25x in performance and it is more stable. :-) > >> As said, I'd still like to investigate that. > > That will be very helpful. > > > >> But let the sometimes intentionally different concepts of these two > >> drivers co-exist for some time out there, and let's draw conclusions > >> of what to make common later. > >> > > It won't happen later once it is in tree. :-) > > There is nothing stopping code de-duplication once siw is in the kernel. > > The most obvious counter-example of your claim is the in-kernel ULPs which > have adopted new core APIs such as ib_alloc_cq long after the ULPs were > accepted and merged. > It happened because of active maintainers like you, Sagi and many others. > > >> And let's be clear, lots of things between RoCE and iWarp are very > different. > > All rest of the core components such as ib_core, addr, rdmacm are able to > handle it two different transports. > > Cavium or qedr driver is dual protocol driver that support RoCE and iWarp > both in single driver. > > > >> Not only wire protocol and lower interfaces, but also semantically. > >> These things will stay different by specification, and an efficient > >> implementation of those things will likely still go it's own ways. > >> > > We would like to understand technically why rxe is mis fit for it, and why it > cannot be improved. > > Lack of resources is probably not good reason. > > Actually I find lack of resources to be a compelling argument. Before siw is > merged, Bernard and his team are the only ones who can do this work, and > they are not yet familiar with the rxe code base. > I want Leon to explicitly ACK this point and want to listen to him what he has to say. Leon? Is this compelling reason for siw? And therefore for loopback driver too? > After merging, anyone in the Linux RDMA community can help with it, > including you, the person who is asking for this extra effort. > I asked because Leon is asking that for another simplified driver proposal. Most ULPs maintainers and users would agree that rxe driver is not able to satisfy their needs (stability, performance). And your feedback is obviously important here. > It is reasonable to request that common code be refactored. It is neither fair > nor productive to gate merging siw on that work. I do not intent to gate merging siw work by any means. I want community to reach consensus on - why should we merge a new driver even though technically its feasible to share a code with existing in-tree driver. And therefore, that same exact reason won't be limited to just siw driver. > Why force Bernard to bear the burden of fixing rxe? As per Leon, rxe seems stable enough that doesn't need fixing. I infer that from his suggestion to enhance rxe instead of a new loopback driver for roce and IB. So Bernard won't need to fix it based on Leon's and Yuval's comments. And if rxe is not good enough, it speaks for itself. Leon? Yuval? If Leon and Yuval didn't ask to enhance soft_roce driver, my proposal is: 1. siw to provide a transport agnostic user verbs like [2] And improve performance incrementally with core extensions. This will be similar to Jen's io_uring work to provide vendor independent io cmd and cmpl queues for block ios. Currently rxe and siw creates their own rings etc. 2. loopback driver for RoCE and IB to reuse [2]. 3. soft_roce maintainer (?) to decide if they want to reuse code from siw or loopback [1] https://patchwork.kernel.org/patch/10831261/ [2] https://patchwork.kernel.org/patch/10831251/ > > Let's get siw merged, and then _everyone_ can have a say where it is > valuable to share code. > Let's apply that to other driver proposed too?
On Sun, Mar 10, 2019 at 11:44:52PM +0000, Parav Pandit wrote: > Hi Chuck, > > > -----Original Message----- > > From: Chuck Lever <chuck.lever@oracle.com> > > Sent: Sunday, March 10, 2019 5:30 PM > > To: Parav Pandit <parav@mellanox.com> > > Cc: Bernard Metzler <BMT@zurich.ibm.com>; Steve Wise > > <larrystevenwise@gmail.com>; linux-rdma@vger.kernel.org; Leon > > Romanovsky <leonro@mellanox.com>; Yuval Shaia > > <yuval.shaia@oracle.com> > > Subject: Re: [PATCH v5 00/13] SIW: Request for Comments > > > > Hey Parav- > > > > > On Mar 10, 2019, at 6:01 PM, Parav Pandit <parav@mellanox.com> wrote: > > > > > > Hi Bernard, > > > > > > > > >> -----Original Message----- > > >> From: Bernard Metzler <BMT@zurich.ibm.com> > > >> Sent: Sunday, March 10, 2019 4:41 PM > > >> To: Parav Pandit <parav@mellanox.com> > > >> Cc: Steve Wise <larrystevenwise@gmail.com>; > > >> linux-rdma@vger.kernel.org; Leon Romanovsky <leonro@mellanox.com>; > > >> Yuval Shaia <yuval.shaia@oracle.com> > > >> Subject: RE: [PATCH v5 00/13] SIW: Request for Comments > > >> > > >> Hi Parav, > > >> > > >> I understand your point of having some common driver infrastructure > > >> between rxe and siw. I remember I was even proposing rxe folks > > >> looking into that when rxe emerged after siw was already prototyped > > >> and open sourced (yes, siw was first out there). There was not much > > >> interest from rxe folks at that point in time, which is OK and > > >> understood (always cool to write something new and better from scratch, > > who wants iWarp, etc.). > > >> > > > I didn't follow that part of the history, but it clearly proves the point that it > > was started incorrectly. > > > With all the fancy code for doorbells, with mmap() call for QP, CQ, ground > > reality is, it doesn't even pass a developer's silly ib_write_bw today. > > > And at that test shows 2Gbps bw on 64Gb RAM with 48 core cpus! > > > > > >> At the current stage of the project, it is rather contra-productive > > >> to completely refactor siw to match with rxe, since I do not have the > > >> resources to rip apart the whole thing, and I am not yet convinced it > > makes sense. > > >> > > > But that seems to be right approach to have modular code and reuse > > existing pieces. > > > Leon, Yuval and others resonate that too. > > > > I agree that code sharing can be valuable. However... > > > > > > > If existing pieces are inferior or mis fit, their technical details has to be > > called out. > > > rdmavt improved and used library model to share code between two > > drivers. > > > > > >> If one asks for merging these days, one asks at first for a delay in > > >> getting siw accepted. > > >> > > > Yuval and Leon wants to improve rxe in general and have desire to reuse > > rxe. > > > They might be able to help you. Please check with them. > > > > > >> Over time, it might indeed be a good thing to merge parts. > > > It should happen from beginning. > > > > I disagree that merging siw should wait for code de-duplication with rxe. > > > > The current siw is coming close to being merge-ready. I would like to see it in > > the kernel as soon as possible. RoCE, hard or soft, requires DCE in order to > > be fully useful, and thus it will never be possible to have a software RoCE > > driver that can operate via any Ethernet driver on any network. That is not > > the case with iWARP, which relies on TCP's built-in congestion avoidance, > > which is truly routable and available everywhere. > > Soft roce driver faces soft cpu lockup in single system at 2Gbps bandwidth with just two QPs (send and receive). > :-) > Going across a system is brave world. > > But Leon and Yuval likes to see it enhanced/used in general instead of new driver. > And therefore, it extends to siw efforts too. > > > > > The ULPs need to have a robust generic in-tree software driver as soon as we > > can get it to enable development, testing, and deployment in environments > > that have no physical RDMA device. siw is the fastest way to get that. > > > > > I totally agree to your point. > In case if you didn't notice, a loopback driver [1] that outperforms soft_roce by 25x in performance and it is more stable. :-) > > > >> As said, I'd still like to investigate that. > > > That will be very helpful. > > > > > >> But let the sometimes intentionally different concepts of these two > > >> drivers co-exist for some time out there, and let's draw conclusions > > >> of what to make common later. > > >> > > > It won't happen later once it is in tree. :-) > > > > There is nothing stopping code de-duplication once siw is in the kernel. > > > > The most obvious counter-example of your claim is the in-kernel ULPs which > > have adopted new core APIs such as ib_alloc_cq long after the ULPs were > > accepted and merged. > > > It happened because of active maintainers like you, Sagi and many others. > > > > > >> And let's be clear, lots of things between RoCE and iWarp are very > > different. > > > All rest of the core components such as ib_core, addr, rdmacm are able to > > handle it two different transports. > > > Cavium or qedr driver is dual protocol driver that support RoCE and iWarp > > both in single driver. > > > > > >> Not only wire protocol and lower interfaces, but also semantically. > > >> These things will stay different by specification, and an efficient > > >> implementation of those things will likely still go it's own ways. > > >> > > > We would like to understand technically why rxe is mis fit for it, and why it > > cannot be improved. > > > Lack of resources is probably not good reason. > > > > Actually I find lack of resources to be a compelling argument. Before siw is > > merged, Bernard and his team are the only ones who can do this work, and > > they are not yet familiar with the rxe code base. > > > I want Leon to explicitly ACK this point and want to listen to him what he has to say. > Leon? > Is this compelling reason for siw? > And therefore for loopback driver too? > > > After merging, anyone in the Linux RDMA community can help with it, > > including you, the person who is asking for this extra effort. > > > I asked because Leon is asking that for another simplified driver proposal. > Most ULPs maintainers and users would agree that rxe driver is not able to satisfy their needs > (stability, performance). > And your feedback is obviously important here. > > > It is reasonable to request that common code be refactored. It is neither fair > > nor productive to gate merging siw on that work. > > I do not intent to gate merging siw work by any means. > I want community to reach consensus on - why should we merge a new driver even though technically its feasible to share a code with existing in-tree driver. > > And therefore, that same exact reason won't be limited to just siw driver. > > > Why force Bernard to bear the burden of fixing rxe? > > As per Leon, rxe seems stable enough that doesn't need fixing. > I infer that from his suggestion to enhance rxe instead of a new loopback driver for roce and IB. > So Bernard won't need to fix it based on Leon's and Yuval's comments. > > And if rxe is not good enough, it speaks for itself. > Leon? Yuval? > > If Leon and Yuval didn't ask to enhance soft_roce driver, my proposal is: > > 1. siw to provide a transport agnostic user verbs like [2] > And improve performance incrementally with core extensions. > This will be similar to Jen's io_uring work to provide vendor independent io cmd and cmpl queues for block ios. > Currently rxe and siw creates their own rings etc. > > 2. loopback driver for RoCE and IB to reuse [2]. > 3. soft_roce maintainer (?) to decide if they want to reuse code from siw or loopback > > [1] https://patchwork.kernel.org/patch/10831261/ > [2] https://patchwork.kernel.org/patch/10831251/ > > > > > Let's get siw merged, and then _everyone_ can have a say where it is > > valuable to share code. > > > Let's apply that to other driver proposed too? Parav, Main difference between loopback driver and SIW/RXE that the latest ones implement full transports and first is sub-feature of SIW/RXE. At some point of time, you will need to extend loopback to support both iWARP and RoCE. Thanks > >
On Tue, Feb 19, 2019 at 11:08:50AM +0100, Bernard Metzler wrote: > This patch set contributes version 5 of the SoftiWarp > driver, as originally introduced to the list Oct 6th, 2017. > SoftiWarp (siw) implements the iWarp RDMA protocol over > kernel TCP sockets. The driver integrates with the > linux-rdma framework. > > In response to the various helpful feedback, we fixed (besides > other small fixes) the following issues: What I'd like to hear is that the uapi is designed properly in this driver from a security perspective. 1) Kernel can only read-once any memory under control of user space to avoid execution integrity security problems 2) Userspace never provides data that is unsafe, ie MAC addresses, IP addresses, VLAN #s, etc. Anything that goes in a L2/L3 header of a packet is a security problem 3) We don't have bugs like rxe has where the netdev side is assuming lifetimes of IB objects that are not guaranteed - ie qps, ib_devs, etc can be destroyed async to netdev stuff if userspace tries hard enough. Finally, I want to hear from other people that this driver actually works: - Chuck, does it run NFS work loads without crashing? - Sagi, does it run NVMe workloads. - Does it pass the various user space rdma-core ping test commands? - Can it pass verbs fabtests from libfabric? etc Jason
> On Mar 11, 2019, at 10:13 AM, Jason Gunthorpe <jgg@ziepe.ca> wrote: > > On Tue, Feb 19, 2019 at 11:08:50AM +0100, Bernard Metzler wrote: >> This patch set contributes version 5 of the SoftiWarp >> driver, as originally introduced to the list Oct 6th, 2017. >> SoftiWarp (siw) implements the iWarp RDMA protocol over >> kernel TCP sockets. The driver integrates with the >> linux-rdma framework. >> >> In response to the various helpful feedback, we fixed (besides >> other small fixes) the following issues: > > What I'd like to hear is that the uapi is designed properly in this > driver from a security perspective. > > 1) Kernel can only read-once any memory under control of user space to > avoid execution integrity security problems > > 2) Userspace never provides data that is unsafe, ie MAC addresses, IP > addresses, VLAN #s, etc. Anything that goes in a L2/L3 header of a > packet is a security problem > > 3) We don't have bugs like rxe has where the netdev side is assuming > lifetimes of IB objects that are not guaranteed - ie qps, ib_devs, > etc can be destroyed async to netdev stuff if userspace tries > hard enough. > > Finally, I want to hear from other people that this driver actually > works: > > - Chuck, does it run NFS work loads without crashing? > - Sagi, does it run NVMe workloads. > - Does it pass the various user space rdma-core ping test commands? > - Can it pass verbs fabtests from libfabric? Is someone planning to perform basic interop testing with h/w iWARP implementations? Not a high bar, just due diligence. -- Chuck Lever
-----"Jason Gunthorpe" <jgg@ziepe.ca> wrote: ----- >To: "Bernard Metzler" <bmt@zurich.ibm.com> >From: "Jason Gunthorpe" <jgg@ziepe.ca> >Date: 03/11/2019 03:14PM >Cc: linux-rdma@vger.kernel.org >Subject: Re: [PATCH v5 00/13] SIW: Request for Comments > >On Tue, Feb 19, 2019 at 11:08:50AM +0100, Bernard Metzler wrote: >> This patch set contributes version 5 of the SoftiWarp >> driver, as originally introduced to the list Oct 6th, 2017. >> SoftiWarp (siw) implements the iWarp RDMA protocol over >> kernel TCP sockets. The driver integrates with the >> linux-rdma framework. >> >> In response to the various helpful feedback, we fixed (besides >> other small fixes) the following issues: > >What I'd like to hear is that the uapi is designed properly in this >driver from a security perspective. > >1) Kernel can only read-once any memory under control of user space >to > avoid execution integrity security problems Right, I reworked relevant pieces accordingly. RFC v6 will have it. > >2) Userspace never provides data that is unsafe, ie MAC addresses, IP > addresses, VLAN #s, etc. Anything that goes in a L2/L3 header of a > packet is a security problem siw adheres to that. we randomize values for user visible information during creation: memory reservation keys, QP and CQ ID's. > >3) We don't have bugs like rxe has where the netdev side is assuming > lifetimes of IB objects that are not guaranteed - ie qps, ib_devs, > etc can be destroyed async to netdev stuff if userspace tries > hard enough. > I hope I tried hard enough. >Finally, I want to hear from other people that this driver actually >works: > >- Chuck, does it run NFS work loads without crashing? >- Sagi, does it run NVMe workloads. >- Does it pass the various user space rdma-core ping test commands? >- Can it pass verbs fabtests from libfabric? > >etc > >Jason > scaling might be another topic. We tried with up to 8k nodes running Spectrum Scale (aka GPFS) and IOR on top of it, which helped to solve related issues. We did not test with larger setups though. Thanks Bernard. >
-----"Chuck Lever" <chuck.lever@oracle.com> wrote: ----- >To: "Jason Gunthorpe" <jgg@ziepe.ca> >From: "Chuck Lever" <chuck.lever@oracle.com> >Date: 03/11/2019 03:24PM >Cc: "Bernard Metzler" <bmt@zurich.ibm.com>, >linux-rdma@vger.kernel.org >Subject: Re: [PATCH v5 00/13] SIW: Request for Comments > > >> On Mar 11, 2019, at 10:13 AM, Jason Gunthorpe <jgg@ziepe.ca> wrote: >> >> On Tue, Feb 19, 2019 at 11:08:50AM +0100, Bernard Metzler wrote: >>> This patch set contributes version 5 of the SoftiWarp >>> driver, as originally introduced to the list Oct 6th, 2017. >>> SoftiWarp (siw) implements the iWarp RDMA protocol over >>> kernel TCP sockets. The driver integrates with the >>> linux-rdma framework. >>> >>> In response to the various helpful feedback, we fixed (besides >>> other small fixes) the following issues: >> >> What I'd like to hear is that the uapi is designed properly in this >> driver from a security perspective. >> >> 1) Kernel can only read-once any memory under control of user space >to >> avoid execution integrity security problems >> >> 2) Userspace never provides data that is unsafe, ie MAC addresses, >IP >> addresses, VLAN #s, etc. Anything that goes in a L2/L3 header of >a >> packet is a security problem >> >> 3) We don't have bugs like rxe has where the netdev side is >assuming >> lifetimes of IB objects that are not guaranteed - ie qps, >ib_devs, >> etc can be destroyed async to netdev stuff if userspace tries >> hard enough. >> >> Finally, I want to hear from other people that this driver actually >> works: >> >> - Chuck, does it run NFS work loads without crashing? >> - Sagi, does it run NVMe workloads. >> - Does it pass the various user space rdma-core ping test commands? >> - Can it pass verbs fabtests from libfabric? > >Is someone planning to perform basic interop testing with h/w iWARP >implementations? Not a high bar, just due diligence. > > Well I did that myself, but third party results are obviously more relevant here ;) I tested with some generations of Chelsio adapters (T3 - T6). Since some hardware does not support larger than MTU frames, we disabled using GSO for now.
On 3/10/2019 7:44 PM, Parav Pandit wrote: > Hi Chuck, > >> -----Original Message----- >> From: Chuck Lever <chuck.lever@oracle.com> >> Sent: Sunday, March 10, 2019 5:30 PM >> To: Parav Pandit <parav@mellanox.com> >> Cc: Bernard Metzler <BMT@zurich.ibm.com>; Steve Wise >> <larrystevenwise@gmail.com>; linux-rdma@vger.kernel.org; Leon >> Romanovsky <leonro@mellanox.com>; Yuval Shaia >> <yuval.shaia@oracle.com> >> Subject: Re: [PATCH v5 00/13] SIW: Request for Comments >> >> Hey Parav- >> >>> On Mar 10, 2019, at 6:01 PM, Parav Pandit <parav@mellanox.com> wrote: >>> >>> Hi Bernard, >>> >>> >>>> -----Original Message----- >>>> From: Bernard Metzler <BMT@zurich.ibm.com> >>>> Sent: Sunday, March 10, 2019 4:41 PM >>>> To: Parav Pandit <parav@mellanox.com> >>>> Cc: Steve Wise <larrystevenwise@gmail.com>; >>>> linux-rdma@vger.kernel.org; Leon Romanovsky <leonro@mellanox.com>; >>>> Yuval Shaia <yuval.shaia@oracle.com> >>>> Subject: RE: [PATCH v5 00/13] SIW: Request for Comments >>>> >>>> Hi Parav, >>>> >>>> I understand your point of having some common driver infrastructure >>>> between rxe and siw. I remember I was even proposing rxe folks >>>> looking into that when rxe emerged after siw was already prototyped >>>> and open sourced (yes, siw was first out there). There was not much >>>> interest from rxe folks at that point in time, which is OK and >>>> understood (always cool to write something new and better from scratch, >> who wants iWarp, etc.). >>>> >>> I didn't follow that part of the history, but it clearly proves the point that it >> was started incorrectly. >>> With all the fancy code for doorbells, with mmap() call for QP, CQ, ground >> reality is, it doesn't even pass a developer's silly ib_write_bw today. >>> And at that test shows 2Gbps bw on 64Gb RAM with 48 core cpus! >>> >>>> At the current stage of the project, it is rather contra-productive >>>> to completely refactor siw to match with rxe, since I do not have the >>>> resources to rip apart the whole thing, and I am not yet convinced it >> makes sense. >>>> >>> But that seems to be right approach to have modular code and reuse >> existing pieces. >>> Leon, Yuval and others resonate that too. >> >> I agree that code sharing can be valuable. However... >> >> >>> If existing pieces are inferior or mis fit, their technical details has to be >> called out. >>> rdmavt improved and used library model to share code between two >> drivers. >>> >>>> If one asks for merging these days, one asks at first for a delay in >>>> getting siw accepted. >>>> >>> Yuval and Leon wants to improve rxe in general and have desire to reuse >> rxe. >>> They might be able to help you. Please check with them. >>> >>>> Over time, it might indeed be a good thing to merge parts. >>> It should happen from beginning. >> >> I disagree that merging siw should wait for code de-duplication with rxe. >> >> The current siw is coming close to being merge-ready. I would like to see it in >> the kernel as soon as possible. RoCE, hard or soft, requires DCE in order to >> be fully useful, and thus it will never be possible to have a software RoCE >> driver that can operate via any Ethernet driver on any network. That is not >> the case with iWARP, which relies on TCP's built-in congestion avoidance, >> which is truly routable and available everywhere. > > Soft roce driver faces soft cpu lockup in single system at 2Gbps bandwidth with just two QPs (send and receive). > :-) > Going across a system is brave world. > > But Leon and Yuval likes to see it enhanced/used in general instead of new driver. > And therefore, it extends to siw efforts too. > >> >> The ULPs need to have a robust generic in-tree software driver as soon as we >> can get it to enable development, testing, and deployment in environments >> that have no physical RDMA device. siw is the fastest way to get that. >> >> > I totally agree to your point. > In case if you didn't notice, a loopback driver [1] that outperforms soft_roce by 25x in performance and it is more stable. :-) > >>>> As said, I'd still like to investigate that. >>> That will be very helpful. >>> >>>> But let the sometimes intentionally different concepts of these two >>>> drivers co-exist for some time out there, and let's draw conclusions >>>> of what to make common later. >>>> >>> It won't happen later once it is in tree. :-) >> >> There is nothing stopping code de-duplication once siw is in the kernel. >> >> The most obvious counter-example of your claim is the in-kernel ULPs which >> have adopted new core APIs such as ib_alloc_cq long after the ULPs were >> accepted and merged. >> > It happened because of active maintainers like you, Sagi and many others. > >> >>>> And let's be clear, lots of things between RoCE and iWarp are very >> different. >>> All rest of the core components such as ib_core, addr, rdmacm are able to >> handle it two different transports. >>> Cavium or qedr driver is dual protocol driver that support RoCE and iWarp >> both in single driver. >>> >>>> Not only wire protocol and lower interfaces, but also semantically. >>>> These things will stay different by specification, and an efficient >>>> implementation of those things will likely still go it's own ways. >>>> >>> We would like to understand technically why rxe is mis fit for it, and why it >> cannot be improved. >>> Lack of resources is probably not good reason. >> >> Actually I find lack of resources to be a compelling argument. Before siw is >> merged, Bernard and his team are the only ones who can do this work, and >> they are not yet familiar with the rxe code base. >> > I want Leon to explicitly ACK this point and want to listen to him what he has to say. > Leon? > Is this compelling reason for siw? > And therefore for loopback driver too? > >> After merging, anyone in the Linux RDMA community can help with it, >> including you, the person who is asking for this extra effort. >> > I asked because Leon is asking that for another simplified driver proposal. > Most ULPs maintainers and users would agree that rxe driver is not able to satisfy their needs > (stability, performance). > And your feedback is obviously important here. > >> It is reasonable to request that common code be refactored. It is neither fair >> nor productive to gate merging siw on that work. > > I do not intent to gate merging siw work by any means. > I want community to reach consensus on - why should we merge a new driver even though technically its feasible to share a code with existing in-tree driver. > > And therefore, that same exact reason won't be limited to just siw driver. > >> Why force Bernard to bear the burden of fixing rxe? > > As per Leon, rxe seems stable enough that doesn't need fixing. > I infer that from his suggestion to enhance rxe instead of a new loopback driver for roce and IB. > So Bernard won't need to fix it based on Leon's and Yuval's comments. > > And if rxe is not good enough, it speaks for itself. > Leon? Yuval? > > If Leon and Yuval didn't ask to enhance soft_roce driver, my proposal is: > > 1. siw to provide a transport agnostic user verbs like [2] > And improve performance incrementally with core extensions. > This will be similar to Jen's io_uring work to provide vendor independent io cmd and cmpl queues for block ios. > Currently rxe and siw creates their own rings etc. > > 2. loopback driver for RoCE and IB to reuse [2]. > 3. soft_roce maintainer (?) to decide if they want to reuse code from siw or loopback > > [1] https://patchwork.kernel.org/patch/10831261/ > [2] https://patchwork.kernel.org/patch/10831251/ > >> >> Let's get siw merged, and then _everyone_ can have a say where it is >> valuable to share code. >> > Let's apply that to other driver proposed too? My 2 cents... If things compile and move, why not allow it into the tree, maybe on a provisional basis. If it stagnates we kick it out. In the past we used the staging tree but I don't think anyone wants to go down that road again. We don't need a big formal process. The bar is still higher than staging, in that code needs to be checkpatch, sparse, etc clean. While stuff sits in developers branches it just means its harder (not impossible though) for everyone to be involved. -Denny
On Tue, Mar 12, 2019 at 10:39:04AM -0400, Dennis Dalessandro wrote: > On 3/10/2019 7:44 PM, Parav Pandit wrote: > > Hi Chuck, > > > > > -----Original Message----- > > > From: Chuck Lever <chuck.lever@oracle.com> > > > Sent: Sunday, March 10, 2019 5:30 PM > > > To: Parav Pandit <parav@mellanox.com> > > > Cc: Bernard Metzler <BMT@zurich.ibm.com>; Steve Wise > > > <larrystevenwise@gmail.com>; linux-rdma@vger.kernel.org; Leon > > > Romanovsky <leonro@mellanox.com>; Yuval Shaia > > > <yuval.shaia@oracle.com> > > > Subject: Re: [PATCH v5 00/13] SIW: Request for Comments > > > > > > Hey Parav- > > > > > > > On Mar 10, 2019, at 6:01 PM, Parav Pandit <parav@mellanox.com> wrote: > > > > > > > > Hi Bernard, > > > > > > > > > > > > > -----Original Message----- > > > > > From: Bernard Metzler <BMT@zurich.ibm.com> > > > > > Sent: Sunday, March 10, 2019 4:41 PM > > > > > To: Parav Pandit <parav@mellanox.com> > > > > > Cc: Steve Wise <larrystevenwise@gmail.com>; > > > > > linux-rdma@vger.kernel.org; Leon Romanovsky <leonro@mellanox.com>; > > > > > Yuval Shaia <yuval.shaia@oracle.com> > > > > > Subject: RE: [PATCH v5 00/13] SIW: Request for Comments > > > > > > > > > > Hi Parav, > > > > > > > > > > I understand your point of having some common driver infrastructure > > > > > between rxe and siw. I remember I was even proposing rxe folks > > > > > looking into that when rxe emerged after siw was already prototyped > > > > > and open sourced (yes, siw was first out there). There was not much > > > > > interest from rxe folks at that point in time, which is OK and > > > > > understood (always cool to write something new and better from scratch, > > > who wants iWarp, etc.). > > > > > > > > > I didn't follow that part of the history, but it clearly proves the point that it > > > was started incorrectly. > > > > With all the fancy code for doorbells, with mmap() call for QP, CQ, ground > > > reality is, it doesn't even pass a developer's silly ib_write_bw today. > > > > And at that test shows 2Gbps bw on 64Gb RAM with 48 core cpus! > > > > > > > > > At the current stage of the project, it is rather contra-productive > > > > > to completely refactor siw to match with rxe, since I do not have the > > > > > resources to rip apart the whole thing, and I am not yet convinced it > > > makes sense. > > > > > > > > > But that seems to be right approach to have modular code and reuse > > > existing pieces. > > > > Leon, Yuval and others resonate that too. > > > > > > I agree that code sharing can be valuable. However... > > > > > > > > > > If existing pieces are inferior or mis fit, their technical details has to be > > > called out. > > > > rdmavt improved and used library model to share code between two > > > drivers. > > > > > > > > > If one asks for merging these days, one asks at first for a delay in > > > > > getting siw accepted. > > > > > > > > > Yuval and Leon wants to improve rxe in general and have desire to reuse > > > rxe. > > > > They might be able to help you. Please check with them. > > > > > > > > > Over time, it might indeed be a good thing to merge parts. > > > > It should happen from beginning. > > > > > > I disagree that merging siw should wait for code de-duplication with rxe. > > > > > > The current siw is coming close to being merge-ready. I would like to see it in > > > the kernel as soon as possible. RoCE, hard or soft, requires DCE in order to > > > be fully useful, and thus it will never be possible to have a software RoCE > > > driver that can operate via any Ethernet driver on any network. That is not > > > the case with iWARP, which relies on TCP's built-in congestion avoidance, > > > which is truly routable and available everywhere. > > > > Soft roce driver faces soft cpu lockup in single system at 2Gbps bandwidth with just two QPs (send and receive). > > :-) > > Going across a system is brave world. > > > > But Leon and Yuval likes to see it enhanced/used in general instead of new driver. > > And therefore, it extends to siw efforts too. > > > > > > > > The ULPs need to have a robust generic in-tree software driver as soon as we > > > can get it to enable development, testing, and deployment in environments > > > that have no physical RDMA device. siw is the fastest way to get that. > > > > > > > > I totally agree to your point. > > In case if you didn't notice, a loopback driver [1] that outperforms soft_roce by 25x in performance and it is more stable. :-) > > > > > > > As said, I'd still like to investigate that. > > > > That will be very helpful. > > > > > > > > > But let the sometimes intentionally different concepts of these two > > > > > drivers co-exist for some time out there, and let's draw conclusions > > > > > of what to make common later. > > > > > > > > > It won't happen later once it is in tree. :-) > > > > > > There is nothing stopping code de-duplication once siw is in the kernel. > > > > > > The most obvious counter-example of your claim is the in-kernel ULPs which > > > have adopted new core APIs such as ib_alloc_cq long after the ULPs were > > > accepted and merged. > > > > > It happened because of active maintainers like you, Sagi and many others. > > > > > > > > > > And let's be clear, lots of things between RoCE and iWarp are very > > > different. > > > > All rest of the core components such as ib_core, addr, rdmacm are able to > > > handle it two different transports. > > > > Cavium or qedr driver is dual protocol driver that support RoCE and iWarp > > > both in single driver. > > > > > > > > > Not only wire protocol and lower interfaces, but also semantically. > > > > > These things will stay different by specification, and an efficient > > > > > implementation of those things will likely still go it's own ways. > > > > > > > > > We would like to understand technically why rxe is mis fit for it, and why it > > > cannot be improved. > > > > Lack of resources is probably not good reason. > > > > > > Actually I find lack of resources to be a compelling argument. Before siw is > > > merged, Bernard and his team are the only ones who can do this work, and > > > they are not yet familiar with the rxe code base. > > > > > I want Leon to explicitly ACK this point and want to listen to him what he has to say. > > Leon? > > Is this compelling reason for siw? > > And therefore for loopback driver too? > > > > > After merging, anyone in the Linux RDMA community can help with it, > > > including you, the person who is asking for this extra effort. > > > > > I asked because Leon is asking that for another simplified driver proposal. > > Most ULPs maintainers and users would agree that rxe driver is not able to satisfy their needs > > (stability, performance). > > And your feedback is obviously important here. > > > > > It is reasonable to request that common code be refactored. It is neither fair > > > nor productive to gate merging siw on that work. > > > > I do not intent to gate merging siw work by any means. > > I want community to reach consensus on - why should we merge a new driver even though technically its feasible to share a code with existing in-tree driver. > > > > And therefore, that same exact reason won't be limited to just siw driver. > > > > > Why force Bernard to bear the burden of fixing rxe? > > > > As per Leon, rxe seems stable enough that doesn't need fixing. > > I infer that from his suggestion to enhance rxe instead of a new loopback driver for roce and IB. > > So Bernard won't need to fix it based on Leon's and Yuval's comments. > > > > And if rxe is not good enough, it speaks for itself. > > Leon? Yuval? > > > > If Leon and Yuval didn't ask to enhance soft_roce driver, my proposal is: > > > > 1. siw to provide a transport agnostic user verbs like [2] > > And improve performance incrementally with core extensions. > > This will be similar to Jen's io_uring work to provide vendor independent io cmd and cmpl queues for block ios. > > Currently rxe and siw creates their own rings etc. > > > > 2. loopback driver for RoCE and IB to reuse [2]. > > 3. soft_roce maintainer (?) to decide if they want to reuse code from siw or loopback > > > > [1] https://patchwork.kernel.org/patch/10831261/ > > [2] https://patchwork.kernel.org/patch/10831251/ > > > > > > > > Let's get siw merged, and then _everyone_ can have a say where it is > > > valuable to share code. > > > > > Let's apply that to other driver proposed too? > > My 2 cents... > > If things compile and move, why not allow it into the tree. Because upstream != trash can. We want to see involvement, commitment and in some extent development roadmap. Thanks > > >
Hi Leon, > -----Original Message----- > From: Leon Romanovsky > Sent: Tuesday, March 12, 2019 9:42 AM > To: Dennis Dalessandro <dennis.dalessandro@intel.com> > Cc: Parav Pandit <parav@mellanox.com>; Chuck Lever > <chuck.lever@oracle.com>; Bernard Metzler <BMT@zurich.ibm.com>; Steve > Wise <larrystevenwise@gmail.com>; linux-rdma@vger.kernel.org; Yuval > Shaia <yuval.shaia@oracle.com> > Subject: Re: [PATCH v5 00/13] SIW: Request for Comments > > On Tue, Mar 12, 2019 at 10:39:04AM -0400, Dennis Dalessandro wrote: > > On 3/10/2019 7:44 PM, Parav Pandit wrote: > > > Hi Chuck, > > > > > > > -----Original Message----- > > > > From: Chuck Lever <chuck.lever@oracle.com> > > > > Sent: Sunday, March 10, 2019 5:30 PM > > > > To: Parav Pandit <parav@mellanox.com> > > > > Cc: Bernard Metzler <BMT@zurich.ibm.com>; Steve Wise > > > > <larrystevenwise@gmail.com>; linux-rdma@vger.kernel.org; Leon > > > > Romanovsky <leonro@mellanox.com>; Yuval Shaia > > > > <yuval.shaia@oracle.com> > > > > Subject: Re: [PATCH v5 00/13] SIW: Request for Comments > > > > > > > > Hey Parav- > > > > > > > > > On Mar 10, 2019, at 6:01 PM, Parav Pandit <parav@mellanox.com> > wrote: > > > > > > > > > > Hi Bernard, > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > From: Bernard Metzler <BMT@zurich.ibm.com> > > > > > > Sent: Sunday, March 10, 2019 4:41 PM > > > > > > To: Parav Pandit <parav@mellanox.com> > > > > > > Cc: Steve Wise <larrystevenwise@gmail.com>; > > > > > > linux-rdma@vger.kernel.org; Leon Romanovsky > > > > > > <leonro@mellanox.com>; Yuval Shaia <yuval.shaia@oracle.com> > > > > > > Subject: RE: [PATCH v5 00/13] SIW: Request for Comments > > > > > > > > > > > > Hi Parav, > > > > > > > > > > > > I understand your point of having some common driver > > > > > > infrastructure between rxe and siw. I remember I was even > > > > > > proposing rxe folks looking into that when rxe emerged after > > > > > > siw was already prototyped and open sourced (yes, siw was > > > > > > first out there). There was not much interest from rxe folks > > > > > > at that point in time, which is OK and understood (always cool > > > > > > to write something new and better from scratch, > > > > who wants iWarp, etc.). > > > > > > > > > > > I didn't follow that part of the history, but it clearly proves > > > > > the point that it > > > > was started incorrectly. > > > > > With all the fancy code for doorbells, with mmap() call for QP, > > > > > CQ, ground > > > > reality is, it doesn't even pass a developer's silly ib_write_bw today. > > > > > And at that test shows 2Gbps bw on 64Gb RAM with 48 core cpus! > > > > > > > > > > > At the current stage of the project, it is rather > > > > > > contra-productive to completely refactor siw to match with > > > > > > rxe, since I do not have the resources to rip apart the whole > > > > > > thing, and I am not yet convinced it > > > > makes sense. > > > > > > > > > > > But that seems to be right approach to have modular code and > > > > > reuse > > > > existing pieces. > > > > > Leon, Yuval and others resonate that too. > > > > > > > > I agree that code sharing can be valuable. However... > > > > > > > > > > > > > If existing pieces are inferior or mis fit, their technical > > > > > details has to be > > > > called out. > > > > > rdmavt improved and used library model to share code between two > > > > drivers. > > > > > > > > > > > If one asks for merging these days, one asks at first for a > > > > > > delay in getting siw accepted. > > > > > > > > > > > Yuval and Leon wants to improve rxe in general and have desire > > > > > to reuse > > > > rxe. > > > > > They might be able to help you. Please check with them. > > > > > > > > > > > Over time, it might indeed be a good thing to merge parts. > > > > > It should happen from beginning. > > > > > > > > I disagree that merging siw should wait for code de-duplication with > rxe. > > > > > > > > The current siw is coming close to being merge-ready. I would like > > > > to see it in the kernel as soon as possible. RoCE, hard or soft, > > > > requires DCE in order to be fully useful, and thus it will never > > > > be possible to have a software RoCE driver that can operate via > > > > any Ethernet driver on any network. That is not the case with > > > > iWARP, which relies on TCP's built-in congestion avoidance, which is > truly routable and available everywhere. > > > > > > Soft roce driver faces soft cpu lockup in single system at 2Gbps > bandwidth with just two QPs (send and receive). > > > :-) > > > Going across a system is brave world. > > > > > > But Leon and Yuval likes to see it enhanced/used in general instead of > new driver. > > > And therefore, it extends to siw efforts too. > > > > > > > > > > > The ULPs need to have a robust generic in-tree software driver as > > > > soon as we can get it to enable development, testing, and > > > > deployment in environments that have no physical RDMA device. siw is > the fastest way to get that. > > > > > > > > > > > I totally agree to your point. > > > In case if you didn't notice, a loopback driver [1] that outperforms > > > soft_roce by 25x in performance and it is more stable. :-) > > > > > > > > > As said, I'd still like to investigate that. > > > > > That will be very helpful. > > > > > > > > > > > But let the sometimes intentionally different concepts of > > > > > > these two drivers co-exist for some time out there, and let's > > > > > > draw conclusions of what to make common later. > > > > > > > > > > > It won't happen later once it is in tree. :-) > > > > > > > > There is nothing stopping code de-duplication once siw is in the kernel. > > > > > > > > The most obvious counter-example of your claim is the in-kernel > > > > ULPs which have adopted new core APIs such as ib_alloc_cq long > > > > after the ULPs were accepted and merged. > > > > > > > It happened because of active maintainers like you, Sagi and many > others. > > > > > > > > > > > > > And let's be clear, lots of things between RoCE and iWarp are > > > > > > very > > > > different. > > > > > All rest of the core components such as ib_core, addr, rdmacm > > > > > are able to > > > > handle it two different transports. > > > > > Cavium or qedr driver is dual protocol driver that support RoCE > > > > > and iWarp > > > > both in single driver. > > > > > > > > > > > Not only wire protocol and lower interfaces, but also semantically. > > > > > > These things will stay different by specification, and an > > > > > > efficient implementation of those things will likely still go it's own > ways. > > > > > > > > > > > We would like to understand technically why rxe is mis fit for > > > > > it, and why it > > > > cannot be improved. > > > > > Lack of resources is probably not good reason. > > > > > > > > Actually I find lack of resources to be a compelling argument. > > > > Before siw is merged, Bernard and his team are the only ones who > > > > can do this work, and they are not yet familiar with the rxe code base. > > > > > > > I want Leon to explicitly ACK this point and want to listen to him what he > has to say. > > > Leon? > > > Is this compelling reason for siw? > > > And therefore for loopback driver too? > > > > > > > After merging, anyone in the Linux RDMA community can help with > > > > it, including you, the person who is asking for this extra effort. > > > > > > > I asked because Leon is asking that for another simplified driver > proposal. > > > Most ULPs maintainers and users would agree that rxe driver is not > > > able to satisfy their needs (stability, performance). > > > And your feedback is obviously important here. > > > > > > > It is reasonable to request that common code be refactored. It is > > > > neither fair nor productive to gate merging siw on that work. > > > > > > I do not intent to gate merging siw work by any means. > > > I want community to reach consensus on - why should we merge a new > driver even though technically its feasible to share a code with existing in- > tree driver. > > > > > > And therefore, that same exact reason won't be limited to just siw driver. > > > > > > > Why force Bernard to bear the burden of fixing rxe? > > > > > > As per Leon, rxe seems stable enough that doesn't need fixing. > > > I infer that from his suggestion to enhance rxe instead of a new loopback > driver for roce and IB. > > > So Bernard won't need to fix it based on Leon's and Yuval's comments. > > > > > > And if rxe is not good enough, it speaks for itself. > > > Leon? Yuval? > > > > > > If Leon and Yuval didn't ask to enhance soft_roce driver, my proposal is: > > > > > > 1. siw to provide a transport agnostic user verbs like [2] And > > > improve performance incrementally with core extensions. > > > This will be similar to Jen's io_uring work to provide vendor independent > io cmd and cmpl queues for block ios. > > > Currently rxe and siw creates their own rings etc. > > > > > > 2. loopback driver for RoCE and IB to reuse [2]. > > > 3. soft_roce maintainer (?) to decide if they want to reuse code > > > from siw or loopback > > > > > > [1] https://patchwork.kernel.org/patch/10831261/ > > > [2] https://patchwork.kernel.org/patch/10831251/ > > > > > > > > > > > Let's get siw merged, and then _everyone_ can have a say where it > > > > is valuable to share code. > > > > > > > Let's apply that to other driver proposed too? > > > > My 2 cents... > > > > If things compile and move, why not allow it into the tree. > > Because upstream != trash can. We want to see involvement, commitment > and in some extent development roadmap. > I assume this and Dennis's comment is for loopback driver. Commitment: Since I had to travel, let's have the internal meeting for commitment on 3/18. We must have stable email in MAINTAINER's file who will actually provide timely fixes and review incoming enhancements. Roadmap: I see following minimum features in initial version. 1. Instead of experimental patches, stable version of patches RC QP, GSI QP, reg_mr, FR_MR, send_with_invalidate, invalidate support. After first merge, since user explicitly requested below two items, 2. bind mw, xrc qp. above with initially RoCEv2 transport with two GID entries, one for IPv4, one for IPv6. 3. Expose loopback device immutable property so that core doesn't create compat devices. 4. IB support I haven't figured out yet on how to support QP-0. Configuration of link layer or other params via configfs similar to evolving drivers/block/null_blk_main.c. I wanted to change link type via devlink which support this, but sadly devlink doesn't allow device registration which doesn't have bus. Netlink command to this driver for driver parameter, but don't know value of it, and your inputs would be nice. 5. If there is actual user request to support iWarp loopback by extending ibdev->ibcm callbacks or other things. Haven't done the design of this yet. Minimal tests: 1. perftest (bw, lat), rping with -a and infinite commands 2. nvme fabrics with ext4 mount with 64 qps using fio 3. rdma_resource_lat tests time based tests to flush out memory leaks etc 4. mpirun basic benchmarks All on more than single cpu system. Other tests that you or others think are good addition. If ULP maintainers provide some scripts.. Its long time I did srp or nfs-rdma tinkering.
On Wed, Mar 13, 2019 at 06:04:40AM +0000, Parav Pandit wrote: > Hi Leon, > > > -----Original Message----- > > From: Leon Romanovsky > > Sent: Tuesday, March 12, 2019 9:42 AM > > To: Dennis Dalessandro <dennis.dalessandro@intel.com> > > Cc: Parav Pandit <parav@mellanox.com>; Chuck Lever > > <chuck.lever@oracle.com>; Bernard Metzler <BMT@zurich.ibm.com>; Steve > > Wise <larrystevenwise@gmail.com>; linux-rdma@vger.kernel.org; Yuval > > Shaia <yuval.shaia@oracle.com> > > Subject: Re: [PATCH v5 00/13] SIW: Request for Comments > > > > On Tue, Mar 12, 2019 at 10:39:04AM -0400, Dennis Dalessandro wrote: > > > On 3/10/2019 7:44 PM, Parav Pandit wrote: > > > > Hi Chuck, > > > > > > > > > -----Original Message----- > > > > > From: Chuck Lever <chuck.lever@oracle.com> > > > > > Sent: Sunday, March 10, 2019 5:30 PM > > > > > To: Parav Pandit <parav@mellanox.com> > > > > > Cc: Bernard Metzler <BMT@zurich.ibm.com>; Steve Wise > > > > > <larrystevenwise@gmail.com>; linux-rdma@vger.kernel.org; Leon > > > > > Romanovsky <leonro@mellanox.com>; Yuval Shaia > > > > > <yuval.shaia@oracle.com> > > > > > Subject: Re: [PATCH v5 00/13] SIW: Request for Comments > > > > > > > > > > Hey Parav- > > > > > > > > > > > On Mar 10, 2019, at 6:01 PM, Parav Pandit <parav@mellanox.com> > > wrote: > > > > > > > > > > > > Hi Bernard, > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Bernard Metzler <BMT@zurich.ibm.com> > > > > > > > Sent: Sunday, March 10, 2019 4:41 PM > > > > > > > To: Parav Pandit <parav@mellanox.com> > > > > > > > Cc: Steve Wise <larrystevenwise@gmail.com>; > > > > > > > linux-rdma@vger.kernel.org; Leon Romanovsky > > > > > > > <leonro@mellanox.com>; Yuval Shaia <yuval.shaia@oracle.com> > > > > > > > Subject: RE: [PATCH v5 00/13] SIW: Request for Comments > > > > > > > > > > > > > > Hi Parav, > > > > > > > > > > > > > > I understand your point of having some common driver > > > > > > > infrastructure between rxe and siw. I remember I was even > > > > > > > proposing rxe folks looking into that when rxe emerged after > > > > > > > siw was already prototyped and open sourced (yes, siw was > > > > > > > first out there). There was not much interest from rxe folks > > > > > > > at that point in time, which is OK and understood (always cool > > > > > > > to write something new and better from scratch, > > > > > who wants iWarp, etc.). > > > > > > > > > > > > > I didn't follow that part of the history, but it clearly proves > > > > > > the point that it > > > > > was started incorrectly. > > > > > > With all the fancy code for doorbells, with mmap() call for QP, > > > > > > CQ, ground > > > > > reality is, it doesn't even pass a developer's silly ib_write_bw today. > > > > > > And at that test shows 2Gbps bw on 64Gb RAM with 48 core cpus! > > > > > > > > > > > > > At the current stage of the project, it is rather > > > > > > > contra-productive to completely refactor siw to match with > > > > > > > rxe, since I do not have the resources to rip apart the whole > > > > > > > thing, and I am not yet convinced it > > > > > makes sense. > > > > > > > > > > > > > But that seems to be right approach to have modular code and > > > > > > reuse > > > > > existing pieces. > > > > > > Leon, Yuval and others resonate that too. > > > > > > > > > > I agree that code sharing can be valuable. However... > > > > > > > > > > > > > > > > If existing pieces are inferior or mis fit, their technical > > > > > > details has to be > > > > > called out. > > > > > > rdmavt improved and used library model to share code between two > > > > > drivers. > > > > > > > > > > > > > If one asks for merging these days, one asks at first for a > > > > > > > delay in getting siw accepted. > > > > > > > > > > > > > Yuval and Leon wants to improve rxe in general and have desire > > > > > > to reuse > > > > > rxe. > > > > > > They might be able to help you. Please check with them. > > > > > > > > > > > > > Over time, it might indeed be a good thing to merge parts. > > > > > > It should happen from beginning. > > > > > > > > > > I disagree that merging siw should wait for code de-duplication with > > rxe. > > > > > > > > > > The current siw is coming close to being merge-ready. I would like > > > > > to see it in the kernel as soon as possible. RoCE, hard or soft, > > > > > requires DCE in order to be fully useful, and thus it will never > > > > > be possible to have a software RoCE driver that can operate via > > > > > any Ethernet driver on any network. That is not the case with > > > > > iWARP, which relies on TCP's built-in congestion avoidance, which is > > truly routable and available everywhere. > > > > > > > > Soft roce driver faces soft cpu lockup in single system at 2Gbps > > bandwidth with just two QPs (send and receive). > > > > :-) > > > > Going across a system is brave world. > > > > > > > > But Leon and Yuval likes to see it enhanced/used in general instead of > > new driver. > > > > And therefore, it extends to siw efforts too. > > > > > > > > > > > > > > The ULPs need to have a robust generic in-tree software driver as > > > > > soon as we can get it to enable development, testing, and > > > > > deployment in environments that have no physical RDMA device. siw is > > the fastest way to get that. > > > > > > > > > > > > > > I totally agree to your point. > > > > In case if you didn't notice, a loopback driver [1] that outperforms > > > > soft_roce by 25x in performance and it is more stable. :-) > > > > > > > > > > > As said, I'd still like to investigate that. > > > > > > That will be very helpful. > > > > > > > > > > > > > But let the sometimes intentionally different concepts of > > > > > > > these two drivers co-exist for some time out there, and let's > > > > > > > draw conclusions of what to make common later. > > > > > > > > > > > > > It won't happen later once it is in tree. :-) > > > > > > > > > > There is nothing stopping code de-duplication once siw is in the kernel. > > > > > > > > > > The most obvious counter-example of your claim is the in-kernel > > > > > ULPs which have adopted new core APIs such as ib_alloc_cq long > > > > > after the ULPs were accepted and merged. > > > > > > > > > It happened because of active maintainers like you, Sagi and many > > others. > > > > > > > > > > > > > > > > And let's be clear, lots of things between RoCE and iWarp are > > > > > > > very > > > > > different. > > > > > > All rest of the core components such as ib_core, addr, rdmacm > > > > > > are able to > > > > > handle it two different transports. > > > > > > Cavium or qedr driver is dual protocol driver that support RoCE > > > > > > and iWarp > > > > > both in single driver. > > > > > > > > > > > > > Not only wire protocol and lower interfaces, but also semantically. > > > > > > > These things will stay different by specification, and an > > > > > > > efficient implementation of those things will likely still go it's own > > ways. > > > > > > > > > > > > > We would like to understand technically why rxe is mis fit for > > > > > > it, and why it > > > > > cannot be improved. > > > > > > Lack of resources is probably not good reason. > > > > > > > > > > Actually I find lack of resources to be a compelling argument. > > > > > Before siw is merged, Bernard and his team are the only ones who > > > > > can do this work, and they are not yet familiar with the rxe code base. > > > > > > > > > I want Leon to explicitly ACK this point and want to listen to him what he > > has to say. > > > > Leon? > > > > Is this compelling reason for siw? > > > > And therefore for loopback driver too? > > > > > > > > > After merging, anyone in the Linux RDMA community can help with > > > > > it, including you, the person who is asking for this extra effort. > > > > > > > > > I asked because Leon is asking that for another simplified driver > > proposal. > > > > Most ULPs maintainers and users would agree that rxe driver is not > > > > able to satisfy their needs (stability, performance). > > > > And your feedback is obviously important here. > > > > > > > > > It is reasonable to request that common code be refactored. It is > > > > > neither fair nor productive to gate merging siw on that work. > > > > > > > > I do not intent to gate merging siw work by any means. > > > > I want community to reach consensus on - why should we merge a new > > driver even though technically its feasible to share a code with existing in- > > tree driver. > > > > > > > > And therefore, that same exact reason won't be limited to just siw driver. > > > > > > > > > Why force Bernard to bear the burden of fixing rxe? > > > > > > > > As per Leon, rxe seems stable enough that doesn't need fixing. > > > > I infer that from his suggestion to enhance rxe instead of a new loopback > > driver for roce and IB. > > > > So Bernard won't need to fix it based on Leon's and Yuval's comments. > > > > > > > > And if rxe is not good enough, it speaks for itself. > > > > Leon? Yuval? > > > > > > > > If Leon and Yuval didn't ask to enhance soft_roce driver, my proposal is: > > > > > > > > 1. siw to provide a transport agnostic user verbs like [2] And > > > > improve performance incrementally with core extensions. > > > > This will be similar to Jen's io_uring work to provide vendor independent > > io cmd and cmpl queues for block ios. > > > > Currently rxe and siw creates their own rings etc. > > > > > > > > 2. loopback driver for RoCE and IB to reuse [2]. > > > > 3. soft_roce maintainer (?) to decide if they want to reuse code > > > > from siw or loopback > > > > > > > > [1] https://patchwork.kernel.org/patch/10831261/ > > > > [2] https://patchwork.kernel.org/patch/10831251/ > > > > > > > > > > > > > > Let's get siw merged, and then _everyone_ can have a say where it > > > > > is valuable to share code. > > > > > > > > > Let's apply that to other driver proposed too? > > > > > > My 2 cents... > > > > > > If things compile and move, why not allow it into the tree. > > > > Because upstream != trash can. We want to see involvement, commitment > > and in some extent development roadmap. > > > I assume this and Dennis's comment is for loopback driver. > > Commitment: > Since I had to travel, let's have the internal meeting for commitment on 3/18. > We must have stable email in MAINTAINER's file who will actually provide timely fixes and review incoming enhancements. Please take it offline. Mellanox employees should refrain from public discussions of commitments and schedules. All below should be in RXE/SIW. Thanks > > Roadmap: > I see following minimum features in initial version. > 1. Instead of experimental patches, stable version of patches > RC QP, GSI QP, reg_mr, FR_MR, send_with_invalidate, invalidate support. > > After first merge, since user explicitly requested below two items, > 2. bind mw, xrc qp. > > above with initially RoCEv2 transport with two GID entries, one for IPv4, one for IPv6. > > 3. Expose loopback device immutable property so that core doesn't create compat devices. > > 4. IB support > I haven't figured out yet on how to support QP-0. > Configuration of link layer or other params via configfs similar to evolving drivers/block/null_blk_main.c. > I wanted to change link type via devlink which support this, but sadly devlink doesn't allow device registration which doesn't have bus. > Netlink command to this driver for driver parameter, but don't know value of it, and your inputs would be nice. > > 5. If there is actual user request to support iWarp loopback by extending ibdev->ibcm callbacks or other things. > Haven't done the design of this yet. > > Minimal tests: > 1. perftest (bw, lat), rping with -a and infinite commands > 2. nvme fabrics with ext4 mount with 64 qps using fio > 3. rdma_resource_lat tests time based tests to flush out memory leaks etc > 4. mpirun basic benchmarks > > All on more than single cpu system. > Other tests that you or others think are good addition. > If ULP maintainers provide some scripts.. Its long time I did srp or nfs-rdma tinkering.
On 3/12/2019 12:42 PM, Leon Romanovsky wrote: > On Tue, Mar 12, 2019 at 10:39:04AM -0400, Dennis Dalessandro wrote: >> On 3/10/2019 7:44 PM, Parav Pandit wrote: >>> Hi Chuck, >>> >>>> -----Original Message----- >>>> From: Chuck Lever <chuck.lever@oracle.com> >>>> Sent: Sunday, March 10, 2019 5:30 PM >>>> To: Parav Pandit <parav@mellanox.com> >>>> Cc: Bernard Metzler <BMT@zurich.ibm.com>; Steve Wise >>>> <larrystevenwise@gmail.com>; linux-rdma@vger.kernel.org; Leon >>>> Romanovsky <leonro@mellanox.com>; Yuval Shaia >>>> <yuval.shaia@oracle.com> >>>> Subject: Re: [PATCH v5 00/13] SIW: Request for Comments >>>> >>>> Hey Parav- >>>> >>>>> On Mar 10, 2019, at 6:01 PM, Parav Pandit <parav@mellanox.com> wrote: >>>>> >>>>> Hi Bernard, >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: Bernard Metzler <BMT@zurich.ibm.com> >>>>>> Sent: Sunday, March 10, 2019 4:41 PM >>>>>> To: Parav Pandit <parav@mellanox.com> >>>>>> Cc: Steve Wise <larrystevenwise@gmail.com>; >>>>>> linux-rdma@vger.kernel.org; Leon Romanovsky <leonro@mellanox.com>; >>>>>> Yuval Shaia <yuval.shaia@oracle.com> >>>>>> Subject: RE: [PATCH v5 00/13] SIW: Request for Comments >>>>>> >>>>>> Hi Parav, >>>>>> >>>>>> I understand your point of having some common driver infrastructure >>>>>> between rxe and siw. I remember I was even proposing rxe folks >>>>>> looking into that when rxe emerged after siw was already prototyped >>>>>> and open sourced (yes, siw was first out there). There was not much >>>>>> interest from rxe folks at that point in time, which is OK and >>>>>> understood (always cool to write something new and better from scratch, >>>> who wants iWarp, etc.). >>>>>> >>>>> I didn't follow that part of the history, but it clearly proves the point that it >>>> was started incorrectly. >>>>> With all the fancy code for doorbells, with mmap() call for QP, CQ, ground >>>> reality is, it doesn't even pass a developer's silly ib_write_bw today. >>>>> And at that test shows 2Gbps bw on 64Gb RAM with 48 core cpus! >>>>> >>>>>> At the current stage of the project, it is rather contra-productive >>>>>> to completely refactor siw to match with rxe, since I do not have the >>>>>> resources to rip apart the whole thing, and I am not yet convinced it >>>> makes sense. >>>>>> >>>>> But that seems to be right approach to have modular code and reuse >>>> existing pieces. >>>>> Leon, Yuval and others resonate that too. >>>> >>>> I agree that code sharing can be valuable. However... >>>> >>>> >>>>> If existing pieces are inferior or mis fit, their technical details has to be >>>> called out. >>>>> rdmavt improved and used library model to share code between two >>>> drivers. >>>>> >>>>>> If one asks for merging these days, one asks at first for a delay in >>>>>> getting siw accepted. >>>>>> >>>>> Yuval and Leon wants to improve rxe in general and have desire to reuse >>>> rxe. >>>>> They might be able to help you. Please check with them. >>>>> >>>>>> Over time, it might indeed be a good thing to merge parts. >>>>> It should happen from beginning. >>>> >>>> I disagree that merging siw should wait for code de-duplication with rxe. >>>> >>>> The current siw is coming close to being merge-ready. I would like to see it in >>>> the kernel as soon as possible. RoCE, hard or soft, requires DCE in order to >>>> be fully useful, and thus it will never be possible to have a software RoCE >>>> driver that can operate via any Ethernet driver on any network. That is not >>>> the case with iWARP, which relies on TCP's built-in congestion avoidance, >>>> which is truly routable and available everywhere. >>> >>> Soft roce driver faces soft cpu lockup in single system at 2Gbps bandwidth with just two QPs (send and receive). >>> :-) >>> Going across a system is brave world. >>> >>> But Leon and Yuval likes to see it enhanced/used in general instead of new driver. >>> And therefore, it extends to siw efforts too. >>> >>>> >>>> The ULPs need to have a robust generic in-tree software driver as soon as we >>>> can get it to enable development, testing, and deployment in environments >>>> that have no physical RDMA device. siw is the fastest way to get that. >>>> >>>> >>> I totally agree to your point. >>> In case if you didn't notice, a loopback driver [1] that outperforms soft_roce by 25x in performance and it is more stable. :-) >>> >>>>>> As said, I'd still like to investigate that. >>>>> That will be very helpful. >>>>> >>>>>> But let the sometimes intentionally different concepts of these two >>>>>> drivers co-exist for some time out there, and let's draw conclusions >>>>>> of what to make common later. >>>>>> >>>>> It won't happen later once it is in tree. :-) >>>> >>>> There is nothing stopping code de-duplication once siw is in the kernel. >>>> >>>> The most obvious counter-example of your claim is the in-kernel ULPs which >>>> have adopted new core APIs such as ib_alloc_cq long after the ULPs were >>>> accepted and merged. >>>> >>> It happened because of active maintainers like you, Sagi and many others. >>> >>>> >>>>>> And let's be clear, lots of things between RoCE and iWarp are very >>>> different. >>>>> All rest of the core components such as ib_core, addr, rdmacm are able to >>>> handle it two different transports. >>>>> Cavium or qedr driver is dual protocol driver that support RoCE and iWarp >>>> both in single driver. >>>>> >>>>>> Not only wire protocol and lower interfaces, but also semantically. >>>>>> These things will stay different by specification, and an efficient >>>>>> implementation of those things will likely still go it's own ways. >>>>>> >>>>> We would like to understand technically why rxe is mis fit for it, and why it >>>> cannot be improved. >>>>> Lack of resources is probably not good reason. >>>> >>>> Actually I find lack of resources to be a compelling argument. Before siw is >>>> merged, Bernard and his team are the only ones who can do this work, and >>>> they are not yet familiar with the rxe code base. >>>> >>> I want Leon to explicitly ACK this point and want to listen to him what he has to say. >>> Leon? >>> Is this compelling reason for siw? >>> And therefore for loopback driver too? >>> >>>> After merging, anyone in the Linux RDMA community can help with it, >>>> including you, the person who is asking for this extra effort. >>>> >>> I asked because Leon is asking that for another simplified driver proposal. >>> Most ULPs maintainers and users would agree that rxe driver is not able to satisfy their needs >>> (stability, performance). >>> And your feedback is obviously important here. >>> >>>> It is reasonable to request that common code be refactored. It is neither fair >>>> nor productive to gate merging siw on that work. >>> >>> I do not intent to gate merging siw work by any means. >>> I want community to reach consensus on - why should we merge a new driver even though technically its feasible to share a code with existing in-tree driver. >>> >>> And therefore, that same exact reason won't be limited to just siw driver. >>> >>>> Why force Bernard to bear the burden of fixing rxe? >>> >>> As per Leon, rxe seems stable enough that doesn't need fixing. >>> I infer that from his suggestion to enhance rxe instead of a new loopback driver for roce and IB. >>> So Bernard won't need to fix it based on Leon's and Yuval's comments. >>> >>> And if rxe is not good enough, it speaks for itself. >>> Leon? Yuval? >>> >>> If Leon and Yuval didn't ask to enhance soft_roce driver, my proposal is: >>> >>> 1. siw to provide a transport agnostic user verbs like [2] >>> And improve performance incrementally with core extensions. >>> This will be similar to Jen's io_uring work to provide vendor independent io cmd and cmpl queues for block ios. >>> Currently rxe and siw creates their own rings etc. >>> >>> 2. loopback driver for RoCE and IB to reuse [2]. >>> 3. soft_roce maintainer (?) to decide if they want to reuse code from siw or loopback >>> >>> [1] https://patchwork.kernel.org/patch/10831261/ >>> [2] https://patchwork.kernel.org/patch/10831251/ >>> >>>> >>>> Let's get siw merged, and then _everyone_ can have a say where it is >>>> valuable to share code. >>>> >>> Let's apply that to other driver proposed too? >> >> My 2 cents... >> >> If things compile and move, why not allow it into the tree. > > Because upstream != trash can. We want to see involvement, commitment > and in some extent development roadmap. First off I never suggested that, not in the slightest. Secondly it's not fair to label either siw or loopback drivers as trash. They both are cleanly (arguably) done, they both function. Can they run NVMEoF? If so then according to Jason [1] that is meeting the unambiguous low bar. To me that means they are supporting enough to be considered a serious driver and brought into the sub system for further development. [1] https://marc.info/?l=linux-rdma&m=154403516727177&w=2 -Denny
-----"Dennis Dalessandro" <dennis.dalessandro@intel.com> wrote: ----- >To: "Leon Romanovsky" <leonro@mellanox.com> >From: "Dennis Dalessandro" <dennis.dalessandro@intel.com> >Date: 03/13/2019 04:31PM >Cc: "Parav Pandit" <parav@mellanox.com>, "Chuck Lever" ><chuck.lever@oracle.com>, "Bernard Metzler" <BMT@zurich.ibm.com>, >"Steve Wise" <larrystevenwise@gmail.com>, >"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>, "Yuval >Shaia" <yuval.shaia@oracle.com> >Subject: Re: [PATCH v5 00/13] SIW: Request for Comments > >On 3/12/2019 12:42 PM, Leon Romanovsky wrote: >> On Tue, Mar 12, 2019 at 10:39:04AM -0400, Dennis Dalessandro wrote: >>> On 3/10/2019 7:44 PM, Parav Pandit wrote: >>>> Hi Chuck, >>>> >>>>> -----Original Message----- >>>>> From: Chuck Lever <chuck.lever@oracle.com> >>>>> Sent: Sunday, March 10, 2019 5:30 PM >>>>> To: Parav Pandit <parav@mellanox.com> >>>>> Cc: Bernard Metzler <BMT@zurich.ibm.com>; Steve Wise >>>>> <larrystevenwise@gmail.com>; linux-rdma@vger.kernel.org; Leon >>>>> Romanovsky <leonro@mellanox.com>; Yuval Shaia >>>>> <yuval.shaia@oracle.com> >>>>> Subject: Re: [PATCH v5 00/13] SIW: Request for Comments >>>>> >>>>> Hey Parav- >>>>> >>>>>> On Mar 10, 2019, at 6:01 PM, Parav Pandit <parav@mellanox.com> >wrote: >>>>>> >>>>>> Hi Bernard, >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Bernard Metzler <BMT@zurich.ibm.com> >>>>>>> Sent: Sunday, March 10, 2019 4:41 PM >>>>>>> To: Parav Pandit <parav@mellanox.com> >>>>>>> Cc: Steve Wise <larrystevenwise@gmail.com>; >>>>>>> linux-rdma@vger.kernel.org; Leon Romanovsky ><leonro@mellanox.com>; >>>>>>> Yuval Shaia <yuval.shaia@oracle.com> >>>>>>> Subject: RE: [PATCH v5 00/13] SIW: Request for Comments >>>>>>> >>>>>>> Hi Parav, >>>>>>> >>>>>>> I understand your point of having some common driver >infrastructure >>>>>>> between rxe and siw. I remember I was even proposing rxe folks >>>>>>> looking into that when rxe emerged after siw was already >prototyped >>>>>>> and open sourced (yes, siw was first out there). There was not >much >>>>>>> interest from rxe folks at that point in time, which is OK and >>>>>>> understood (always cool to write something new and better from >scratch, >>>>> who wants iWarp, etc.). >>>>>>> >>>>>> I didn't follow that part of the history, but it clearly proves >the point that it >>>>> was started incorrectly. >>>>>> With all the fancy code for doorbells, with mmap() call for QP, >CQ, ground >>>>> reality is, it doesn't even pass a developer's silly ib_write_bw >today. >>>>>> And at that test shows 2Gbps bw on 64Gb RAM with 48 core cpus! >>>>>> >>>>>>> At the current stage of the project, it is rather >contra-productive >>>>>>> to completely refactor siw to match with rxe, since I do not >have the >>>>>>> resources to rip apart the whole thing, and I am not yet >convinced it >>>>> makes sense. >>>>>>> >>>>>> But that seems to be right approach to have modular code and >reuse >>>>> existing pieces. >>>>>> Leon, Yuval and others resonate that too. >>>>> >>>>> I agree that code sharing can be valuable. However... >>>>> >>>>> >>>>>> If existing pieces are inferior or mis fit, their technical >details has to be >>>>> called out. >>>>>> rdmavt improved and used library model to share code between >two >>>>> drivers. >>>>>> >>>>>>> If one asks for merging these days, one asks at first for a >delay in >>>>>>> getting siw accepted. >>>>>>> >>>>>> Yuval and Leon wants to improve rxe in general and have desire >to reuse >>>>> rxe. >>>>>> They might be able to help you. Please check with them. >>>>>> >>>>>>> Over time, it might indeed be a good thing to merge parts. >>>>>> It should happen from beginning. >>>>> >>>>> I disagree that merging siw should wait for code de-duplication >with rxe. >>>>> >>>>> The current siw is coming close to being merge-ready. I would >like to see it in >>>>> the kernel as soon as possible. RoCE, hard or soft, requires DCE >in order to >>>>> be fully useful, and thus it will never be possible to have a >software RoCE >>>>> driver that can operate via any Ethernet driver on any network. >That is not >>>>> the case with iWARP, which relies on TCP's built-in congestion >avoidance, >>>>> which is truly routable and available everywhere. >>>> >>>> Soft roce driver faces soft cpu lockup in single system at 2Gbps >bandwidth with just two QPs (send and receive). >>>> :-) >>>> Going across a system is brave world. >>>> >>>> But Leon and Yuval likes to see it enhanced/used in general >instead of new driver. >>>> And therefore, it extends to siw efforts too. >>>> >>>>> >>>>> The ULPs need to have a robust generic in-tree software driver >as soon as we >>>>> can get it to enable development, testing, and deployment in >environments >>>>> that have no physical RDMA device. siw is the fastest way to get >that. >>>>> >>>>> >>>> I totally agree to your point. >>>> In case if you didn't notice, a loopback driver [1] that >outperforms soft_roce by 25x in performance and it is more stable. >:-) >>>> >>>>>>> As said, I'd still like to investigate that. >>>>>> That will be very helpful. >>>>>> >>>>>>> But let the sometimes intentionally different concepts of >these two >>>>>>> drivers co-exist for some time out there, and let's draw >conclusions >>>>>>> of what to make common later. >>>>>>> >>>>>> It won't happen later once it is in tree. :-) >>>>> >>>>> There is nothing stopping code de-duplication once siw is in the >kernel. >>>>> >>>>> The most obvious counter-example of your claim is the in-kernel >ULPs which >>>>> have adopted new core APIs such as ib_alloc_cq long after the >ULPs were >>>>> accepted and merged. >>>>> >>>> It happened because of active maintainers like you, Sagi and many >others. >>>> >>>>> >>>>>>> And let's be clear, lots of things between RoCE and iWarp are >very >>>>> different. >>>>>> All rest of the core components such as ib_core, addr, rdmacm >are able to >>>>> handle it two different transports. >>>>>> Cavium or qedr driver is dual protocol driver that support RoCE >and iWarp >>>>> both in single driver. >>>>>> >>>>>>> Not only wire protocol and lower interfaces, but also >semantically. >>>>>>> These things will stay different by specification, and an >efficient >>>>>>> implementation of those things will likely still go it's own >ways. >>>>>>> >>>>>> We would like to understand technically why rxe is mis fit for >it, and why it >>>>> cannot be improved. >>>>>> Lack of resources is probably not good reason. >>>>> >>>>> Actually I find lack of resources to be a compelling argument. >Before siw is >>>>> merged, Bernard and his team are the only ones who can do this >work, and >>>>> they are not yet familiar with the rxe code base. >>>>> >>>> I want Leon to explicitly ACK this point and want to listen to >him what he has to say. >>>> Leon? >>>> Is this compelling reason for siw? >>>> And therefore for loopback driver too? >>>> >>>>> After merging, anyone in the Linux RDMA community can help with >it, >>>>> including you, the person who is asking for this extra effort. >>>>> >>>> I asked because Leon is asking that for another simplified driver >proposal. >>>> Most ULPs maintainers and users would agree that rxe driver is >not able to satisfy their needs >>>> (stability, performance). >>>> And your feedback is obviously important here. >>>> >>>>> It is reasonable to request that common code be refactored. It >is neither fair >>>>> nor productive to gate merging siw on that work. >>>> >>>> I do not intent to gate merging siw work by any means. >>>> I want community to reach consensus on - why should we merge a >new driver even though technically its feasible to share a code with >existing in-tree driver. >>>> >>>> And therefore, that same exact reason won't be limited to just >siw driver. >>>> >>>>> Why force Bernard to bear the burden of fixing rxe? >>>> >>>> As per Leon, rxe seems stable enough that doesn't need fixing. >>>> I infer that from his suggestion to enhance rxe instead of a new >loopback driver for roce and IB. >>>> So Bernard won't need to fix it based on Leon's and Yuval's >comments. >>>> >>>> And if rxe is not good enough, it speaks for itself. >>>> Leon? Yuval? >>>> >>>> If Leon and Yuval didn't ask to enhance soft_roce driver, my >proposal is: >>>> >>>> 1. siw to provide a transport agnostic user verbs like [2] >>>> And improve performance incrementally with core extensions. >>>> This will be similar to Jen's io_uring work to provide vendor >independent io cmd and cmpl queues for block ios. >>>> Currently rxe and siw creates their own rings etc. >>>> >>>> 2. loopback driver for RoCE and IB to reuse [2]. >>>> 3. soft_roce maintainer (?) to decide if they want to reuse code >from siw or loopback >>>> >>>> [1] >https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.kernel >.org_patch_10831261_&d=DwIC-g&c=jf_iaSHvJObTbx-siA1ZOg&r=2TaYXQ0T-r8Z >O1PP1alNwU_QJcRRLfmYTAgd3QCvqSc&m=_08EePmMUq466MpLUdKBkqpiAQwZO6-v3vQ >V8lJGy8Q&s=PNoLlcMS2y11ULloi6Tk8J1q2ccevH1tuvXbZ_gA16Y&e= >>>> [2] >https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.kernel >.org_patch_10831251_&d=DwIC-g&c=jf_iaSHvJObTbx-siA1ZOg&r=2TaYXQ0T-r8Z >O1PP1alNwU_QJcRRLfmYTAgd3QCvqSc&m=_08EePmMUq466MpLUdKBkqpiAQwZO6-v3vQ >V8lJGy8Q&s=5q96F0D8jXv81QEhEfvz90wh6CxdS6TFbqOSz8YzPnw&e= >>>> >>>>> >>>>> Let's get siw merged, and then _everyone_ can have a say where >it is >>>>> valuable to share code. >>>>> >>>> Let's apply that to other driver proposed too? >>> >>> My 2 cents... >>> >>> If things compile and move, why not allow it into the tree. >> >> Because upstream != trash can. We want to see involvement, >commitment >> and in some extent development roadmap. > >First off I never suggested that, not in the slightest. Secondly it's > >not fair to label either siw or loopback drivers as trash. They both >are >cleanly (arguably) done, they both function. > >Can they run NVMEoF? If so then according to Jason [1] that is >meeting >the unambiguous low bar. To me that means they are supporting enough >to >be considered a serious driver and brought into the sub system for >further development. > I speak for myself and for siw development only here. I do this for years as a side project. We used that driver a lot in the past for RDMA application development, it is used in classes at ETH Zurich university, there are even some companies out there using it for their business. I am committed to further development and maintenance of software based RDMA. I know sometimes it takes to long before I response/fix. Like these days where I am completely consumed by my daily business. It is not really laziness, but constant lack of time to pet my pet project... One reason I am very much interested in getting it upstream is, to better align it with a moving target: the RDMA mid-layer. I spent lots of time with fixing the stand alone siw open source project for all that changes over the years. I want to be better integrated with the RDMA development community. Yes, and I hope for a code quality improvement as well. Based on community feedback, patches. I think this humble piece of code is not to bad to get improved by folks who want the functionality it provides. Thanks Bernard.
On Wed, Mar 13, 2019 at 04:17:22PM +0000, Bernard Metzler wrote: > -----"Dennis Dalessandro" <dennis.dalessandro@intel.com> wrote: ----- > > >To: "Leon Romanovsky" <leonro@mellanox.com> > >From: "Dennis Dalessandro" <dennis.dalessandro@intel.com> > >Date: 03/13/2019 04:31PM > >Cc: "Parav Pandit" <parav@mellanox.com>, "Chuck Lever" > ><chuck.lever@oracle.com>, "Bernard Metzler" <BMT@zurich.ibm.com>, > >"Steve Wise" <larrystevenwise@gmail.com>, > >"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>, "Yuval > >Shaia" <yuval.shaia@oracle.com> > >Subject: Re: [PATCH v5 00/13] SIW: Request for Comments > > > >On 3/12/2019 12:42 PM, Leon Romanovsky wrote: > >> On Tue, Mar 12, 2019 at 10:39:04AM -0400, Dennis Dalessandro wrote: > >>> On 3/10/2019 7:44 PM, Parav Pandit wrote: > >>>> Hi Chuck, > >>>> > >>>>> -----Original Message----- > >>>>> From: Chuck Lever <chuck.lever@oracle.com> > >>>>> Sent: Sunday, March 10, 2019 5:30 PM > >>>>> To: Parav Pandit <parav@mellanox.com> > >>>>> Cc: Bernard Metzler <BMT@zurich.ibm.com>; Steve Wise > >>>>> <larrystevenwise@gmail.com>; linux-rdma@vger.kernel.org; Leon > >>>>> Romanovsky <leonro@mellanox.com>; Yuval Shaia > >>>>> <yuval.shaia@oracle.com> > >>>>> Subject: Re: [PATCH v5 00/13] SIW: Request for Comments > >>>>> > >>>>> Hey Parav- > >>>>> > >>>>>> On Mar 10, 2019, at 6:01 PM, Parav Pandit <parav@mellanox.com> > >wrote: > >>>>>> > >>>>>> Hi Bernard, > >>>>>> > >>>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: Bernard Metzler <BMT@zurich.ibm.com> > >>>>>>> Sent: Sunday, March 10, 2019 4:41 PM > >>>>>>> To: Parav Pandit <parav@mellanox.com> > >>>>>>> Cc: Steve Wise <larrystevenwise@gmail.com>; > >>>>>>> linux-rdma@vger.kernel.org; Leon Romanovsky > ><leonro@mellanox.com>; > >>>>>>> Yuval Shaia <yuval.shaia@oracle.com> > >>>>>>> Subject: RE: [PATCH v5 00/13] SIW: Request for Comments > >>>>>>> > >>>>>>> Hi Parav, > >>>>>>> > >>>>>>> I understand your point of having some common driver > >infrastructure > >>>>>>> between rxe and siw. I remember I was even proposing rxe folks > >>>>>>> looking into that when rxe emerged after siw was already > >prototyped > >>>>>>> and open sourced (yes, siw was first out there). There was not > >much > >>>>>>> interest from rxe folks at that point in time, which is OK and > >>>>>>> understood (always cool to write something new and better from > >scratch, > >>>>> who wants iWarp, etc.). > >>>>>>> > >>>>>> I didn't follow that part of the history, but it clearly proves > >the point that it > >>>>> was started incorrectly. > >>>>>> With all the fancy code for doorbells, with mmap() call for QP, > >CQ, ground > >>>>> reality is, it doesn't even pass a developer's silly ib_write_bw > >today. > >>>>>> And at that test shows 2Gbps bw on 64Gb RAM with 48 core cpus! > >>>>>> > >>>>>>> At the current stage of the project, it is rather > >contra-productive > >>>>>>> to completely refactor siw to match with rxe, since I do not > >have the > >>>>>>> resources to rip apart the whole thing, and I am not yet > >convinced it > >>>>> makes sense. > >>>>>>> > >>>>>> But that seems to be right approach to have modular code and > >reuse > >>>>> existing pieces. > >>>>>> Leon, Yuval and others resonate that too. > >>>>> > >>>>> I agree that code sharing can be valuable. However... > >>>>> > >>>>> > >>>>>> If existing pieces are inferior or mis fit, their technical > >details has to be > >>>>> called out. > >>>>>> rdmavt improved and used library model to share code between > >two > >>>>> drivers. > >>>>>> > >>>>>>> If one asks for merging these days, one asks at first for a > >delay in > >>>>>>> getting siw accepted. > >>>>>>> > >>>>>> Yuval and Leon wants to improve rxe in general and have desire > >to reuse > >>>>> rxe. > >>>>>> They might be able to help you. Please check with them. > >>>>>> > >>>>>>> Over time, it might indeed be a good thing to merge parts. > >>>>>> It should happen from beginning. > >>>>> > >>>>> I disagree that merging siw should wait for code de-duplication > >with rxe. > >>>>> > >>>>> The current siw is coming close to being merge-ready. I would > >like to see it in > >>>>> the kernel as soon as possible. RoCE, hard or soft, requires DCE > >in order to > >>>>> be fully useful, and thus it will never be possible to have a > >software RoCE > >>>>> driver that can operate via any Ethernet driver on any network. > >That is not > >>>>> the case with iWARP, which relies on TCP's built-in congestion > >avoidance, > >>>>> which is truly routable and available everywhere. > >>>> > >>>> Soft roce driver faces soft cpu lockup in single system at 2Gbps > >bandwidth with just two QPs (send and receive). > >>>> :-) > >>>> Going across a system is brave world. > >>>> > >>>> But Leon and Yuval likes to see it enhanced/used in general > >instead of new driver. > >>>> And therefore, it extends to siw efforts too. > >>>> > >>>>> > >>>>> The ULPs need to have a robust generic in-tree software driver > >as soon as we > >>>>> can get it to enable development, testing, and deployment in > >environments > >>>>> that have no physical RDMA device. siw is the fastest way to get > >that. > >>>>> > >>>>> > >>>> I totally agree to your point. > >>>> In case if you didn't notice, a loopback driver [1] that > >outperforms soft_roce by 25x in performance and it is more stable. > >:-) > >>>> > >>>>>>> As said, I'd still like to investigate that. > >>>>>> That will be very helpful. > >>>>>> > >>>>>>> But let the sometimes intentionally different concepts of > >these two > >>>>>>> drivers co-exist for some time out there, and let's draw > >conclusions > >>>>>>> of what to make common later. > >>>>>>> > >>>>>> It won't happen later once it is in tree. :-) > >>>>> > >>>>> There is nothing stopping code de-duplication once siw is in the > >kernel. > >>>>> > >>>>> The most obvious counter-example of your claim is the in-kernel > >ULPs which > >>>>> have adopted new core APIs such as ib_alloc_cq long after the > >ULPs were > >>>>> accepted and merged. > >>>>> > >>>> It happened because of active maintainers like you, Sagi and many > >others. > >>>> > >>>>> > >>>>>>> And let's be clear, lots of things between RoCE and iWarp are > >very > >>>>> different. > >>>>>> All rest of the core components such as ib_core, addr, rdmacm > >are able to > >>>>> handle it two different transports. > >>>>>> Cavium or qedr driver is dual protocol driver that support RoCE > >and iWarp > >>>>> both in single driver. > >>>>>> > >>>>>>> Not only wire protocol and lower interfaces, but also > >semantically. > >>>>>>> These things will stay different by specification, and an > >efficient > >>>>>>> implementation of those things will likely still go it's own > >ways. > >>>>>>> > >>>>>> We would like to understand technically why rxe is mis fit for > >it, and why it > >>>>> cannot be improved. > >>>>>> Lack of resources is probably not good reason. > >>>>> > >>>>> Actually I find lack of resources to be a compelling argument. > >Before siw is > >>>>> merged, Bernard and his team are the only ones who can do this > >work, and > >>>>> they are not yet familiar with the rxe code base. > >>>>> > >>>> I want Leon to explicitly ACK this point and want to listen to > >him what he has to say. > >>>> Leon? > >>>> Is this compelling reason for siw? > >>>> And therefore for loopback driver too? > >>>> > >>>>> After merging, anyone in the Linux RDMA community can help with > >it, > >>>>> including you, the person who is asking for this extra effort. > >>>>> > >>>> I asked because Leon is asking that for another simplified driver > >proposal. > >>>> Most ULPs maintainers and users would agree that rxe driver is > >not able to satisfy their needs > >>>> (stability, performance). > >>>> And your feedback is obviously important here. > >>>> > >>>>> It is reasonable to request that common code be refactored. It > >is neither fair > >>>>> nor productive to gate merging siw on that work. > >>>> > >>>> I do not intent to gate merging siw work by any means. > >>>> I want community to reach consensus on - why should we merge a > >new driver even though technically its feasible to share a code with > >existing in-tree driver. > >>>> > >>>> And therefore, that same exact reason won't be limited to just > >siw driver. > >>>> > >>>>> Why force Bernard to bear the burden of fixing rxe? > >>>> > >>>> As per Leon, rxe seems stable enough that doesn't need fixing. > >>>> I infer that from his suggestion to enhance rxe instead of a new > >loopback driver for roce and IB. > >>>> So Bernard won't need to fix it based on Leon's and Yuval's > >comments. > >>>> > >>>> And if rxe is not good enough, it speaks for itself. > >>>> Leon? Yuval? > >>>> > >>>> If Leon and Yuval didn't ask to enhance soft_roce driver, my > >proposal is: > >>>> > >>>> 1. siw to provide a transport agnostic user verbs like [2] > >>>> And improve performance incrementally with core extensions. > >>>> This will be similar to Jen's io_uring work to provide vendor > >independent io cmd and cmpl queues for block ios. > >>>> Currently rxe and siw creates their own rings etc. > >>>> > >>>> 2. loopback driver for RoCE and IB to reuse [2]. > >>>> 3. soft_roce maintainer (?) to decide if they want to reuse code > >from siw or loopback > >>>> > >>>> [1] > >https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.kernel > >.org_patch_10831261_&d=DwIC-g&c=jf_iaSHvJObTbx-siA1ZOg&r=2TaYXQ0T-r8Z > >O1PP1alNwU_QJcRRLfmYTAgd3QCvqSc&m=_08EePmMUq466MpLUdKBkqpiAQwZO6-v3vQ > >V8lJGy8Q&s=PNoLlcMS2y11ULloi6Tk8J1q2ccevH1tuvXbZ_gA16Y&e= > >>>> [2] > >https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.kernel > >.org_patch_10831251_&d=DwIC-g&c=jf_iaSHvJObTbx-siA1ZOg&r=2TaYXQ0T-r8Z > >O1PP1alNwU_QJcRRLfmYTAgd3QCvqSc&m=_08EePmMUq466MpLUdKBkqpiAQwZO6-v3vQ > >V8lJGy8Q&s=5q96F0D8jXv81QEhEfvz90wh6CxdS6TFbqOSz8YzPnw&e= > >>>> > >>>>> > >>>>> Let's get siw merged, and then _everyone_ can have a say where > >it is > >>>>> valuable to share code. > >>>>> > >>>> Let's apply that to other driver proposed too? > >>> > >>> My 2 cents... > >>> > >>> If things compile and move, why not allow it into the tree. > >> > >> Because upstream != trash can. We want to see involvement, > >commitment > >> and in some extent development roadmap. > > > >First off I never suggested that, not in the slightest. Secondly it's > > > >not fair to label either siw or loopback drivers as trash. They both > >are > >cleanly (arguably) done, they both function. > > > >Can they run NVMEoF? If so then according to Jason [1] that is > >meeting > >the unambiguous low bar. To me that means they are supporting enough > >to > >be considered a serious driver and brought into the sub system for > >further development. > > > > I speak for myself and for siw development only here. I do this for > years as a side project. We used that driver a lot in the past for > RDMA application development, it is used in classes at ETH Zurich > university, there are even some companies out there using it for their > business. > > I am committed to further development and maintenance of software > based RDMA. I know sometimes it takes to long before I response/fix. > Like these days where I am completely consumed by my daily business. > It is not really laziness, but constant lack of time to pet my > pet project... One reason I am very much interested in getting it > upstream is, to better align it with a moving target: the RDMA > mid-layer. I spent lots of time with fixing the stand alone siw open > source project for all that changes over the years. I want to be > better integrated with the RDMA development community. > Yes, and I hope for a code quality improvement as well. Based on community > feedback, patches. I think this humble piece of code is not to bad to > get improved by folks who want the functionality it provides. Guys, I didn't mean anything bad about siw or loopback. Just responded on the Dennis's sentence "If things compile and move, why not allow it into the tree." I'm pretty convinced that SIW will be part of next kernel release. Sorry if I was too rude. Thanks > > Thanks > Bernard. >
-----"Leon Romanovsky" <leonro@mellanox.com> wrote: ----- >To: "Bernard Metzler" <BMT@zurich.ibm.com> >From: "Leon Romanovsky" <leonro@mellanox.com> >Date: 03/13/2019 05:25PM >Cc: "Dennis Dalessandro" <dennis.dalessandro@intel.com>, "Parav >Pandit" <parav@mellanox.com>, "Chuck Lever" <chuck.lever@oracle.com>, >"Steve Wise" <larrystevenwise@gmail.com>, >"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>, "Yuval >Shaia" <yuval.shaia@oracle.com> >Subject: Re: [PATCH v5 00/13] SIW: Request for Comments > >On Wed, Mar 13, 2019 at 04:17:22PM +0000, Bernard Metzler wrote: >> -----"Dennis Dalessandro" <dennis.dalessandro@intel.com> wrote: >----- >> >> >To: "Leon Romanovsky" <leonro@mellanox.com> >> >From: "Dennis Dalessandro" <dennis.dalessandro@intel.com> >> >Date: 03/13/2019 04:31PM >> >Cc: "Parav Pandit" <parav@mellanox.com>, "Chuck Lever" >> ><chuck.lever@oracle.com>, "Bernard Metzler" <BMT@zurich.ibm.com>, >> >"Steve Wise" <larrystevenwise@gmail.com>, >> >"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>, "Yuval >> >Shaia" <yuval.shaia@oracle.com> >> >Subject: Re: [PATCH v5 00/13] SIW: Request for Comments >> > >> >On 3/12/2019 12:42 PM, Leon Romanovsky wrote: >> >> On Tue, Mar 12, 2019 at 10:39:04AM -0400, Dennis Dalessandro >wrote: >> >>> On 3/10/2019 7:44 PM, Parav Pandit wrote: >> >>>> Hi Chuck, >> >>>> >> >>>>> -----Original Message----- >> >>>>> From: Chuck Lever <chuck.lever@oracle.com> >> >>>>> Sent: Sunday, March 10, 2019 5:30 PM >> >>>>> To: Parav Pandit <parav@mellanox.com> >> >>>>> Cc: Bernard Metzler <BMT@zurich.ibm.com>; Steve Wise >> >>>>> <larrystevenwise@gmail.com>; linux-rdma@vger.kernel.org; Leon >> >>>>> Romanovsky <leonro@mellanox.com>; Yuval Shaia >> >>>>> <yuval.shaia@oracle.com> >> >>>>> Subject: Re: [PATCH v5 00/13] SIW: Request for Comments >> >>>>> >> >>>>> Hey Parav- >> >>>>> >> >>>>>> On Mar 10, 2019, at 6:01 PM, Parav Pandit ><parav@mellanox.com> >> >wrote: >> >>>>>> >> >>>>>> Hi Bernard, >> >>>>>> >> >>>>>> >> >>>>>>> -----Original Message----- >> >>>>>>> From: Bernard Metzler <BMT@zurich.ibm.com> >> >>>>>>> Sent: Sunday, March 10, 2019 4:41 PM >> >>>>>>> To: Parav Pandit <parav@mellanox.com> >> >>>>>>> Cc: Steve Wise <larrystevenwise@gmail.com>; >> >>>>>>> linux-rdma@vger.kernel.org; Leon Romanovsky >> ><leonro@mellanox.com>; >> >>>>>>> Yuval Shaia <yuval.shaia@oracle.com> >> >>>>>>> Subject: RE: [PATCH v5 00/13] SIW: Request for Comments >> >>>>>>> >> >>>>>>> Hi Parav, >> >>>>>>> >> >>>>>>> I understand your point of having some common driver >> >infrastructure >> >>>>>>> between rxe and siw. I remember I was even proposing rxe >folks >> >>>>>>> looking into that when rxe emerged after siw was already >> >prototyped >> >>>>>>> and open sourced (yes, siw was first out there). There was >not >> >much >> >>>>>>> interest from rxe folks at that point in time, which is OK >and >> >>>>>>> understood (always cool to write something new and better >from >> >scratch, >> >>>>> who wants iWarp, etc.). >> >>>>>>> >> >>>>>> I didn't follow that part of the history, but it clearly >proves >> >the point that it >> >>>>> was started incorrectly. >> >>>>>> With all the fancy code for doorbells, with mmap() call for >QP, >> >CQ, ground >> >>>>> reality is, it doesn't even pass a developer's silly >ib_write_bw >> >today. >> >>>>>> And at that test shows 2Gbps bw on 64Gb RAM with 48 core >cpus! >> >>>>>> >> >>>>>>> At the current stage of the project, it is rather >> >contra-productive >> >>>>>>> to completely refactor siw to match with rxe, since I do >not >> >have the >> >>>>>>> resources to rip apart the whole thing, and I am not yet >> >convinced it >> >>>>> makes sense. >> >>>>>>> >> >>>>>> But that seems to be right approach to have modular code and >> >reuse >> >>>>> existing pieces. >> >>>>>> Leon, Yuval and others resonate that too. >> >>>>> >> >>>>> I agree that code sharing can be valuable. However... >> >>>>> >> >>>>> >> >>>>>> If existing pieces are inferior or mis fit, their technical >> >details has to be >> >>>>> called out. >> >>>>>> rdmavt improved and used library model to share code between >> >two >> >>>>> drivers. >> >>>>>> >> >>>>>>> If one asks for merging these days, one asks at first for a >> >delay in >> >>>>>>> getting siw accepted. >> >>>>>>> >> >>>>>> Yuval and Leon wants to improve rxe in general and have >desire >> >to reuse >> >>>>> rxe. >> >>>>>> They might be able to help you. Please check with them. >> >>>>>> >> >>>>>>> Over time, it might indeed be a good thing to merge parts. >> >>>>>> It should happen from beginning. >> >>>>> >> >>>>> I disagree that merging siw should wait for code >de-duplication >> >with rxe. >> >>>>> >> >>>>> The current siw is coming close to being merge-ready. I would >> >like to see it in >> >>>>> the kernel as soon as possible. RoCE, hard or soft, requires >DCE >> >in order to >> >>>>> be fully useful, and thus it will never be possible to have a >> >software RoCE >> >>>>> driver that can operate via any Ethernet driver on any >network. >> >That is not >> >>>>> the case with iWARP, which relies on TCP's built-in >congestion >> >avoidance, >> >>>>> which is truly routable and available everywhere. >> >>>> >> >>>> Soft roce driver faces soft cpu lockup in single system at >2Gbps >> >bandwidth with just two QPs (send and receive). >> >>>> :-) >> >>>> Going across a system is brave world. >> >>>> >> >>>> But Leon and Yuval likes to see it enhanced/used in general >> >instead of new driver. >> >>>> And therefore, it extends to siw efforts too. >> >>>> >> >>>>> >> >>>>> The ULPs need to have a robust generic in-tree software >driver >> >as soon as we >> >>>>> can get it to enable development, testing, and deployment in >> >environments >> >>>>> that have no physical RDMA device. siw is the fastest way to >get >> >that. >> >>>>> >> >>>>> >> >>>> I totally agree to your point. >> >>>> In case if you didn't notice, a loopback driver [1] that >> >outperforms soft_roce by 25x in performance and it is more stable. >> >:-) >> >>>> >> >>>>>>> As said, I'd still like to investigate that. >> >>>>>> That will be very helpful. >> >>>>>> >> >>>>>>> But let the sometimes intentionally different concepts of >> >these two >> >>>>>>> drivers co-exist for some time out there, and let's draw >> >conclusions >> >>>>>>> of what to make common later. >> >>>>>>> >> >>>>>> It won't happen later once it is in tree. :-) >> >>>>> >> >>>>> There is nothing stopping code de-duplication once siw is in >the >> >kernel. >> >>>>> >> >>>>> The most obvious counter-example of your claim is the >in-kernel >> >ULPs which >> >>>>> have adopted new core APIs such as ib_alloc_cq long after the >> >ULPs were >> >>>>> accepted and merged. >> >>>>> >> >>>> It happened because of active maintainers like you, Sagi and >many >> >others. >> >>>> >> >>>>> >> >>>>>>> And let's be clear, lots of things between RoCE and iWarp >are >> >very >> >>>>> different. >> >>>>>> All rest of the core components such as ib_core, addr, >rdmacm >> >are able to >> >>>>> handle it two different transports. >> >>>>>> Cavium or qedr driver is dual protocol driver that support >RoCE >> >and iWarp >> >>>>> both in single driver. >> >>>>>> >> >>>>>>> Not only wire protocol and lower interfaces, but also >> >semantically. >> >>>>>>> These things will stay different by specification, and an >> >efficient >> >>>>>>> implementation of those things will likely still go it's >own >> >ways. >> >>>>>>> >> >>>>>> We would like to understand technically why rxe is mis fit >for >> >it, and why it >> >>>>> cannot be improved. >> >>>>>> Lack of resources is probably not good reason. >> >>>>> >> >>>>> Actually I find lack of resources to be a compelling >argument. >> >Before siw is >> >>>>> merged, Bernard and his team are the only ones who can do >this >> >work, and >> >>>>> they are not yet familiar with the rxe code base. >> >>>>> >> >>>> I want Leon to explicitly ACK this point and want to listen to >> >him what he has to say. >> >>>> Leon? >> >>>> Is this compelling reason for siw? >> >>>> And therefore for loopback driver too? >> >>>> >> >>>>> After merging, anyone in the Linux RDMA community can help >with >> >it, >> >>>>> including you, the person who is asking for this extra >effort. >> >>>>> >> >>>> I asked because Leon is asking that for another simplified >driver >> >proposal. >> >>>> Most ULPs maintainers and users would agree that rxe driver is >> >not able to satisfy their needs >> >>>> (stability, performance). >> >>>> And your feedback is obviously important here. >> >>>> >> >>>>> It is reasonable to request that common code be refactored. >It >> >is neither fair >> >>>>> nor productive to gate merging siw on that work. >> >>>> >> >>>> I do not intent to gate merging siw work by any means. >> >>>> I want community to reach consensus on - why should we merge a >> >new driver even though technically its feasible to share a code >with >> >existing in-tree driver. >> >>>> >> >>>> And therefore, that same exact reason won't be limited to just >> >siw driver. >> >>>> >> >>>>> Why force Bernard to bear the burden of fixing rxe? >> >>>> >> >>>> As per Leon, rxe seems stable enough that doesn't need fixing. >> >>>> I infer that from his suggestion to enhance rxe instead of a >new >> >loopback driver for roce and IB. >> >>>> So Bernard won't need to fix it based on Leon's and Yuval's >> >comments. >> >>>> >> >>>> And if rxe is not good enough, it speaks for itself. >> >>>> Leon? Yuval? >> >>>> >> >>>> If Leon and Yuval didn't ask to enhance soft_roce driver, my >> >proposal is: >> >>>> >> >>>> 1. siw to provide a transport agnostic user verbs like [2] >> >>>> And improve performance incrementally with core extensions. >> >>>> This will be similar to Jen's io_uring work to provide vendor >> >independent io cmd and cmpl queues for block ios. >> >>>> Currently rxe and siw creates their own rings etc. >> >>>> >> >>>> 2. loopback driver for RoCE and IB to reuse [2]. >> >>>> 3. soft_roce maintainer (?) to decide if they want to reuse >code >> >from siw or loopback >> >>>> >> >>>> [1] >> >>https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.kerne >l >> >>.org_patch_10831261_&d=DwIC-g&c=jf_iaSHvJObTbx-siA1ZOg&r=2TaYXQ0T-r8 >Z >> >>O1PP1alNwU_QJcRRLfmYTAgd3QCvqSc&m=_08EePmMUq466MpLUdKBkqpiAQwZO6-v3v >Q >> >V8lJGy8Q&s=PNoLlcMS2y11ULloi6Tk8J1q2ccevH1tuvXbZ_gA16Y&e= >> >>>> [2] >> >>https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.kerne >l >> >>.org_patch_10831251_&d=DwIC-g&c=jf_iaSHvJObTbx-siA1ZOg&r=2TaYXQ0T-r8 >Z >> >>O1PP1alNwU_QJcRRLfmYTAgd3QCvqSc&m=_08EePmMUq466MpLUdKBkqpiAQwZO6-v3v >Q >> >V8lJGy8Q&s=5q96F0D8jXv81QEhEfvz90wh6CxdS6TFbqOSz8YzPnw&e= >> >>>> >> >>>>> >> >>>>> Let's get siw merged, and then _everyone_ can have a say >where >> >it is >> >>>>> valuable to share code. >> >>>>> >> >>>> Let's apply that to other driver proposed too? >> >>> >> >>> My 2 cents... >> >>> >> >>> If things compile and move, why not allow it into the tree. >> >> >> >> Because upstream != trash can. We want to see involvement, >> >commitment >> >> and in some extent development roadmap. >> > >> >First off I never suggested that, not in the slightest. Secondly >it's >> > >> >not fair to label either siw or loopback drivers as trash. They >both >> >are >> >cleanly (arguably) done, they both function. >> > >> >Can they run NVMEoF? If so then according to Jason [1] that is >> >meeting >> >the unambiguous low bar. To me that means they are supporting >enough >> >to >> >be considered a serious driver and brought into the sub system for >> >further development. >> > >> >> I speak for myself and for siw development only here. I do this for >> years as a side project. We used that driver a lot in the past for >> RDMA application development, it is used in classes at ETH Zurich >> university, there are even some companies out there using it for >their >> business. >> >> I am committed to further development and maintenance of software >> based RDMA. I know sometimes it takes to long before I >response/fix. >> Like these days where I am completely consumed by my daily >business. >> It is not really laziness, but constant lack of time to pet my >> pet project... One reason I am very much interested in getting it >> upstream is, to better align it with a moving target: the RDMA >> mid-layer. I spent lots of time with fixing the stand alone siw >open >> source project for all that changes over the years. I want to be >> better integrated with the RDMA development community. >> Yes, and I hope for a code quality improvement as well. Based on >community >> feedback, patches. I think this humble piece of code is not to bad >to >> get improved by folks who want the functionality it provides. > >Guys, > >I didn't mean anything bad about siw or loopback. Just responded on >the >Dennis's sentence "If things compile and move, why not allow it into >the tree." > >I'm pretty convinced that SIW will be part of next kernel release. > >Sorry if I was too rude. > >Thanks > Hi Leon, this is very much appreciated. And I completely agree with a demand for high code quality. Maybe even especially true for software which is not bound to certain hardware - where one could assume constant maintenance interest over some longer time period. Best regards, Bernard.
On Wed, Mar 13, 2019 at 11:31:41AM -0400, Dennis Dalessandro wrote: > Can they run NVMEoF? If so then according to Jason [1] that is meeting the > unambiguous low bar. .. to be considered as part of the RDMA subsystem :) That was not a bar to show sufficient testing/quality/whatever. At plumbgers we agreed the biggest problem with rxe is all the bugs. How can we decide if siw has less bugs than rxe? Jason
-----"Jason Gunthorpe" <jgg@ziepe.ca> wrote: ----- >To: "Dennis Dalessandro" <dennis.dalessandro@intel.com> >From: "Jason Gunthorpe" <jgg@ziepe.ca> >Date: 03/13/2019 07:13PM >Cc: "Leon Romanovsky" <leonro@mellanox.com>, "Parav Pandit" ><parav@mellanox.com>, "Chuck Lever" <chuck.lever@oracle.com>, >"Bernard Metzler" <BMT@zurich.ibm.com>, "Steve Wise" ><larrystevenwise@gmail.com>, "linux-rdma@vger.kernel.org" ><linux-rdma@vger.kernel.org>, "Yuval Shaia" <yuval.shaia@oracle.com> >Subject: Re: [PATCH v5 00/13] SIW: Request for Comments > >On Wed, Mar 13, 2019 at 11:31:41AM -0400, Dennis Dalessandro wrote: > >> Can they run NVMEoF? If so then according to Jason [1] that is >meeting the >> unambiguous low bar. > >.. to be considered as part of the RDMA subsystem :) That was not a >bar to show sufficient testing/quality/whatever. > >At plumbgers we agreed the biggest problem with rxe is all the >bugs. How can we decide if siw has less bugs than rxe? > >Jason > Let me rebase to for-next from git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git and send a new version with the fixes? Hope to find time tomorrow. This may make it easier to test. Thanks Bernard.