Message ID | 20170529082423.1180-1-leon@kernel.org (mailing list archive) |
---|---|
State | Rejected |
Headers | show |
On Mon, May 29, 2017 at 11:24:23AM +0300, Leon Romanovsky wrote: > From: Leon Romanovsky <leonro@melalnox.com> > > The commit cea05eadded0 ("IB/core: Add flow control to the portmapper netlink calls") > changed netlink to be blocked for all RDMA clients. This workaround > worked perfectly for portmapper, but is not correct for the whole > NETLINK_RDMA family. > > The request/response should always be blocking and asynchronous > notification should always be non-blocking. > > It is library and user-space application to chose how to handle recvmgs, > as an example see nl_recvmsgs() and nl_socket_set_nonblocking() calls of > libnl library. > > Send timeout is not needed too and can be configured with SO_SNDTIMEO socket > option. > > This reverts commit cea05eadded0d4eb59f7be6e1f1560eb6bfde2bf. > > Signed-off-by: Leon Romanovsky <leonro@mellanox.com> This commit was accepted with much feedback from you. https://patchwork.kernel.org/patch/9235137/ Not clear from your description to determine what your trying to fix. Simply reverting a commit that solves a real problem is not acceptable. Please come up with a solution without breaking portmapper. Shiraz -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 30, 2017 at 04:24:31PM -0500, Shiraz Saleem wrote: > On Mon, May 29, 2017 at 11:24:23AM +0300, Leon Romanovsky wrote: > > From: Leon Romanovsky <leonro@melalnox.com> > > > > The commit cea05eadded0 ("IB/core: Add flow control to the portmapper netlink calls") > > changed netlink to be blocked for all RDMA clients. This workaround > > worked perfectly for portmapper, but is not correct for the whole > > NETLINK_RDMA family. > > > > The request/response should always be blocking and asynchronous > > notification should always be non-blocking. > > > > It is library and user-space application to chose how to handle recvmgs, > > as an example see nl_recvmsgs() and nl_socket_set_nonblocking() calls of > > libnl library. > > > > Send timeout is not needed too and can be configured with SO_SNDTIMEO socket > > option. > > > > This reverts commit cea05eadded0d4eb59f7be6e1f1560eb6bfde2bf. > > > > Signed-off-by: Leon Romanovsky <leonro@mellanox.com> > > This commit was accepted with much feedback from you. > > https://patchwork.kernel.org/patch/9235137/ > > Not clear from your description to determine what your trying to fix. > > Simply reverting a commit that solves a real problem is not acceptable. > > Please come up with a solution without breaking portmapper. 1. It was cosmetic comments, I didn't give my reviewed-by tag, so please don't put blame on me, 2. The commit cea05eadde broke NETLINK semantics for whole RDMA. 3. The commit cea05eadde made libnl library (basic block of user-space part of netlink) to work incorrectly and not according to _blocking/_nonblocking semantics. 4. Reverting is a common practice in Linux kernel. Patches are not carved in stones. 5. I proposed a solution -> go and fix your user space program. Thanks > > Shiraz
On Wed, May 31, 2017 at 07:04:37AM +0300, Leon Romanovsky wrote: > On Tue, May 30, 2017 at 04:24:31PM -0500, Shiraz Saleem wrote: > > On Mon, May 29, 2017 at 11:24:23AM +0300, Leon Romanovsky wrote: > > > From: Leon Romanovsky <leonro@melalnox.com> > > > > > > The commit cea05eadded0 ("IB/core: Add flow control to the portmapper netlink calls") > > > changed netlink to be blocked for all RDMA clients. This workaround > > > worked perfectly for portmapper, but is not correct for the whole > > > NETLINK_RDMA family. > > > > > > The request/response should always be blocking and asynchronous > > > notification should always be non-blocking. > > > > > > It is library and user-space application to chose how to handle recvmgs, > > > as an example see nl_recvmsgs() and nl_socket_set_nonblocking() calls of > > > libnl library. > > > > > > Send timeout is not needed too and can be configured with SO_SNDTIMEO socket > > > option. > > > > > > This reverts commit cea05eadded0d4eb59f7be6e1f1560eb6bfde2bf. > > > > > > Signed-off-by: Leon Romanovsky <leonro@mellanox.com> > > > > This commit was accepted with much feedback from you. > > > > https://patchwork.kernel.org/patch/9235137/ > > > > Not clear from your description to determine what your trying to fix. > > > > Simply reverting a commit that solves a real problem is not acceptable. > > > > Please come up with a solution without breaking portmapper. > > 1. It was cosmetic comments, I didn't give my reviewed-by tag, so please > don't put blame on me, We are not blaming you. You had a lot of input on the original thread including, "One is move from non-blocking mode to blocking mode which is fine enough after justification was added" > 2. The commit cea05eadde broke NETLINK semantics for whole RDMA. How did it break? What's the issue? > 3. The commit cea05eadde made libnl library (basic block of user-space part of netlink) > to work incorrectly and not according to _blocking/_nonblocking semantics. How? Is libnl calling ibnl_unicast? As far we can understand ibnl_unicast is only called by portmapper kernel code. > 4. Reverting is a common practice in Linux kernel. Patches are not > carved in stones. Reverting a patch that's introduced during RC cycle is fine, introducing regression is NOT and that is what you are doing by simply proposing to revert this patch. Reverting this patch will introduce a REGRESSION error with respect to port mapping functionality for all iWARP vendors. > 5. I proposed a solution -> go and fix your user space program. This is a kernel patch you are trying to revert, you are breaking existing kernel functionality. Nothing to do with user space. Bottom line, come up with a solution that will address both port mapper functionality and your issue. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > 4. Reverting is a common practice in Linux kernel. Patches are not > > carved in stones. > > Reverting a patch that's introduced during RC cycle is fine, introducing > regression is NOT and that is what you are doing by simply proposing to revert > this patch. Reverting this patch will introduce a REGRESSION error with respect to > port mapping functionality for all iWARP vendors. > > > 5. I proposed a solution -> go and fix your user space program. > This is a kernel patch you are trying to revert, you are breaking existing > kernel functionality. Nothing to do with user space. > > Bottom line, come up with a solution that will address both port mapper > functionality and your issue. Hey Leon, I would hate to introduce an iwarp regression vs solving your issue w/o reverting. If I recall, the flow control is need so as to not drop messages flowing from the kernel to the iwpmd. I'm not sure how that could be fixed in the user daemon? Steve. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 31, 2017 at 12:46:22PM -0500, Steve Wise wrote: > > > > > 4. Reverting is a common practice in Linux kernel. Patches are not > > > carved in stones. > > > > Reverting a patch that's introduced during RC cycle is fine, introducing > > regression is NOT and that is what you are doing by simply proposing to revert > > this patch. Reverting this patch will introduce a REGRESSION error with > respect to > > port mapping functionality for all iWARP vendors. > > > > > 5. I proposed a solution -> go and fix your user space program. > > This is a kernel patch you are trying to revert, you are breaking existing > > kernel functionality. Nothing to do with user space. > > > > Bottom line, come up with a solution that will address both port mapper > > functionality and your issue. > > Hey Leon, > > I would hate to introduce an iwarp regression vs solving your issue w/o > reverting. If I recall, the flow control is need so as to not drop messages > flowing from the kernel to the iwpmd. I'm not sure how that could be fixed in > the user daemon? Assume that netlink is unreliable and set socket timeout. Thanks > > Steve. > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Hey Leon, > > > > I would hate to introduce an iwarp regression vs solving your issue w/o > > reverting. If I recall, the flow control is need so as to not drop messages > > flowing from the kernel to the iwpmd. I'm not sure how that could be fixed in > > the user daemon? > > Assume that netlink is unreliable and set socket timeout. > > Thanks It seems that netlink is becoming more and more a message API between kernel and user. Wouldn't everyone benefit from make it reliable? And I think we can certainly provide blocking/reliable mode as an option to be used only by those subsystems that require it. Steve -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2017-05-31 at 13:34 -0500, Steve Wise wrote: > > > I would hate to introduce an iwarp regression vs solving your issue w/o > > > reverting. If I recall, the flow control is need so as to not drop messages > > > flowing from the kernel to the iwpmd. I'm not sure how that could be fixed > > > in the user daemon? > > > > Assume that netlink is unreliable and set socket timeout. > > It seems that netlink is becoming more and more a message API between kernel and > user. Wouldn't everyone benefit from make it reliable? And I think we can > certainly provide blocking/reliable mode as an option to be used only by those > subsystems that require it. Sorry Steve but I'm not sure that's a good idea. Netlink uses fixed size buffers for sending messages to user space. Making netlink reliable means making the kernel wait if the buffer is full. That would introduce the possibility of a kernel deadlock if the user space application hangs. I don't think we want that. BTW, are you aware netlink reports buffer overflows to user space through the ENOBUFS error code? Any user space software that reads from a netlink socket should check the error code returned by recv() and resynchronize the state if recv() fails with ENOBUFS. I'm writing here "should" because there are several examples of user space software that doesn't do this correctly. Bart.-- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2017-05-31 at 12:42 -0500, Shiraz Saleem wrote: > > 5. I proposed a solution -> go and fix your user space program. > > This is a kernel patch you are trying to revert, you are breaking existing > kernel functionality. Nothing to do with user space. > > Bottom line, come up with a solution that will address both port mapper > functionality and your issue. Hello Shiraz, Sorry that this means additional work for you, but I agree with Leon that user space software should not assume that netlink sockets are a reliable communication mechanism. Bart.-- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 31, 2017 at 12:42:45PM -0500, Shiraz Saleem wrote: > On Wed, May 31, 2017 at 07:04:37AM +0300, Leon Romanovsky wrote: > > On Tue, May 30, 2017 at 04:24:31PM -0500, Shiraz Saleem wrote: > > > On Mon, May 29, 2017 at 11:24:23AM +0300, Leon Romanovsky wrote: > > > > From: Leon Romanovsky <leonro@melalnox.com> > > > > > > > 3. The commit cea05eadde made libnl library (basic block of user-space part of netlink) > > to work incorrectly and not according to _blocking/_nonblocking semantics. > > How? Is libnl calling ibnl_unicast? As far we can understand ibnl_unicast is only called > by portmapper kernel code. Yes, libnl is calling to ibnl_unicast() and this is why it brought my attention to regression caused by commit which I'm reverting. Bottom-up flow: ibnl_unicast send_nlmsg_done iwpm_send_mapinfo iwpm_mapping_info_cb [RDMA_NL_IWPM_MAPINFO] = {.dump = iwpm_mapping_info_cb} ibnl_rcv_msg netlink_rcv_skb(skb, &ibnl_rcv_msg); ----- libnl: nl_recvmsgs() > > > 4. Reverting is a common practice in Linux kernel. Patches are not > > carved in stones. > > Reverting a patch that's introduced during RC cycle is fine, introducing > regression is NOT and that is what you are doing by simply proposing to revert > this patch. Reverting this patch will introduce a REGRESSION error with respect to > port mapping functionality for all iWARP vendors. Interesting and how did all these iWARP vendors survive before your patch? > > > 5. I proposed a solution -> go and fix your user space program. > This is a kernel patch you are trying to revert, you are breaking existing > kernel functionality. Nothing to do with user space. Sorry, but it is not the case. The user space is broken. Thanks
On Thu, Jun 01, 2017 at 07:10:22AM +0300, Leon Romanovsky wrote: > On Wed, May 31, 2017 at 12:42:45PM -0500, Shiraz Saleem wrote: > > On Wed, May 31, 2017 at 07:04:37AM +0300, Leon Romanovsky wrote: > > > On Tue, May 30, 2017 at 04:24:31PM -0500, Shiraz Saleem wrote: > > > > On Mon, May 29, 2017 at 11:24:23AM +0300, Leon Romanovsky wrote: > > > > > From: Leon Romanovsky <leonro@melalnox.com> > > > > > > > > > > 3. The commit cea05eadde made libnl library (basic block of user-space part of netlink) > > > to work incorrectly and not according to _blocking/_nonblocking semantics. > > > > How? Is libnl calling ibnl_unicast? As far we can understand ibnl_unicast is only called > > by portmapper kernel code. > > Yes, libnl is calling to ibnl_unicast() and this is why it brought my > attention to regression caused by commit which I'm reverting. > > Bottom-up flow: > ibnl_unicast > send_nlmsg_done > iwpm_send_mapinfo > iwpm_mapping_info_cb > [RDMA_NL_IWPM_MAPINFO] = {.dump = iwpm_mapping_info_cb} > ibnl_rcv_msg > netlink_rcv_skb(skb, &ibnl_rcv_msg); > ----- > libnl: > nl_recvmsgs() Assuming your stack trace is correct, it is showing the normal use case of ibnl_unicate() by portmapper from the iWARP side. I'm still waiting on evidence of this patch breaking rest of RDMA as you so claimed. > > > 4. Reverting is a common practice in Linux kernel. Patches are not > > > carved in stones. > > > > Reverting a patch that's introduced during RC cycle is fine, introducing > > regression is NOT and that is what you are doing by simply proposing to revert > > this patch. Reverting this patch will introduce a REGRESSION error with respect to > > port mapping functionality for all iWARP vendors. > > Interesting and how did all these iWARP vendors survive before your > patch? We are all still here. :-) But if we let you simply revert a patch that fixes portmapper for all iWARP vendors then we may be not. > > > 5. I proposed a solution -> go and fix your user space program. > > This is a kernel patch you are trying to revert, you are breaking existing > > kernel functionality. Nothing to do with user space. Here is a better solution. Post a patch to the kernel that will not introduce a regression and fix whatever the probelm you think there is, then I will personally review the patch. Chien -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 31, 2017 at 02:10:31PM -0600, Bart Van Assche wrote: > On Wed, 2017-05-31 at 12:42 -0500, Shiraz Saleem wrote: > > > 5. I proposed a solution -> go and fix your user space program. > > > > This is a kernel patch you are trying to revert, you are breaking existing > > kernel functionality. Nothing to do with user space. > > > > Bottom line, come up with a solution that will address both port mapper > > functionality and your issue. > > Hello Shiraz, > > Sorry that this means additional work for you, but I agree with Leon that > user space software should not assume that netlink sockets are a reliable > communication mechanism. Hi Bart - Thank you for your response. The original problem was that ibnl_unicast, which is used to send nl messages from portmapper kernel space to user-space, would occasionally and momentarily fail under stress. We could have retried the call for a certain amount of time, but since netlink_unicast has a nonblock/block parameter, we chose to use the blocking option with a timeout. So we thought we did account for deadlocks with this timeout. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2017-06-02 at 11:28 -0500, Shiraz Saleem wrote: > On Wed, May 31, 2017 at 02:10:31PM -0600, Bart Van Assche wrote: > > On Wed, 2017-05-31 at 12:42 -0500, Shiraz Saleem wrote: > > > > 5. I proposed a solution -> go and fix your user space program. > > > > > > This is a kernel patch you are trying to revert, you are breaking existing > > > kernel functionality. Nothing to do with user space. > > > > > > Bottom line, come up with a solution that will address both port mapper > > > functionality and your issue. > > > > Hello Shiraz, > > > > Sorry that this means additional work for you, but I agree with Leon that > > user space software should not assume that netlink sockets are a reliable > > communication mechanism. > > Hi Bart - Thank you for your response. > > The original problem was that ibnl_unicast, which is used to send nl messages from > portmapper kernel space to user-space, would occasionally and momentarily fail under stress. > We could have retried the call for a certain amount of time, but since netlink_unicast has a > nonblock/block parameter, we chose to use the blocking option with a timeout. So we thought we > did account for deadlocks with this timeout. Hello Shiraz, According to what I see in iwpmd/iwpmd.conf the default value for the iwpmd netlink socket receive buffer size is 419430400 bytes. So I'm surprised that sending netlink messages occasionally fails since the receive buffer is so large? Bart.
On Fri, Jun 02, 2017 at 11:21:03AM -0500, Chien Tin Tung wrote: > On Thu, Jun 01, 2017 at 07:10:22AM +0300, Leon Romanovsky wrote: > > On Wed, May 31, 2017 at 12:42:45PM -0500, Shiraz Saleem wrote: > > > On Wed, May 31, 2017 at 07:04:37AM +0300, Leon Romanovsky wrote: > > > > On Tue, May 30, 2017 at 04:24:31PM -0500, Shiraz Saleem wrote: > > > > > On Mon, May 29, 2017 at 11:24:23AM +0300, Leon Romanovsky wrote: > > > > > > From: Leon Romanovsky <leonro@melalnox.com> > > > > > > > > > > > > > 3. The commit cea05eadde made libnl library (basic block of user-space part of netlink) > > > > to work incorrectly and not according to _blocking/_nonblocking semantics. > > > > > > How? Is libnl calling ibnl_unicast? As far we can understand ibnl_unicast is only called > > > by portmapper kernel code. > > > > Yes, libnl is calling to ibnl_unicast() and this is why it brought my > > attention to regression caused by commit which I'm reverting. > > > > Bottom-up flow: > > ibnl_unicast > > send_nlmsg_done > > iwpm_send_mapinfo > > iwpm_mapping_info_cb > > [RDMA_NL_IWPM_MAPINFO] = {.dump = iwpm_mapping_info_cb} > > ibnl_rcv_msg > > netlink_rcv_skb(skb, &ibnl_rcv_msg); > > ----- > > libnl: > > nl_recvmsgs() > > Assuming your stack trace is correct, it is showing the normal use case of ibnl_unicate() > by portmapper from the iWARP side. I'm still waiting on evidence of this patch breaking > rest of RDMA as you so claimed. I think that I understand the differences between our views on the subject. The main difference is if iWARP portmapper is part of large ecosystem (my view) or some standalone feature (your view). I see iWARP as an integral part of IB core and I do expect that all consumers of NETLINK_RDMA will be complied to libnl library and work in similar manner to whole network stack. I do expect that newcomer won't learn in hard way that his non-blocking calls to netdev NETLINKs and NETLINK_RDMA work except for iWARP portmapper part. > > > > > 4. Reverting is a common practice in Linux kernel. Patches are not > > > > carved in stones. > > > > > > Reverting a patch that's introduced during RC cycle is fine, introducing > > > regression is NOT and that is what you are doing by simply proposing to revert > > > this patch. Reverting this patch will introduce a REGRESSION error with respect to > > > port mapping functionality for all iWARP vendors. > > > > Interesting and how did all these iWARP vendors survive before your > > patch? > > We are all still here. :-) But if we let you simply revert a patch that fixes portmapper > for all iWARP vendors then we may be not. It is not really fixing but hiding, and it is not related to vendors which want or don't want, it is related to the community and to the right infrastructure for everyone, so everyone will benefit from it. > > > > > 5. I proposed a solution -> go and fix your user space program. > > > This is a kernel patch you are trying to revert, you are breaking existing > > > kernel functionality. Nothing to do with user space. > > Here is a better solution. Post a patch to the kernel that will not introduce a regression > and fix whatever the probelm you think there is, then I will personally review the patch. You got proposal, which is not related to kernel. > > Chien
On Fri, Jun 02, 2017 at 11:28:49AM -0500, Shiraz Saleem wrote: > On Wed, May 31, 2017 at 02:10:31PM -0600, Bart Van Assche wrote: > > On Wed, 2017-05-31 at 12:42 -0500, Shiraz Saleem wrote: > > > > 5. I proposed a solution -> go and fix your user space program. > > > > > > This is a kernel patch you are trying to revert, you are breaking existing > > > kernel functionality. Nothing to do with user space. > > > > > > Bottom line, come up with a solution that will address both port mapper > > > functionality and your issue. > > > > Hello Shiraz, > > > > Sorry that this means additional work for you, but I agree with Leon that > > user space software should not assume that netlink sockets are a reliable > > communication mechanism. > > Hi Bart - Thank you for your response. > > The original problem was that ibnl_unicast, which is used to send nl messages from > portmapper kernel space to user-space, would occasionally and momentarily fail under stress. > We could have retried the call for a certain amount of time, but since netlink_unicast has a > nonblock/block parameter, we chose to use the blocking option with a timeout. So we thought we > did account for deadlocks with this timeout. Not really, you just reduced the chances. In very large scale, you will have a very large chances of such deadlocks. Thanks > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
> I think that I understand the differences between our views on the subject. The main > difference is if iWARP portmapper is part of large ecosystem (my view) or some > standalone feature (your view). Please do not speak for me. Portmapper is a part of the ecosystem but the only user of it is on the iWARP side. It is no different from iw_cm which is only used from iWARP side. > I see iWARP as an integral part of IB core and I do expect that all consumers of > NETLINK_RDMA will be complied to libnl library and work in similar manner to whole > network stack. I do expect that newcomer won't learn in hard way that > his non-blocking calls to netdev NETLINKs and NETLINK_RDMA work except > for iWARP portmapper part. You are trying to generalize portmapper into something which it is not to justify your argument. Please stop this line of argument. It won't work on me. > > > > > 4. Reverting is a common practice in Linux kernel. Patches are not > > > > > carved in stones. > > > > > > > > Reverting a patch that's introduced during RC cycle is fine, introducing > > > > regression is NOT and that is what you are doing by simply proposing to revert > > > > this patch. Reverting this patch will introduce a REGRESSION error with respect to > > > > port mapping functionality for all iWARP vendors. > > > > > > Interesting and how did all these iWARP vendors survive before your > > > patch? > > > > We are all still here. :-) But if we let you simply revert a patch that fixes portmapper > > for all iWARP vendors then we may be not. > > It is not really fixing but hiding, and it is not related to vendors which want > or don't want, it is related to the community and to the right infrastructure for > everyone, so everyone will benefit from it. Again, over generalization. > > > > > 5. I proposed a solution -> go and fix your user space program. > > > > This is a kernel patch you are trying to revert, you are breaking existing > > > > kernel functionality. Nothing to do with user space. > > > > Here is a better solution. Post a patch to the kernel that will not introduce a regression > > and fix whatever the probelm you think there is, then I will personally review the patch. > > You got proposal, which is not related to kernel. Still waiting on a usable patch from you, no more unsubstantiated arguments from you. Chien -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Sun, Jun 04, 2017 at 08:36:35AM +0300, Leon Romanovsky wrote: > On Fri, Jun 02, 2017 at 11:28:49AM -0500, Shiraz Saleem wrote: > > On Wed, May 31, 2017 at 02:10:31PM -0600, Bart Van Assche wrote: > > > On Wed, 2017-05-31 at 12:42 -0500, Shiraz Saleem wrote: > > > > > 5. I proposed a solution -> go and fix your user space program. > > > > > > > > This is a kernel patch you are trying to revert, you are breaking existing > > > > kernel functionality. Nothing to do with user space. > > > > > > > > Bottom line, come up with a solution that will address both port mapper > > > > functionality and your issue. > > > > > > Hello Shiraz, > > > > > > Sorry that this means additional work for you, but I agree with Leon that > > > user space software should not assume that netlink sockets are a reliable > > > communication mechanism. > > > > Hi Bart - Thank you for your response. > > > > The original problem was that ibnl_unicast, which is used to send nl messages from > > portmapper kernel space to user-space, would occasionally and momentarily fail under stress. > > We could have retried the call for a certain amount of time, but since netlink_unicast has a > > nonblock/block parameter, we chose to use the blocking option with a timeout. So we thought we > > did account for deadlocks with this timeout. > > Not really, you just reduced the chances. In very large scale, you will > have a very large chances of such deadlocks. Please stop using the word deadlock until you can prove that the deadlock exists with the timeout in place. Chien -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Jun 04, 2017 at 09:23:13PM -0500, Chien Tin Tung wrote: > Sun, Jun 04, 2017 at 08:36:35AM +0300, Leon Romanovsky wrote: > > On Fri, Jun 02, 2017 at 11:28:49AM -0500, Shiraz Saleem wrote: > > > On Wed, May 31, 2017 at 02:10:31PM -0600, Bart Van Assche wrote: > > > > On Wed, 2017-05-31 at 12:42 -0500, Shiraz Saleem wrote: > > > > > > 5. I proposed a solution -> go and fix your user space program. > > > > > > > > > > This is a kernel patch you are trying to revert, you are breaking existing > > > > > kernel functionality. Nothing to do with user space. > > > > > > > > > > Bottom line, come up with a solution that will address both port mapper > > > > > functionality and your issue. > > > > > > > > Hello Shiraz, > > > > > > > > Sorry that this means additional work for you, but I agree with Leon that > > > > user space software should not assume that netlink sockets are a reliable > > > > communication mechanism. > > > > > > Hi Bart - Thank you for your response. > > > > > > The original problem was that ibnl_unicast, which is used to send nl messages from > > > portmapper kernel space to user-space, would occasionally and momentarily fail under stress. > > > We could have retried the call for a certain amount of time, but since netlink_unicast has a > > > nonblock/block parameter, we chose to use the blocking option with a timeout. So we thought we > > > did account for deadlocks with this timeout. > > > > Not really, you just reduced the chances. In very large scale, you will > > have a very large chances of such deadlocks. > > Please stop using the word deadlock until you can prove that the deadlock exists with the timeout > in place. Can you please post the whole list of forbidden words? It will be great to have it accompanied with technical response to my and Bart's claims, and to summarize it, it is very simple: "netlink receive should be non-blocking and asynchronous". It will allow for all libnl functions, like nl_socket_set_nonblocking, nl_recvmsgsa and nl_wait_for_ack, work correctly with ALL NETLINK_RDMA clients. Thanks > > Chien > >
Mon, Jun 05, 2017 at 07:00:30AM +0300, Leon Romanovsky wrote: > On Sun, Jun 04, 2017 at 09:23:13PM -0500, Chien Tin Tung wrote: > > Sun, Jun 04, 2017 at 08:36:35AM +0300, Leon Romanovsky wrote: > > > On Fri, Jun 02, 2017 at 11:28:49AM -0500, Shiraz Saleem wrote: > > > > On Wed, May 31, 2017 at 02:10:31PM -0600, Bart Van Assche wrote: > > > > > On Wed, 2017-05-31 at 12:42 -0500, Shiraz Saleem wrote: > > > > > > > 5. I proposed a solution -> go and fix your user space program. > > > > > > > > > > > > This is a kernel patch you are trying to revert, you are breaking existing > > > > > > kernel functionality. Nothing to do with user space. > > > > > > > > > > > > Bottom line, come up with a solution that will address both port mapper > > > > > > functionality and your issue. > > > > > > > > > > Hello Shiraz, > > > > > > > > > > Sorry that this means additional work for you, but I agree with Leon that > > > > > user space software should not assume that netlink sockets are a reliable > > > > > communication mechanism. > > > > > > > > Hi Bart - Thank you for your response. > > > > > > > > The original problem was that ibnl_unicast, which is used to send nl messages from > > > > portmapper kernel space to user-space, would occasionally and momentarily fail under stress. > > > > We could have retried the call for a certain amount of time, but since netlink_unicast has a > > > > nonblock/block parameter, we chose to use the blocking option with a timeout. So we thought we > > > > did account for deadlocks with this timeout. > > > > > > Not really, you just reduced the chances. In very large scale, you will > > > have a very large chances of such deadlocks. > > > > Please stop using the word deadlock until you can prove that the deadlock exists with the timeout > > in place. > > Can you please post the whole list of forbidden words? It will be great to > have it accompanied with technical response to my and Bart's claims, and > to summarize it, it is very simple: "netlink receive should be > non-blocking and asynchronous". Non-blocking and asynchronous is not deadlock, please get it right. Again, provide proof of deadlock, until then refrain from using that word in support of your argument. Which you still have not proved there is a problem. Chien -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
You jump around in this thread so much it is hard for a sane person to follow so I will attempt to summarize what has taken place. You try to revert a patch that fixed a real problem in portmapper, claiming: 1) Code in question impacts the whole RDMA subsystem. which is false by my multiple replies on this. 2) There is a deadlock. Which is false. I'm still asking for proof. 3) you want netlink receive to be non-blocking and asynchronous what does that have to do with the non-existing deadlock? Asnwer is you can't when there isn't one. If you want it, create a patch for it instead of creating a regression with a lazy revert. I will continue this discussion if you answer directly to any of those points. If you choose to dance around the subject and claim falsehood, you will only damage your own creditibility on the list. I hope you take that to heart. Chien On Sun, Jun 04, 2017 at 11:20:07PM -0500, Chien Tin Tung wrote: > Mon, Jun 05, 2017 at 07:00:30AM +0300, Leon Romanovsky wrote: > > On Sun, Jun 04, 2017 at 09:23:13PM -0500, Chien Tin Tung wrote: > > > Sun, Jun 04, 2017 at 08:36:35AM +0300, Leon Romanovsky wrote: > > > > On Fri, Jun 02, 2017 at 11:28:49AM -0500, Shiraz Saleem wrote: > > > > > On Wed, May 31, 2017 at 02:10:31PM -0600, Bart Van Assche wrote: > > > > > > On Wed, 2017-05-31 at 12:42 -0500, Shiraz Saleem wrote: > > > > > > > > 5. I proposed a solution -> go and fix your user space program. > > > > > > > > > > > > > > This is a kernel patch you are trying to revert, you are breaking existing > > > > > > > kernel functionality. Nothing to do with user space. > > > > > > > > > > > > > > Bottom line, come up with a solution that will address both port mapper > > > > > > > functionality and your issue. > > > > > > > > > > > > Hello Shiraz, > > > > > > > > > > > > Sorry that this means additional work for you, but I agree with Leon that > > > > > > user space software should not assume that netlink sockets are a reliable > > > > > > communication mechanism. > > > > > > > > > > Hi Bart - Thank you for your response. > > > > > > > > > > The original problem was that ibnl_unicast, which is used to send nl messages from > > > > > portmapper kernel space to user-space, would occasionally and momentarily fail under stress. > > > > > We could have retried the call for a certain amount of time, but since netlink_unicast has a > > > > > nonblock/block parameter, we chose to use the blocking option with a timeout. So we thought we > > > > > did account for deadlocks with this timeout. > > > > > > > > Not really, you just reduced the chances. In very large scale, you will > > > > have a very large chances of such deadlocks. > > > > > > Please stop using the word deadlock until you can prove that the deadlock exists with the timeout > > > in place. > > > > Can you please post the whole list of forbidden words? It will be great to > > have it accompanied with technical response to my and Bart's claims, and > > to summarize it, it is very simple: "netlink receive should be > > non-blocking and asynchronous". > > Non-blocking and asynchronous is not deadlock, please get it right. Again, provide proof of > deadlock, until then refrain from using that word in support of your argument. Which you still > have not proved there is a problem. > > Chien > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Jun 04, 2017 at 11:50:43PM -0500, Chien Tin Tung wrote: > You jump around in this thread so much it is hard for a sane person to follow > so I will attempt to summarize what has taken place. > > You try to revert a patch that fixed a real problem in portmapper, claiming: > > 1) Code in question impacts the whole RDMA subsystem. > which is false by my multiple replies on this. > 2) There is a deadlock. > Which is false. I'm still asking for proof. > 3) you want netlink receive to be non-blocking and asynchronous > what does that have to do with the non-existing deadlock? > Asnwer is you can't when there isn't one. > If you want it, create a patch for it instead of creating a > regression with a lazy revert. > > I will continue this discussion if you answer directly to any of those > points. If you choose to dance around the subject and claim falsehood, you > will only damage your own creditibility on the list. I hope you take that > to heart. OK, I got your point. It is worthless discussion. FYI, ibnl_unicast holds global lock for whole NETLINK_RDMA static void ibnl_rcv(struct sk_buff *skb) { mutex_lock(&ibnl_mutex); ibnl_rcv_reply_skb(skb); netlink_rcv_skb(skb, &ibnl_rcv_msg); mutex_unlock(&ibnl_mutex); } I'll wait for a maintainer's decision on the proposed patch. Thanks > > Chien > > On Sun, Jun 04, 2017 at 11:20:07PM -0500, Chien Tin Tung wrote: > > Mon, Jun 05, 2017 at 07:00:30AM +0300, Leon Romanovsky wrote: > > > On Sun, Jun 04, 2017 at 09:23:13PM -0500, Chien Tin Tung wrote: > > > > Sun, Jun 04, 2017 at 08:36:35AM +0300, Leon Romanovsky wrote: > > > > > On Fri, Jun 02, 2017 at 11:28:49AM -0500, Shiraz Saleem wrote: > > > > > > On Wed, May 31, 2017 at 02:10:31PM -0600, Bart Van Assche wrote: > > > > > > > On Wed, 2017-05-31 at 12:42 -0500, Shiraz Saleem wrote: > > > > > > > > > 5. I proposed a solution -> go and fix your user space program. > > > > > > > > > > > > > > > > This is a kernel patch you are trying to revert, you are breaking existing > > > > > > > > kernel functionality. Nothing to do with user space. > > > > > > > > > > > > > > > > Bottom line, come up with a solution that will address both port mapper > > > > > > > > functionality and your issue. > > > > > > > > > > > > > > Hello Shiraz, > > > > > > > > > > > > > > Sorry that this means additional work for you, but I agree with Leon that > > > > > > > user space software should not assume that netlink sockets are a reliable > > > > > > > communication mechanism. > > > > > > > > > > > > Hi Bart - Thank you for your response. > > > > > > > > > > > > The original problem was that ibnl_unicast, which is used to send nl messages from > > > > > > portmapper kernel space to user-space, would occasionally and momentarily fail under stress. > > > > > > We could have retried the call for a certain amount of time, but since netlink_unicast has a > > > > > > nonblock/block parameter, we chose to use the blocking option with a timeout. So we thought we > > > > > > did account for deadlocks with this timeout. > > > > > > > > > > Not really, you just reduced the chances. In very large scale, you will > > > > > have a very large chances of such deadlocks. > > > > > > > > Please stop using the word deadlock until you can prove that the deadlock exists with the timeout > > > > in place. > > > > > > Can you please post the whole list of forbidden words? It will be great to > > > have it accompanied with technical response to my and Bart's claims, and > > > to summarize it, it is very simple: "netlink receive should be > > > non-blocking and asynchronous". > > > > Non-blocking and asynchronous is not deadlock, please get it right. Again, provide proof of > > deadlock, until then refrain from using that word in support of your argument. Which you still > > have not proved there is a problem. > > > > Chien > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2017-06-02 at 11:28 -0500, Shiraz Saleem wrote: > On Wed, May 31, 2017 at 02:10:31PM -0600, Bart Van Assche wrote: > > > > On Wed, 2017-05-31 at 12:42 -0500, Shiraz Saleem wrote: > > > > > > > > > > > 5. I proposed a solution -> go and fix your user space program. > > > > > > This is a kernel patch you are trying to revert, you are breaking > > > existing > > > kernel functionality. Nothing to do with user space. > > > > > > Bottom line, come up with a solution that will address both port > > > mapper > > > functionality and your issue. > > > > Hello Shiraz, > > > > Sorry that this means additional work for you, but I agree with > > Leon that > > user space software should not assume that netlink sockets are a > > reliable > > communication mechanism. > > Hi Bart - Thank you for your response. > > The original problem was that ibnl_unicast, which is used to send nl > messages from > portmapper kernel space to user-space, would occasionally and > momentarily fail under stress. > We could have retried the call for a certain amount of time, but > since netlink_unicast has a > nonblock/block parameter, we chose to use the blocking option with a > timeout. So we thought we > did account for deadlocks with this timeout. Just to be clear, replacing a non-blocking and occasionally dropping function with a blocking function but with a timeout does not actually solve the problem, it merely moves the goal post out. It is entirely possible that you will have the problem again given sufficient load.
On Sun, 2017-06-04 at 08:36 +0300, Leon Romanovsky wrote: > On Fri, Jun 02, 2017 at 11:28:49AM -0500, Shiraz Saleem wrote: > > > > On Wed, May 31, 2017 at 02:10:31PM -0600, Bart Van Assche wrote: > > > > > > On Wed, 2017-05-31 at 12:42 -0500, Shiraz Saleem wrote: > > > > > > > > > > > > > > 5. I proposed a solution -> go and fix your user space > > > > > program. > > > > > > > > This is a kernel patch you are trying to revert, you are > > > > breaking existing > > > > kernel functionality. Nothing to do with user space. > > > > > > > > Bottom line, come up with a solution that will address both > > > > port mapper > > > > functionality and your issue. > > > > > > Hello Shiraz, > > > > > > Sorry that this means additional work for you, but I agree with > > > Leon that > > > user space software should not assume that netlink sockets are a > > > reliable > > > communication mechanism. > > > > Hi Bart - Thank you for your response. > > > > The original problem was that ibnl_unicast, which is used to send > > nl messages from > > portmapper kernel space to user-space, would occasionally and > > momentarily fail under stress. > > We could have retried the call for a certain amount of time, but > > since netlink_unicast has a > > nonblock/block parameter, we chose to use the blocking option with > > a timeout. So we thought we > > did account for deadlocks with this timeout. > > Not really, you just reduced the chances. In very large scale, you > will > have a very large chances of such deadlocks. Since it has a timeout, it's not a deadlock, it's a state consistency failure. Once it happens, the user space and kernel space port maps will no longer be in sync.
On Sun, 2017-06-04 at 21:23 -0500, Chien Tin Tung wrote: > Sun, Jun 04, 2017 at 08:36:35AM +0300, Leon Romanovsky wrote: > > > > On Fri, Jun 02, 2017 at 11:28:49AM -0500, Shiraz Saleem wrote: > > > > > > On Wed, May 31, 2017 at 02:10:31PM -0600, Bart Van Assche wrote: > > > > > > > > On Wed, 2017-05-31 at 12:42 -0500, Shiraz Saleem wrote: > > > > > > > > > > > > > > > > > 5. I proposed a solution -> go and fix your user space > > > > > > program. > > > > > > > > > > This is a kernel patch you are trying to revert, you are > > > > > breaking existing > > > > > kernel functionality. Nothing to do with user space. > > > > > > > > > > Bottom line, come up with a solution that will address both > > > > > port mapper > > > > > functionality and your issue. > > > > > > > > Hello Shiraz, > > > > > > > > Sorry that this means additional work for you, but I agree with > > > > Leon that > > > > user space software should not assume that netlink sockets are > > > > a reliable > > > > communication mechanism. > > > > > > Hi Bart - Thank you for your response. > > > > > > The original problem was that ibnl_unicast, which is used to send > > > nl messages from > > > portmapper kernel space to user-space, would occasionally and > > > momentarily fail under stress. > > > We could have retried the call for a certain amount of time, but > > > since netlink_unicast has a > > > nonblock/block parameter, we chose to use the blocking option > > > with a timeout. So we thought we > > > did account for deadlocks with this timeout. > > > > Not really, you just reduced the chances. In very large scale, you > > will > > have a very large chances of such deadlocks. > > Please stop using the word deadlock until you can prove that the > deadlock exists with the timeout > in place. He doesn't need to use the word deadlock for you to know that if you have a non-blocking function that is failing under load, and then you replace it with a blocking function but with a timeout, then it can also fail under load, and therefore you have not really solved the problem.
On Wed, 2017-05-31 at 12:42 -0500, Shiraz Saleem wrote: > On Wed, May 31, 2017 at 07:04:37AM +0300, Leon Romanovsky wrote: > > > > 2. The commit cea05eadde broke NETLINK semantics for whole RDMA. > > How did it break? What's the issue? As I understand it, and I'll let Leon correct me if I'm wrong, the rest of the linux kernel's NETLINK interfaces all have a standard they follow: When you open a netlink socket, it is by default asynchronous and non- blocking. In order to have it any other way, you must issue the standard libnl commands to switch the socket to the other behavior. Leon's argument is that because of the problem iw_portmapper was having, instead of adding the 3 or 4 lines to iw_portmapper so that it would issue the necessary and typical calls to libnl to change the socket semantics, this patch changed the default symantics for the RDMA subsystems netlink sockets. The point, then, is that this is not normal accepted practice for solving a problem. Normally, you would fix the user space application. If someone is used to programming network side netlink interfaces, and they then get a new job and end up programming rdma side netlink interfaces, they are going to find this difference confusing, to say the least. We will have broken the normal netlink API for the RDMA subsystem in the name of not fixing a broken user space application. Does that summarize your argument Leon? FWIW, I would take this issue very seriously. Having a default netlink socket that behaves one way if it happens to belong to the network subsystem and a default netlink socket that behaves another way if it belongs to the RDMA subsystem is just flat wrong. We should never create an ambiguous API like that. And if we broke it and made it ambiguous, then it absolutely needs fixed. > > > > 3. The commit cea05eadde made libnl library (basic block of user- > > space part of netlink) > > to work incorrectly and not according to _blocking/_nonblocking > > semantics. > > How? Is libnl calling ibnl_unicast? As far we can understand > ibnl_unicast is only called > by portmapper kernel code. Let's not forget that Leon is working on a new tool, similar to the ip command from iproute2, for configuring the rdma subsystem. Feedback has already established that it needs to be done all via netlink. So, the RDMA subsystem's current use of netlink is but a drop in a bucket compared to what it will be and where it's going. It is highly likely that Leon stumbled across this issue as he was working on this tool and started needing to deal with the RDMA subsystem's netlink support. > > 4. Reverting is a common practice in Linux kernel. Patches are not > > carved in stones. > > Reverting a patch that's introduced during RC cycle is fine, > introducing > regression is NOT and that is what you are doing by simply proposing > to revert > this patch. Reverting this patch will introduce a REGRESSION error > with respect to > port mapping functionality for all iWARP vendors. Reverting buggy patches, however, is not considered a regression, but a bug fix. This is a bit of a special case in that, if Leon's arguments are right, this is both a bug fix and a regression. It means that there might need to be a flag day to push out a new, fixed iw_portmapper to people, followed by the revert. But just because buggy code is in use does not mean it gets to stay in use. > > > > 5. I proposed a solution -> go and fix your user space program. > This is a kernel patch you are trying to revert, you are breaking > existing > kernel functionality. Nothing to do with user space. It absolutely has to do with user space. If this is an issue that should have never been fixed via kernel space in the first place and should have been resolved in user space all along by making iw_portmapper treat the RDMA netlink socket just like any network netlink socket would need to be treated, then Leon is right, iw_portmapper needs fixed and this needs reverted. > Bottom line, come up with a solution that will address both port > mapper > functionality and your issue. No. If he's right (and I plan to investigate whether he is) that the normal netlink socket semantic is asynchronous and non-blocking by default, and you normally use libnl commands to migrate the socket to blocking/synchronous once the socket is opened, then this patch will eventually be reverted and you absolutely need to fix iw_portmapper.
On Mon, 2017-06-05 at 09:03 +0300, Leon Romanovsky wrote: > On Sun, Jun 04, 2017 at 11:50:43PM -0500, Chien Tin Tung wrote: > > > > You jump around in this thread so much it is hard for a sane person > > to follow > > so I will attempt to summarize what has taken place. > > > > You try to revert a patch that fixed a real problem in portmapper, > > claiming: > > > > 1) Code in question impacts the whole RDMA subsystem. > > which is false by my multiple replies on this. I haven't seen this as false. Right now the rdma subsystem's netlink usage is pretty small. In the future it will be much larger. Changing the default semantics of a newly opened RDMA netlink socket does in fact impact all of the RDMA subsystem's current and future netlink sockets. So, it *does* impact the whole RDMA subsystem, at least as far as their netlink sockets are concerned. > > 2) There is a deadlock. > > Which is false. I'm still asking for proof. I'm not sure of the deadlock, but as I already pointed out, your fix is not a guaranteed fix. It never was. It just moved the goal post of how hard you would have to push the subsystem before it would fail. > > 3) you want netlink receive to be non-blocking and asynchronous > > what does that have to do with the non-existing deadlock? Nothing, it's a separate point and need not be tied to a deadlock. It stands on its own merits. > > Asnwer is you can't when there isn't one. > > If you want it, create a patch for it instead of creating a > > regression with a lazy revert. If the original patch broke our semantics so that we have created an ambiguous API for netlink sockets in general, which I will investigate, then this patch is buggy, whether it fixed a problem for you or not, and will have to be reverted eventually. We can coordinate with doing the revert after you've fixed iw_portmapper, but an ambiguous API will not be allowed to remain. See my other emails in this thread. > > I will continue this discussion if you answer directly to any of > > those > > points. If you choose to dance around the subject and claim > > falsehood, you > > will only damage your own creditibility on the list. I hope you > > take that > > to heart. Leon is not a native english speaker, as I assume you aren't. I would suggest that it might be more about failure to understand each other and failure to get at the root of the issue than anything that would legitimately do damage to anyone's credibility. You might wish to calm down and read my posts in this thread. I hope I was able to clarify things a bit. > OK, I got your point. It is worthless discussion. > > FYI, ibnl_unicast holds global lock for whole NETLINK_RDMA > static void ibnl_rcv(struct sk_buff *skb) > { > mutex_lock(&ibnl_mutex); > ibnl_rcv_reply_skb(skb); > netlink_rcv_skb(skb, &ibnl_rcv_msg); > mutex_unlock(&ibnl_mutex); > } > > I'll wait for a maintainer's decision on the proposed patch. I should be able to finish my investigation by tomorrow (I have two appointments today that will make this a short day for me, unfortunately :-/).
On Mon, Jun 05, 2017 at 10:55:41AM -0400, Doug Ledford wrote: > On Wed, 2017-05-31 at 12:42 -0500, Shiraz Saleem wrote: > > On Wed, May 31, 2017 at 07:04:37AM +0300, Leon Romanovsky wrote: > > > > > > 2. The commit cea05eadde broke NETLINK semantics for whole RDMA. > > > > How did it break? What's the issue? > > As I understand it, and I'll let Leon correct me if I'm wrong, the rest > of the linux kernel's NETLINK interfaces all have a standard they > follow: > > When you open a netlink socket, it is by default asynchronous and non- > blocking. In order to have it any other way, you must issue the > standard libnl commands to switch the socket to the other behavior. > Leon's argument is that because of the problem iw_portmapper was > having, instead of adding the 3 or 4 lines to iw_portmapper so that it > would issue the necessary and typical calls to libnl to change the > socket semantics, this patch changed the default symantics for the RDMA > subsystems netlink sockets. > > The point, then, is that this is not normal accepted practice for > solving a problem. Normally, you would fix the user space application. > If someone is used to programming network side netlink interfaces, and > they then get a new job and end up programming rdma side netlink > interfaces, they are going to find this difference confusing, to say > the least. We will have broken the normal netlink API for the RDMA > subsystem in the name of not fixing a broken user space application. > > Does that summarize your argument Leon? Thanks Doug, And to add, it is not only NETLINK_RDMA behaviour vs. all other NETLINKs. It is inconsistency inside NETLINK_RDMA family too. iWARP portmapper is blocking, while rest is non-blocking. > > FWIW, I would take this issue very seriously. Having a default netlink > socket that behaves one way if it happens to belong to the network > subsystem and a default netlink socket that behaves another way if it > belongs to the RDMA subsystem is just flat wrong. We should never > create an ambiguous API like that. And if we broke it and made it > ambiguous, then it absolutely needs fixed. > > > > > > > 3. The commit cea05eadde made libnl library (basic block of user- > > > space part of netlink) > > > to work incorrectly and not according to _blocking/_nonblocking > > > semantics. > > > > How? Is libnl calling ibnl_unicast? As far we can understand > > ibnl_unicast is only called > > by portmapper kernel code. > > Let's not forget that Leon is working on a new tool, similar to the ip > command from iproute2, for configuring the rdma subsystem. Feedback > has already established that it needs to be done all via netlink. So, > the RDMA subsystem's current use of netlink is but a drop in a bucket > compared to what it will be and where it's going. It is highly likely > that Leon stumbled across this issue as he was working on this tool and > started needing to deal with the RDMA subsystem's netlink support. Right, And I started from refactoring and cleaning the code. https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=topic/rdmatool-rfc-v1 Currently work in progress and I'm trying to minimize "the damage" as much as possible, but future work will include delete of a lot of dead and useless code in iwarp (the whole registration logic is complete trash and can be replaced by very small number of lines), introduce nla_policy to support properly annotated netlink attributes, get rid of custom functions to build messages and many more. > > > > 4. Reverting is a common practice in Linux kernel. Patches are not > > > carved in stones. > > > > Reverting a patch that's introduced during RC cycle is fine, > > introducing > > regression is NOT and that is what you are doing by simply proposing > > to revert > > this patch. Reverting this patch will introduce a REGRESSION error > > with respect to > > port mapping functionality for all iWARP vendors. > > Reverting buggy patches, however, is not considered a regression, but a > bug fix. This is a bit of a special case in that, if Leon's arguments > are right, this is both a bug fix and a regression. It means that > there might need to be a flag day to push out a new, fixed > iw_portmapper to people, followed by the revert. But just because > buggy code is in use does not mean it gets to stay in use. > > > > > > > 5. I proposed a solution -> go and fix your user space program. > > This is a kernel patch you are trying to revert, you are breaking > > existing > > kernel functionality. Nothing to do with user space. > > It absolutely has to do with user space. If this is an issue that > should have never been fixed via kernel space in the first place and > should have been resolved in user space all along by making > iw_portmapper treat the RDMA netlink socket just like any network > netlink socket would need to be treated, then Leon is right, > iw_portmapper needs fixed and this needs reverted. > > > Bottom line, come up with a solution that will address both port > > mapper > > functionality and your issue. > > No. If he's right (and I plan to investigate whether he is) that the > normal netlink socket semantic is asynchronous and non-blocking by > default, and you normally use libnl commands to migrate the socket to > blocking/synchronous once the socket is opened, then this patch will > eventually be reverted and you absolutely need to fix iw_portmapper. > > -- > Doug Ledford <dledford@redhat.com> > GPG KeyID: B826A3330E572FDD > > Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD >
On Mon, Jun 05, 2017 at 11:08:16AM -0400, Doug Ledford wrote: > On Mon, 2017-06-05 at 09:03 +0300, Leon Romanovsky wrote: > > On Sun, Jun 04, 2017 at 11:50:43PM -0500, Chien Tin Tung wrote: > Leon is not a native english speaker, as I assume you aren't. I would > suggest that it might be more about failure to understand each other > and failure to get at the root of the issue than anything that would > legitimately do damage to anyone's credibility. You might wish to calm > down and read my posts in this thread. I hope I was able to clarify > things a bit. Alright, this is my attempt at world peace. I started with a long email but decided to keep it short. Please ask for details if you have questions and remember to keep an open mind. Here are some links for those interested in understanding the background leading to this thread: * Tatyana Nikolova's original commit 30dc5e63d6a5 which introduced ibnl_unicast() * Leon's comment to Mustafa's original proposal - https://patchwork.kernel.org/patch/9235137/ * Mustafa Ismail's accepted patch, cea05eadded0d, to ibnl_unicast() Netlink is fundamentally sockets. You can substitute the word socket for Netlink when you try to get a sense of Netlink functionality. In the RDMA subsystem, we use Netlink to communicate between user-space and kernel. Current controversy surrounds ibnl_unicast(). It is used to send NL message from the kernel to the user and the NL flow can be triggered from both user-space and kernel code. ibnl_unicast() uses netlink_unicast with the following signature: int netlink_unicast(struct sock *ssk, struct sk_buff *skb, u32 portid, int nonblock) The last parameter, nonblock, is important. It allows the caller to control the call to wait and retry on failure, provided a timeout value is set, or not block even with a timeout value set. The actual mechanism is very elegant and can be used to solve the issue at hand. Let me go back to the problem Mustafa tried to solve. Specifically it is an issue with the call to iwpm_add_mapping() failing. That function sends a NL message from the kernel to user-space portmapper daemon using the kernel NL socket. Mustafa originally proposed to have two ibnl_unicast functions, one blocking and the other one non-blocking. It was shot down by Leon. In hindsight, we should of pushed ahead and implemented those two functions. The blocking function for iwpm_add_mapping() and the non-blocking function for any NL interaction from user-space. This solution can still be implemented solving everyone's problem and bring world peace. :-) Chien -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/core/netlink.c b/drivers/infiniband/core/netlink.c index b784055423c8..fcc9702efd38 100644 --- a/drivers/infiniband/core/netlink.c +++ b/drivers/infiniband/core/netlink.c @@ -230,10 +230,7 @@ static void ibnl_rcv(struct sk_buff *skb) int ibnl_unicast(struct sk_buff *skb, struct nlmsghdr *nlh, __u32 pid) { - int err; - - err = netlink_unicast(nls, skb, pid, 0); - return (err < 0) ? err : 0; + return nlmsg_unicast(nls, skb, pid); } EXPORT_SYMBOL(ibnl_unicast); @@ -256,7 +253,6 @@ int __init ibnl_init(void) return -ENOMEM; } - nls->sk_sndtimeo = 10 * HZ; return 0; }