Message ID | 20170510072627.12060-1-hch@lst.de (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
On Wed, 2017-05-10 at 09:26 +0200, Christoph Hellwig wrote: > The driver has a lot of quality issues due to the lack of RDMA-side > review, and explicitly bypasses APIs to register all memory once a > connection is made, and thus allows remote access to memoery. > > Mark it as broken until at least that part is fixed. Since this is the only way to get the BROKEN marker in the v4.11 stable kernel series: Acked-by: Bart Van Assche <bart.vanassche@sandisk.com>-- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Dave, this patch has not been superceeded by anything, can you explain why it has been marked as such in patchworks? On Thu, May 11, 2017 at 02:57:43PM +0000, Bart Van Assche wrote: > On Wed, 2017-05-10 at 09:26 +0200, Christoph Hellwig wrote: > > The driver has a lot of quality issues due to the lack of RDMA-side > > review, and explicitly bypasses APIs to register all memory once a > > connection is made, and thus allows remote access to memoery. > > > > Mark it as broken until at least that part is fixed. > > Since this is the only way to get the BROKEN marker in the v4.11 stable > kernel series: > > Acked-by: Bart Van Assche <bart.vanassche@sandisk.com>---end quoted text--- -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Christoph Hellwig <hch@lst.de> Date: Sun, 14 May 2017 07:58:48 +0200 > this patch has not been superceeded by anything, can you explain why > it has been marked as such in patchworks? I think you're being overbearing by requiring this to be marked BROKEN and I would like you to explore other ways with the authors to fix whatever perceived problems you think SMC has. You claim that this is somehow "urgent" is false. You can ask distributions to disable SMC or whatever in the short term if it reallly, truly, bothers you. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, 2017-05-14 at 11:51 -0400, David Miller wrote: > From: Christoph Hellwig <hch@lst.de> > Date: Sun, 14 May 2017 07:58:48 +0200 > > > this patch has not been superceeded by anything, can you explain why > > it has been marked as such in patchworks? > > I think you're being overbearing by requiring this to be marked BROKEN > and I would like you to explore other ways with the authors to fix > whatever perceived problems you think SMC has. > > You claim that this is somehow "urgent" is false. You can ask > distributions to disable SMC or whatever in the short term if it > reallly, truly, bothers you. Hello Dave, There is agreement that the user-space API for using the SMC protocol must be changed, namely by dropping AF_SMC and by making applications use the SMC protocol through socket(AF_INET..., SOCK_STREAM, ...). What is your plan to avoid that applications start using and depending on AF_SMC? Thanks, Bart.
From: Bart Van Assche <Bart.VanAssche@sandisk.com> Date: Sun, 14 May 2017 19:08:50 +0000 > What is your plan to avoid that applications start using and > depending on AF_SMC? The API is out there already so we are out of luck, and neither you nor I nor anyone else can "stop" this from happening. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Dave, > -----Original Message----- > From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma- > owner@vger.kernel.org] On Behalf Of David Miller > Sent: Sunday, May 14, 2017 7:44 PM > To: Bart.VanAssche@sandisk.com > Cc: hch@lst.de; netdev@vger.kernel.org; linux-rdma@vger.kernel.org; > stable@vger.kernel.org; ubraun@linux.vnet.ibm.com > Subject: Re: [PATCH] net/smc: mark as BROKEN due to remote memory > exposure > > From: Bart Van Assche <Bart.VanAssche@sandisk.com> > Date: Sun, 14 May 2017 19:08:50 +0000 > > > What is your plan to avoid that applications start using and depending > > on AF_SMC? > status = socket(AF_SMC, field, IPPROT_TCP); Here, - AF_SMC actually means AF_INET IPv4 addresses! - IPPROTO_TCP means TCP and RDMA both when socket is AF_SMC. - When creating socket addresses, use AF_INET based addresses. - When invoking bind(), listen(), connect() APIs, use AF_INET addresses instead. - Supporting IPv6 is TBD with AF_SMC sockets. - At user level get_addrinfo will continue to return AF_INET addresses. Such explanation for socket APIs doesn't sound correct. The primary motivation for SMC protocol was to simplify the applications and library to make use of RDMA. This kind of API is against such simplicity and creates more confusion. RFC only gives example and doesn't asks to create new socket family. I can provide more data, but a simple grep in get_addrinfo() and friend functions in user space has heavy dependence on AF_INET and AF_INET6. > The API is out there already so we are out of luck, and neither you nor I nor > anyone else can "stop" this from happening. I think it is still not too late to fix this API. SMC is released in v4.11 very recently. v4.12 is still not out. Given the limitation of protocol being RoCEv1 only, we might not have many users whose applications will stop functioning. (Which will anyway won't work for RoCEv2, and IPv6 addresses). I propose, (a) AF_SMC socket 43 can be marked reserved in future kernel versions to avoid use. (b) New protocol family that represents TCP and RDMA protocol, may be named IPPROTO_SMC even though it is not a protocol in IP header. We can possibly target to have this fix in 4.13 kernel timeframe. > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the > body of a message to majordomo@vger.kernel.org More majordomo info > at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Dave, >> this patch has not been superceeded by anything, can you explain why >> it has been marked as such in patchworks? > > I think you're being overbearing by requiring this to be marked BROKEN > and I would like you to explore other ways with the authors to fix > whatever perceived problems you think SMC has. Well, its not one's opinion, its a real problem. To be fair, this security breach existed in other RDMA based storage protocols for years in the past, but we cleaned it up completely. Our assumption is that *if* the user is willingly choosing to expose its entire physical address space to remote access to get some performance boost that's fine, but to have the kernel expose it by default, without even letting the user to control it is plain irresponsible. IMHO we should *not* repeat the mistakes of the past and set a higher bar for RDMA protocol implementations. > You claim that this is somehow "urgent" is false. You can ask > distributions to disable SMC or whatever in the short term if it > reallly, truly, bothers you. I doubt that is sufficient given that not all implementations out there rely on distros. I'm afraid one time is too much in this case. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, May 14, 2017 at 11:51:16AM -0400, David Miller wrote: > From: Christoph Hellwig <hch@lst.de> > Date: Sun, 14 May 2017 07:58:48 +0200 > > > this patch has not been superceeded by anything, can you explain why > > it has been marked as such in patchworks? > > I think you're being overbearing by requiring this to be marked BROKEN > and I would like you to explore other ways with the authors to fix > whatever perceived problems you think SMC has. Hi Dave, Christoph proposed very clear way to remove BROKEN tag, by providing secure environment by default to all users. All ULPs in RDMA and lustre in staging [1] are working in secure mode by default and I don't understand why SMC-R should be different. From my experience such "BROKEN" tag will help for authors to get proper priority from their managers to fix it in a first place, before they are moved to anything else. > > You claim that this is somehow "urgent" is false. You can ask > distributions to disable SMC or whatever in the short term if it > reallly, truly, bothers you. It is not distributions only. For example, RDMA stack is released by different providers. Mellanox OFED [2] is one of them and it is based on upstream kernel. Thanks [1] e0ccf5d085ab ("staging: lustre: ko2iblnd: Adapt to the removal of ib_get_dma_mr()") [2] http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Linus, I've added you to this thread. A quick synopsis: Dave sent you the net/smc driver for 4.11. Even though it lives in net/smc, it is, for the most part, a net<->rdma translator and so it is as much an RDMA driver as anything else. And upon review, the rdma community does not believe either the spec/rfc or the driver are the right way to engineer this particular technology, and the implementation leaves much to be desired. On Sun, 2017-05-14 at 20:44 -0400, David Miller wrote: > From: Bart Van Assche <Bart.VanAssche@sandisk.com> > Date: Sun, 14 May 2017 19:08:50 +0000 > > > What is your plan to avoid that applications start using and > > depending on AF_SMC? > > The API is out there already so we are out of luck, and neither > you nor I nor anyone else can "stop" this from happening. That's not true at all. There's nothing that says we can't revert this now before it goes any further. It's only been in two kernels, I'm positive it hasn't landed in any distros yet, and it can go back to being something people can add on the side. Futher, the "standard" this is based on is not a real standard, it's just a publication and has not been through a standard track. I wouldn't consider this "out there already" until there is a standard that has gone through the standard track. Regardless though, I'm rather purturbed about this entire thing. If you are right that because this got into 4.11, it's now a done deal, then the fact that this went through 4 review cycles on netdev@ that, as I understand it, spanned roughly one years time, and not one single person bothered to note that this was as much an RDMA driver as anything else, and not one person bothered to note that linux-rdma@ was not on the Cc: list, and not one person told the submitters that they needed to include linux-rdma@ on the Cc: list of these submissions, and you took it without any review comments from any RDMA people in the course of a year, or an ack from me to show that the RDMA portion of this had at least been given some sort of review, was a collosal fuckup of cross tree maintainer cooperation. The SMC driver makes several mistakes that people tried to avoid with previous RDMA standards, it only supports one out of the five possible link layers (iWARP, IB, OPA, RoCEv1, RoCEv2), it uses a highly discouraged and deprecated technique for memory registration that is considered horribly insecure (handing the keys to the castle to anyone who connects to the machine, aka, the entire memory space is registered with one key and that key is given to remote connections, so they can read any bit of kernel memory they want as opposed to whatever we tell them to read), and the design as articuled in the published rfc seems incomplete for dealing with any of the other link layers, indicating that this should have probably stayed out until the rfc was discussed and updated to address the shortcomings obviously present in the current rfc. With all of these issues outstanding against it, I hope you can see why I think the way I do about you taking it without ever consulting any of the RDMA community. But that leaves us with the question of what to do moving forward. Probably the number one concern is that this protocol chose to create a new AF as opposed to reusing the IPv4 and IPv6 address families and adding an option similar to SCTP for enabling the new protocol on a specific socket. The concern is that we have means of addressing all of the link layers the RDMA subsystem supports using IPv4 or IPv6 (sort of...it's possible to have IB or OPA without IPoIB, which leaves them without an IPv4 or IPv6 address, in which case the rdmacm can use native GUIDs to resolve the other side, but that only works for verbs connections, in the case of TCP connections, we always require IPoIB to be present, and so IPv4 or IPv6 is always sufficient). In the end, switching this protocol to use AF_INET and AF_INET6 and a protocol option to enable SMC may be what we need to do. That, of course, changes the user space API. So, are we truly locked in at this point? I would suggest that, since this is only present in 4.11 and 4.12, and I'm sure this has not landed in any distros as of yet (except maybe something like Fedora rawhide), we can submit a patch to both the current kernel and the 4.11 stable to set this code as CONFIG_EXPERIMENTAL and mark the API as possibly going to undergo change. Then let the RDMA community work with IBM to get this properly fixed so that this is a reasonable RDMA driver and not something the community is ready to immediately trash, and only after we've got it whipped into shape and the RDMA community is satisfied it is a reasonable driver that can continue to work with future planned RDMA subsystem updates and across various link layers, we remove the EXPERIMENTAL marker and freeze the API for user space.
From: Doug Ledford <dledford@redhat.com> Date: Tue, 16 May 2017 11:57:04 -0400 > Regardless though, I'm rather purturbed about this entire thing. If > you are right that because this got into 4.11, it's now a done deal, > then the fact that this went through 4 review cycles on netdev@ that, > as I understand it, spanned roughly one years time, and not one single > person bothered to note that this was as much an RDMA driver as > anything else, and not one person bothered to note that linux-rdma@ was > not on the Cc: list, and not one person told the submitters that they > needed to include linux-rdma@ on the Cc: list of these submissions, and > you took it without any review comments from any RDMA people in the > course of a year, or an ack from me to show that the RDMA portion of > this had at least been given some sort of review, was a collosal fuckup > of cross tree maintainer cooperation. We rely on people from various areas of expertiece to contribute to patch review on netdev and give appropriate feedback. If you actually look through the history, I made many semantic reviews of the SMC changes, and kept pushing back. And in fact I did this several times, making them go through several revisions, in the hopes that someone would review more of the meat and substance of the patch set. Nobody do this for over a year. I can't push back on people with silly coding style and small semantic issues forever. And I think I made a serious effort to keep the patches getting posted over and over again to make sure they got more exposure. I think it's unsettling that there are no RDMA experts, or at least people remotely knowledgable about this "networking" technology, subscribed to netdev and taking a cursory look at pactches that might be relevant and effect that technology either directly or indirectly. So there is a lot of blame to go around. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 16, 2017 at 12:29:23PM -0400, David Miller wrote: > I can't push back on people with silly coding style and small semantic > issues forever. And I think I made a serious effort to keep the > patches getting posted over and over again to make sure they got more > exposure. You can tell them to go to linux-rdma. I'm sending people to the right mailing list all the time. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Christoph Hellwig <hch@lst.de> Date: Tue, 16 May 2017 18:30:41 +0200 > On Tue, May 16, 2017 at 12:29:23PM -0400, David Miller wrote: >> I can't push back on people with silly coding style and small semantic >> issues forever. And I think I made a serious effort to keep the >> patches getting posted over and over again to make sure they got more >> exposure. > > You can tell them to go to linux-rdma. I'm sending people to the right > mailing list all the time. That doesn't cover things that don't directly touch the RDMA code or infiniband infrastructure. There should have been RDMA people on netdev who saw this thing and cried wolf, and they would have had about an entire year, and about 8 instances of this series being posted in which to do so. It's not like this got posted once or twice and went in with zero review. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 16, 2017 at 12:33:08PM -0400, David Miller wrote: > That doesn't cover things that don't directly touch the RDMA code or > infiniband infrastructure. > > There should have been RDMA people on netdev who saw this thing and > cried wolf, and they would have had about an entire year, and about > 8 instances of this series being posted in which to do so. > > It's not like this got posted once or twice and went in with zero > review. None outside of netdev. Really, if you get rdma patches include linux-rdma. Or at least linux-kernel where I will usually catch this sort of stuff. netdev is too high traffic and too far away from my area of work to watch it. But for example when I wrote my first RDMA ULP I did subsribe to linux-rdma, started extensive discussions and fixed up lots of core code. There is no excuse for other people to not even try. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2017-05-16 at 18:30 +0200, Christoph Hellwig wrote: > On Tue, May 16, 2017 at 12:29:23PM -0400, David Miller wrote: > > > > I can't push back on people with silly coding style and small > > semantic > > issues forever. And I think I made a serious effort to keep the > > patches getting posted over and over again to make sure they got > > more > > exposure. > > You can tell them to go to linux-rdma. I'm sending people to the > right > mailing list all the time. Indeed. Every single time a patch comes into linux-rdma that touches things in net/ or include/net, unless it is exceedingly minor, I check the To:/Cc: lines on the email and if netdev@ isn't included, or in the case of complex/tricky items, you aren't directly Cc:ed, then I specifically tell them to include netdev@ and/or you. I've even had things like a 12 patch series that buried three netdev@ appropriate patches at different points in the series and told the submitter to move all of the netdev@ related patches to the front and submit them to netdev@ so they can be reviewed as a group before I would move on to the others. It's just what you do. I've always considered that part of my job.
From: Doug Ledford <dledford@redhat.com> Date: Tue, 16 May 2017 12:36:01 -0400 > On Tue, 2017-05-16 at 18:30 +0200, Christoph Hellwig wrote: >> On Tue, May 16, 2017 at 12:29:23PM -0400, David Miller wrote: >> > >> > I can't push back on people with silly coding style and small >> > semantic >> > issues forever. And I think I made a serious effort to keep the >> > patches getting posted over and over again to make sure they got >> > more >> > exposure. >> >> You can tell them to go to linux-rdma. I'm sending people to the >> right >> mailing list all the time. > > Indeed. Every single time a patch comes into linux-rdma that touches > things in net/ or include/net, unless it is exceedingly minor, I check > the To:/Cc: lines on the email and if netdev@ isn't included, or in the > case of complex/tricky items, you aren't directly Cc:ed, then I > specifically tell them to include netdev@ and/or you. I've even had > things like a 12 patch series that buried three netdev@ appropriate > patches at different points in the series and told the submitter to > move all of the netdev@ related patches to the front and submit them to > netdev@ so they can be reviewed as a group before I would move on to > the others. It's just what you do. I've always considered that part > of my job. To be quite honest it wasn't exceedingly clear, even to me, that this had such implications or was directly a RDMA thing. From my perspective while reviewing I saw a patch series adding it's own protocol stack living inside of it's own directory under net/ And, if even one RDMA/infiniband person said to me "you really shouldn't apply this" then I would have dropped it on the spot. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2017-05-16 at 12:29 -0400, David Miller wrote: > From: Doug Ledford <dledford@redhat.com> > Date: Tue, 16 May 2017 11:57:04 -0400 > > > Regardless though, I'm rather purturbed about this entire thing. > If > > you are right that because this got into 4.11, it's now a done > deal, > > then the fact that this went through 4 review cycles on netdev@ > that, > > as I understand it, spanned roughly one years time, and not one > single > > person bothered to note that this was as much an RDMA driver as > > anything else, and not one person bothered to note that linux-rdma@ > was > > not on the Cc: list, and not one person told the submitters that > they > > needed to include linux-rdma@ on the Cc: list of these submissions, > and > > you took it without any review comments from any RDMA people in the > > course of a year, or an ack from me to show that the RDMA portion > of > > this had at least been given some sort of review, was a collosal > fuckup > > of cross tree maintainer cooperation. > > We rely on people from various areas of expertiece to contribute to > patch review on netdev and give appropriate feedback. > > If you actually look through the history, I made many semantic > reviews > of the SMC changes, and kept pushing back. > > And in fact I did this several times, making them go through several > revisions, in the hopes that someone would review more of the meat > and > substance of the patch set. If you want to walk to the mailbox, you walk to the mailbox, you don't walk to the grocery store, to the gym, and never even go to the mailbox. Likewise, if you want review from RDMA experts, most (maybe even all) don't subscribe to netdev@ because it's too high traffic, you don't waste time on silly semantic pushbacks, you send a single email that says "Please get review from linux-rdma@, thank you." Don't beat around the bush, be direct and get it over with. That's exactly what I do for all netdev@ related patches coming to linux-rdma@ without a proper Cc: to netdev@. > Nobody do this for over a year. > > I can't push back on people with silly coding style and small > semantic > issues forever. And I think I made a serious effort to keep the > patches getting posted over and over again to make sure they got more > exposure. > > I think it's unsettling that there are no RDMA experts, or at least > people remotely knowledgable about this "networking" technology, > subscribed to netdev and taking a cursory look at pactches that might > be relevant and effect that technology either directly or indirectly. > > So there is a lot of blame to go around. Fine, allocate blame however, you like. What I want to actually settle is how we are going to move forward. You didn't respond to that part of my email. Your thoughts?
From: Doug Ledford <dledford@redhat.com> Date: Tue, 16 May 2017 12:42:43 -0400 > On Tue, 2017-05-16 at 12:29 -0400, David Miller wrote: >> From: Doug Ledford <dledford@redhat.com> >> Date: Tue, 16 May 2017 11:57:04 -0400 >> >> > Regardless though, I'm rather purturbed about this entire thing. >> If >> > you are right that because this got into 4.11, it's now a done >> deal, >> > then the fact that this went through 4 review cycles on netdev@ >> that, >> > as I understand it, spanned roughly one years time, and not one >> single >> > person bothered to note that this was as much an RDMA driver as >> > anything else, and not one person bothered to note that linux-rdma@ >> was >> > not on the Cc: list, and not one person told the submitters that >> they >> > needed to include linux-rdma@ on the Cc: list of these submissions, >> and >> > you took it without any review comments from any RDMA people in the >> > course of a year, or an ack from me to show that the RDMA portion >> of >> > this had at least been given some sort of review, was a collosal >> fuckup >> > of cross tree maintainer cooperation. >> >> We rely on people from various areas of expertiece to contribute to >> patch review on netdev and give appropriate feedback. >> >> If you actually look through the history, I made many semantic >> reviews >> of the SMC changes, and kept pushing back. >> >> And in fact I did this several times, making them go through several >> revisions, in the hopes that someone would review more of the meat >> and >> substance of the patch set. > > If you want to walk to the mailbox, you walk to the mailbox, you don't > walk to the grocery store, to the gym, and never even go to the > mailbox. Likewise, if you want review from RDMA experts, most (maybe > even all) don't subscribe to netdev@ because it's too high traffic, you > don't waste time on silly semantic pushbacks, you send a single email > that says "Please get review from linux-rdma@, thank you." Don't beat > around the bush, be direct and get it over with. That's exactly what I > do for all netdev@ related patches coming to linux-rdma@ without a > proper Cc: to netdev@. Read my other email, it wasn't %100 clear to me that this was so strictly RDMA related. And I kept pushing back with semantic changes in part because it wasn't clear. So as far as I was concerned I was not necessarily going to the wrong store, in fact I wasn't sure what store to go to. And none of the thousands of subscribers to netdev intuit'd this either. Maybe there is a reason for that. Furthermore, if netdev is too much traffic for one or two RDMA people to just casually subscribe to on the off chance that a situationm like this comes up, guess what it is like for me who has to read and review pretty much every single posting that is placed there? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2017-05-16 at 12:41 -0400, David Miller wrote: > From: Doug Ledford <dledford@redhat.com> > Date: Tue, 16 May 2017 12:36:01 -0400 > > > On Tue, 2017-05-16 at 18:30 +0200, Christoph Hellwig wrote: > >> On Tue, May 16, 2017 at 12:29:23PM -0400, David Miller wrote: > >> > > >> > I can't push back on people with silly coding style and small > >> > semantic > >> > issues forever. And I think I made a serious effort to keep the > >> > patches getting posted over and over again to make sure they got > >> > more > >> > exposure. > >> > >> You can tell them to go to linux-rdma. I'm sending people to the > >> right > >> mailing list all the time. > > > > Indeed. Every single time a patch comes into linux-rdma that > touches > > things in net/ or include/net, unless it is exceedingly minor, I > check > > the To:/Cc: lines on the email and if netdev@ isn't included, or in > the > > case of complex/tricky items, you aren't directly Cc:ed, then I > > specifically tell them to include netdev@ and/or you. I've even > had > > things like a 12 patch series that buried three netdev@ appropriate > > patches at different points in the series and told the submitter to > > move all of the netdev@ related patches to the front and submit > them to > > netdev@ so they can be reviewed as a group before I would move on > to > > the others. It's just what you do. I've always considered that > part > > of my job. > > To be quite honest it wasn't exceedingly clear, even to me, that this > had such implications or was directly a RDMA thing. From my > perspective while reviewing I saw a patch series adding it's own > protocol stack living inside of it's own directory under net/ OK. Fair enough. That implies to me that you probably dove right into the patches and skimmed the cover letter (http://marc.info/?l=linux-s39 0&m=148397751211964&w=2), because when I read the cover letter, it seems pretty clear to me that this is very much RDMA related. In any case, there are already two other code bases that live under net/ but also involve RDMA heavily: nfs and rds. This is a third now. So, for future reference, just like the RDMA related nfs/rds patches already get Cc:ed to linux-rdma@, these probably should be too. > And, if even one RDMA/infiniband person said to me "you really > shouldn't apply this" then I would have dropped it on the spot. I'm glad to hear that. I wish we had managed to get together on this sooner.
On Tue, 2017-05-16 at 12:49 -0400, David Miller wrote: > From: Doug Ledford <dledford@redhat.com> > Date: Tue, 16 May 2017 12:42:43 -0400 > > > On Tue, 2017-05-16 at 12:29 -0400, David Miller wrote: > >> From: Doug Ledford <dledford@redhat.com> > >> Date: Tue, 16 May 2017 11:57:04 -0400 > >> > >> > Regardless though, I'm rather purturbed about this entire thing. > >> If > >> > you are right that because this got into 4.11, it's now a done > >> deal, > >> > then the fact that this went through 4 review cycles on netdev@ > >> that, > >> > as I understand it, spanned roughly one years time, and not one > >> single > >> > person bothered to note that this was as much an RDMA driver as > >> > anything else, and not one person bothered to note that linux- > rdma@ > >> was > >> > not on the Cc: list, and not one person told the submitters that > >> they > >> > needed to include linux-rdma@ on the Cc: list of these > submissions, > >> and > >> > you took it without any review comments from any RDMA people in > the > >> > course of a year, or an ack from me to show that the RDMA > portion > >> of > >> > this had at least been given some sort of review, was a collosal > >> fuckup > >> > of cross tree maintainer cooperation. > >> > >> We rely on people from various areas of expertiece to contribute > to > >> patch review on netdev and give appropriate feedback. > >> > >> If you actually look through the history, I made many semantic > >> reviews > >> of the SMC changes, and kept pushing back. > >> > >> And in fact I did this several times, making them go through > several > >> revisions, in the hopes that someone would review more of the meat > >> and > >> substance of the patch set. > > > > If you want to walk to the mailbox, you walk to the mailbox, you > don't > > walk to the grocery store, to the gym, and never even go to the > > mailbox. Likewise, if you want review from RDMA experts, most > (maybe > > even all) don't subscribe to netdev@ because it's too high traffic, > you > > don't waste time on silly semantic pushbacks, you send a single > email > > that says "Please get review from linux-rdma@, thank you." Don't > beat > > around the bush, be direct and get it over with. That's exactly > what I > > do for all netdev@ related patches coming to linux-rdma@ without a > > proper Cc: to netdev@. > > Read my other email, it wasn't %100 clear to me that this was so > strictly RDMA related. And I kept pushing back with semantic changes > in part because it wasn't clear. See my other email. I think in the fact that you are overloaded on netdev@ you skimmed the cover letter, which is the one thing that would have made things clear. > So as far as I was concerned I was not necessarily going to the wrong > store, in fact I wasn't sure what store to go to. > > And none of the thousands of subscribers to netdev intuit'd this > either. Maybe there is a reason for that. Yeah, it's an overloaded list would be my guess. I imagine no one can keep up with it fully in truth. Maybe it's time to split some of it out into sublists? netdev-core/netdev-ethernet/netdev-packet? I don't know, but I know I can't keep up with it. That you do as well as you do is probably only because you know how to not spend too much time on any one thing. I'm sure you have people you know are submitting important work that effects the overall net subsystem and you focus on their stuff, but ancillary things probably only get half attention at double speed in order to allow you to make it through the day. > Furthermore, if netdev is too much traffic for one or two RDMA people > to just casually subscribe to on the off chance that a situationm > like > this comes up, guess what it is like for me who has to read and > review > pretty much every single posting that is placed there? I get it. I don't have the time to both be responsible for linux-rdma and follow netdev. I can already tell you what would happen if I tried. Eventually netdev would get so far behind I would just mass delete and start over. So that doesn't solve the problem. Anyway, we're just talking out what happened, when what we really need to focus on is moving forward. Again, your thoughts on marking SMC EXPERIMENTAL until it's fixed up and unfreezing the API in case we need to adjust it to work on different link layers?
From: Doug Ledford <dledford@redhat.com> Date: Tue, 16 May 2017 13:20:44 -0400 > Anyway, we're just talking out what happened, when what we really need > to focus on is moving forward. Again, your thoughts on marking SMC > EXPERIMENTAL until it's fixed up and unfreezing the API in case we need > to adjust it to work on different link layers? Something like: http://patchwork.ozlabs.org/patch/762803/ with the addition of the EXPERIMENTAL dependency? Sure. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2017-05-16 at 13:36 -0400, David Miller wrote: > From: Doug Ledford <dledford@redhat.com> > Date: Tue, 16 May 2017 13:20:44 -0400 > > > Anyway, we're just talking out what happened, when what we really > need > > to focus on is moving forward. Again, your thoughts on marking SMC > > EXPERIMENTAL until it's fixed up and unfreezing the API in case we > need > > to adjust it to work on different link layers? > > Something like: > > http://patchwork.ozlabs.org/patch/762803/ > > with the addition of the EXPERIMENTAL dependency? > > Sure. Perfect. I assume you'll submit it since it's in your patchworks?
From: Doug Ledford <dledford@redhat.com> Date: Tue, 16 May 2017 14:03:22 -0400 > On Tue, 2017-05-16 at 13:36 -0400, David Miller wrote: >> From: Doug Ledford <dledford@redhat.com> >> Date: Tue, 16 May 2017 13:20:44 -0400 >> >> > Anyway, we're just talking out what happened, when what we really >> need >> > to focus on is moving forward. Again, your thoughts on marking SMC >> > EXPERIMENTAL until it's fixed up and unfreezing the API in case we >> need >> > to adjust it to work on different link layers? >> >> Something like: >> >> http://patchwork.ozlabs.org/patch/762803/ >> >> with the addition of the EXPERIMENTAL dependency? >> >> Sure. > > Perfect. I assume you'll submit it since it's in your patchworks? Ok I applied the patch referenced above, but we don't actually have an EXPERIMENTAL symbol. The closest thing we have is BROKEN and even in this situation that's a bit harsh. I also applied the patch which converts SMC over to using IB_PD_UNSAFE_GLOBAL_RKEY. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2017-05-16 at 14:52 -0400, David Miller wrote: > From: Doug Ledford <dledford@redhat.com> > Date: Tue, 16 May 2017 14:03:22 -0400 > > > On Tue, 2017-05-16 at 13:36 -0400, David Miller wrote: > >> From: Doug Ledford <dledford@redhat.com> > >> Date: Tue, 16 May 2017 13:20:44 -0400 > >> > >> > Anyway, we're just talking out what happened, when what we > really > >> need > >> > to focus on is moving forward. Again, your thoughts on marking > SMC > >> > EXPERIMENTAL until it's fixed up and unfreezing the API in case > we > >> need > >> > to adjust it to work on different link layers? > >> > >> Something like: > >> > >> http://patchwork.ozlabs.org/patch/762803/ > >> > >> with the addition of the EXPERIMENTAL dependency? > >> > >> Sure. > > > > Perfect. I assume you'll submit it since it's in your patchworks? > > Ok I applied the patch referenced above, but we don't actually have > an EXPERIMENTAL symbol. The closest thing we have is BROKEN and > even in this situation that's a bit harsh. I hadn't realized EXPERIMENTAL was gone. Which is too bad, because that's entirely appropriate in this case, and would have had the desired side effect of keeping it out of any non-cutting edge distros and warning people of possible API changes. With EXPERIMENTAL gone, the closest thing we have is drivers/staging, since that tends to imply some of the same consequences. I know you think BROKEN is overly harsh, but I'm not sure we should just do nothing. How about we take a few days to let some of the RDMA people closely review the 143 page (egads!) rfc (http://www.rfc-editor.org/info/rfc7609) to see if we think it can be fixed to use multiple link layers with the existing API in place or if it will require something other than AF_SMC. If we need to break API, then I think we should either fix it ASAP and send that fix to the 4.11 stable series (which probably violates the normative stable patch size/scope) or if the fix will take longer than this kernel cycle, then move it to staging both here and in 4.11 stable, and fix it there and then move it back. Something like that would prevent the kind of API flappage we ought not do....
On Tue, 2017-05-16 at 15:28 -0400, Doug Ledford wrote: > I hadn't realized EXPERIMENTAL was gone. Which is too bad, because > that's entirely appropriate in this case, and would have had the > desired side effect of keeping it out of any non-cutting edge distros > and warning people of possible API changes. With EXPERIMENTAL gone, > the closest thing we have is drivers/staging, since that tends to > imply > some of the same consequences. I know you think BROKEN is overly > harsh, but I'm not sure we should just do nothing. How about we take > a > few days to let some of the RDMA people closely review the 143 page > (egads!) rfc (http://www.rfc-editor.org/info/rfc7609) to see if we > think it can be fixed to use multiple link layers with the existing > API > in place or if it will require something other than AF_SMC. If we > need > to break API, then I think we should either fix it ASAP and send that > fix to the 4.11 stable series (which probably violates the normative > stable patch size/scope) or if the fix will take longer than this > kernel cycle, then move it to staging both here and in 4.11 stable, > and > fix it there and then move it back. Something like that would > prevent > the kind of API flappage we ought not do.... So, I've skimmed the entire RFC, focusing on things were I needed to. Here's my take on it: It would have been better with AF_INET/AF_INET6 and an option to enable SMC than AF_SMC. The first implementation simply assumes AF_INET in the presence of AF_SMC. When IPv6 support is added, some sort of guessing logic will have to be put in place to try and determine if an AF_SMC address is actually AF_INET or AF_INET6 since we won't have a guaranteed way of telling. Apps can use struct sockaddr_storage as their normal element to stick the address into, and could rely on the kernel to interpret it properly based on the AF_INET/AF_INET6 differentiation, and this breaks that. The RFC gives *some* thought to adding IPv6 in the future, but not a lot. It may be that the answer is that in the future, IPv6 support is enabled by making the IPv6 API be AF_INET6 + setsockopt(SMC) or the equivalent. If that's the case, then I would suggest making the later API specifically call out AF_INET + setsockopt(SMC) be identical to AF_SMC. The protocol included a version header in the negotation messages. This is good as it allows us to almost immediately start work on version 2 that fixes the shortcomings of version 1 while maintaining back compatibility. After reading the RFC, I can see why they only support one link layer: RoCE. The issue here is that they punted on the hard issue of doing any sort of work to determine if security restrictions on the TCP connection should also be applied to the RDMA connection. The RFC basically says "the RoCE link needs to have the same physical restrictions (vlan) as the TCP/IP link so that the security limitations are the same" because they don't do anything to check them essentially. For v2 of the protocol, and for different link layer support, this is no longer sufficient, so there will be significant work to determine the security context of specific TCP connections and then make sure that they meet the security context of the RDMA links allowed. Additionally, the same is likely to be true in terms of SELinux options. The addition of the IB/OPA link layers will throw a bit of a monkey wrench in things because the SELinux control over those links is slightly different than a normal TCP/IP SELinux control. For instance, you might create rules about processes and ports to make sure that the httpd daemon can only access specific ports on TCP/IP, but on IB/OPA you would need to create process and P_Key rules as IB/OPA don't have the same port level semantics, and it's the P_Key on communications that is enforced network wide, including in the switches, so controlling what P_Key a process can send/receive on is your best way to limit what a process can do. Translation from one to the other might be rather difficult to do in any sort of automated fashion. There might have to be some additional work to get this to properly account items for the RDMA control group elements that were recently taken into the kernel. Finally, the RDMA subsystem is in the process of switching to structured option processing similar to how netlink does it. For version 2 of this protocol, since it will be interacting with the RDMA core in many ways, will be simpler if it switches the on-wire negotiation packets to follow the same methods as that will allow reuse of code that it will have to have for properly dealing with the RDMA subsystem in the future. So, I'm fine with it being left as is since you accepted the patch that makes note of the memory registration insecurity in the Kconfig text. The fact that this is a versioned protocol means that we can fix the things we see as being wrong without having to have it all right from the very start, it can be done incrementally.
SGkgRG91ZywNCg0KPiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiBsaW51eC1y ZG1hLW93bmVyQHZnZXIua2VybmVsLm9yZyBbbWFpbHRvOmxpbnV4LXJkbWEtDQo+IG93bmVyQHZn ZXIua2VybmVsLm9yZ10gT24gQmVoYWxmIE9mIERvdWcgTGVkZm9yZA0KPiBTZW50OiBXZWRuZXNk YXksIE1heSAxNywgMjAxNyAzOjM4IFBNDQo+IFRvOiBEYXZpZCBNaWxsZXIgPGRhdmVtQGRhdmVt bG9mdC5uZXQ+DQo+IENjOiBCYXJ0LlZhbkFzc2NoZUBzYW5kaXNrLmNvbTsgdG9ydmFsZHNAbGlu dXgtZm91bmRhdGlvbi5vcmc7DQo+IGhjaEBsc3QuZGU7IG5ldGRldkB2Z2VyLmtlcm5lbC5vcmc7 IGxpbnV4LXJkbWFAdmdlci5rZXJuZWwub3JnOw0KPiBzdGFibGVAdmdlci5rZXJuZWwub3JnOyB1 YnJhdW5AbGludXgudm5ldC5pYm0uY29tDQo+IFN1YmplY3Q6IFJlOiBbUEFUQ0hdIG5ldC9zbWM6 IG1hcmsgYXMgQlJPS0VOIGR1ZSB0byByZW1vdGUgbWVtb3J5DQo+IGV4cG9zdXJlDQo+IA0KPiBP biBUdWUsIDIwMTctMDUtMTYgYXQgMTU6MjggLTA0MDAsIERvdWcgTGVkZm9yZCB3cm90ZToNCj4g PiBJIGhhZG4ndCByZWFsaXplZCBFWFBFUklNRU5UQUwgd2FzIGdvbmUuIMKgV2hpY2ggaXMgdG9v IGJhZCwgYmVjYXVzZQ0KPiA+IHRoYXQncyBlbnRpcmVseSBhcHByb3ByaWF0ZSBpbiB0aGlzIGNh c2UsIGFuZCB3b3VsZCBoYXZlIGhhZCB0aGUNCj4gPiBkZXNpcmVkIHNpZGUgZWZmZWN0IG9mIGtl ZXBpbmcgaXQgb3V0IG9mIGFueSBub24tY3V0dGluZyBlZGdlIGRpc3Ryb3MNCj4gPiBhbmQgd2Fy bmluZyBwZW9wbGUgb2YgcG9zc2libGUgQVBJIGNoYW5nZXMuIMKgV2l0aCBFWFBFUklNRU5UQUwg Z29uZSwNCj4gPiB0aGUgY2xvc2VzdCB0aGluZyB3ZSBoYXZlIGlzIGRyaXZlcnMvc3RhZ2luZywg c2luY2UgdGhhdCB0ZW5kcyB0bw0KPiA+IGltcGx5IHNvbWUgb2YgdGhlIHNhbWUgY29uc2VxdWVu Y2VzLiDCoEkga25vdyB5b3UgdGhpbmsgQlJPS0VOIGlzDQo+ID4gb3Zlcmx5IGhhcnNoLCBidXQg SSdtIG5vdCBzdXJlIHdlIHNob3VsZCBqdXN0IGRvIG5vdGhpbmcuIMKgSG93IGFib3V0DQo+ID4g d2UgdGFrZSBhIGZldyBkYXlzIHRvIGxldCBzb21lIG9mIHRoZSBSRE1BIHBlb3BsZSBjbG9zZWx5 IHJldmlldyB0aGUNCj4gPiAxNDMgcGFnZQ0KPiA+IChlZ2FkcyEpIHJmYyAoaHR0cDovL3d3dy5y ZmMtZWRpdG9yLm9yZy9pbmZvL3JmYzc2MDkpIHRvIHNlZSBpZiB3ZQ0KPiA+IHRoaW5rIGl0IGNh biBiZSBmaXhlZCB0byB1c2UgbXVsdGlwbGUgbGluayBsYXllcnMgd2l0aCB0aGUgZXhpc3RpbmcN Cj4gPiBBUEkgaW4gcGxhY2Ugb3IgaWYgaXQgd2lsbCByZXF1aXJlIHNvbWV0aGluZyBvdGhlciB0 aGFuIEFGX1NNQy4gwqBJZiB3ZQ0KPiA+IG5lZWQgdG8gYnJlYWsgQVBJLCB0aGVuIEkgdGhpbmsg d2Ugc2hvdWxkIGVpdGhlciBmaXggaXQgQVNBUCBhbmQgc2VuZA0KPiA+IHRoYXQgZml4IHRvIHRo ZSA0LjExIHN0YWJsZSBzZXJpZXMgKHdoaWNoIHByb2JhYmx5IHZpb2xhdGVzIHRoZQ0KPiA+IG5v cm1hdGl2ZSBzdGFibGUgcGF0Y2ggc2l6ZS9zY29wZSkgb3IgaWYgdGhlIGZpeCB3aWxsIHRha2Ug bG9uZ2VyIHRoYW4NCj4gPiB0aGlzIGtlcm5lbCBjeWNsZSwgdGhlbiBtb3ZlIGl0IHRvIHN0YWdp bmcgYm90aCBoZXJlIGFuZCBpbiA0LjExDQo+ID4gc3RhYmxlLCBhbmQgZml4IGl0IHRoZXJlIGFu ZCB0aGVuIG1vdmUgaXQgYmFjay4gwqBTb21ldGhpbmcgbGlrZSB0aGF0DQo+ID4gd291bGQgcHJl dmVudCB0aGUga2luZCBvZiBBUEkgZmxhcHBhZ2Ugd2Ugb3VnaHQgbm90IGRvLi4uLg0KPiANCj4g U28sIEkndmUgc2tpbW1lZCB0aGUgZW50aXJlIFJGQywgZm9jdXNpbmcgb24gdGhpbmdzIHdlcmUg SSBuZWVkZWQgdG8uDQo+IMKgSGVyZSdzIG15IHRha2Ugb24gaXQ6DQo+IA0KPiBJdCB3b3VsZCBo YXZlIGJlZW4gYmV0dGVyIHdpdGggQUZfSU5FVC9BRl9JTkVUNiBhbmQgYW4gb3B0aW9uIHRvIGVu YWJsZQ0KPiBTTUMgdGhhbiBBRl9TTUMuIMKgVGhlIGZpcnN0IGltcGxlbWVudGF0aW9uIHNpbXBs eSBhc3N1bWVzIEFGX0lORVQgaW4NCj4gdGhlIHByZXNlbmNlIG9mIEFGX1NNQy4gwqBXaGVuIElQ djYgc3VwcG9ydCBpcyBhZGRlZCwgc29tZSBzb3J0IG9mDQo+IGd1ZXNzaW5nIGxvZ2ljIHdpbGwg aGF2ZSB0byBiZSBwdXQgaW4gcGxhY2UgdG8gdHJ5IGFuZCBkZXRlcm1pbmUgaWYgYW4NCj4gQUZf U01DIGFkZHJlc3MgaXMgYWN0dWFsbHkgQUZfSU5FVCBvciBBRl9JTkVUNiBzaW5jZSB3ZSB3b24n dCBoYXZlIGENCj4gZ3VhcmFudGVlZCB3YXkgb2YgdGVsbGluZy4gwqBBcHBzIGNhbiB1c2Ugc3Ry dWN0IHNvY2thZGRyX3N0b3JhZ2UgYXMgdGhlaXINCj4gbm9ybWFsIGVsZW1lbnQgdG8gc3RpY2sg dGhlIGFkZHJlc3MgaW50bywgYW5kIGNvdWxkIHJlbHkgb24gdGhlIGtlcm5lbCB0bw0KPiBpbnRl cnByZXQgaXQgcHJvcGVybHkgYmFzZWQgb24gdGhlIEFGX0lORVQvQUZfSU5FVDYgZGlmZmVyZW50 aWF0aW9uLCBhbmQNCj4gdGhpcyBicmVha3MgdGhhdC4gwqBUaGUgUkZDIGdpdmVzICpzb21lKiB0 aG91Z2h0IHRvIGFkZGluZyBJUHY2IGluIHRoZQ0KPiBmdXR1cmUsIGJ1dCBub3QgYSBsb3QuIMKg SXQgbWF5IGJlIHRoYXQgdGhlIGFuc3dlciBpcyB0aGF0IGluIHRoZSBmdXR1cmUsIElQdjYNCj4g c3VwcG9ydCBpcyBlbmFibGVkIGJ5IG1ha2luZyB0aGUgSVB2NiBBUEkgYmUNCj4gQUZfSU5FVDYg KyBzZXRzb2Nrb3B0KFNNQykgb3IgdGhlIGVxdWl2YWxlbnQuIMKgSWYgdGhhdCdzIHRoZSBjYXNl LCB0aGVuIEkNCj4gd291bGQgc3VnZ2VzdCBtYWtpbmcgdGhlIGxhdGVyIEFQSSBzcGVjaWZpY2Fs bHkgY2FsbCBvdXQgQUZfSU5FVCArDQo+IHNldHNvY2tvcHQoU01DKSBiZSBpZGVudGljYWwgdG8g QUZfU01DLg0KPiANCg0KV2hhdCBhcmUgdGhlIHNob3J0Y29taW5ncyBpbiBteSBwcm9wb3NhbCBb MV0gd2hpY2ggSSBhbSByZWl0ZXJhdGluZyBiZWxvdy4NCkJhcnQgYWxzbyBzdWdnZXN0ZWQgdG8g ZGVmaW5lIG5ldyBzdHJlYW0gcHJvdG9jb2wgZm9yIFNNQyBzaW1pbGFyIHRvIFNDVFAuDQoNCihh KSBhZGRyZXNzIGZhbWlseSBzaG91bGQgYmUgZWl0aGVyIEFGX0lORVQgb3IgQUZfSU5FVDYNCihi KSBzb2NrZXQoKSBBUEkgdG8gaW50cm9kdWNlIG5ldyBwcm90b2NvbCBhcyBQUk9UT19TTUMgZm9y IGluIHNvY2tldCgpIHN5c3RlbSBjYWxsLg0KDQpXaXRoIHRoaXMgdGhlcmUgaXMgbm8gYWRkaXRp b25hbCBzZXRzb2Nrb3AoKSBuZWVkZWQuDQoNCldpdGggdGhpcyAtIHVzZXIgc3BhY2UgYXBwbGlj YXRpb25zLCBkbyBnZXRhZGRyaW5mbygpIHdpdGggaGludCBhcw0KaGludHMuYWlfZmFtaWx5ICA9 IEFGX0lORVQ7DQpoaW50cy5haV9zb2NrdHlwZSA9IFNPQ0tfU1RSRUFNOw0KDQpnZXRhZGRyaW5m bygpIHJldHVybnMgYmFjayBib3RoIHRoZSBwcm90b2NvbHMgVENQIGFuZCBTTUMuDQpGYW1vdXMg ZGF0YWJhc2UgYXBwbGljYXRpb24gc3VjaCBhcyBSZWRpcyBjbGllbnQgaXRlcmF0ZXMgb3ZlciBh bGwgZW50cmllcyBvZiBnZXRhZGRyaW5mbygpIGFuZCBlc3RhYmxpc2hlcyBjb25uZWN0aW9uIHRv IHNlcnZlcnMuDQoNClRoZXJlIGFyZSBmZXcgYWR2YW50YWdlcyBvZiB0aGlzIGludGVyZmFjZS4N CjEuIE5vIGNoYW5nZSBpbiBhbnkgbWFrZWZpbGUgb2YgYXBwbGljYXRpb25zIG5lZWRlZCAtIHVu bGVzcyB1c2VyIHdhbnRzIHRvIHNwZWNpZnkgZXhwbGljaXRseSB0aGF0IGl0IHdhbnRzIHRvIGZv cmNlIFNNQyBwcm90b2NvbC4NCjIuIE5vIG5lZWQgdG8gZG8gTERfTElCUkFSWV9QUkVMT0FELCAo d2hpY2ggd29uJ3Qgd29yayBhbnl3YXkgYmVjYXVzZSBiaW5kKCkgY29ubmVjdCgpIG9mIFNNQyBj aGVja3MgZm9yIEFGX0lORVQpLg0KMy4gTm8gbWFqb3IgY2hhbmdlcyB0byBnbGliYyB0byBwcm9j ZXNzIEFGX1NNQyBkaWZmZXJlbnRseQ0KZ2xpYmMgcmVmZXJlbmNlcyBJUFBST1RPX1RDUCBhdCAy MiBwbGFjZXMuIChjb21wYXJlIHRvIEFGX0lORVQgYXQgMTQwKyBwbGFjZXMpLg0KQSBsb3Qgc2lt cGxlciB0ZXN0IG1hdHJpeCBmb3IgZ2xpYmMgZm9yIG5ldyBwcm90b2NvbA0KNS4gTm8gbmVlZCB0 byByZWNvbXBpbGUgYXBwbGljYXRpb25zLCBhcyBsb25nIGFzIGdldGFkZHJpbmZvIHJldHVybnMg YWxsIHN0cmVhbWluZyBwcm90b2NvbHMgKFRDUCwgU01DKQ0KNi4gZm9yIGFwcGxpY2F0aW9ucyB0 byBtYWtlIHVzZSBvZiBzZXRzb2Nrb3B0KCkgaXQgbmVlZHMgYW5vdGhlciBrbm9iIGFuZCBoaW50 IGZyb20gb3RoZXIgcGxhY2VzLCB3aGljaCBjYW4gYmUgYXZvaWRlZCBiZWNhdXNlIFNNQyBUQ1Ag b3B0aW9uIG5lZ290aWF0ZXMgd2l0aCByZW1vdGUgZW5kDQoNCkFuZCByZXByZXNlbnRpbmcgbmV3 IHByb3RvY29sIGFzIG5ldyBwcm90b2NvbCBmb3IgYSBnaXZlbiBhZGRyZXNzIGZhbWlseSBhcHBl YXJzIGNvcnJlY3QsIGNvbXBhcmUgdG8gc2V0dGluZyBzb2NrZXQgb3B0aW9ucy4NCg0KVG9vbHMg bGlrZSBDUklVIG9yIHNpbWlsYXIgdG9vbCBtaWdodCBmaW5kIGEgcmFjZSBjb25kaXRpb25zIC0g d2hlbiBpdCBxdWVyaWVzIHNvY2tldCBvcHRpb24sIFNNQyB3YXMgbm90IHNldCwgYnV0IGxhdGVy IG9uIFNNQyB3YXMgc2V0LCBhbmQgZG9lcyBpbmNvcnJlY3QgaGFuZGxpbmcuDQpTZXR0aW5nIHNv Y2tldCgpIHdpdGggU01DIHByb3RvY29sIG1ha2VzIGl0IGVhc2llciB0byB1bmRlcnN0YW5kIGlu IHRoaXMgYXJlYSBhcyB3ZWxsLg0KDQpJIGhhdmUgYWRkaXRpb25hbCBwcm9wb3NhbCBmb3IgbGlu ayBncm91cHMsIHJlc291cmNlIGNyZWF0aW9uIGFyZWEuIEkgd2lsbCB0YWtlIHRoYXQgdXAgYWZ0 ZXIgdGhpcyBkaXNjdXNzaW9uLg0KDQpbMV0gaHR0cHM6Ly9wYXRjaHdvcmsua2VybmVsLm9yZy9w YXRjaC85NzE5Mzc1Lw0KDQo+IFRoZSBwcm90b2NvbCBpbmNsdWRlZCBhIHZlcnNpb24gaGVhZGVy IGluIHRoZSBuZWdvdGF0aW9uIG1lc3NhZ2VzLg0KPiDCoFRoaXMgaXMgZ29vZCBhcyBpdCBhbGxv d3MgdXMgdG8gYWxtb3N0IGltbWVkaWF0ZWx5IHN0YXJ0IHdvcmsgb24gdmVyc2lvbiAyDQo+IHRo YXQgZml4ZXMgdGhlIHNob3J0Y29taW5ncyBvZiB2ZXJzaW9uIDEgd2hpbGUgbWFpbnRhaW5pbmcg YmFjaw0KPiBjb21wYXRpYmlsaXR5Lg0KPiANCj4gQWZ0ZXIgcmVhZGluZyB0aGUgUkZDLCBJIGNh biBzZWUgd2h5IHRoZXkgb25seSBzdXBwb3J0IG9uZSBsaW5rIGxheWVyOg0KPiBSb0NFLiDCoFRo ZSBpc3N1ZSBoZXJlIGlzIHRoYXQgdGhleSBwdW50ZWQgb24gdGhlIGhhcmQgaXNzdWUgb2YgZG9p bmcgYW55IHNvcnQNCj4gb2Ygd29yayB0byBkZXRlcm1pbmUgaWYgc2VjdXJpdHkgcmVzdHJpY3Rp b25zIG9uIHRoZSBUQ1AgY29ubmVjdGlvbiBzaG91bGQNCj4gYWxzbyBiZSBhcHBsaWVkIHRvIHRo ZSBSRE1BIGNvbm5lY3Rpb24uIMKgVGhlIFJGQyBiYXNpY2FsbHkgc2F5cyAidGhlIFJvQ0UNCj4g bGluayBuZWVkcyB0byBoYXZlIHRoZSBzYW1lIHBoeXNpY2FsIHJlc3RyaWN0aW9ucyAodmxhbikg YXMgdGhlIFRDUC9JUCBsaW5rIHNvDQo+IHRoYXQgdGhlIHNlY3VyaXR5IGxpbWl0YXRpb25zIGFy ZSB0aGUgc2FtZSIgYmVjYXVzZSB0aGV5IGRvbid0IGRvIGFueXRoaW5nDQo+IHRvIGNoZWNrIHRo ZW0gZXNzZW50aWFsbHkuDQo+IMKgRm9yIHYyIG9mIHRoZSBwcm90b2NvbCwgYW5kIGZvciBkaWZm ZXJlbnQgbGluayBsYXllciBzdXBwb3J0LCB0aGlzIGlzIG5vIGxvbmdlcg0KPiBzdWZmaWNpZW50 LCBzbyB0aGVyZSB3aWxsIGJlIHNpZ25pZmljYW50IHdvcmsgdG8gZGV0ZXJtaW5lIHRoZSBzZWN1 cml0eSBjb250ZXh0DQo+IG9mIHNwZWNpZmljIFRDUCBjb25uZWN0aW9ucyBhbmQgdGhlbiBtYWtl IHN1cmUgdGhhdCB0aGV5IG1lZXQgdGhlIHNlY3VyaXR5DQo+IGNvbnRleHQgb2YgdGhlIFJETUEg bGlua3MgYWxsb3dlZC4NCj4gDQo+IEFkZGl0aW9uYWxseSwgdGhlIHNhbWUgaXMgbGlrZWx5IHRv IGJlIHRydWUgaW4gdGVybXMgb2YgU0VMaW51eCBvcHRpb25zLiDCoFRoZQ0KPiBhZGRpdGlvbiBv ZiB0aGUgSUIvT1BBIGxpbmsgbGF5ZXJzIHdpbGwgdGhyb3cgYSBiaXQgb2YgYSBtb25rZXkgd3Jl bmNoIGluDQo+IHRoaW5ncyBiZWNhdXNlIHRoZSBTRUxpbnV4IGNvbnRyb2wgb3ZlciB0aG9zZSBs aW5rcyBpcyBzbGlnaHRseSBkaWZmZXJlbnQgdGhhbg0KPiBhIG5vcm1hbCBUQ1AvSVAgU0VMaW51 eCBjb250cm9sLiDCoEZvciBpbnN0YW5jZSwgeW91IG1pZ2h0IGNyZWF0ZSBydWxlcyBhYm91dA0K PiBwcm9jZXNzZXMgYW5kIHBvcnRzIHRvIG1ha2Ugc3VyZSB0aGF0IHRoZSBodHRwZCBkYWVtb24g Y2FuIG9ubHkgYWNjZXNzDQo+IHNwZWNpZmljIHBvcnRzIG9uIFRDUC9JUCwgYnV0IG9uIElCL09Q QSB5b3Ugd291bGQgbmVlZCB0byBjcmVhdGUgcHJvY2Vzcw0KPiBhbmQgUF9LZXkgcnVsZXMgYXMg SUIvT1BBIGRvbid0IGhhdmUgdGhlIHNhbWUgcG9ydCBsZXZlbCBzZW1hbnRpY3MsIGFuZA0KPiBp dCdzIHRoZSBQX0tleSBvbiBjb21tdW5pY2F0aW9ucyB0aGF0IGlzIGVuZm9yY2VkIG5ldHdvcmsg d2lkZSwgaW5jbHVkaW5nDQo+IGluIHRoZSBzd2l0Y2hlcywgc28gY29udHJvbGxpbmcgd2hhdCBQ X0tleSBhIHByb2Nlc3MgY2FuIHNlbmQvcmVjZWl2ZSBvbiBpcw0KPiB5b3VyIGJlc3Qgd2F5IHRv IGxpbWl0IHdoYXQgYSBwcm9jZXNzIGNhbiBkby4gwqBUcmFuc2xhdGlvbiBmcm9tIG9uZSB0byB0 aGUNCj4gb3RoZXIgbWlnaHQgYmUgcmF0aGVyIGRpZmZpY3VsdCB0byBkbyBpbiBhbnkgc29ydCBv ZiBhdXRvbWF0ZWQgZmFzaGlvbi4NCj4gDQo+IFRoZXJlIG1pZ2h0IGhhdmUgdG8gYmUgc29tZSBh ZGRpdGlvbmFsIHdvcmsgdG8gZ2V0IHRoaXMgdG8gcHJvcGVybHkNCj4gYWNjb3VudCBpdGVtcyBm b3IgdGhlIFJETUEgY29udHJvbCBncm91cCBlbGVtZW50cyB0aGF0IHdlcmUgcmVjZW50bHkNCj4g dGFrZW4gaW50byB0aGUga2VybmVsLg0KPiANCj4gRmluYWxseSwgdGhlIFJETUEgc3Vic3lzdGVt IGlzIGluIHRoZSBwcm9jZXNzIG9mIHN3aXRjaGluZyB0byBzdHJ1Y3R1cmVkDQo+IG9wdGlvbiBw cm9jZXNzaW5nIHNpbWlsYXIgdG8gaG93IG5ldGxpbmsgZG9lcyBpdC4gwqBGb3IgdmVyc2lvbiAy IG9mIHRoaXMNCj4gcHJvdG9jb2wsIHNpbmNlIGl0IHdpbGwgYmUgaW50ZXJhY3Rpbmcgd2l0aCB0 aGUgUkRNQSBjb3JlIGluIG1hbnkgd2F5cywgd2lsbA0KPiBiZSBzaW1wbGVyIGlmIGl0IHN3aXRj aGVzIHRoZSBvbi13aXJlIG5lZ290aWF0aW9uIHBhY2tldHMgdG8gZm9sbG93IHRoZSBzYW1lDQo+ IG1ldGhvZHMgYXMgdGhhdCB3aWxsIGFsbG93IHJldXNlIG9mIGNvZGUgdGhhdCBpdCB3aWxsIGhh dmUgdG8gaGF2ZSBmb3INCj4gcHJvcGVybHkgZGVhbGluZyB3aXRoIHRoZSBSRE1BIHN1YnN5c3Rl bSBpbiB0aGUgZnV0dXJlLg0KPiANCj4gU28sIEknbSBmaW5lIHdpdGggaXQgYmVpbmcgbGVmdCBh cyBpcyBzaW5jZSB5b3UgYWNjZXB0ZWQgdGhlIHBhdGNoIHRoYXQgbWFrZXMNCj4gbm90ZSBvZiB0 aGUgbWVtb3J5IHJlZ2lzdHJhdGlvbiBpbnNlY3VyaXR5IGluIHRoZSBLY29uZmlnIHRleHQuDQo+ IMKgVGhlIGZhY3QgdGhhdCB0aGlzIGlzIGEgdmVyc2lvbmVkIHByb3RvY29sIG1lYW5zIHRoYXQg d2UgY2FuIGZpeCB0aGUgdGhpbmdzDQo+IHdlIHNlZSBhcyBiZWluZyB3cm9uZyB3aXRob3V0IGhh dmluZyB0byBoYXZlIGl0IGFsbCByaWdodCBmcm9tIHRoZSB2ZXJ5DQo+IHN0YXJ0LCBpdCBjYW4g YmUgZG9uZSBpbmNyZW1lbnRhbGx5Lg0KPiANCj4gLS0NCj4gRG91ZyBMZWRmb3JkIDxkbGVkZm9y ZEByZWRoYXQuY29tPg0KPiDCoCDCoCBHUEcgS2V5SUQ6IEI4MjZBMzMzMEU1NzJGREQNCj4gDQo+ IEtleSBmaW5nZXJwcmludCA9IEFFNkIgMUJEQSAxMjJCIDIzQjQgMjY1QiDCoDEyNzQgQjgyNiBB MzMzIDBFNTcgMkZERA0KPiANCj4gLS0NCj4gVG8gdW5zdWJzY3JpYmUgZnJvbSB0aGlzIGxpc3Q6 IHNlbmQgdGhlIGxpbmUgInVuc3Vic2NyaWJlIGxpbnV4LXJkbWEiIGluIHRoZQ0KPiBib2R5IG9m IGEgbWVzc2FnZSB0byBtYWpvcmRvbW9Admdlci5rZXJuZWwub3JnIE1vcmUgbWFqb3Jkb21vIGlu Zm8NCj4gYXQgIGh0dHA6Ly92Z2VyLmtlcm5lbC5vcmcvbWFqb3Jkb21vLWluZm8uaHRtbA0K -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 5/17/2017 6:37 PM, Parav Pandit wrote: > Hi Doug, > >> It would have been better with AF_INET/AF_INET6 and an option to enable >> SMC than AF_SMC. The first implementation simply assumes AF_INET in >> the presence of AF_SMC. When IPv6 support is added, some sort of >> guessing logic will have to be put in place to try and determine if an >> AF_SMC address is actually AF_INET or AF_INET6 since we won't have a >> guaranteed way of telling. Apps can use struct sockaddr_storage as their >> normal element to stick the address into, and could rely on the kernel to >> interpret it properly based on the AF_INET/AF_INET6 differentiation, and >> this breaks that. The RFC gives *some* thought to adding IPv6 in the >> future, but not a lot. It may be that the answer is that in the future, IPv6 >> support is enabled by making the IPv6 API be >> AF_INET6 + setsockopt(SMC) or the equivalent. If that's the case, then I >> would suggest making the later API specifically call out AF_INET + >> setsockopt(SMC) be identical to AF_SMC. >> > > What are the shortcomings in my proposal [1] which I am reiterating below. > Bart also suggested to define new stream protocol for SMC similar to SCTP. > > (a) address family should be either AF_INET or AF_INET6 > (b) socket() API to introduce new protocol as PROTO_SMC for in socket() system call. > > With this there is no additional setsockop() needed. > > With this - user space applications, do getaddrinfo() with hint as > hints.ai_family = AF_INET; > hints.ai_socktype = SOCK_STREAM; > > getaddrinfo() returns back both the protocols TCP and SMC. > Famous database application such as Redis client iterates over all entries of getaddrinfo() and establishes connection to servers. > > There are few advantages of this interface. > 1. No change in any makefile of applications needed - unless user wants to specify explicitly that it wants to force SMC protocol. > 2. No need to do LD_LIBRARY_PRELOAD, (which won't work anyway because bind() connect() of SMC checks for AF_INET). > 3. No major changes to glibc to process AF_SMC differently > glibc references IPPROTO_TCP at 22 places. (compare to AF_INET at 140+ places). > A lot simpler test matrix for glibc for new protocol > 5. No need to recompile applications, as long as getaddrinfo returns all streaming protocols (TCP, SMC) > 6. for applications to make use of setsockopt() it needs another knob and hint from other places, which can be avoided because SMC TCP option negotiates with remote end > > And representing new protocol as new protocol for a given address family appears correct, compare to setting socket options. > > Tools like CRIU or similar tool might find a race conditions - when it queries socket option, SMC was not set, but later on SMC was set, and does incorrect handling. > Setting socket() with SMC protocol makes it easier to understand in this area as well. I have no problem with the proposal in itself, but as IBM released this publication and did their own implementation prior to submitting things upstream, and as there might exist in the field implementations, it depends on whether we wish to call those in the field implementations experimental and break them as we go to a final implementation of version 1, or if we consider version 1 baked. I'm fine breaking it. After all, that's what happened with XRC back in the day and Mellanox learned a valuable lesson about upstream first. I have no problem with IBM learning that same lesson IMO. So, I find your proposal, including breaking the API of the version 1 implementation just taken into the kernel before it has had time to fully sit and gel, acceptable. But this is where we kind of need a judgment from on high, and why I Cc:ed Linus on this thread. Any input on this issue Linus? > I have additional proposal for link groups, resource creation area. I will take that up after this discussion. Look forward to hearing it. > [1] https://patchwork.kernel.org/patch/9719375/
On Wed, May 17, 2017 at 08:07:00PM -0400, Doug Ledford wrote: > On 5/17/2017 6:37 PM, Parav Pandit wrote: > > Hi Doug, > > > > >> It would have been better with AF_INET/AF_INET6 and an option to enable > >> SMC than AF_SMC. The first implementation simply assumes AF_INET in > >> the presence of AF_SMC. When IPv6 support is added, some sort of > >> guessing logic will have to be put in place to try and determine if an > >> AF_SMC address is actually AF_INET or AF_INET6 since we won't have a > >> guaranteed way of telling. Apps can use struct sockaddr_storage as their > >> normal element to stick the address into, and could rely on the kernel to > >> interpret it properly based on the AF_INET/AF_INET6 differentiation, and > >> this breaks that. The RFC gives *some* thought to adding IPv6 in the > >> future, but not a lot. It may be that the answer is that in the future, IPv6 > >> support is enabled by making the IPv6 API be > >> AF_INET6 + setsockopt(SMC) or the equivalent. If that's the case, then I > >> would suggest making the later API specifically call out AF_INET + > >> setsockopt(SMC) be identical to AF_SMC. > >> > > > > What are the shortcomings in my proposal [1] which I am reiterating below. > > Bart also suggested to define new stream protocol for SMC similar to SCTP. > > > > (a) address family should be either AF_INET or AF_INET6 > > (b) socket() API to introduce new protocol as PROTO_SMC for in socket() system call. > > > > With this there is no additional setsockop() needed. > > > > With this - user space applications, do getaddrinfo() with hint as > > hints.ai_family = AF_INET; > > hints.ai_socktype = SOCK_STREAM; > > > > getaddrinfo() returns back both the protocols TCP and SMC. > > Famous database application such as Redis client iterates over all entries of getaddrinfo() and establishes connection to servers. > > > > There are few advantages of this interface. > > 1. No change in any makefile of applications needed - unless user wants to specify explicitly that it wants to force SMC protocol. > > 2. No need to do LD_LIBRARY_PRELOAD, (which won't work anyway because bind() connect() of SMC checks for AF_INET). > > 3. No major changes to glibc to process AF_SMC differently > > glibc references IPPROTO_TCP at 22 places. (compare to AF_INET at 140+ places). > > A lot simpler test matrix for glibc for new protocol > > 5. No need to recompile applications, as long as getaddrinfo returns all streaming protocols (TCP, SMC) > > 6. for applications to make use of setsockopt() it needs another knob and hint from other places, which can be avoided because SMC TCP option negotiates with remote end > > > > And representing new protocol as new protocol for a given address family appears correct, compare to setting socket options. > > > > Tools like CRIU or similar tool might find a race conditions - when it queries socket option, SMC was not set, but later on SMC was set, and does incorrect handling. > > Setting socket() with SMC protocol makes it easier to understand in this area as well. > > I have no problem with the proposal in itself, but as IBM released this > publication and did their own implementation prior to submitting things > upstream, and as there might exist in the field implementations, it > depends on whether we wish to call those in the field implementations > experimental and break them as we go to a final implementation of > version 1, or if we consider version 1 baked. I'm fine breaking it. > After all, that's what happened with XRC back in the day and Mellanox > learned a valuable lesson about upstream first. I have no problem with > IBM learning that same lesson IMO. So, I find your proposal, including > breaking the API of the version 1 implementation just taken into the > kernel before it has had time to fully sit and gel, acceptable. Doug, I am amused that after your excellent summary you are not calling to this v1 of protocol in its real word: BROKEN. Current implementation and RFC are DOA (dead-on-arrival) and are not usable for ALL RDMA key players (Mellanox, Intel, Chelsio, ...) and ALL RDMA major users. It doesn't fit into existing infrastructure (DB, HPC, glibc, e.t.c.) and doesn't even fulfill the expected from cover letter (doesn't work with LD_PRELOAD_*). The fact that IBM implemented it and used it doesn't mean that they can't continue to leave it with out-of-tree solution for a little bit more time till usable version will come. > > But this is where we kind of need a judgment from on high, and why I > Cc:ed Linus on this thread. Any input on this issue Linus? It should be dropped/moved_to_staging as soon as possible, before UAPI started to spread. > > > I have additional proposal for link groups, resource creation area. I will take that up after this discussion. > > Look forward to hearing it. > > > [1] https://patchwork.kernel.org/patch/9719375/ > > > -- > Doug Ledford <dledford@redhat.com> > GPG Key ID: B826A3330E572FDD > Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD >
diff --git a/net/smc/Kconfig b/net/smc/Kconfig index c717ef0896aa..fe6b78bc515f 100644 --- a/net/smc/Kconfig +++ b/net/smc/Kconfig @@ -1,6 +1,6 @@ config SMC tristate "SMC socket protocol family" - depends on INET && INFINIBAND + depends on INET && INFINIBAND && BROKEN ---help--- SMC-R provides a "sockets over RDMA" solution making use of RDMA over Converged Ethernet (RoCE) technology to upgrade @@ -8,6 +8,10 @@ config SMC The Linux implementation of the SMC-R solution is designed as a separate socket family SMC. + Warning: SMC will expose all memory for remote reads and writes + once a connection is established. Don't enable this option except + for tightly controlled lab environment. + Select this option if you want to run SMC socket applications config SMC_DIAG
The driver has a lot of quality issues due to the lack of RDMA-side review, and explicitly bypasses APIs to register all memory once a connection is made, and thus allows remote access to memoery. Mark it as broken until at least that part is fixed. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org --- net/smc/Kconfig | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)