Message ID | VI1PR0502MB300860F7C004ABFB01079F53D15E0@VI1PR0502MB3008.eurprd05.prod.outlook.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
On Tue, Oct 31, 2017 at 03:16:42AM +0000, Parav Pandit wrote: > I am yet to review my below patch with Dan as he did most security > dev, but I suspect this might be the cause where rmpp list is not > initialized and mad processing is continued when security check > fails. This patch sure looks needed to me, ib_free_recv_mad touches rmpp_list, so if it needs initializion then it certainly has to be done earlier.. Adding the new return sure makes alot of sense as well.. Hal, Ira, would you check this routine too? kernel oops's are bad.. > diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c > index f8f53bb..cb91245 100644 > +++ b/drivers/infiniband/core/mad.c > @@ -1974,14 +1974,15 @@ static void ib_mad_complete_recv(struct ib_mad_agent_private *mad_agent_priv, > unsigned long flags; > int ret; > > + INIT_LIST_HEAD(&mad_recv_wc->rmpp_list); > ret = ib_mad_enforce_security(mad_agent_priv, > mad_recv_wc->wc->pkey_index); > if (ret) { > ib_free_recv_mad(mad_recv_wc); > deref_mad_agent(mad_agent_priv); > + return; > } > > - INIT_LIST_HEAD(&mad_recv_wc->rmpp_list); > list_add(&mad_recv_wc->recv_buf.list, &mad_recv_wc->rmpp_list); -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Oct 30, 2017 at 10:24:35PM -0600, Jason Gunthorpe wrote: > On Tue, Oct 31, 2017 at 03:16:42AM +0000, Parav Pandit wrote: > > > I am yet to review my below patch with Dan as he did most security > > dev, but I suspect this might be the cause where rmpp list is not > > initialized and mad processing is continued when security check > > fails. > > This patch sure looks needed to me, ib_free_recv_mad touches > rmpp_list, so if it needs initializion then it certainly has to be > done earlier.. Right, it aligns with my analysis too. > > Adding the new return sure makes alot of sense as well.. > > Hal, Ira, would you check this routine too? kernel oops's are bad.. > > > diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c > > index f8f53bb..cb91245 100644 > > +++ b/drivers/infiniband/core/mad.c > > @@ -1974,14 +1974,15 @@ static void ib_mad_complete_recv(struct ib_mad_agent_private *mad_agent_priv, > > unsigned long flags; > > int ret; > > > > + INIT_LIST_HEAD(&mad_recv_wc->rmpp_list); > > ret = ib_mad_enforce_security(mad_agent_priv, > > mad_recv_wc->wc->pkey_index); > > if (ret) { > > ib_free_recv_mad(mad_recv_wc); > > deref_mad_agent(mad_agent_priv); > > + return; > > } > > > > - INIT_LIST_HEAD(&mad_recv_wc->rmpp_list); > > list_add(&mad_recv_wc->recv_buf.list, &mad_recv_wc->rmpp_list);
On 10/31/2017 12:24 AM, Jason Gunthorpe wrote: > On Tue, Oct 31, 2017 at 03:16:42AM +0000, Parav Pandit wrote: > >> I am yet to review my below patch with Dan as he did most security >> dev, but I suspect this might be the cause where rmpp list is not >> initialized and mad processing is continued when security check >> fails. > > This patch sure looks needed to me, ib_free_recv_mad touches > rmpp_list, so if it needs initializion then it certainly has to be > done earlier.. Agreed. > Adding the new return sure makes alot of sense as well.. > > Hal, Ira, would you check this routine too? kernel oops's are bad.. Patch looks needed for just the point that Parav made above (that if security check fails, then ib_free_recv_mad will cause the mad_recv_wc->rmpp_list to be accessed so it needs to be initialized before security is enforced). I don't have mthca to try this. Maybe Chris can try this patch (with CONFIG_SECURITY_INFINIBAND=y). -- Hal >> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c >> index f8f53bb..cb91245 100644 >> +++ b/drivers/infiniband/core/mad.c >> @@ -1974,14 +1974,15 @@ static void ib_mad_complete_recv(struct ib_mad_agent_private *mad_agent_priv, >> unsigned long flags; >> int ret; >> >> + INIT_LIST_HEAD(&mad_recv_wc->rmpp_list); >> ret = ib_mad_enforce_security(mad_agent_priv, >> mad_recv_wc->wc->pkey_index); >> if (ret) { >> ib_free_recv_mad(mad_recv_wc); >> deref_mad_agent(mad_agent_priv); >> + return; >> } >> >> - INIT_LIST_HEAD(&mad_recv_wc->rmpp_list); >> list_add(&mad_recv_wc->recv_buf.list, &mad_recv_wc->rmpp_list); > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/31/2017 7:49 AM, Hal Rosenstock wrote: > On 10/31/2017 12:24 AM, Jason Gunthorpe wrote: >> On Tue, Oct 31, 2017 at 03:16:42AM +0000, Parav Pandit wrote: >> >>> I am yet to review my below patch with Dan as he did most security >>> dev, but I suspect this might be the cause where rmpp list is not >>> initialized and mad processing is continued when security check >>> fails. >> This patch sure looks needed to me, ib_free_recv_mad touches >> rmpp_list, so if it needs initializion then it certainly has to be >> done earlier.. > Agreed. > >> Adding the new return sure makes alot of sense as well.. >> >> Hal, Ira, would you check this routine too? kernel oops's are bad.. > Patch looks needed for just the point that Parav made above (that if > security check fails, then ib_free_recv_mad will cause the > mad_recv_wc->rmpp_list to be accessed so it needs to be initialized > before security is enforced). Agree the patch is needed regardless. > > I don't have mthca to try this. Maybe Chris can try this patch (with > CONFIG_SECURITY_INFINIBAND=y). Chris, are you running with SELinux enabled? If this addresses your issue it means permission is denied, so once the crash is resolved additional policy will be required in order for it to work as expected. > -- Hal > >>> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c >>> index f8f53bb..cb91245 100644 >>> +++ b/drivers/infiniband/core/mad.c >>> @@ -1974,14 +1974,15 @@ static void ib_mad_complete_recv(struct ib_mad_agent_private *mad_agent_priv, >>> unsigned long flags; >>> int ret; >>> >>> + INIT_LIST_HEAD(&mad_recv_wc->rmpp_list); >>> ret = ib_mad_enforce_security(mad_agent_priv, >>> mad_recv_wc->wc->pkey_index); >>> if (ret) { >>> ib_free_recv_mad(mad_recv_wc); >>> deref_mad_agent(mad_agent_priv); >>> + return; >>> } >>> >>> - INIT_LIST_HEAD(&mad_recv_wc->rmpp_list); >>> list_add(&mad_recv_wc->recv_buf.list, &mad_recv_wc->rmpp_list); >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fvger.kernel.org%2Fmajordomo-info.html&data=02%7C01%7Cdanielj%40mellanox.com%7C0c1d5cc98d224d5f21ca08d5205dceb0%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636450509634819485&sdata=RANgNBE48saDXft%2BfCIIS3qWZT8PK5imlXnoCKnWdkU%3D&reserved=0 >> -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Oct 31, 2017 at 10:01:49AM -0500, Daniel Jurgens wrote: > >> Adding the new return sure makes alot of sense as well.. > >> > >> Hal, Ira, would you check this routine too? kernel oops's are bad.. > > Patch looks needed for just the point that Parav made above (that if > > security check fails, then ib_free_recv_mad will cause the > > mad_recv_wc->rmpp_list to be accessed so it needs to be initialized > > before security is enforced). > > Agree the patch is needed regardless. Someone please send it.. > > I don't have mthca to try this. Maybe Chris can try this patch (with > > CONFIG_SECURITY_INFINIBAND=y). > > Chris, are you running with SELinux enabled? If this addresses your issue it means permission is denied, so once the crash is resolved additional policy will be required in order for it to work as expected. If Chris has selinux turned on in his distro would you expect this test to just fail? Doesn't that mean we have missed installing security labels for things like opensm? Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/31/2017 10:09 AM, Jason Gunthorpe wrote: > On Tue, Oct 31, 2017 at 10:01:49AM -0500, Daniel Jurgens wrote: > >>>> Adding the new return sure makes alot of sense as well.. >>>> >>>> Hal, Ira, would you check this routine too? kernel oops's are bad.. >>> Patch looks needed for just the point that Parav made above (that if >>> security check fails, then ib_free_recv_mad will cause the >>> mad_recv_wc->rmpp_list to be accessed so it needs to be initialized >>> before security is enforced). >> Agree the patch is needed regardless. > Someone please send it.. > >>> I don't have mthca to try this. Maybe Chris can try this patch (with >>> CONFIG_SECURITY_INFINIBAND=y). >> Chris, are you running with SELinux enabled? If this addresses your issue it means permission is denied, so once the crash is resolved additional policy will be required in order for it to work as expected. > If Chris has selinux turned on in his distro would you expect this > test to just fail? Doesn't that mean we have missed installing security labels > for things like opensm? I suspect his policy doesn't have have the necessary allow rules for opensm to receive MADs. > Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Oct 31, 2017 at 09:09:01AM -0600, Jason Gunthorpe wrote: > On Tue, Oct 31, 2017 at 10:01:49AM -0500, Daniel Jurgens wrote: > > > >> Adding the new return sure makes alot of sense as well.. > > >> > > >> Hal, Ira, would you check this routine too? kernel oops's are bad.. > > > Patch looks needed for just the point that Parav made above (that if > > > security check fails, then ib_free_recv_mad will cause the > > > mad_recv_wc->rmpp_list to be accessed so it needs to be initialized > > > before security is enforced). > > > > Agree the patch is needed regardless. > > Someone please send it.. Parav/Daniel, Please send it directly to the mailing list. > > > > I don't have mthca to try this. Maybe Chris can try this patch (with > > > CONFIG_SECURITY_INFINIBAND=y). > > > > Chris, are you running with SELinux enabled? If this addresses your issue it means permission is denied, so once the crash is resolved additional policy will be required in order for it to work as expected. > > If Chris has selinux turned on in his distro would you expect this > test to just fail? Doesn't that mean we have missed installing security labels > for things like opensm? Chris has SELinux enabled, see his gist: https://gist.github.com/riptidewave93/b3b83c13e93ab3be4254c855885f5b3a Thanks > > Jason
On Tue, Oct 31, 2017 at 10:15 AM, Leon Romanovsky <leon@kernel.org> wrote: > On Tue, Oct 31, 2017 at 09:09:01AM -0600, Jason Gunthorpe wrote: >> On Tue, Oct 31, 2017 at 10:01:49AM -0500, Daniel Jurgens wrote: >> >> > >> Adding the new return sure makes alot of sense as well.. >> > >> >> > >> Hal, Ira, would you check this routine too? kernel oops's are bad.. >> > > Patch looks needed for just the point that Parav made above (that if >> > > security check fails, then ib_free_recv_mad will cause the >> > > mad_recv_wc->rmpp_list to be accessed so it needs to be initialized >> > > before security is enforced). >> > >> > Agree the patch is needed regardless. >> >> Someone please send it.. > > Parav/Daniel, > > Please send it directly to the mailing list. > >> >> > > I don't have mthca to try this. Maybe Chris can try this patch (with >> > > CONFIG_SECURITY_INFINIBAND=y). >> > >> > Chris, are you running with SELinux enabled? If this addresses your issue it means permission is denied, so once the crash is resolved additional policy will be required in order for it to work as expected. >> >> If Chris has selinux turned on in his distro would you expect this >> test to just fail? Doesn't that mean we have missed installing security labels >> for things like opensm? > > Chris has SELinux enabled, see his gist: https://gist.github.com/riptidewave93/b3b83c13e93ab3be4254c855885f5b3a > > Thanks > >> >> Jason I applied this patch and re-enabled CONFIG_SECURITY_INFINIBAND, and have a kernel build in progress. I will run this for a few days and will report back my findings. As for SELinux, it's not being used on my systems at this time, but I am using apparmor for some lxc containers. Regards, Chris Blake -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/31/2017 10:15 AM, Leon Romanovsky wrote: > On Tue, Oct 31, 2017 at 09:09:01AM -0600, Jason Gunthorpe wrote: >> On Tue, Oct 31, 2017 at 10:01:49AM -0500, Daniel Jurgens wrote: >> >>>>> Adding the new return sure makes alot of sense as well.. >>>>> >>>>> Hal, Ira, would you check this routine too? kernel oops's are bad.. >>>> Patch looks needed for just the point that Parav made above (that if >>>> security check fails, then ib_free_recv_mad will cause the >>>> mad_recv_wc->rmpp_list to be accessed so it needs to be initialized >>>> before security is enforced). >>> Agree the patch is needed regardless. >> Someone please send it.. > Parav/Daniel, > > Please send it directly to the mailing list. > >>>> I don't have mthca to try this. Maybe Chris can try this patch (with >>>> CONFIG_SECURITY_INFINIBAND=y). >>> Chris, are you running with SELinux enabled? If this addresses your issue it means permission is denied, so once the crash is resolved additional policy will be required in order for it to work as expected. >> If Chris has selinux turned on in his distro would you expect this >> test to just fail? Doesn't that mean we have missed installing security labels >> for things like opensm? > Chris has SELinux enabled, see his gist: https://gist.github.com/riptidewave93/b3b83c13e93ab3be4254c855885f5b3a That doesn't indicate if he has SELinux enabled or not, just that CONFIG_SECURITY_INFINIBAND is enabled. Also, even if SELinux enabled in the kernel config it must be turned on via /etc/selinux/config, and also set into enforcing mode, if it were to cause this problem. There's no enough info there to determine any of that. > Thanks > >> Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Oct 31, 2017 at 10:20 AM, Daniel Jurgens <danielj@mellanox.com> wrote: > On 10/31/2017 10:15 AM, Leon Romanovsky wrote: >> On Tue, Oct 31, 2017 at 09:09:01AM -0600, Jason Gunthorpe wrote: >>> On Tue, Oct 31, 2017 at 10:01:49AM -0500, Daniel Jurgens wrote: >>> >>>>>> Adding the new return sure makes alot of sense as well.. >>>>>> >>>>>> Hal, Ira, would you check this routine too? kernel oops's are bad.. >>>>> Patch looks needed for just the point that Parav made above (that if >>>>> security check fails, then ib_free_recv_mad will cause the >>>>> mad_recv_wc->rmpp_list to be accessed so it needs to be initialized >>>>> before security is enforced). >>>> Agree the patch is needed regardless. >>> Someone please send it.. >> Parav/Daniel, >> >> Please send it directly to the mailing list. >> >>>>> I don't have mthca to try this. Maybe Chris can try this patch (with >>>>> CONFIG_SECURITY_INFINIBAND=y). >>>> Chris, are you running with SELinux enabled? If this addresses your issue it means permission is denied, so once the crash is resolved additional policy will be required in order for it to work as expected. >>> If Chris has selinux turned on in his distro would you expect this >>> test to just fail? Doesn't that mean we have missed installing security labels >>> for things like opensm? >> Chris has SELinux enabled, see his gist: https://gist.github.com/riptidewave93/b3b83c13e93ab3be4254c855885f5b3a > > That doesn't indicate if he has SELinux enabled or not, just that CONFIG_SECURITY_INFINIBAND is enabled. Also, even if SELinux enabled in the kernel config it must be turned on via /etc/selinux/config, and also set into enforcing mode, if it were to cause this problem. There's no enough info there to determine any of that. > >> Thanks >> >>> Jason > > Hello All, I have installed the kernel with the mentioned patch, as well as CONFIG_SECURITY_INFINIBAND enabled. Sadly I am back to the issue where my compute node is reporting: kernel: infiniband mthca0: ib_post_send_mad error As soon as I roll back to a kernel with CONFIG_SECURITY_INFINIBAND disabled, the issue goes away and things work as expected. Regards, Chris Blake -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
SGkgQ2hyaXMsDQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogQ2hyaXMg Qmxha2UgW21haWx0bzpjaHJpc3JibGFrZTkzQGdtYWlsLmNvbV0NCj4gU2VudDogVHVlc2RheSwg T2N0b2JlciAzMSwgMjAxNyAxMjo1MCBQTQ0KPiBUbzogRGFuaWVsIEp1cmdlbnMgPGRhbmllbGpA bWVsbGFub3guY29tPg0KPiBDYzogTGVvbiBSb21hbm92c2t5IDxsZW9uQGtlcm5lbC5vcmc+OyBK YXNvbiBHdW50aG9ycGUgPGpnZ0B6aWVwZS5jYT47DQo+IEhhbCBSb3NlbnN0b2NrIDxoYWxAZGV2 Lm1lbGxhbm94LmNvLmlsPjsgUGFyYXYgUGFuZGl0DQo+IDxwYXJhdkBtZWxsYW5veC5jb20+OyBs aW51eC1yZG1hQHZnZXIua2VybmVsLm9yZzsgSGFsIFJvc2Vuc3RvY2sNCj4gPGhhbEBtZWxsYW5v eC5jb20+OyBJcmEgV2VpbnkgPGlyYS53ZWlueUBpbnRlbC5jb20+DQo+IFN1YmplY3Q6IFJlOiA0 LjEzIGliX210aGNhIE5VTEwgcG9pbnRlciBkZXJlZmVyZW5jZSB3aXRoIE9wZW5TTQ0KPiANCj4g T24gVHVlLCBPY3QgMzEsIDIwMTcgYXQgMTA6MjAgQU0sIERhbmllbCBKdXJnZW5zIDxkYW5pZWxq QG1lbGxhbm94LmNvbT4NCj4gd3JvdGU6DQo+ID4gT24gMTAvMzEvMjAxNyAxMDoxNSBBTSwgTGVv biBSb21hbm92c2t5IHdyb3RlOg0KPiA+PiBPbiBUdWUsIE9jdCAzMSwgMjAxNyBhdCAwOTowOTow MUFNIC0wNjAwLCBKYXNvbiBHdW50aG9ycGUgd3JvdGU6DQo+ID4+PiBPbiBUdWUsIE9jdCAzMSwg MjAxNyBhdCAxMDowMTo0OUFNIC0wNTAwLCBEYW5pZWwgSnVyZ2VucyB3cm90ZToNCj4gPj4+DQo+ ID4+Pj4+PiBBZGRpbmcgdGhlIG5ldyByZXR1cm4gc3VyZSBtYWtlcyBhbG90IG9mIHNlbnNlIGFz IHdlbGwuLg0KPiA+Pj4+Pj4NCj4gPj4+Pj4+IEhhbCwgSXJhLCB3b3VsZCB5b3UgY2hlY2sgdGhp cyByb3V0aW5lIHRvbz8ga2VybmVsIG9vcHMncyBhcmUgYmFkLi4NCj4gPj4+Pj4gUGF0Y2ggbG9v a3MgbmVlZGVkIGZvciBqdXN0IHRoZSBwb2ludCB0aGF0IFBhcmF2IG1hZGUgYWJvdmUgKHRoYXQN Cj4gPj4+Pj4gaWYgc2VjdXJpdHkgY2hlY2sgZmFpbHMsIHRoZW4gaWJfZnJlZV9yZWN2X21hZCB3 aWxsIGNhdXNlIHRoZQ0KPiA+Pj4+PiBtYWRfcmVjdl93Yy0+cm1wcF9saXN0IHRvIGJlIGFjY2Vz c2VkIHNvIGl0IG5lZWRzIHRvIGJlDQo+ID4+Pj4+IGluaXRpYWxpemVkIGJlZm9yZSBzZWN1cml0 eSBpcyBlbmZvcmNlZCkuDQo+ID4+Pj4gQWdyZWUgdGhlIHBhdGNoIGlzIG5lZWRlZCByZWdhcmRs ZXNzLg0KPiA+Pj4gU29tZW9uZSBwbGVhc2Ugc2VuZCBpdC4uDQo+ID4+IFBhcmF2L0RhbmllbCwN Cj4gPj4NCj4gPj4gUGxlYXNlIHNlbmQgaXQgZGlyZWN0bHkgdG8gdGhlIG1haWxpbmcgbGlzdC4N Cj4gPj4NCj4gPj4+Pj4gSSBkb24ndCBoYXZlIG10aGNhIHRvIHRyeSB0aGlzLiBNYXliZSBDaHJp cyBjYW4gdHJ5IHRoaXMgcGF0Y2gNCj4gPj4+Pj4gKHdpdGggQ09ORklHX1NFQ1VSSVRZX0lORklO SUJBTkQ9eSkuDQo+ID4+Pj4gQ2hyaXMsIGFyZSB5b3UgcnVubmluZyB3aXRoIFNFTGludXggZW5h YmxlZD8gSWYgdGhpcyBhZGRyZXNzZXMgeW91ciBpc3N1ZSBpdA0KPiBtZWFucyBwZXJtaXNzaW9u IGlzIGRlbmllZCwgc28gb25jZSB0aGUgY3Jhc2ggaXMgcmVzb2x2ZWQgYWRkaXRpb25hbCBwb2xp Y3kgd2lsbA0KPiBiZSByZXF1aXJlZCBpbiBvcmRlciBmb3IgaXQgdG8gd29yayBhcyBleHBlY3Rl ZC4NCj4gPj4+IElmIENocmlzIGhhcyBzZWxpbnV4IHR1cm5lZCBvbiBpbiBoaXMgZGlzdHJvIHdv dWxkIHlvdSBleHBlY3QgdGhpcw0KPiA+Pj4gdGVzdCB0byBqdXN0IGZhaWw/IERvZXNuJ3QgdGhh dCBtZWFuIHdlIGhhdmUgbWlzc2VkIGluc3RhbGxpbmcNCj4gPj4+IHNlY3VyaXR5IGxhYmVscyBm b3IgdGhpbmdzIGxpa2Ugb3BlbnNtPw0KPiA+PiBDaHJpcyBoYXMgU0VMaW51eCBlbmFibGVkLCBz ZWUgaGlzIGdpc3Q6DQo+ID4+IGh0dHBzOi8vZW1lYTAxLnNhZmVsaW5rcy5wcm90ZWN0aW9uLm91 dGxvb2suY29tLz91cmw9aHR0cHMlM0ElMkYlMkZnaQ0KPiA+Pg0KPiBzdC5naXRodWIuY29tJTJG cmlwdGlkZXdhdmU5MyUyRmIzYjgzYzEzZTkzYWIzYmU0MjU0Yzg1NTg4NWY1YjNhJmRhdA0KPiBh DQo+ID4+DQo+ID0wMiU3QzAxJTdDcGFyYXYlNDBtZWxsYW5veC5jb20lN0NmOTY4ZGYwNzJmNzE0 M2NhMWFhNzA4ZDUyMDg3YzZiDQo+IDglN0MNCj4gPj4NCj4gYTY1Mjk3MWM3ZDJlNGQ5YmE2YTRk MTQ5MjU2ZjQ2MWIlN0MwJTdDMCU3QzYzNjQ1MDY4OTg5MTQxNjIzNyZzZA0KPiBhdGE9Vw0KPiA+ PiA3NFFuNGxPczJNQmdWM3VIUFNuZ212dExDeHAlMkY4a0RreVoxY0lJaEdRMCUzRCZyZXNlcnZl ZD0wDQo+ID4NCj4gPiBUaGF0IGRvZXNuJ3QgaW5kaWNhdGUgaWYgaGUgaGFzIFNFTGludXggZW5h YmxlZCBvciBub3QsIGp1c3QgdGhhdA0KPiBDT05GSUdfU0VDVVJJVFlfSU5GSU5JQkFORCBpcyBl bmFibGVkLiAgQWxzbywgZXZlbiBpZiBTRUxpbnV4IGVuYWJsZWQgaW4gdGhlDQo+IGtlcm5lbCBj b25maWcgaXQgbXVzdCBiZSB0dXJuZWQgb24gdmlhIC9ldGMvc2VsaW51eC9jb25maWcsIGFuZCBh bHNvIHNldCBpbnRvDQo+IGVuZm9yY2luZyBtb2RlLCBpZiBpdCB3ZXJlIHRvIGNhdXNlIHRoaXMg cHJvYmxlbS4gIFRoZXJlJ3Mgbm8gZW5vdWdoIGluZm8gdGhlcmUNCj4gdG8gZGV0ZXJtaW5lIGFu eSBvZiB0aGF0Lg0KPiA+DQo+ID4+IFRoYW5rcw0KPiA+Pg0KPiA+Pj4gSmFzb24NCj4gPg0KPiA+ DQo+IA0KPiBIZWxsbyBBbGwsDQo+IA0KPiBJIGhhdmUgaW5zdGFsbGVkIHRoZSBrZXJuZWwgd2l0 aCB0aGUgbWVudGlvbmVkIHBhdGNoLCBhcyB3ZWxsIGFzDQo+IENPTkZJR19TRUNVUklUWV9JTkZJ TklCQU5EIGVuYWJsZWQuIFNhZGx5IEkgYW0gYmFjayB0byB0aGUgaXNzdWUgd2hlcmUgbXkNCj4g Y29tcHV0ZSBub2RlIGlzIHJlcG9ydGluZzoNCj4gDQo+IGtlcm5lbDogaW5maW5pYmFuZCBtdGhj YTA6IGliX3Bvc3Rfc2VuZF9tYWQgZXJyb3INClRoZXJlIHdlcmUgdHdvIGlzc3VlcyBpbiB5b3Vy IHJlcG9ydC4NCjEuIFBvc3Qgc2VuZCBmYWlsdXJlLg0KMi4gS2VybmVsIGNyYXNoIG9uIHJlY2Vp dmluZyBtYWQuDQpJIHByb3ZpZGVkIGZpeCBmb3IgMm5kIGlzc3VlLg0KRG8geW91IHN0aWxsIDJu ZCBpc3N1ZSAoY3Jhc2gpIGFmdGVyIGFwcGx5aW5nIHRoZSBmaXg/DQoNCkZvciAxc3QgcHJvYmxl bSwgRGFuIGhhcyBmZXcgcXVlc3Rpb25zL3N1Z2dlc3Rpb25zIHJlZ2FyZGluZyBjb25maWd1cmF0 aW9uIG9mIHNlbGludXggcG9saWN5Lg0KQ2FuIHlvdSBwbGVhc2UgZ28gdGhyb3VnaCBpdD8NCg0K -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/31/2017 12:49 PM, Chris Blake wrote: > Hello All, > > I have installed the kernel with the mentioned patch, as well as > CONFIG_SECURITY_INFINIBAND enabled. Sadly I am back to the issue where > my compute node is reporting: > > kernel: infiniband mthca0: ib_post_send_mad error > > As soon as I roll back to a kernel with CONFIG_SECURITY_INFINIBAND > disabled, the issue goes away and things work as expected. > > Regards, > Chris Blake Sounds like the crash is resolved and now you're getting a denial from a security module. I looked in the code, it looks like AppArmor doesn't register any callbacks for the ib_* security hooks, and if no hook is registered it should return 0. Can you tell me more about your setup so I can create a reproducer? What OS are you using? Can you double check that SELinux isn't enabled (see output of sestatus). -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Oct 31, 2017 at 1:22 PM, Daniel Jurgens <danielj@mellanox.com> wrote: > On 10/31/2017 12:49 PM, Chris Blake wrote: >> Hello All, >> >> I have installed the kernel with the mentioned patch, as well as >> CONFIG_SECURITY_INFINIBAND enabled. Sadly I am back to the issue where >> my compute node is reporting: >> >> kernel: infiniband mthca0: ib_post_send_mad error >> >> As soon as I roll back to a kernel with CONFIG_SECURITY_INFINIBAND >> disabled, the issue goes away and things work as expected. >> >> Regards, >> Chris Blake > > Sounds like the crash is resolved and now you're getting a denial from a security module. I looked in the code, it looks like AppArmor doesn't register any callbacks for the ib_* security hooks, and if no hook is registered it should return 0. Can you tell me more about your setup so I can create a reproducer? What OS are you using? Can you double check that SELinux isn't enabled (see output of sestatus). > Hello, I am not using SELinux on my system. I do have apparmor, but it is only configured for lxc. # apparmor_status apparmor module is loaded. 5 profiles are loaded. 5 profiles are in enforce mode. /usr/bin/lxc-start lxc-container-default lxc-container-default-cgns lxc-container-default-with-mounting lxc-container-default-with-nesting 0 profiles are in complain mode. 0 processes have profiles defined. 0 processes are in enforce mode. 0 processes are in complain mode. 0 processes are unconfined but have a profile defined. # sestatus SELinux status: disabled As for the crash issue I was seeing for #2, so far I have not been able to replicate it with the patch. :) Regarding my OS, my "NAS" box is running Debian 9.2 with it's default distro kernel (currently 4.9.51-1), and my "Compute" nodes are on Proxmox, which is based on Debian 9.1 and it's kernel is based on ubuntu-artful and is version 4.13.4. Source is at https://git.proxmox.com/?p=pve-kernel.git;a=summary. This is the kernel I have tested the patch on, and have been running with CONFIG_SECURITY_INFINIBAND disabled to resolve the issue. Regards, Chris Blake -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Oct 31, 2017 at 1:29 PM, Chris Blake <chrisrblake93@gmail.com> wrote: > On Tue, Oct 31, 2017 at 1:22 PM, Daniel Jurgens <danielj@mellanox.com> wrote: >> On 10/31/2017 12:49 PM, Chris Blake wrote: >>> Hello All, >>> >>> I have installed the kernel with the mentioned patch, as well as >>> CONFIG_SECURITY_INFINIBAND enabled. Sadly I am back to the issue where >>> my compute node is reporting: >>> >>> kernel: infiniband mthca0: ib_post_send_mad error >>> >>> As soon as I roll back to a kernel with CONFIG_SECURITY_INFINIBAND >>> disabled, the issue goes away and things work as expected. >>> >>> Regards, >>> Chris Blake >> >> Sounds like the crash is resolved and now you're getting a denial from a security module. I looked in the code, it looks like AppArmor doesn't register any callbacks for the ib_* security hooks, and if no hook is registered it should return 0. Can you tell me more about your setup so I can create a reproducer? What OS are you using? Can you double check that SELinux isn't enabled (see output of sestatus). >> > > Hello, > > I am not using SELinux on my system. I do have apparmor, but it is > only configured for lxc. > > # apparmor_status > apparmor module is loaded. > 5 profiles are loaded. > 5 profiles are in enforce mode. > /usr/bin/lxc-start > lxc-container-default > lxc-container-default-cgns > lxc-container-default-with-mounting > lxc-container-default-with-nesting > 0 profiles are in complain mode. > 0 processes have profiles defined. > 0 processes are in enforce mode. > 0 processes are in complain mode. > 0 processes are unconfined but have a profile defined. > > # sestatus > SELinux status: disabled > > As for the crash issue I was seeing for #2, so far I have not been > able to replicate it with the patch. :) > > Regarding my OS, my "NAS" box is running Debian 9.2 with it's default > distro kernel (currently 4.9.51-1), and my "Compute" nodes are on > Proxmox, which is based on Debian 9.1 and it's kernel is based on > ubuntu-artful and is version 4.13.4. Source is at > https://git.proxmox.com/?p=pve-kernel.git;a=summary. This is the > kernel I have tested the patch on, and have been running with > CONFIG_SECURITY_INFINIBAND disabled to resolve the issue. > > Regards, > Chris Blake Hello, Forgot to mention, on my compute nodes I was testing with mainline kernels, such as 4.14.0-rc5, per my original email. If there are any specific kernels you prefer I test with on my NAS or compute nodes, please let me know. Regards, Chris Blake -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Oct 31, 2017 at 12:22 PM, Daniel Jurgens <danielj@mellanox.com> wrote: > Sounds like the crash is resolved and now you're getting a denial > from a security module. I looked in the code, it looks like If Chris was hitting that crash it means his security module failed the mad action, as that is the only way to trigger it. > AppArmor doesn't register any callbacks for the ib_* security hooks, > and if no hook is registered it should return 0. Can you tell me > more about your setup so I can create a reproducer? What OS are you > using? Can you double check that SELinux isn't enabled (see output > of sestatus). Which suggests to me that apparmour is not working properly with the rdma selinux patches?? Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/31/2017 2:23 PM, Jason Gunthorpe wrote: > On Tue, Oct 31, 2017 at 12:22 PM, Daniel Jurgens <danielj@mellanox.com> wrote: > >> Sounds like the crash is resolved and now you're getting a denial >> from a security module. I looked in the code, it looks like > If Chris was hitting that crash it means his security module failed the > mad action, as that is the only way to trigger it. > >> AppArmor doesn't register any callbacks for the ib_* security hooks, >> and if no hook is registered it should return 0. Can you tell me >> more about your setup so I can create a reproducer? What OS are you >> using? Can you double check that SELinux isn't enabled (see output >> of sestatus). > Which suggests to me that apparmour is not working properly with the > rdma selinux patches?? It seems that way. I can't explain it looking at the code, I'm trying to reproduce it for debug. A point of clarification, the RDMA "SELinux" patches aren't SELinux specific. They interact with the LSM, a layer of abstraction between the security modules and the rest of the kernel. Zero or more security modules like SELinux or AppArmor can implement the security hooks. The default return value for the hook in question is 0 if no modules implement it. It doesn't look like AppArmor implements it. > Jason > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Oct 31, 2017 at 2:44 PM, Daniel Jurgens <danielj@mellanox.com> wrote: > On 10/31/2017 2:23 PM, Jason Gunthorpe wrote: >> On Tue, Oct 31, 2017 at 12:22 PM, Daniel Jurgens <danielj@mellanox.com> wrote: >> >>> Sounds like the crash is resolved and now you're getting a denial >>> from a security module. I looked in the code, it looks like >> If Chris was hitting that crash it means his security module failed the >> mad action, as that is the only way to trigger it. >> >>> AppArmor doesn't register any callbacks for the ib_* security hooks, >>> and if no hook is registered it should return 0. Can you tell me >>> more about your setup so I can create a reproducer? What OS are you >>> using? Can you double check that SELinux isn't enabled (see output >>> of sestatus). >> Which suggests to me that apparmour is not working properly with the >> rdma selinux patches?? > > It seems that way. I can't explain it looking at the code, I'm trying to reproduce it for debug. > > A point of clarification, the RDMA "SELinux" patches aren't SELinux specific. They interact with the LSM, a layer of abstraction between the security modules and the rest of the kernel. Zero or more security modules like SELinux or AppArmor can implement the security hooks. The default return value for the hook in question is 0 if no modules implement it. It doesn't look like AppArmor implements it. > >> Jason >> > Hello All, As a follow up to this issue, I went ahead and installed v4.14-rc8 mainline from http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.14-rc8/ to do some testing. I disabled apparmor via the cmdline (I added apparmor=0 security="") and this seems to have worked: root@C6100-1-N4:~# cat /proc/cmdline BOOT_IMAGE=/boot/vmlinuz-4.14.0-041400rc8-generic root=/dev/mapper/pve-root ro apparmor=0 security= quiet root@C6100-1-N4:~# apparmor_status apparmor module is loaded. apparmor filesystem is not mounted. However, I am still having the issue with up_post_send_mad error: root@C6100-1-N4:~# uptime 17:36:30 up 35 min, 1 user, load average: 0.13, 0.06, 0.01 root@C6100-1-N4:~# dmesg | grep "infiniband mthca0: ib_post_send_mad error" | wc -l 848 I will admit that I have little knowledge of apparmor, but from the above if "ib_post_send_mad error" is caused by apparmor as previously suggested, wouldn't disabling it resolve the issue? Let me know if you have any other things you would like me to test. Regards, Chris Blake -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/12/2017 5:38 PM, Chris Blake wrote: > I will admit that I have little knowledge of apparmor, but from the > above if "ib_post_send_mad error" is caused by apparmor as previously > suggested, wouldn't disabling it resolve the issue? > > Let me know if you have any other things you would like me to test. Hi Chris, I sent you a patch in a separate email that will isolate which check is failing. Can you try that and send me the output? Sorry to ask you to do this, I've still not been able to reproduce. > Regards, > Chris Blake -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c index f8f53bb..cb91245 100644 --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -1974,14 +1974,15 @@ static void ib_mad_complete_recv(struct ib_mad_agent_private *mad_agent_priv, unsigned long flags; int ret; + INIT_LIST_HEAD(&mad_recv_wc->rmpp_list); ret = ib_mad_enforce_security(mad_agent_priv, mad_recv_wc->wc->pkey_index); if (ret) { ib_free_recv_mad(mad_recv_wc); deref_mad_agent(mad_agent_priv); + return; } - INIT_LIST_HEAD(&mad_recv_wc->rmpp_list); list_add(&mad_recv_wc->recv_buf.list, &mad_recv_wc->rmpp_list); -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org