diff mbox

4.13 ib_mthca NULL pointer dereference with OpenSM

Message ID VI1PR0502MB300860F7C004ABFB01079F53D15E0@VI1PR0502MB3008.eurprd05.prod.outlook.com (mailing list archive)
State Superseded
Headers show

Commit Message

Parav Pandit Oct. 31, 2017, 3:16 a.m. UTC
Hi Jason,

> -----Original Message-----
> From: Jason Gunthorpe [mailto:jgg@ziepe.ca]
> Sent: Monday, October 30, 2017 6:02 PM
> To: Chris Blake <chrisrblake93@gmail.com>
> Cc: Leon Romanovsky <leon@kernel.org>; linux-rdma@vger.kernel.org; Parav
> Pandit <parav@mellanox.com>
> Subject: Re: 4.13 ib_mthca NULL pointer dereference with OpenSM
> 
> On Mon, Oct 30, 2017 at 01:39:42PM -0500, Chris Blake wrote:
> > On Mon, Oct 30, 2017 at 2:19 AM, Leon Romanovsky <leon@kernel.org>
> wrote:
> > >
> > > Can you please try to set CONFIG_SECURITY_INFINIBAND=n and see if it
> > > helps?
> > >
> > > Thanks
> > >
> >
> > Hello Leon,
> >
> > I went ahead and set CONFIG_SECURITY_INFINIBAND=n in my kernel, and so
> > far the issue seems resolved. I will run this for a week or so and
> > will get back to you, but things are looking promising. :)
> 
> I certainly don't expect this setting to break any drivers..
>
I looked the back trace - happening in freeing ib_free_recv_mad().
It doesn't look a driver issue certainly.

Post_send failure seems to indicate that security enforcement checks likely would have failed on QP0/1.
I tried ib_ipoib and rping with 4.13.10 and ConnectX4 but that didn't help with reproduction.
I tried injecting error locally on recv mad, based on suspect and I was able to crash a host and with below patch I was able to avoid it.

I am yet to review my below patch with Dan as he did most security dev, but I suspect this might be the cause where rmpp list is not initialized and mad processing is continued when security check fails.

Let see if Chris has same issue or different one.

Chris,
Can you try below patch and see if that avoids the crash?

More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Jason Gunthorpe Oct. 31, 2017, 4:24 a.m. UTC | #1
On Tue, Oct 31, 2017 at 03:16:42AM +0000, Parav Pandit wrote:

> I am yet to review my below patch with Dan as he did most security
> dev, but I suspect this might be the cause where rmpp list is not
> initialized and mad processing is continued when security check
> fails.

This patch sure looks needed to me, ib_free_recv_mad touches
rmpp_list, so if it needs initializion then it certainly has to be
done earlier..

Adding the new return sure makes alot of sense as well..

Hal, Ira, would you check this routine too? kernel oops's are bad..

> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
> index f8f53bb..cb91245 100644
> +++ b/drivers/infiniband/core/mad.c
> @@ -1974,14 +1974,15 @@ static void ib_mad_complete_recv(struct ib_mad_agent_private *mad_agent_priv,
>         unsigned long flags;
>         int ret;
> 
> +       INIT_LIST_HEAD(&mad_recv_wc->rmpp_list);
>         ret = ib_mad_enforce_security(mad_agent_priv,
>                                       mad_recv_wc->wc->pkey_index);
>         if (ret) {
>                 ib_free_recv_mad(mad_recv_wc);
>                 deref_mad_agent(mad_agent_priv);
> +               return;
>         }
> 
> -       INIT_LIST_HEAD(&mad_recv_wc->rmpp_list);
>         list_add(&mad_recv_wc->recv_buf.list, &mad_recv_wc->rmpp_list);
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Leon Romanovsky Oct. 31, 2017, 5:11 a.m. UTC | #2
On Mon, Oct 30, 2017 at 10:24:35PM -0600, Jason Gunthorpe wrote:
> On Tue, Oct 31, 2017 at 03:16:42AM +0000, Parav Pandit wrote:
>
> > I am yet to review my below patch with Dan as he did most security
> > dev, but I suspect this might be the cause where rmpp list is not
> > initialized and mad processing is continued when security check
> > fails.
>
> This patch sure looks needed to me, ib_free_recv_mad touches
> rmpp_list, so if it needs initializion then it certainly has to be
> done earlier..

Right, it aligns with my analysis too.

>
> Adding the new return sure makes alot of sense as well..
>
> Hal, Ira, would you check this routine too? kernel oops's are bad..
>
> > diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
> > index f8f53bb..cb91245 100644
> > +++ b/drivers/infiniband/core/mad.c
> > @@ -1974,14 +1974,15 @@ static void ib_mad_complete_recv(struct ib_mad_agent_private *mad_agent_priv,
> >         unsigned long flags;
> >         int ret;
> >
> > +       INIT_LIST_HEAD(&mad_recv_wc->rmpp_list);
> >         ret = ib_mad_enforce_security(mad_agent_priv,
> >                                       mad_recv_wc->wc->pkey_index);
> >         if (ret) {
> >                 ib_free_recv_mad(mad_recv_wc);
> >                 deref_mad_agent(mad_agent_priv);
> > +               return;
> >         }
> >
> > -       INIT_LIST_HEAD(&mad_recv_wc->rmpp_list);
> >         list_add(&mad_recv_wc->recv_buf.list, &mad_recv_wc->rmpp_list);
Hal Rosenstock Oct. 31, 2017, 12:49 p.m. UTC | #3
On 10/31/2017 12:24 AM, Jason Gunthorpe wrote:
> On Tue, Oct 31, 2017 at 03:16:42AM +0000, Parav Pandit wrote:
> 
>> I am yet to review my below patch with Dan as he did most security
>> dev, but I suspect this might be the cause where rmpp list is not
>> initialized and mad processing is continued when security check
>> fails.
> 
> This patch sure looks needed to me, ib_free_recv_mad touches
> rmpp_list, so if it needs initializion then it certainly has to be
> done earlier..
Agreed.

> Adding the new return sure makes alot of sense as well..
> 
> Hal, Ira, would you check this routine too? kernel oops's are bad..

Patch looks needed for just the point that Parav made above (that if
security check fails, then ib_free_recv_mad will cause the
mad_recv_wc->rmpp_list to be accessed so it needs to be initialized
before security is enforced).

I don't have mthca to try this. Maybe Chris can try this patch (with
CONFIG_SECURITY_INFINIBAND=y).

-- Hal

>> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
>> index f8f53bb..cb91245 100644
>> +++ b/drivers/infiniband/core/mad.c
>> @@ -1974,14 +1974,15 @@ static void ib_mad_complete_recv(struct ib_mad_agent_private *mad_agent_priv,
>>         unsigned long flags;
>>         int ret;
>>
>> +       INIT_LIST_HEAD(&mad_recv_wc->rmpp_list);
>>         ret = ib_mad_enforce_security(mad_agent_priv,
>>                                       mad_recv_wc->wc->pkey_index);
>>         if (ret) {
>>                 ib_free_recv_mad(mad_recv_wc);
>>                 deref_mad_agent(mad_agent_priv);
>> +               return;
>>         }
>>
>> -       INIT_LIST_HEAD(&mad_recv_wc->rmpp_list);
>>         list_add(&mad_recv_wc->recv_buf.list, &mad_recv_wc->rmpp_list);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Jurgens Oct. 31, 2017, 3:01 p.m. UTC | #4
On 10/31/2017 7:49 AM, Hal Rosenstock wrote:
> On 10/31/2017 12:24 AM, Jason Gunthorpe wrote:
>> On Tue, Oct 31, 2017 at 03:16:42AM +0000, Parav Pandit wrote:
>>
>>> I am yet to review my below patch with Dan as he did most security
>>> dev, but I suspect this might be the cause where rmpp list is not
>>> initialized and mad processing is continued when security check
>>> fails.
>> This patch sure looks needed to me, ib_free_recv_mad touches
>> rmpp_list, so if it needs initializion then it certainly has to be
>> done earlier..
> Agreed.
>
>> Adding the new return sure makes alot of sense as well..
>>
>> Hal, Ira, would you check this routine too? kernel oops's are bad..
> Patch looks needed for just the point that Parav made above (that if
> security check fails, then ib_free_recv_mad will cause the
> mad_recv_wc->rmpp_list to be accessed so it needs to be initialized
> before security is enforced).

Agree the patch is needed regardless.

>
> I don't have mthca to try this. Maybe Chris can try this patch (with
> CONFIG_SECURITY_INFINIBAND=y).

Chris, are you running with SELinux enabled? If this addresses your issue it means permission is denied, so once the crash is resolved additional policy will be required in order for it to work as expected.

> -- Hal
>
>>> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
>>> index f8f53bb..cb91245 100644
>>> +++ b/drivers/infiniband/core/mad.c
>>> @@ -1974,14 +1974,15 @@ static void ib_mad_complete_recv(struct ib_mad_agent_private *mad_agent_priv,
>>>         unsigned long flags;
>>>         int ret;
>>>
>>> +       INIT_LIST_HEAD(&mad_recv_wc->rmpp_list);
>>>         ret = ib_mad_enforce_security(mad_agent_priv,
>>>                                       mad_recv_wc->wc->pkey_index);
>>>         if (ret) {
>>>                 ib_free_recv_mad(mad_recv_wc);
>>>                 deref_mad_agent(mad_agent_priv);
>>> +               return;
>>>         }
>>>
>>> -       INIT_LIST_HEAD(&mad_recv_wc->rmpp_list);
>>>         list_add(&mad_recv_wc->recv_buf.list, &mad_recv_wc->rmpp_list);
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fvger.kernel.org%2Fmajordomo-info.html&data=02%7C01%7Cdanielj%40mellanox.com%7C0c1d5cc98d224d5f21ca08d5205dceb0%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636450509634819485&sdata=RANgNBE48saDXft%2BfCIIS3qWZT8PK5imlXnoCKnWdkU%3D&reserved=0
>>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe Oct. 31, 2017, 3:09 p.m. UTC | #5
On Tue, Oct 31, 2017 at 10:01:49AM -0500, Daniel Jurgens wrote:

> >> Adding the new return sure makes alot of sense as well..
> >>
> >> Hal, Ira, would you check this routine too? kernel oops's are bad..
> > Patch looks needed for just the point that Parav made above (that if
> > security check fails, then ib_free_recv_mad will cause the
> > mad_recv_wc->rmpp_list to be accessed so it needs to be initialized
> > before security is enforced).
> 
> Agree the patch is needed regardless.

Someone please send it..

> > I don't have mthca to try this. Maybe Chris can try this patch (with
> > CONFIG_SECURITY_INFINIBAND=y).
> 
> Chris, are you running with SELinux enabled? If this addresses your issue it means permission is denied, so once the crash is resolved additional policy will be required in order for it to work as expected.

If Chris has selinux turned on in his distro would you expect this
test to just fail? Doesn't that mean we have missed installing security labels
for things like opensm?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Jurgens Oct. 31, 2017, 3:12 p.m. UTC | #6
On 10/31/2017 10:09 AM, Jason Gunthorpe wrote:
> On Tue, Oct 31, 2017 at 10:01:49AM -0500, Daniel Jurgens wrote:
>
>>>> Adding the new return sure makes alot of sense as well..
>>>>
>>>> Hal, Ira, would you check this routine too? kernel oops's are bad..
>>> Patch looks needed for just the point that Parav made above (that if
>>> security check fails, then ib_free_recv_mad will cause the
>>> mad_recv_wc->rmpp_list to be accessed so it needs to be initialized
>>> before security is enforced).
>> Agree the patch is needed regardless.
> Someone please send it..
>
>>> I don't have mthca to try this. Maybe Chris can try this patch (with
>>> CONFIG_SECURITY_INFINIBAND=y).
>> Chris, are you running with SELinux enabled? If this addresses your issue it means permission is denied, so once the crash is resolved additional policy will be required in order for it to work as expected.
> If Chris has selinux turned on in his distro would you expect this
> test to just fail? Doesn't that mean we have missed installing security labels
> for things like opensm?
I suspect his policy doesn't have have the necessary allow rules for opensm to receive MADs.
> Jason


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Leon Romanovsky Oct. 31, 2017, 3:15 p.m. UTC | #7
On Tue, Oct 31, 2017 at 09:09:01AM -0600, Jason Gunthorpe wrote:
> On Tue, Oct 31, 2017 at 10:01:49AM -0500, Daniel Jurgens wrote:
>
> > >> Adding the new return sure makes alot of sense as well..
> > >>
> > >> Hal, Ira, would you check this routine too? kernel oops's are bad..
> > > Patch looks needed for just the point that Parav made above (that if
> > > security check fails, then ib_free_recv_mad will cause the
> > > mad_recv_wc->rmpp_list to be accessed so it needs to be initialized
> > > before security is enforced).
> >
> > Agree the patch is needed regardless.
>
> Someone please send it..

Parav/Daniel,

Please send it directly to the mailing list.

>
> > > I don't have mthca to try this. Maybe Chris can try this patch (with
> > > CONFIG_SECURITY_INFINIBAND=y).
> >
> > Chris, are you running with SELinux enabled? If this addresses your issue it means permission is denied, so once the crash is resolved additional policy will be required in order for it to work as expected.
>
> If Chris has selinux turned on in his distro would you expect this
> test to just fail? Doesn't that mean we have missed installing security labels
> for things like opensm?

Chris has SELinux enabled, see his gist: https://gist.github.com/riptidewave93/b3b83c13e93ab3be4254c855885f5b3a

Thanks

>
> Jason
Chris Oct. 31, 2017, 3:20 p.m. UTC | #8
On Tue, Oct 31, 2017 at 10:15 AM, Leon Romanovsky <leon@kernel.org> wrote:
> On Tue, Oct 31, 2017 at 09:09:01AM -0600, Jason Gunthorpe wrote:
>> On Tue, Oct 31, 2017 at 10:01:49AM -0500, Daniel Jurgens wrote:
>>
>> > >> Adding the new return sure makes alot of sense as well..
>> > >>
>> > >> Hal, Ira, would you check this routine too? kernel oops's are bad..
>> > > Patch looks needed for just the point that Parav made above (that if
>> > > security check fails, then ib_free_recv_mad will cause the
>> > > mad_recv_wc->rmpp_list to be accessed so it needs to be initialized
>> > > before security is enforced).
>> >
>> > Agree the patch is needed regardless.
>>
>> Someone please send it..
>
> Parav/Daniel,
>
> Please send it directly to the mailing list.
>
>>
>> > > I don't have mthca to try this. Maybe Chris can try this patch (with
>> > > CONFIG_SECURITY_INFINIBAND=y).
>> >
>> > Chris, are you running with SELinux enabled? If this addresses your issue it means permission is denied, so once the crash is resolved additional policy will be required in order for it to work as expected.
>>
>> If Chris has selinux turned on in his distro would you expect this
>> test to just fail? Doesn't that mean we have missed installing security labels
>> for things like opensm?
>
> Chris has SELinux enabled, see his gist: https://gist.github.com/riptidewave93/b3b83c13e93ab3be4254c855885f5b3a
>
> Thanks
>
>>
>> Jason

I applied this patch and re-enabled CONFIG_SECURITY_INFINIBAND, and
have a kernel build in progress. I will run this for a few days and
will report back my findings.

As for SELinux, it's not being used on my systems at this time, but I
am using apparmor for some lxc containers.

Regards,
Chris Blake
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Jurgens Oct. 31, 2017, 3:20 p.m. UTC | #9
On 10/31/2017 10:15 AM, Leon Romanovsky wrote:
> On Tue, Oct 31, 2017 at 09:09:01AM -0600, Jason Gunthorpe wrote:
>> On Tue, Oct 31, 2017 at 10:01:49AM -0500, Daniel Jurgens wrote:
>>
>>>>> Adding the new return sure makes alot of sense as well..
>>>>>
>>>>> Hal, Ira, would you check this routine too? kernel oops's are bad..
>>>> Patch looks needed for just the point that Parav made above (that if
>>>> security check fails, then ib_free_recv_mad will cause the
>>>> mad_recv_wc->rmpp_list to be accessed so it needs to be initialized
>>>> before security is enforced).
>>> Agree the patch is needed regardless.
>> Someone please send it..
> Parav/Daniel,
>
> Please send it directly to the mailing list.
>
>>>> I don't have mthca to try this. Maybe Chris can try this patch (with
>>>> CONFIG_SECURITY_INFINIBAND=y).
>>> Chris, are you running with SELinux enabled? If this addresses your issue it means permission is denied, so once the crash is resolved additional policy will be required in order for it to work as expected.
>> If Chris has selinux turned on in his distro would you expect this
>> test to just fail? Doesn't that mean we have missed installing security labels
>> for things like opensm?
> Chris has SELinux enabled, see his gist: https://gist.github.com/riptidewave93/b3b83c13e93ab3be4254c855885f5b3a

That doesn't indicate if he has SELinux enabled or not, just that CONFIG_SECURITY_INFINIBAND is enabled.  Also, even if SELinux enabled in the kernel config it must be turned on via /etc/selinux/config, and also set into enforcing mode, if it were to cause this problem.  There's no enough info there to determine any of that.

> Thanks
>
>> Jason


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Oct. 31, 2017, 5:49 p.m. UTC | #10
On Tue, Oct 31, 2017 at 10:20 AM, Daniel Jurgens <danielj@mellanox.com> wrote:
> On 10/31/2017 10:15 AM, Leon Romanovsky wrote:
>> On Tue, Oct 31, 2017 at 09:09:01AM -0600, Jason Gunthorpe wrote:
>>> On Tue, Oct 31, 2017 at 10:01:49AM -0500, Daniel Jurgens wrote:
>>>
>>>>>> Adding the new return sure makes alot of sense as well..
>>>>>>
>>>>>> Hal, Ira, would you check this routine too? kernel oops's are bad..
>>>>> Patch looks needed for just the point that Parav made above (that if
>>>>> security check fails, then ib_free_recv_mad will cause the
>>>>> mad_recv_wc->rmpp_list to be accessed so it needs to be initialized
>>>>> before security is enforced).
>>>> Agree the patch is needed regardless.
>>> Someone please send it..
>> Parav/Daniel,
>>
>> Please send it directly to the mailing list.
>>
>>>>> I don't have mthca to try this. Maybe Chris can try this patch (with
>>>>> CONFIG_SECURITY_INFINIBAND=y).
>>>> Chris, are you running with SELinux enabled? If this addresses your issue it means permission is denied, so once the crash is resolved additional policy will be required in order for it to work as expected.
>>> If Chris has selinux turned on in his distro would you expect this
>>> test to just fail? Doesn't that mean we have missed installing security labels
>>> for things like opensm?
>> Chris has SELinux enabled, see his gist: https://gist.github.com/riptidewave93/b3b83c13e93ab3be4254c855885f5b3a
>
> That doesn't indicate if he has SELinux enabled or not, just that CONFIG_SECURITY_INFINIBAND is enabled.  Also, even if SELinux enabled in the kernel config it must be turned on via /etc/selinux/config, and also set into enforcing mode, if it were to cause this problem.  There's no enough info there to determine any of that.
>
>> Thanks
>>
>>> Jason
>
>

Hello All,

I have installed the kernel with the mentioned patch, as well as
CONFIG_SECURITY_INFINIBAND enabled. Sadly I am back to the issue where
my compute node is reporting:

kernel: infiniband mthca0: ib_post_send_mad error

As soon as I roll back to a kernel with CONFIG_SECURITY_INFINIBAND
disabled, the issue goes away and things work as expected.

Regards,
Chris Blake
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Parav Pandit Oct. 31, 2017, 5:58 p.m. UTC | #11
SGkgQ2hyaXMsDQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogQ2hyaXMg
Qmxha2UgW21haWx0bzpjaHJpc3JibGFrZTkzQGdtYWlsLmNvbV0NCj4gU2VudDogVHVlc2RheSwg
T2N0b2JlciAzMSwgMjAxNyAxMjo1MCBQTQ0KPiBUbzogRGFuaWVsIEp1cmdlbnMgPGRhbmllbGpA
bWVsbGFub3guY29tPg0KPiBDYzogTGVvbiBSb21hbm92c2t5IDxsZW9uQGtlcm5lbC5vcmc+OyBK
YXNvbiBHdW50aG9ycGUgPGpnZ0B6aWVwZS5jYT47DQo+IEhhbCBSb3NlbnN0b2NrIDxoYWxAZGV2
Lm1lbGxhbm94LmNvLmlsPjsgUGFyYXYgUGFuZGl0DQo+IDxwYXJhdkBtZWxsYW5veC5jb20+OyBs
aW51eC1yZG1hQHZnZXIua2VybmVsLm9yZzsgSGFsIFJvc2Vuc3RvY2sNCj4gPGhhbEBtZWxsYW5v
eC5jb20+OyBJcmEgV2VpbnkgPGlyYS53ZWlueUBpbnRlbC5jb20+DQo+IFN1YmplY3Q6IFJlOiA0
LjEzIGliX210aGNhIE5VTEwgcG9pbnRlciBkZXJlZmVyZW5jZSB3aXRoIE9wZW5TTQ0KPiANCj4g
T24gVHVlLCBPY3QgMzEsIDIwMTcgYXQgMTA6MjAgQU0sIERhbmllbCBKdXJnZW5zIDxkYW5pZWxq
QG1lbGxhbm94LmNvbT4NCj4gd3JvdGU6DQo+ID4gT24gMTAvMzEvMjAxNyAxMDoxNSBBTSwgTGVv
biBSb21hbm92c2t5IHdyb3RlOg0KPiA+PiBPbiBUdWUsIE9jdCAzMSwgMjAxNyBhdCAwOTowOTow
MUFNIC0wNjAwLCBKYXNvbiBHdW50aG9ycGUgd3JvdGU6DQo+ID4+PiBPbiBUdWUsIE9jdCAzMSwg
MjAxNyBhdCAxMDowMTo0OUFNIC0wNTAwLCBEYW5pZWwgSnVyZ2VucyB3cm90ZToNCj4gPj4+DQo+
ID4+Pj4+PiBBZGRpbmcgdGhlIG5ldyByZXR1cm4gc3VyZSBtYWtlcyBhbG90IG9mIHNlbnNlIGFz
IHdlbGwuLg0KPiA+Pj4+Pj4NCj4gPj4+Pj4+IEhhbCwgSXJhLCB3b3VsZCB5b3UgY2hlY2sgdGhp
cyByb3V0aW5lIHRvbz8ga2VybmVsIG9vcHMncyBhcmUgYmFkLi4NCj4gPj4+Pj4gUGF0Y2ggbG9v
a3MgbmVlZGVkIGZvciBqdXN0IHRoZSBwb2ludCB0aGF0IFBhcmF2IG1hZGUgYWJvdmUgKHRoYXQN
Cj4gPj4+Pj4gaWYgc2VjdXJpdHkgY2hlY2sgZmFpbHMsIHRoZW4gaWJfZnJlZV9yZWN2X21hZCB3
aWxsIGNhdXNlIHRoZQ0KPiA+Pj4+PiBtYWRfcmVjdl93Yy0+cm1wcF9saXN0IHRvIGJlIGFjY2Vz
c2VkIHNvIGl0IG5lZWRzIHRvIGJlDQo+ID4+Pj4+IGluaXRpYWxpemVkIGJlZm9yZSBzZWN1cml0
eSBpcyBlbmZvcmNlZCkuDQo+ID4+Pj4gQWdyZWUgdGhlIHBhdGNoIGlzIG5lZWRlZCByZWdhcmRs
ZXNzLg0KPiA+Pj4gU29tZW9uZSBwbGVhc2Ugc2VuZCBpdC4uDQo+ID4+IFBhcmF2L0RhbmllbCwN
Cj4gPj4NCj4gPj4gUGxlYXNlIHNlbmQgaXQgZGlyZWN0bHkgdG8gdGhlIG1haWxpbmcgbGlzdC4N
Cj4gPj4NCj4gPj4+Pj4gSSBkb24ndCBoYXZlIG10aGNhIHRvIHRyeSB0aGlzLiBNYXliZSBDaHJp
cyBjYW4gdHJ5IHRoaXMgcGF0Y2gNCj4gPj4+Pj4gKHdpdGggQ09ORklHX1NFQ1VSSVRZX0lORklO
SUJBTkQ9eSkuDQo+ID4+Pj4gQ2hyaXMsIGFyZSB5b3UgcnVubmluZyB3aXRoIFNFTGludXggZW5h
YmxlZD8gSWYgdGhpcyBhZGRyZXNzZXMgeW91ciBpc3N1ZSBpdA0KPiBtZWFucyBwZXJtaXNzaW9u
IGlzIGRlbmllZCwgc28gb25jZSB0aGUgY3Jhc2ggaXMgcmVzb2x2ZWQgYWRkaXRpb25hbCBwb2xp
Y3kgd2lsbA0KPiBiZSByZXF1aXJlZCBpbiBvcmRlciBmb3IgaXQgdG8gd29yayBhcyBleHBlY3Rl
ZC4NCj4gPj4+IElmIENocmlzIGhhcyBzZWxpbnV4IHR1cm5lZCBvbiBpbiBoaXMgZGlzdHJvIHdv
dWxkIHlvdSBleHBlY3QgdGhpcw0KPiA+Pj4gdGVzdCB0byBqdXN0IGZhaWw/IERvZXNuJ3QgdGhh
dCBtZWFuIHdlIGhhdmUgbWlzc2VkIGluc3RhbGxpbmcNCj4gPj4+IHNlY3VyaXR5IGxhYmVscyBm
b3IgdGhpbmdzIGxpa2Ugb3BlbnNtPw0KPiA+PiBDaHJpcyBoYXMgU0VMaW51eCBlbmFibGVkLCBz
ZWUgaGlzIGdpc3Q6DQo+ID4+IGh0dHBzOi8vZW1lYTAxLnNhZmVsaW5rcy5wcm90ZWN0aW9uLm91
dGxvb2suY29tLz91cmw9aHR0cHMlM0ElMkYlMkZnaQ0KPiA+Pg0KPiBzdC5naXRodWIuY29tJTJG
cmlwdGlkZXdhdmU5MyUyRmIzYjgzYzEzZTkzYWIzYmU0MjU0Yzg1NTg4NWY1YjNhJmRhdA0KPiBh
DQo+ID4+DQo+ID0wMiU3QzAxJTdDcGFyYXYlNDBtZWxsYW5veC5jb20lN0NmOTY4ZGYwNzJmNzE0
M2NhMWFhNzA4ZDUyMDg3YzZiDQo+IDglN0MNCj4gPj4NCj4gYTY1Mjk3MWM3ZDJlNGQ5YmE2YTRk
MTQ5MjU2ZjQ2MWIlN0MwJTdDMCU3QzYzNjQ1MDY4OTg5MTQxNjIzNyZzZA0KPiBhdGE9Vw0KPiA+
PiA3NFFuNGxPczJNQmdWM3VIUFNuZ212dExDeHAlMkY4a0RreVoxY0lJaEdRMCUzRCZyZXNlcnZl
ZD0wDQo+ID4NCj4gPiBUaGF0IGRvZXNuJ3QgaW5kaWNhdGUgaWYgaGUgaGFzIFNFTGludXggZW5h
YmxlZCBvciBub3QsIGp1c3QgdGhhdA0KPiBDT05GSUdfU0VDVVJJVFlfSU5GSU5JQkFORCBpcyBl
bmFibGVkLiAgQWxzbywgZXZlbiBpZiBTRUxpbnV4IGVuYWJsZWQgaW4gdGhlDQo+IGtlcm5lbCBj
b25maWcgaXQgbXVzdCBiZSB0dXJuZWQgb24gdmlhIC9ldGMvc2VsaW51eC9jb25maWcsIGFuZCBh
bHNvIHNldCBpbnRvDQo+IGVuZm9yY2luZyBtb2RlLCBpZiBpdCB3ZXJlIHRvIGNhdXNlIHRoaXMg
cHJvYmxlbS4gIFRoZXJlJ3Mgbm8gZW5vdWdoIGluZm8gdGhlcmUNCj4gdG8gZGV0ZXJtaW5lIGFu
eSBvZiB0aGF0Lg0KPiA+DQo+ID4+IFRoYW5rcw0KPiA+Pg0KPiA+Pj4gSmFzb24NCj4gPg0KPiA+
DQo+IA0KPiBIZWxsbyBBbGwsDQo+IA0KPiBJIGhhdmUgaW5zdGFsbGVkIHRoZSBrZXJuZWwgd2l0
aCB0aGUgbWVudGlvbmVkIHBhdGNoLCBhcyB3ZWxsIGFzDQo+IENPTkZJR19TRUNVUklUWV9JTkZJ
TklCQU5EIGVuYWJsZWQuIFNhZGx5IEkgYW0gYmFjayB0byB0aGUgaXNzdWUgd2hlcmUgbXkNCj4g
Y29tcHV0ZSBub2RlIGlzIHJlcG9ydGluZzoNCj4gDQo+IGtlcm5lbDogaW5maW5pYmFuZCBtdGhj
YTA6IGliX3Bvc3Rfc2VuZF9tYWQgZXJyb3INClRoZXJlIHdlcmUgdHdvIGlzc3VlcyBpbiB5b3Vy
IHJlcG9ydC4NCjEuIFBvc3Qgc2VuZCBmYWlsdXJlLg0KMi4gS2VybmVsIGNyYXNoIG9uIHJlY2Vp
dmluZyBtYWQuDQpJIHByb3ZpZGVkIGZpeCBmb3IgMm5kIGlzc3VlLg0KRG8geW91IHN0aWxsIDJu
ZCBpc3N1ZSAoY3Jhc2gpIGFmdGVyIGFwcGx5aW5nIHRoZSBmaXg/DQoNCkZvciAxc3QgcHJvYmxl
bSwgRGFuIGhhcyBmZXcgcXVlc3Rpb25zL3N1Z2dlc3Rpb25zIHJlZ2FyZGluZyBjb25maWd1cmF0
aW9uIG9mIHNlbGludXggcG9saWN5Lg0KQ2FuIHlvdSBwbGVhc2UgZ28gdGhyb3VnaCBpdD8NCg0K
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Jurgens Oct. 31, 2017, 6:22 p.m. UTC | #12
On 10/31/2017 12:49 PM, Chris Blake wrote:
> Hello All,
>
> I have installed the kernel with the mentioned patch, as well as
> CONFIG_SECURITY_INFINIBAND enabled. Sadly I am back to the issue where
> my compute node is reporting:
>
> kernel: infiniband mthca0: ib_post_send_mad error
>
> As soon as I roll back to a kernel with CONFIG_SECURITY_INFINIBAND
> disabled, the issue goes away and things work as expected.
>
> Regards,
> Chris Blake

Sounds like the crash is resolved and now you're getting a denial from a security module.  I looked in the code, it looks like AppArmor doesn't register any callbacks for the ib_* security hooks, and if no hook is registered it should return 0.  Can you tell me more about your setup so I can create a reproducer? What OS are you using? Can you double check that SELinux isn't enabled (see output of sestatus).

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Oct. 31, 2017, 6:29 p.m. UTC | #13
On Tue, Oct 31, 2017 at 1:22 PM, Daniel Jurgens <danielj@mellanox.com> wrote:
> On 10/31/2017 12:49 PM, Chris Blake wrote:
>> Hello All,
>>
>> I have installed the kernel with the mentioned patch, as well as
>> CONFIG_SECURITY_INFINIBAND enabled. Sadly I am back to the issue where
>> my compute node is reporting:
>>
>> kernel: infiniband mthca0: ib_post_send_mad error
>>
>> As soon as I roll back to a kernel with CONFIG_SECURITY_INFINIBAND
>> disabled, the issue goes away and things work as expected.
>>
>> Regards,
>> Chris Blake
>
> Sounds like the crash is resolved and now you're getting a denial from a security module.  I looked in the code, it looks like AppArmor doesn't register any callbacks for the ib_* security hooks, and if no hook is registered it should return 0.  Can you tell me more about your setup so I can create a reproducer? What OS are you using? Can you double check that SELinux isn't enabled (see output of sestatus).
>

Hello,

I am not using SELinux on my system. I do have apparmor, but it is
only configured for lxc.

# apparmor_status
apparmor module is loaded.
5 profiles are loaded.
5 profiles are in enforce mode.
   /usr/bin/lxc-start
   lxc-container-default
   lxc-container-default-cgns
   lxc-container-default-with-mounting
   lxc-container-default-with-nesting
0 profiles are in complain mode.
0 processes have profiles defined.
0 processes are in enforce mode.
0 processes are in complain mode.
0 processes are unconfined but have a profile defined.

# sestatus
SELinux status:                 disabled

As for the crash issue I was seeing for #2, so far I have not been
able to replicate it with the patch. :)

Regarding my OS, my "NAS" box is running Debian 9.2 with it's default
distro kernel (currently 4.9.51-1), and my "Compute" nodes are on
Proxmox, which is based on Debian 9.1 and it's kernel is based on
ubuntu-artful and is version 4.13.4. Source is at
https://git.proxmox.com/?p=pve-kernel.git;a=summary. This is the
kernel I have tested the patch on, and have been running with
CONFIG_SECURITY_INFINIBAND disabled to resolve the issue.

Regards,
Chris Blake
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Oct. 31, 2017, 6:36 p.m. UTC | #14
On Tue, Oct 31, 2017 at 1:29 PM, Chris Blake <chrisrblake93@gmail.com> wrote:
> On Tue, Oct 31, 2017 at 1:22 PM, Daniel Jurgens <danielj@mellanox.com> wrote:
>> On 10/31/2017 12:49 PM, Chris Blake wrote:
>>> Hello All,
>>>
>>> I have installed the kernel with the mentioned patch, as well as
>>> CONFIG_SECURITY_INFINIBAND enabled. Sadly I am back to the issue where
>>> my compute node is reporting:
>>>
>>> kernel: infiniband mthca0: ib_post_send_mad error
>>>
>>> As soon as I roll back to a kernel with CONFIG_SECURITY_INFINIBAND
>>> disabled, the issue goes away and things work as expected.
>>>
>>> Regards,
>>> Chris Blake
>>
>> Sounds like the crash is resolved and now you're getting a denial from a security module.  I looked in the code, it looks like AppArmor doesn't register any callbacks for the ib_* security hooks, and if no hook is registered it should return 0.  Can you tell me more about your setup so I can create a reproducer? What OS are you using? Can you double check that SELinux isn't enabled (see output of sestatus).
>>
>
> Hello,
>
> I am not using SELinux on my system. I do have apparmor, but it is
> only configured for lxc.
>
> # apparmor_status
> apparmor module is loaded.
> 5 profiles are loaded.
> 5 profiles are in enforce mode.
>    /usr/bin/lxc-start
>    lxc-container-default
>    lxc-container-default-cgns
>    lxc-container-default-with-mounting
>    lxc-container-default-with-nesting
> 0 profiles are in complain mode.
> 0 processes have profiles defined.
> 0 processes are in enforce mode.
> 0 processes are in complain mode.
> 0 processes are unconfined but have a profile defined.
>
> # sestatus
> SELinux status:                 disabled
>
> As for the crash issue I was seeing for #2, so far I have not been
> able to replicate it with the patch. :)
>
> Regarding my OS, my "NAS" box is running Debian 9.2 with it's default
> distro kernel (currently 4.9.51-1), and my "Compute" nodes are on
> Proxmox, which is based on Debian 9.1 and it's kernel is based on
> ubuntu-artful and is version 4.13.4. Source is at
> https://git.proxmox.com/?p=pve-kernel.git;a=summary. This is the
> kernel I have tested the patch on, and have been running with
> CONFIG_SECURITY_INFINIBAND disabled to resolve the issue.
>
> Regards,
> Chris Blake

Hello,

Forgot to mention, on my compute nodes I was testing with mainline
kernels, such as 4.14.0-rc5, per my original email. If there are any
specific kernels you prefer I test with on my NAS or compute nodes,
please let me know.

Regards,
Chris Blake
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe Oct. 31, 2017, 7:23 p.m. UTC | #15
On Tue, Oct 31, 2017 at 12:22 PM, Daniel Jurgens <danielj@mellanox.com> wrote:

> Sounds like the crash is resolved and now you're getting a denial
> from a security module.  I looked in the code, it looks like

If Chris was hitting that crash it means his security module failed the
mad action, as that is the only way to trigger it.

> AppArmor doesn't register any callbacks for the ib_* security hooks,
> and if no hook is registered it should return 0.  Can you tell me
> more about your setup so I can create a reproducer? What OS are you
> using? Can you double check that SELinux isn't enabled (see output
> of sestatus).

Which suggests to me that apparmour is not working properly with the
rdma selinux patches??

Jason

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Jurgens Oct. 31, 2017, 7:44 p.m. UTC | #16
On 10/31/2017 2:23 PM, Jason Gunthorpe wrote:
> On Tue, Oct 31, 2017 at 12:22 PM, Daniel Jurgens <danielj@mellanox.com> wrote:
>
>> Sounds like the crash is resolved and now you're getting a denial
>> from a security module.  I looked in the code, it looks like
> If Chris was hitting that crash it means his security module failed the
> mad action, as that is the only way to trigger it.
>
>> AppArmor doesn't register any callbacks for the ib_* security hooks,
>> and if no hook is registered it should return 0.  Can you tell me
>> more about your setup so I can create a reproducer? What OS are you
>> using? Can you double check that SELinux isn't enabled (see output
>> of sestatus).
> Which suggests to me that apparmour is not working properly with the
> rdma selinux patches??

It seems that way.  I can't explain it looking at the code, I'm trying to reproduce it for debug.

 A point of clarification, the RDMA "SELinux" patches aren't SELinux specific.  They interact with the LSM, a layer of abstraction between the security modules and the rest of the kernel.   Zero or more security modules like SELinux or AppArmor can implement the security hooks.  The default return value for the hook in question is 0 if no modules implement it.  It doesn't look like AppArmor implements it.

> Jason
>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Nov. 12, 2017, 11:38 p.m. UTC | #17
On Tue, Oct 31, 2017 at 2:44 PM, Daniel Jurgens <danielj@mellanox.com> wrote:
> On 10/31/2017 2:23 PM, Jason Gunthorpe wrote:
>> On Tue, Oct 31, 2017 at 12:22 PM, Daniel Jurgens <danielj@mellanox.com> wrote:
>>
>>> Sounds like the crash is resolved and now you're getting a denial
>>> from a security module.  I looked in the code, it looks like
>> If Chris was hitting that crash it means his security module failed the
>> mad action, as that is the only way to trigger it.
>>
>>> AppArmor doesn't register any callbacks for the ib_* security hooks,
>>> and if no hook is registered it should return 0.  Can you tell me
>>> more about your setup so I can create a reproducer? What OS are you
>>> using? Can you double check that SELinux isn't enabled (see output
>>> of sestatus).
>> Which suggests to me that apparmour is not working properly with the
>> rdma selinux patches??
>
> It seems that way.  I can't explain it looking at the code, I'm trying to reproduce it for debug.
>
>  A point of clarification, the RDMA "SELinux" patches aren't SELinux specific.  They interact with the LSM, a layer of abstraction between the security modules and the rest of the kernel.   Zero or more security modules like SELinux or AppArmor can implement the security hooks.  The default return value for the hook in question is 0 if no modules implement it.  It doesn't look like AppArmor implements it.
>
>> Jason
>>
>

Hello All,

As a follow up to this issue, I went ahead and installed v4.14-rc8
mainline from http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.14-rc8/
to do some testing. I disabled apparmor via the cmdline (I added
apparmor=0 security="") and this seems to have worked:

root@C6100-1-N4:~# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-4.14.0-041400rc8-generic
root=/dev/mapper/pve-root ro apparmor=0 security= quiet
root@C6100-1-N4:~# apparmor_status
apparmor module is loaded.
apparmor filesystem is not mounted.

However, I am still having the issue with up_post_send_mad error:

root@C6100-1-N4:~# uptime
 17:36:30 up 35 min,  1 user,  load average: 0.13, 0.06, 0.01
root@C6100-1-N4:~# dmesg | grep "infiniband mthca0: ib_post_send_mad
error" | wc -l
848

I will admit that I have little knowledge of apparmor, but from the
above if "ib_post_send_mad error" is caused by apparmor as previously
suggested, wouldn't disabling it resolve the issue?

Let me know if you have any other things you would like me to test.

Regards,
Chris Blake
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Jurgens Nov. 13, 2017, 5:47 p.m. UTC | #18
On 11/12/2017 5:38 PM, Chris Blake wrote:
> I will admit that I have little knowledge of apparmor, but from the
> above if "ib_post_send_mad error" is caused by apparmor as previously
> suggested, wouldn't disabling it resolve the issue?
>
> Let me know if you have any other things you would like me to test.

Hi Chris, I sent you a patch in a separate email that will isolate which check is failing. Can you try that and send me the output? Sorry to ask you to do this, I've still not been able to reproduce.

> Regards,
> Chris Blake


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index f8f53bb..cb91245 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -1974,14 +1974,15 @@  static void ib_mad_complete_recv(struct ib_mad_agent_private *mad_agent_priv,
        unsigned long flags;
        int ret;

+       INIT_LIST_HEAD(&mad_recv_wc->rmpp_list);
        ret = ib_mad_enforce_security(mad_agent_priv,
                                      mad_recv_wc->wc->pkey_index);
        if (ret) {
                ib_free_recv_mad(mad_recv_wc);
                deref_mad_agent(mad_agent_priv);
+               return;
        }

-       INIT_LIST_HEAD(&mad_recv_wc->rmpp_list);
        list_add(&mad_recv_wc->recv_buf.list, &mad_recv_wc->rmpp_list);
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org