diff mbox series

[1/1] RDMA/core: avoid kernel NULL pointer error

Message ID 20190122071821.16174-1-yanjun.zhu@oracle.com (mailing list archive)
State Changes Requested
Delegated to: Jason Gunthorpe
Headers show
Series [1/1] RDMA/core: avoid kernel NULL pointer error | expand

Commit Message

Zhu Yanjun Jan. 22, 2019, 7:18 a.m. UTC
When the interface related with IB device is set to down/up over and
over again, the following call trace will pop out.
"
 Call Trace:
  [<ffffffffa039ff8d>] ib_mad_completion_handler+0x7d/0xa0 [ib_mad]
  [<ffffffff810a1a41>] process_one_work+0x151/0x4b0
  [<ffffffff810a1ec0>] worker_thread+0x120/0x480
  [<ffffffff810a709e>] kthread+0xce/0xf0
  [<ffffffff816e9962>] ret_from_fork+0x42/0x70

 RIP  [<ffffffffa039f926>] ib_mad_recv_done_handler+0x26/0x610 [ib_mad]
"
From vmcore, we can find the following:
"
crash7lates> struct ib_mad_list_head ffff881fb3713400
struct ib_mad_list_head {
  list = {
    next = 0xffff881fb3713800,
    prev = 0xffff881fe01395c0
  },
  mad_queue = 0x0
}
"

Before the call trace, a lot of ib_cancel_mad is sent to the sender.
So it is necessary to check mad_queue in struct ib_mad_list_head to avoid
"kernel NULL pointer" error.

Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
---
 drivers/infiniband/core/mad.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Jason Gunthorpe Jan. 22, 2019, 4:15 p.m. UTC | #1
On Tue, Jan 22, 2019 at 02:18:21AM -0500, Zhu Yanjun wrote:
> When the interface related with IB device is set to down/up over and
> over again, the following call trace will pop out.
> "
>  Call Trace:
>   [<ffffffffa039ff8d>] ib_mad_completion_handler+0x7d/0xa0 [ib_mad]
>   [<ffffffff810a1a41>] process_one_work+0x151/0x4b0
>   [<ffffffff810a1ec0>] worker_thread+0x120/0x480
>   [<ffffffff810a709e>] kthread+0xce/0xf0
>   [<ffffffff816e9962>] ret_from_fork+0x42/0x70
> 
>  RIP  [<ffffffffa039f926>] ib_mad_recv_done_handler+0x26/0x610 [ib_mad]
> "
> From vmcore, we can find the following:
> "
> crash7lates> struct ib_mad_list_head ffff881fb3713400
> struct ib_mad_list_head {
>   list = {
>     next = 0xffff881fb3713800,
>     prev = 0xffff881fe01395c0
>   },
>   mad_queue = 0x0
> }
> "
> 
> Before the call trace, a lot of ib_cancel_mad is sent to the sender.
> So it is necessary to check mad_queue in struct ib_mad_list_head to avoid
> "kernel NULL pointer" error.
> 
> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
>  drivers/infiniband/core/mad.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
> index 7870823bac47..ab5a7d1152ca 100644
> +++ b/drivers/infiniband/core/mad.c
> @@ -2250,6 +2250,17 @@ static void ib_mad_recv_done(struct ib_cq *cq, struct ib_wc *wc)
>  		return;
>  	}
>  
> +	if (unlikely(!mad_list->mad_queue)) {
> +		/*
> +		 * When the interface related with IB device is set to down/up,
> +		 * a lot of ib_cancel_mad packets are sent to the sender. In
> +		 * sender, the mad packets are cancelled.  The receiver will
> +		 * find mad_queue NULL. If the receiver does not test mad_queue,
> +		 * the receiver will crash with "kernel NULL pointer" error.
> +		 */

How does it become null here?

Jason
Zhu Yanjun Jan. 23, 2019, 3:34 a.m. UTC | #2
On 2019/1/23 0:15, Jason Gunthorpe wrote:
> On Tue, Jan 22, 2019 at 02:18:21AM -0500, Zhu Yanjun wrote:
>> When the interface related with IB device is set to down/up over and
>> over again, the following call trace will pop out.
>> "
>>   Call Trace:
>>    [<ffffffffa039ff8d>] ib_mad_completion_handler+0x7d/0xa0 [ib_mad]
>>    [<ffffffff810a1a41>] process_one_work+0x151/0x4b0
>>    [<ffffffff810a1ec0>] worker_thread+0x120/0x480
>>    [<ffffffff810a709e>] kthread+0xce/0xf0
>>    [<ffffffff816e9962>] ret_from_fork+0x42/0x70
>>
>>   RIP  [<ffffffffa039f926>] ib_mad_recv_done_handler+0x26/0x610 [ib_mad]
>> "
>>  From vmcore, we can find the following:
>> "
>> crash7lates> struct ib_mad_list_head ffff881fb3713400
>> struct ib_mad_list_head {
>>    list = {
>>      next = 0xffff881fb3713800,
>>      prev = 0xffff881fe01395c0
>>    },
>>    mad_queue = 0x0
>> }
>> "
>>
>> Before the call trace, a lot of ib_cancel_mad is sent to the sender.
>> So it is necessary to check mad_queue in struct ib_mad_list_head to avoid
>> "kernel NULL pointer" error.
>>
>> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
>>   drivers/infiniband/core/mad.c | 11 +++++++++++
>>   1 file changed, 11 insertions(+)
>>
>> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
>> index 7870823bac47..ab5a7d1152ca 100644
>> +++ b/drivers/infiniband/core/mad.c
>> @@ -2250,6 +2250,17 @@ static void ib_mad_recv_done(struct ib_cq *cq, struct ib_wc *wc)
>>   		return;
>>   	}
>>   
>> +	if (unlikely(!mad_list->mad_queue)) {
>> +		/*
>> +		 * When the interface related with IB device is set to down/up,
>> +		 * a lot of ib_cancel_mad packets are sent to the sender. In
>> +		 * sender, the mad packets are cancelled.  The receiver will
>> +		 * find mad_queue NULL. If the receiver does not test mad_queue,
>> +		 * the receiver will crash with "kernel NULL pointer" error.
>> +		 */
> How does it become null here?
When a lot of ib_cancel_mad packets are sent, from the source code,
ib_cancel_mad->ib_modify_mad, in ib_modify_mad,

"
mad_send_wr->status = IB_WC_WR_FLUSH_ERR
"
Then these ib_cancel_mad packets are sent.

The receiver receives IB_WC_WR_FLUSH_ERR, it will send it to IB device 
to handle it.


So your problem "how mad_queue becomes NULL" should occur in IB device.

IB firmware or HW makes mad_queue become NULL.

In drivers, some error handlers should handle this to avoid kernel crash.

If you need ibstat or lspci information, please let me know.

Zhu Yanjun

>
> Jason
Jason Gunthorpe Jan. 23, 2019, 4:58 a.m. UTC | #3
On Wed, Jan 23, 2019 at 11:34:00AM +0800, Yanjun Zhu wrote:
> 
> On 2019/1/23 0:15, Jason Gunthorpe wrote:
> > On Tue, Jan 22, 2019 at 02:18:21AM -0500, Zhu Yanjun wrote:
> > > When the interface related with IB device is set to down/up over and
> > > over again, the following call trace will pop out.
> > > "
> > >   Call Trace:
> > >    [<ffffffffa039ff8d>] ib_mad_completion_handler+0x7d/0xa0 [ib_mad]
> > >    [<ffffffff810a1a41>] process_one_work+0x151/0x4b0
> > >    [<ffffffff810a1ec0>] worker_thread+0x120/0x480
> > >    [<ffffffff810a709e>] kthread+0xce/0xf0
> > >    [<ffffffff816e9962>] ret_from_fork+0x42/0x70
> > > 
> > >   RIP  [<ffffffffa039f926>] ib_mad_recv_done_handler+0x26/0x610 [ib_mad]
> > > "
> > >  From vmcore, we can find the following:
> > > "
> > > crash7lates> struct ib_mad_list_head ffff881fb3713400
> > > struct ib_mad_list_head {
> > >    list = {
> > >      next = 0xffff881fb3713800,
> > >      prev = 0xffff881fe01395c0
> > >    },
> > >    mad_queue = 0x0
> > > }
> > > "
> > > 
> > > Before the call trace, a lot of ib_cancel_mad is sent to the sender.
> > > So it is necessary to check mad_queue in struct ib_mad_list_head to avoid
> > > "kernel NULL pointer" error.
> > > 
> > > Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
> > >   drivers/infiniband/core/mad.c | 11 +++++++++++
> > >   1 file changed, 11 insertions(+)
> > > 
> > > diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
> > > index 7870823bac47..ab5a7d1152ca 100644
> > > +++ b/drivers/infiniband/core/mad.c
> > > @@ -2250,6 +2250,17 @@ static void ib_mad_recv_done(struct ib_cq *cq, struct ib_wc *wc)
> > >   		return;
> > >   	}
> > > +	if (unlikely(!mad_list->mad_queue)) {
> > > +		/*
> > > +		 * When the interface related with IB device is set to down/up,
> > > +		 * a lot of ib_cancel_mad packets are sent to the sender. In
> > > +		 * sender, the mad packets are cancelled.  The receiver will
> > > +		 * find mad_queue NULL. If the receiver does not test mad_queue,
> > > +		 * the receiver will crash with "kernel NULL pointer" error.
> > > +		 */
> > How does it become null here?
> When a lot of ib_cancel_mad packets are sent, from the source code,
> ib_cancel_mad->ib_modify_mad, in ib_modify_mad,
> 
> "
> mad_send_wr->status = IB_WC_WR_FLUSH_ERR
> "
> Then these ib_cancel_mad packets are sent.
> 
> The receiver receives IB_WC_WR_FLUSH_ERR, it will send it to IB device to
> handle it.
> 
> 
> So your problem "how mad_queue becomes NULL" should occur in IB device.
> 
> IB firmware or HW makes mad_queue become NULL.

It certainly doesn't

Please find out why it is NULL and report back.

Jason
Zhu Yanjun Jan. 23, 2019, 8:55 a.m. UTC | #4
On 2019/1/23 12:58, Jason Gunthorpe wrote:
> On Wed, Jan 23, 2019 at 11:34:00AM +0800, Yanjun Zhu wrote:
>> On 2019/1/23 0:15, Jason Gunthorpe wrote:
>>> On Tue, Jan 22, 2019 at 02:18:21AM -0500, Zhu Yanjun wrote:
>>>> When the interface related with IB device is set to down/up over and
>>>> over again, the following call trace will pop out.
>>>> "
>>>>    Call Trace:
>>>>     [<ffffffffa039ff8d>] ib_mad_completion_handler+0x7d/0xa0 [ib_mad]
>>>>     [<ffffffff810a1a41>] process_one_work+0x151/0x4b0
>>>>     [<ffffffff810a1ec0>] worker_thread+0x120/0x480
>>>>     [<ffffffff810a709e>] kthread+0xce/0xf0
>>>>     [<ffffffff816e9962>] ret_from_fork+0x42/0x70
>>>>
>>>>    RIP  [<ffffffffa039f926>] ib_mad_recv_done_handler+0x26/0x610 [ib_mad]
>>>> "
>>>>   From vmcore, we can find the following:
>>>> "
>>>> crash7lates> struct ib_mad_list_head ffff881fb3713400
>>>> struct ib_mad_list_head {
>>>>     list = {
>>>>       next = 0xffff881fb3713800,
>>>>       prev = 0xffff881fe01395c0
>>>>     },
>>>>     mad_queue = 0x0
>>>> }
>>>> "
>>>>
>>>> Before the call trace, a lot of ib_cancel_mad is sent to the sender.
>>>> So it is necessary to check mad_queue in struct ib_mad_list_head to avoid
>>>> "kernel NULL pointer" error.
>>>>
>>>> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
>>>>    drivers/infiniband/core/mad.c | 11 +++++++++++
>>>>    1 file changed, 11 insertions(+)
>>>>
>>>> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
>>>> index 7870823bac47..ab5a7d1152ca 100644
>>>> +++ b/drivers/infiniband/core/mad.c
>>>> @@ -2250,6 +2250,17 @@ static void ib_mad_recv_done(struct ib_cq *cq, struct ib_wc *wc)
>>>>    		return;
>>>>    	}
>>>> +	if (unlikely(!mad_list->mad_queue)) {
>>>> +		/*
>>>> +		 * When the interface related with IB device is set to down/up,
>>>> +		 * a lot of ib_cancel_mad packets are sent to the sender. In
>>>> +		 * sender, the mad packets are cancelled.  The receiver will
>>>> +		 * find mad_queue NULL. If the receiver does not test mad_queue,
>>>> +		 * the receiver will crash with "kernel NULL pointer" error.
>>>> +		 */
>>> How does it become null here?
>> When a lot of ib_cancel_mad packets are sent, from the source code,
>> ib_cancel_mad->ib_modify_mad, in ib_modify_mad,
>>
>> "
>> mad_send_wr->status = IB_WC_WR_FLUSH_ERR
>> "
>> Then these ib_cancel_mad packets are sent.
>>
>> The receiver receives IB_WC_WR_FLUSH_ERR, it will send it to IB device to
>> handle it.
>>
>>
>> So your problem "how mad_queue becomes NULL" should occur in IB device.
>>
>> IB firmware or HW makes mad_queue become NULL.
> It certainly doesn't
 From the source code,

int ib_send_mad(struct ib_mad_send_wr_private *mad_send_wr)
{
         struct ib_mad_qp_info *qp_info;
         struct list_head *list;
         struct ib_send_wr *bad_send_wr;
         struct ib_mad_agent *mad_agent;
         struct ib_sge *sge;
         unsigned long flags;
         int ret;

         /* Set WR ID to find mad_send_wr upon completion */
         qp_info = mad_send_wr->mad_agent_priv->qp_info;
         mad_send_wr->send_wr.wr_id = (unsigned long)&mad_send_wr->mad_list;
         mad_send_wr->mad_list.mad_queue = 
&qp_info->send_queue;                    <----mad_queue should be the 
send_queue in sender.

         mad_agent = mad_send_wr->send_buf.mad_agent;
         sge = mad_send_wr->sg_list;
         sge[0].addr = ib_dma_map_single(mad_agent->device,
mad_send_wr->send_buf.mad,
                                         sge[0].length,
                                         DMA_TO_DEVICE);

struct ib_mad_qp_info {
         struct ib_mad_port_private *port_priv;
         struct ib_qp *qp;
         struct ib_mad_queue send_queue;    <---send_queue is not a pointer.
         struct ib_mad_queue recv_queue;
         struct list_head overflow_list;
         spinlock_t snoop_lock;
         struct ib_mad_snoop_private **snoop_table;
         int snoop_table_size;
         atomic_t snoop_count;
};

only in static int ib_mad_port_close(struct ib_device *device, int port_num)

This qp_info will be destroyed.

ib_mad_port_close is only called in mad_client remove.

 From the source code,

static void __exit ib_mad_cleanup_module(void)
{
         ib_unregister_client(&mad_client);  <---This will remove 
mad_client.
         kmem_cache_destroy(ib_mad_cache);
}

module_init(ib_mad_init_module);
module_exit(ib_mad_cleanup_module);
This only occurs when mad modules are removed from the kernel.

But from vmcore, the mad functions can be found in it. So mad module is 
still in it.

ib_mad_cleanup_module is not executed. So qp_info->send_queue should not 
be NULL.

 From the driver, mad_queue should not be set to NULL.

This mad_queue is sent to IB device. And IB_WC_WR_FLUSH_ERR is also sent 
to IB device.

So it is very possible that mad_queue is set to NULL in IB device.

If I am missing something, please feel free to let me know.

Zhu Yanjun
>
> Please find out why it is NULL and report back.
>
> Jason
>
Jason Gunthorpe Jan. 23, 2019, 3:49 p.m. UTC | #5
On Wed, Jan 23, 2019 at 04:55:01PM +0800, Yanjun Zhu wrote:

> This mad_queue is sent to IB device. And IB_WC_WR_FLUSH_ERR is also sent to
> IB device.
> 
> So it is very possible that mad_queue is set to NULL in IB device.

No it isn't.

Jason
Zhu Yanjun Jan. 24, 2019, 9:02 a.m. UTC | #6
On 2019/1/23 23:49, Jason Gunthorpe wrote:
> On Wed, Jan 23, 2019 at 04:55:01PM +0800, Yanjun Zhu wrote:
>
>> This mad_queue is sent to IB device. And IB_WC_WR_FLUSH_ERR is also sent to
>> IB device.
>>
>> So it is very possible that mad_queue is set to NULL in IB device.
> No it isn't.

You are so sure about it. :-(

Can you explain whyIB device does not make mad_queue NULL in details?;-)

>
> Jason
Jason Gunthorpe Jan. 24, 2019, 6:18 p.m. UTC | #7
On Thu, Jan 24, 2019 at 05:02:41PM +0800, Yanjun Zhu wrote:
> 
> On 2019/1/23 23:49, Jason Gunthorpe wrote:
> > On Wed, Jan 23, 2019 at 04:55:01PM +0800, Yanjun Zhu wrote:
> > 
> > > This mad_queue is sent to IB device. And IB_WC_WR_FLUSH_ERR is also sent to
> > > IB device.
> > > 
> > > So it is very possible that mad_queue is set to NULL in IB device.
> > No it isn't.
> 
> You are so sure about it. :-(
> 
> Can you explain whyIB device does not make mad_queue NULL in details?;-)

Hardware does not randomly scribble over kernel memory.

Jason
Zhu Yanjun Feb. 14, 2019, 12:03 a.m. UTC | #8
On 2019/1/23 0:15, Jason Gunthorpe wrote:
> On Tue, Jan 22, 2019 at 02:18:21AM -0500, Zhu Yanjun wrote:
>> When the interface related with IB device is set to down/up over and
>> over again, the following call trace will pop out.
>> "
>>   Call Trace:
>>    [<ffffffffa039ff8d>] ib_mad_completion_handler+0x7d/0xa0 [ib_mad]
>>    [<ffffffff810a1a41>] process_one_work+0x151/0x4b0
>>    [<ffffffff810a1ec0>] worker_thread+0x120/0x480
>>    [<ffffffff810a709e>] kthread+0xce/0xf0
>>    [<ffffffff816e9962>] ret_from_fork+0x42/0x70
>>
>>   RIP  [<ffffffffa039f926>] ib_mad_recv_done_handler+0x26/0x610 [ib_mad]
>> "
>>  From vmcore, we can find the following:
>> "
>> crash7lates> struct ib_mad_list_head ffff881fb3713400
>> struct ib_mad_list_head {
>>    list = {
>>      next = 0xffff881fb3713800,
>>      prev = 0xffff881fe01395c0
>>    },
>>    mad_queue = 0x0
>> }
>> "
>>
>> Before the call trace, a lot of ib_cancel_mad is sent to the sender.
>> So it is necessary to check mad_queue in struct ib_mad_list_head to avoid
>> "kernel NULL pointer" error.
>>
>> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
>>   drivers/infiniband/core/mad.c | 11 +++++++++++
>>   1 file changed, 11 insertions(+)
>>
>> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
>> index 7870823bac47..ab5a7d1152ca 100644
>> +++ b/drivers/infiniband/core/mad.c
>> @@ -2250,6 +2250,17 @@ static void ib_mad_recv_done(struct ib_cq *cq, struct ib_wc *wc)
>>   		return;
>>   	}
>>   
>> +	if (unlikely(!mad_list->mad_queue)) {
>> +		/*
>> +		 * When the interface related with IB device is set to down/up,
>> +		 * a lot of ib_cancel_mad packets are sent to the sender. In
>> +		 * sender, the mad packets are cancelled.  The receiver will
>> +		 * find mad_queue NULL. If the receiver does not test mad_queue,
>> +		 * the receiver will crash with "kernel NULL pointer" error.
>> +		 */
> How does it become null here?

Hi, Jason

After upgrading IB switch to version 2.2.9-3, this problem disappears. 
It seems  that IB switch results in this problem.

Thanks,

Zhu Yanjun

>
> Jason
>
Haakon Bugge Feb. 26, 2019, 12:32 p.m. UTC | #9
> On 14 Feb 2019, at 01:03, Yanjun Zhu <yanjun.zhu@oracle.com> wrote:
> 
> 
> On 2019/1/23 0:15, Jason Gunthorpe wrote:
>> On Tue, Jan 22, 2019 at 02:18:21AM -0500, Zhu Yanjun wrote:
>>> When the interface related with IB device is set to down/up over and
>>> over again, the following call trace will pop out.
>>> "
>>>  Call Trace:
>>>   [<ffffffffa039ff8d>] ib_mad_completion_handler+0x7d/0xa0 [ib_mad]
>>>   [<ffffffff810a1a41>] process_one_work+0x151/0x4b0
>>>   [<ffffffff810a1ec0>] worker_thread+0x120/0x480
>>>   [<ffffffff810a709e>] kthread+0xce/0xf0
>>>   [<ffffffff816e9962>] ret_from_fork+0x42/0x70
>>> 
>>>  RIP  [<ffffffffa039f926>] ib_mad_recv_done_handler+0x26/0x610 [ib_mad]
>>> "
>>> From vmcore, we can find the following:
>>> "
>>> crash7lates> struct ib_mad_list_head ffff881fb3713400
>>> struct ib_mad_list_head {
>>>   list = {
>>>     next = 0xffff881fb3713800,
>>>     prev = 0xffff881fe01395c0
>>>   },
>>>   mad_queue = 0x0
>>> }
>>> "
>>> 
>>> Before the call trace, a lot of ib_cancel_mad is sent to the sender.
>>> So it is necessary to check mad_queue in struct ib_mad_list_head to avoid
>>> "kernel NULL pointer" error.
>>> 
>>> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
>>>  drivers/infiniband/core/mad.c | 11 +++++++++++
>>>  1 file changed, 11 insertions(+)
>>> 
>>> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
>>> index 7870823bac47..ab5a7d1152ca 100644
>>> +++ b/drivers/infiniband/core/mad.c
>>> @@ -2250,6 +2250,17 @@ static void ib_mad_recv_done(struct ib_cq *cq, struct ib_wc *wc)
>>>  		return;
>>>  	}
>>>  +	if (unlikely(!mad_list->mad_queue)) {
>>> +		/*
>>> +		 * When the interface related with IB device is set to down/up,
>>> +		 * a lot of ib_cancel_mad packets are sent to the sender. In
>>> +		 * sender, the mad packets are cancelled.  The receiver will
>>> +		 * find mad_queue NULL. If the receiver does not test mad_queue,
>>> +		 * the receiver will crash with "kernel NULL pointer" error.
>>> +		 */
>> How does it become null here?
> 
> Hi, Jason
> 
> After upgrading IB switch to version 2.2.9-3, this problem disappears. It seems  that IB switch results in this problem.


Hi Yanjun,

I would like to rephrase: After changing the fw in the IB switch (where SA and OpenSM runs), we are not exposed to the bug any more.

This seems very much to be a bug. A kernel shall not crash - and that shall hold true independent of external conditions, such as fw versions in switches.

So, I have two requests for you:

1. Please see if the bug can be reproed with an upstream kernel.

2. (Assuming yes to the above), your commit doesn't fix the problem, it just diminishes it. If mad_list->mad_queue can become NULL asynchronously to ib_mad_recv_done() being called, it can become NULL just after you tested it to be non-NULL, right?


Thxs, Håkon




> 
> Thanks,
> 
> Zhu Yanjun
> 
>> 
>> Jason
>>
Zhu Yanjun Feb. 27, 2019, 1:43 a.m. UTC | #10
On 2019/2/26 20:32, Håkon Bugge wrote:
>
>> On 14 Feb 2019, at 01:03, Yanjun Zhu <yanjun.zhu@oracle.com> wrote:
>>
>>
>> On 2019/1/23 0:15, Jason Gunthorpe wrote:
>>> On Tue, Jan 22, 2019 at 02:18:21AM -0500, Zhu Yanjun wrote:
>>>> When the interface related with IB device is set to down/up over and
>>>> over again, the following call trace will pop out.
>>>> "
>>>>   Call Trace:
>>>>    [<ffffffffa039ff8d>] ib_mad_completion_handler+0x7d/0xa0 [ib_mad]
>>>>    [<ffffffff810a1a41>] process_one_work+0x151/0x4b0
>>>>    [<ffffffff810a1ec0>] worker_thread+0x120/0x480
>>>>    [<ffffffff810a709e>] kthread+0xce/0xf0
>>>>    [<ffffffff816e9962>] ret_from_fork+0x42/0x70
>>>>
>>>>   RIP  [<ffffffffa039f926>] ib_mad_recv_done_handler+0x26/0x610 [ib_mad]
>>>> "
>>>>  From vmcore, we can find the following:
>>>> "
>>>> crash7lates> struct ib_mad_list_head ffff881fb3713400
>>>> struct ib_mad_list_head {
>>>>    list = {
>>>>      next = 0xffff881fb3713800,
>>>>      prev = 0xffff881fe01395c0
>>>>    },
>>>>    mad_queue = 0x0
>>>> }
>>>> "
>>>>
>>>> Before the call trace, a lot of ib_cancel_mad is sent to the sender.
>>>> So it is necessary to check mad_queue in struct ib_mad_list_head to avoid
>>>> "kernel NULL pointer" error.
>>>>
>>>> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
>>>>   drivers/infiniband/core/mad.c | 11 +++++++++++
>>>>   1 file changed, 11 insertions(+)
>>>>
>>>> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
>>>> index 7870823bac47..ab5a7d1152ca 100644
>>>> +++ b/drivers/infiniband/core/mad.c
>>>> @@ -2250,6 +2250,17 @@ static void ib_mad_recv_done(struct ib_cq *cq, struct ib_wc *wc)
>>>>   		return;
>>>>   	}
>>>>   +	if (unlikely(!mad_list->mad_queue)) {
>>>> +		/*
>>>> +		 * When the interface related with IB device is set to down/up,
>>>> +		 * a lot of ib_cancel_mad packets are sent to the sender. In
>>>> +		 * sender, the mad packets are cancelled.  The receiver will
>>>> +		 * find mad_queue NULL. If the receiver does not test mad_queue,
>>>> +		 * the receiver will crash with "kernel NULL pointer" error.
>>>> +		 */
>>> How does it become null here?
>> Hi, Jason
>>
>> After upgrading IB switch to version 2.2.9-3, this problem disappears. It seems  that IB switch results in this problem.
>
> Hi Yanjun,
>
> I would like to rephrase: After changing the fw in the IB switch (where SA and OpenSM runs), we are not exposed to the bug any more.
>
> This seems very much to be a bug. A kernel shall not crash - and that shall hold true independent of external conditions, such as fw versions in switches.
>
> So, I have two requests for you:
>
> 1. Please see if the bug can be reproed with an upstream kernel.

Hi, Haakon

Thanks a lot for your review this patch.

I have no test environment in LAB to reproduce this bug. When the bug 
occurred, I made tests in the customer's host with this patch. And this 
patch can fix this bug and do not introduce new problems. So I think 
this patch is a fix or workaround to this  bug.

And I analyzed the vmcore. Then I found that the packet that caused this 
bug was from a IB switch. And I checked the source code (kernel 2.6.x) 
of IB switch. I find that there is difference in mad between IB switch 
and the hosts(kernel 4.x). This is why I suspect IB switch.

The customer's hosts are the production machine. I suspect that the 
customer can borrow his hosts for us to make tests.

Any way, this patch can fix this problem since this problem disappears 
and no new problem appears after this patch is applied.

As to the root cause, I will make further investigations to find it.:-)

Zhu Yanjun

>
> 2. (Assuming yes to the above), your commit doesn't fix the problem, it just diminishes it. If mad_list->mad_queue can become NULL asynchronously to ib_mad_recv_done() being called, it can become NULL just after you tested it to be non-NULL, right?
>
>
> Thxs, Håkon
>
>
>
>
>> Thanks,
>>
>> Zhu Yanjun
>>
>>> Jason
>>>
diff mbox series

Patch

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 7870823bac47..ab5a7d1152ca 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -2250,6 +2250,17 @@  static void ib_mad_recv_done(struct ib_cq *cq, struct ib_wc *wc)
 		return;
 	}
 
+	if (unlikely(!mad_list->mad_queue)) {
+		/*
+		 * When the interface related with IB device is set to down/up,
+		 * a lot of ib_cancel_mad packets are sent to the sender. In
+		 * sender, the mad packets are cancelled.  The receiver will
+		 * find mad_queue NULL. If the receiver does not test mad_queue,
+		 * the receiver will crash with "kernel NULL pointer" error.
+		 */
+		return;
+	}
+
 	qp_info = mad_list->mad_queue->qp_info;
 	dequeue_mad(mad_list);