diff mbox

[rdma-next,v7,0/8] RDMA resource tracking

Message ID 20180130033436.GA17053@ziepe.ca (mailing list archive)
State Not Applicable
Headers show

Commit Message

Jason Gunthorpe Jan. 30, 2018, 3:34 a.m. UTC
On Mon, Jan 29, 2018 at 03:11:53PM -0500, Doug Ledford wrote:
> On Sun, 2018-01-28 at 14:05 -0700, Jason Gunthorpe wrote:
> > On Sun, Jan 28, 2018 at 11:17:17AM +0200, Leon Romanovsky wrote:
> > 
> > > The original goal of this series was to allow ability to view connection
> > > (QP) information about running processes, however I used this opportunity and
> > > created common infrastructure to track and report various resources. The report
> > > part is implemented in netlink (nldev), but smart ULPs can now create
> > > advanced usage models based on device utilization.
> > > 
> > > The current implementation relies on one lock per-object per-device, so
> > > creation/destroying of various objects (CQ, PD, e.t.c) on various or the
> > > same devices doesn't interfere each with another.
> > > 
> > > The data protection is performed with SRCU and its reader-writer model
> > > ensures that resource won't be destroyed till readers will finish their
> > > work.
> > 
> > Well, this cover letter isn't quite right anymore.. but no matter.
> > 
> > My small comments aside it looks OK to me.
> 
> Likewise.  I'm happy with it at this point.

Okay, I fixed up the small things and applied the patches to for-next

Leon: Please validate I didn't screw it up. Here is the diff against
what you sent:

- Success path on the main execution flow, not under an if
- constify static structure
- Remove confusing comment about locking, ib_enum_all_devs
  obtains locks to iterate its list and rdma_restrack_count holds
  res->rwsem so everything is accounted for directly without
  trickyness
- Speeling
- Remove extra lock in rdma_restrack_del
- Restore pd = NULL in ib_create_xrc_qp. This scraed me a bit, xrc is
  wonky. But ib_create_xrc_q is only called in cases where
  rdma_restrack_add is not added, so keeping things as-they-are should
  not impact restrack. If restrack needs the pd for a XRC someday it
  should get it from qp->real_qp
- Remove SET/NEW/DEL cargo cult, please send a patch for rest?

Branch is here:

https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/log/?h=wip/jgg-for-next

Still unhappy with the kref-as-not-a-kref.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Leon Romanovsky Jan. 30, 2018, 9:16 a.m. UTC | #1
On Mon, Jan 29, 2018 at 08:34:36PM -0700, Jason Gunthorpe wrote:
> On Mon, Jan 29, 2018 at 03:11:53PM -0500, Doug Ledford wrote:
> > On Sun, 2018-01-28 at 14:05 -0700, Jason Gunthorpe wrote:
> > > On Sun, Jan 28, 2018 at 11:17:17AM +0200, Leon Romanovsky wrote:
> > >
> > > > The original goal of this series was to allow ability to view connection
> > > > (QP) information about running processes, however I used this opportunity and
> > > > created common infrastructure to track and report various resources. The report
> > > > part is implemented in netlink (nldev), but smart ULPs can now create
> > > > advanced usage models based on device utilization.
> > > >
> > > > The current implementation relies on one lock per-object per-device, so
> > > > creation/destroying of various objects (CQ, PD, e.t.c) on various or the
> > > > same devices doesn't interfere each with another.
> > > >
> > > > The data protection is performed with SRCU and its reader-writer model
> > > > ensures that resource won't be destroyed till readers will finish their
> > > > work.
> > >
> > > Well, this cover letter isn't quite right anymore.. but no matter.
> > >
> > > My small comments aside it looks OK to me.
> >
> > Likewise.  I'm happy with it at this point.
>
> Okay, I fixed up the small things and applied the patches to for-next
>
> Leon: Please validate I didn't screw it up. Here is the diff against
> what you sent:
>
> - Success path on the main execution flow, not under an if
> - constify static structure
> - Remove confusing comment about locking, ib_enum_all_devs
>   obtains locks to iterate its list and rdma_restrack_count holds
>   res->rwsem so everything is accounted for directly without
>   trickyness
> - Speeling
> - Remove extra lock in rdma_restrack_del
> - Restore pd = NULL in ib_create_xrc_qp. This scraed me a bit, xrc is
>   wonky. But ib_create_xrc_q is only called in cases where
>   rdma_restrack_add is not added, so keeping things as-they-are should
>   not impact restrack. If restrack needs the pd for a XRC someday it
>   should get it from qp->real_qp
> - Remove SET/NEW/DEL cargo cult, please send a patch for rest?
>
> Branch is here:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/log/?h=wip/jgg-for-next
>
> Still unhappy with the kref-as-not-a-kref.

Thanks Doug and Jason for accepting it.

The "qp->pd = NULL" assignment is not needed. PD is NULL
for the XRCD and you are setting "qp->pd = pd" before returning
from _ib_create_qp() call, so it is actually the same.

Everything works as expected, it passed my tests with ud/rc/xsrq
pingpongs and shutdowns during traffic.

Steve, I created the stable tag for you: rdma-next-2018-01-30

Thanks
Steve Wise Jan. 30, 2018, 3:21 p.m. UTC | #2
> > Branch is here:
> >
> >
>
https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/log/?h=wip/jgg
-
> for-next
> >
> > Still unhappy with the kref-as-not-a-kref.
> 
> Thanks Doug and Jason for accepting it.
> 
> The "qp->pd = NULL" assignment is not needed. PD is NULL
> for the XRCD and you are setting "qp->pd = pd" before returning
> from _ib_create_qp() call, so it is actually the same.
> 
> Everything works as expected, it passed my tests with ud/rc/xsrq
> pingpongs and shutdowns during traffic.
> 
> Steve, I created the stable tag for you: rdma-next-2018-01-30
> 
> Thanks

Thanks guys,  Q: 4.15-rc2 kills my machine, I've been rebasing to a later rc
to continue my development work, but the rebase fails horribly on mlx5.  I
skip these failures and don't compile mlx5 because I don't need it.  

Anyway, will rdma-next be rebased on 4.15.0 soon?  Or does that wait until
-rc2...



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe Jan. 30, 2018, 3:56 p.m. UTC | #3
On Tue, Jan 30, 2018 at 09:21:30AM -0600, Steve Wise wrote:
> > > Branch is here:
> > >
> > >
> >
> https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/log/?h=wip/jgg
> -
> > for-next
> > >
> > > Still unhappy with the kref-as-not-a-kref.
> > 
> > Thanks Doug and Jason for accepting it.
> > 
> > The "qp->pd = NULL" assignment is not needed. PD is NULL
> > for the XRCD and you are setting "qp->pd = pd" before returning
> > from _ib_create_qp() call, so it is actually the same.
> > 
> > Everything works as expected, it passed my tests with ud/rc/xsrq
> > pingpongs and shutdowns during traffic.
> > 
> > Steve, I created the stable tag for you: rdma-next-2018-01-30
> > 
> > Thanks
> 
> Thanks guys,  Q: 4.15-rc2 kills my machine, I've been rebasing to a later rc
> to continue my development work, but the rebase fails horribly on mlx5.  I
> skip these failures and don't compile mlx5 because I don't need it.  
> 
> Anyway, will rdma-next be rebased on 4.15.0 soon?  Or does that wait until
> -rc2...

It is annoying for all concerned but Linus has asked us not to have
needless merges. Hopefully we can build enough trust to relax on that
down the road as other trees do.

I have made a merge to v4.15 here:

https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/log/?h=wip/for-linus-merged

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steve Wise Jan. 30, 2018, 4:16 p.m. UTC | #4
> > Thanks guys,  Q: 4.15-rc2 kills my machine, I've been rebasing to a
later rc
> > to continue my development work, but the rebase fails horribly on mlx5.
I
> > skip these failures and don't compile mlx5 because I don't need it.
> >
> > Anyway, will rdma-next be rebased on 4.15.0 soon?  Or does that wait
until
> > -rc2...
> 
> It is annoying for all concerned but Linus has asked us not to have
> needless merges. Hopefully we can build enough trust to relax on that
> down the road as other trees do.
> 
> I have made a merge to v4.15 here:
> 
>
https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/log/?h=wip/for
-
> linus-merged

What is this a merge of exactly? I don't see the restrack stuff, for
instance.

Thanks,

Steve.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe Jan. 30, 2018, 4:33 p.m. UTC | #5
On Tue, Jan 30, 2018 at 10:16:01AM -0600, Steve Wise wrote:

> What is this a merge of exactly? I don't see the restrack stuff, for
> instance.

Yesterday's for-next. You could merge it with the latest for-next..

I updated it.

I think we are done now, so for-next is what will be sent as the
pull-request and for-next-merged is the conflict resolution.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche Jan. 30, 2018, 7:07 p.m. UTC | #6
On Tue, 2018-01-30 at 09:33 -0700, Jason Gunthorpe wrote:
> On Tue, Jan 30, 2018 at 10:16:01AM -0600, Steve Wise wrote:

> 

> > What is this a merge of exactly? I don't see the restrack stuff, for

> > instance.

> 

> Yesterday's for-next. You could merge it with the latest for-next..

> 

> I updated it.

> 

> I think we are done now, so for-next is what will be sent as the

> pull-request and for-next-merged is the conflict resolution.


Hello Jason,

Although I have not yet tried to root-cause this, I want to let you know
that with your for-linus-merged branch the following error message is
reported if I try to run the srp-test software against the rdma_rxe driver:

id_ext=0x505400fffe4a0b7b,ioc_guid=0x505400fffe4a0b7b,dest=192.168.122.76:5555,t
arget_can_queue=1,queue_size=32,max_cmd_per_lun=32,max_sect=131072 >/sys/class/i
nfiniband_srp/srp-rxe0-1/add_target failed: Cannot allocate memory

In the kernel log I found the following:

Jan 30 10:55:50 ubuntu-vm kernel: scsi host4: ib_srp: FR pool allocation failed (-12)

With your for-next branch from a few days ago the same test ran fine.

Thanks,

Bart.
Jason Gunthorpe Jan. 30, 2018, 7:46 p.m. UTC | #7
On Tue, Jan 30, 2018 at 07:07:35PM +0000, Bart Van Assche wrote:
> On Tue, 2018-01-30 at 09:33 -0700, Jason Gunthorpe wrote:
> > On Tue, Jan 30, 2018 at 10:16:01AM -0600, Steve Wise wrote:
> > 
> > > What is this a merge of exactly? I don't see the restrack stuff, for
> > > instance.
> > 
> > Yesterday's for-next. You could merge it with the latest for-next..
> > 
> > I updated it.
> > 
> > I think we are done now, so for-next is what will be sent as the
> > pull-request and for-next-merged is the conflict resolution.
> 
> Hello Jason,
> 
> Although I have not yet tried to root-cause this, I want to let you know
> that with your for-linus-merged branch the following error message is
> reported if I try to run the srp-test software against the rdma_rxe driver:
> 
> id_ext=0x505400fffe4a0b7b,ioc_guid=0x505400fffe4a0b7b,dest=192.168.122.76:5555,t
> arget_can_queue=1,queue_size=32,max_cmd_per_lun=32,max_sect=131072 >/sys/class/i
> nfiniband_srp/srp-rxe0-1/add_target failed: Cannot allocate memory
> 
> In the kernel log I found the following:
> 
> Jan 30 10:55:50 ubuntu-vm kernel: scsi host4: ib_srp: FR pool allocation failed (-12)
> 
> With your for-next branch from a few days ago the same test ran fine.

I don't have a guess for you..

The difference between for-next and merged is only the inclusion of
v4.15? Could some v4.15 non-rdma code be causing issue here?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche Jan. 30, 2018, 8:42 p.m. UTC | #8
On Tue, 2018-01-30 at 12:46 -0700, Jason Gunthorpe wrote:
> On Tue, Jan 30, 2018 at 07:07:35PM +0000, Bart Van Assche wrote:

> > On Tue, 2018-01-30 at 09:33 -0700, Jason Gunthorpe wrote:

> > > On Tue, Jan 30, 2018 at 10:16:01AM -0600, Steve Wise wrote:

> > > 

> > > > What is this a merge of exactly? I don't see the restrack stuff, for

> > > > instance.

> > > 

> > > Yesterday's for-next. You could merge it with the latest for-next..

> > > 

> > > I updated it.

> > > 

> > > I think we are done now, so for-next is what will be sent as the

> > > pull-request and for-next-merged is the conflict resolution.

> > 

> > Hello Jason,

> > 

> > Although I have not yet tried to root-cause this, I want to let you know

> > that with your for-linus-merged branch the following error message is

> > reported if I try to run the srp-test software against the rdma_rxe driver:

> > 

> > id_ext=0x505400fffe4a0b7b,ioc_guid=0x505400fffe4a0b7b,dest=192.168.122.76:5555,t

> > arget_can_queue=1,queue_size=32,max_cmd_per_lun=32,max_sect=131072 >/sys/class/i

> > nfiniband_srp/srp-rxe0-1/add_target failed: Cannot allocate memory

> > 

> > In the kernel log I found the following:

> > 

> > Jan 30 10:55:50 ubuntu-vm kernel: scsi host4: ib_srp: FR pool allocation failed (-12)

> > 

> > With your for-next branch from a few days ago the same test ran fine.

> 

> I don't have a guess for you..

> 

> The difference between for-next and merged is only the inclusion of

> v4.15? Could some v4.15 non-rdma code be causing issue here?


Hello Jason,

I should have mentioned that in the previous tests I ran I merged kernel
v4.15-rc9 myself into the RDMA for-next branch. So this behavior was probably
introduced by a patch that was queued recently on the RDMA for-next branch,
e.g. RDMA resource tracking.

Bart.
Jason Gunthorpe Jan. 30, 2018, 8:48 p.m. UTC | #9
On Tue, Jan 30, 2018 at 08:42:44PM +0000, Bart Van Assche wrote:
> On Tue, 2018-01-30 at 12:46 -0700, Jason Gunthorpe wrote:
> > On Tue, Jan 30, 2018 at 07:07:35PM +0000, Bart Van Assche wrote:
> > > On Tue, 2018-01-30 at 09:33 -0700, Jason Gunthorpe wrote:
> > > > On Tue, Jan 30, 2018 at 10:16:01AM -0600, Steve Wise wrote:
> > > > 
> > > > > What is this a merge of exactly? I don't see the restrack stuff, for
> > > > > instance.
> > > > 
> > > > Yesterday's for-next. You could merge it with the latest for-next..
> > > > 
> > > > I updated it.
> > > > 
> > > > I think we are done now, so for-next is what will be sent as the
> > > > pull-request and for-next-merged is the conflict resolution.
> > > 
> > > Hello Jason,
> > > 
> > > Although I have not yet tried to root-cause this, I want to let you know
> > > that with your for-linus-merged branch the following error message is
> > > reported if I try to run the srp-test software against the rdma_rxe driver:
> > > 
> > > id_ext=0x505400fffe4a0b7b,ioc_guid=0x505400fffe4a0b7b,dest=192.168.122.76:5555,t
> > > arget_can_queue=1,queue_size=32,max_cmd_per_lun=32,max_sect=131072 >/sys/class/i
> > > nfiniband_srp/srp-rxe0-1/add_target failed: Cannot allocate memory
> > > 
> > > In the kernel log I found the following:
> > > 
> > > Jan 30 10:55:50 ubuntu-vm kernel: scsi host4: ib_srp: FR pool allocation failed (-12)
> > > 
> > > With your for-next branch from a few days ago the same test ran fine.
> > 
> > I don't have a guess for you..
> > 
> > The difference between for-next and merged is only the inclusion of
> > v4.15? Could some v4.15 non-rdma code be causing issue here?
> 
> Hello Jason,
> 
> I should have mentioned that in the previous tests I ran I merged kernel
> v4.15-rc9 myself into the RDMA for-next branch. So this behavior was probably
> introduced by a patch that was queued recently on the RDMA for-next branch,
> e.g. RDMA resource tracking.

Ok, I think that is the only likely thing recently..

But your print above must be caused by this line, right:

static struct srp_fr_pool *srp_create_fr_pool(struct ib_device *device,
                                              struct ib_pd *pd, int pool_size,
                                              int max_page_list_len)
{
        ret = -ENOMEM;
        pool = kzalloc(sizeof(struct srp_fr_pool) +
                       pool_size * sizeof(struct srp_fr_desc), GFP_KERNEL);
        if (!pool)
                goto err;

Since you didn't report the ib_alloc_mr() print it can't be the other
ENOMEM case?

Hard to see how that interesects with resource tracking.. Are you
thinking memory corruption?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche Jan. 30, 2018, 9:22 p.m. UTC | #10
On Tue, 2018-01-30 at 13:48 -0700, Jason Gunthorpe wrote:
> Ok, I think that is the only likely thing recently..

> 

> But your print above must be caused by this line, right:

> 

> static struct srp_fr_pool *srp_create_fr_pool(struct ib_device *device,

>                                               struct ib_pd *pd, int pool_size,

>                                               int max_page_list_len)

> {

>         ret = -ENOMEM;

>         pool = kzalloc(sizeof(struct srp_fr_pool) +

>                        pool_size * sizeof(struct srp_fr_desc), GFP_KERNEL);

>         if (!pool)

>                 goto err;

> 

> Since you didn't report the ib_alloc_mr() print it can't be the other

> ENOMEM case?

> 

> Hard to see how that interesects with resource tracking.. Are you

> thinking memory corruption?


Hello Jason,

I don't see any reason to suspect memory corruption. kmemleak isn't reporting
any memory leaks. Maybe memory fragmentation has increased?

Thanks,

Bart.
Laurence Oberman Jan. 30, 2018, 9:33 p.m. UTC | #11
On Tue, 2018-01-30 at 21:22 +0000, Bart Van Assche wrote:
> On Tue, 2018-01-30 at 13:48 -0700, Jason Gunthorpe wrote:
> > Ok, I think that is the only likely thing recently..
> > 
> > But your print above must be caused by this line, right:
> > 
> > static struct srp_fr_pool *srp_create_fr_pool(struct ib_device
> > *device,
> >                                               struct ib_pd *pd, int
> > pool_size,
> >                                               int
> > max_page_list_len)
> > {
> >         ret = -ENOMEM;
> >         pool = kzalloc(sizeof(struct srp_fr_pool) +
> >                        pool_size * sizeof(struct srp_fr_desc),
> > GFP_KERNEL);
> >         if (!pool)
> >                 goto err;
> > 
> > Since you didn't report the ib_alloc_mr() print it can't be the
> > other
> > ENOMEM case?
> > 
> > Hard to see how that interesects with resource tracking.. Are you
> > thinking memory corruption?
> 
> Hello Jason,
> 
> I don't see any reason to suspect memory corruption. kmemleak isn't
> reporting
> any memory leaks. Maybe memory fragmentation has increased?
> 
> Thanks,
> 
> Bart.NrybXǧv^)޺{.n+{ٚ{ayʇڙ,jfhzwj:+vwjmzZ+ݢj"!

Hi Bart, 

Can I take your tree and see if this fails for me too,
Your last tree was fine, so did not have this latest stuff.
Can I just pull to what I have

Thanks
Laurence
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche Jan. 30, 2018, 9:40 p.m. UTC | #12
On Tue, 2018-01-30 at 16:33 -0500, Laurence Oberman wrote:
> Can I take your tree and see if this fails for me too,

> Your last tree was fine, so did not have this latest stuff.

> Can I just pull to what I have


Hello Laurence,

So far I have seen this behavior only inside a VM but not yet on a system
with more memory than the VM. This issue may be specific to the memory size
of the VM. I think we should try to isolate furhter what caused this before
trying to reproduce it on more setups.

Thanks,

Bart.
Jason Gunthorpe Jan. 30, 2018, 9:40 p.m. UTC | #13
On Tue, Jan 30, 2018 at 04:33:19PM -0500, Laurence Oberman wrote:
> > > Since you didn't report the ib_alloc_mr() print it can't be the
> > > other
> > > ENOMEM case?
> > > 
> > > Hard to see how that interesects with resource tracking.. Are you
> > > thinking memory corruption?
> > 
> > Hello Jason,
> > 
> > I don't see any reason to suspect memory corruption. kmemleak isn't
> > reporting
> > any memory leaks. Maybe memory fragmentation has increased?
> > 
> > Thanks,
> > 
> > Bart
> 
> Hi Bart,
>
> Can I take your tree and see if this fails for me too,
> Your last tree was fine, so did not have this latest stuff.
> Can I just pull to what I have

Try what we are about to send as a PR:

git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git
branch wip/for-linus-merged
https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/log/?h=wip/for-linus-merged

No guess why a kzalloc will fail. How much is it asking for anyhow?
Did the size get out of control for some reason?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe Jan. 30, 2018, 9:42 p.m. UTC | #14
On Tue, Jan 30, 2018 at 09:40:14PM +0000, Bart Van Assche wrote:
> On Tue, 2018-01-30 at 16:33 -0500, Laurence Oberman wrote:
> > Can I take your tree and see if this fails for me too,
> > Your last tree was fine, so did not have this latest stuff.
> > Can I just pull to what I have
> 
> Hello Laurence,
> 
> So far I have seen this behavior only inside a VM but not yet on a system
> with more memory than the VM. This issue may be specific to the memory size
> of the VM. I think we should try to isolate furhter what caused this before
> trying to reproduce it on more setups.

Did you get an oops print related a kalloc failure?

Or am I wrong and the ENOMEM is coming from someplace else?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche Jan. 30, 2018, 9:47 p.m. UTC | #15
On Tue, 2018-01-30 at 14:42 -0700, Jason Gunthorpe wrote:
> On Tue, Jan 30, 2018 at 09:40:14PM +0000, Bart Van Assche wrote:

> > On Tue, 2018-01-30 at 16:33 -0500, Laurence Oberman wrote:

> > > Can I take your tree and see if this fails for me too,

> > > Your last tree was fine, so did not have this latest stuff.

> > > Can I just pull to what I have

> > 

> > Hello Laurence,

> > 

> > So far I have seen this behavior only inside a VM but not yet on a system

> > with more memory than the VM. This issue may be specific to the memory size

> > of the VM. I think we should try to isolate furhter what caused this before

> > trying to reproduce it on more setups.

> 

> Did you get an oops print related a kalloc failure?

> 

> Or am I wrong and the ENOMEM is coming from someplace else?


Hello Jason,

I just noticed the following in the system log:

Jan 30 12:53:15 ubuntu-vm kernel: ib_srp: rxe0: ib_alloc_mr() failed. Try to reduce max_cmd_per_lun, max_sect or ch_count

So apparently the ib_alloc_mr() fails sometimes (but not the first few times
it is called).

Thanks,

Bart.
Jason Gunthorpe Jan. 30, 2018, 10:02 p.m. UTC | #16
On Tue, Jan 30, 2018 at 09:47:48PM +0000, Bart Van Assche wrote:
> On Tue, 2018-01-30 at 14:42 -0700, Jason Gunthorpe wrote:
> > On Tue, Jan 30, 2018 at 09:40:14PM +0000, Bart Van Assche wrote:
> > > On Tue, 2018-01-30 at 16:33 -0500, Laurence Oberman wrote:
> > > > Can I take your tree and see if this fails for me too,
> > > > Your last tree was fine, so did not have this latest stuff.
> > > > Can I just pull to what I have
> > > 
> > > Hello Laurence,
> > > 
> > > So far I have seen this behavior only inside a VM but not yet on a system
> > > with more memory than the VM. This issue may be specific to the memory size
> > > of the VM. I think we should try to isolate furhter what caused this before
> > > trying to reproduce it on more setups.
> > 
> > Did you get an oops print related a kalloc failure?
> > 
> > Or am I wrong and the ENOMEM is coming from someplace else?
> 
> Hello Jason,
> 
> I just noticed the following in the system log:
> 
> Jan 30 12:53:15 ubuntu-vm kernel: ib_srp: rxe0: ib_alloc_mr() failed. Try to reduce max_cmd_per_lun, max_sect or ch_count
> 
> So apparently the ib_alloc_mr() fails sometimes (but not the first few times
> it is called).

Looks like the only way you can get that without hitting an kalloc
oops print is if rxe_alloc() fails, and probably here:

	if (atomic_inc_return(&pool->num_elem) > pool->max_elem)
		goto out_put_pool;

Suggesting srp hit the max # of mrs in rxe:

	RXE_MAX_MR			= 2 * 1024,

Or maybe we are now leaking mrs someplace?

There is nothing accepted recently that mucks with this, still not
seeing even a tenuous connection to any patches in the last few days

What was accepted in the past week(s) was a bunch of srp stuff
though:

$ git diff --stat 052eac6eeb5655c52a490a49f09c55500f868558
 MAINTAINERS                                  |   3 +-
 drivers/infiniband/core/Makefile             |   2 +-
 drivers/infiniband/core/cm.c                 |   6 +-
 drivers/infiniband/core/cma.c                |   2 +-
 drivers/infiniband/core/core_priv.h          |  28 ++++
 drivers/infiniband/core/cq.c                 |  16 ++-
 drivers/infiniband/core/device.c             |   4 +
 drivers/infiniband/core/nldev.c              | 374 ++++++++++++++++++++++++++++++++++++++++++++++++++
 drivers/infiniband/core/restrack.c           | 164 ++++++++++++++++++++++
 drivers/infiniband/core/user_mad.c           |   2 +-
 drivers/infiniband/core/uverbs_cmd.c         |   7 +-
 drivers/infiniband/core/uverbs_ioctl.c       |  19 ++-
 drivers/infiniband/core/uverbs_std_types.c   |   3 +
 drivers/infiniband/core/verbs.c              |  17 ++-
 drivers/infiniband/hw/mlx4/cq.c              |   4 +-
 drivers/infiniband/hw/mlx5/cq.c              |   2 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h         |   4 +-
 drivers/infiniband/hw/mlx5/qp.c              |   5 +-
 drivers/infiniband/hw/mthca/mthca_memfree.c  |   2 +-
 drivers/infiniband/hw/mthca/mthca_user.h     | 112 ---------------
 drivers/infiniband/hw/qedr/verbs.c           |   6 +-
 drivers/infiniband/hw/qib/qib_keys.c         | 235 -------------------------------
 drivers/infiniband/sw/rxe/Kconfig            |   4 +-
 drivers/infiniband/ulp/iser/iser_initiator.c |  16 +--
 drivers/infiniband/ulp/srp/ib_srp.c          | 723 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------------------
 drivers/infiniband/ulp/srp/ib_srp.h          |  43 +++++-
 drivers/infiniband/ulp/srpt/ib_srpt.c        |   2 -
 include/rdma/ib_verbs.h                      |  39 ++++--
 include/rdma/restrack.h                      | 157 +++++++++++++++++++++
 include/scsi/srp.h                           |  17 +++
 include/uapi/rdma/ib_user_verbs.h            |   7 +-
 include/uapi/rdma/rdma_netlink.h             |  49 +++++++
 lib/kobject.c                                |   2 +
 33 files changed, 1511 insertions(+), 565 deletions(-)

Any chance one of the SRP patches got mishandled somehow??

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche Jan. 30, 2018, 10:10 p.m. UTC | #17
On Tue, 2018-01-30 at 15:02 -0700, Jason Gunthorpe wrote:
> On Tue, Jan 30, 2018 at 09:47:48PM +0000, Bart Van Assche wrote:

> > So apparently the ib_alloc_mr() fails sometimes (but not the first few times

> > it is called).

> 

> Looks like the only way you can get that without hitting an kalloc

> oops print is if rxe_alloc() fails, and probably here:

> 

> 	if (atomic_inc_return(&pool->num_elem) > pool->max_elem)

> 		goto out_put_pool;

> 

> Suggesting srp hit the max # of mrs in rxe:

> 

> 	RXE_MAX_MR			= 2 * 1024,

> 

> Or maybe we are now leaking mrs someplace?

> 

> There is nothing accepted recently that mucks with this, still not

> seeing even a tenuous connection to any patches in the last few days


Hello Jason,

Since the number of memory regions that is allocated by the ib_srp driver
depends on the max_sect parameter in the login string I will start with
checking whether there have been any recent changes of that parameter in
the test script.

Thanks,

Bart.
diff mbox

Patch

diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index 3dcacf220e5e7e..c4560d84dfaebd 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -310,21 +310,21 @@  static inline struct ib_qp *_ib_create_qp(struct ib_device *dev,
 	struct ib_qp *qp;
 
 	qp = dev->create_qp(pd, attr, udata);
-	if (!IS_ERR(qp)) {
-		qp->device = dev;
-		qp->pd	   = pd;
-		/*
-		 * We don't track XRC QPs for now, because they don't have PD
-		 * and more importantly they are created internaly by driver,
-		 * see mlx5 create_dev_resources() as an example.
-		 */
-		if (attr->qp_type < IB_QPT_XRC_INI) {
-			qp->res.type = RDMA_RESTRACK_QP;
-			rdma_restrack_add(&qp->res);
-		} else {
-			qp->res.valid = false;
-		}
-	}
+	if (IS_ERR(qp))
+		return qp;
+
+	qp->device = dev;
+	qp->pd = pd;
+	/*
+	 * We don't track XRC QPs for now, because they don't have PD
+	 * and more importantly they are created internaly by driver,
+	 * see mlx5 create_dev_resources() as an example.
+	 */
+	if (attr->qp_type < IB_QPT_XRC_INI) {
+		qp->res.type = RDMA_RESTRACK_QP;
+		rdma_restrack_add(&qp->res);
+	} else
+		qp->res.valid = false;
 
 	return qp;
 }
diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
index 421d5fa6a81731..fa8655e3b3edfe 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -178,7 +178,7 @@  static int fill_res_info_entry(struct sk_buff *msg,
 
 static int fill_res_info(struct sk_buff *msg, struct ib_device *device)
 {
-	static const char *names[RDMA_RESTRACK_MAX] = {
+	static const char * const names[RDMA_RESTRACK_MAX] = {
 		[RDMA_RESTRACK_PD] = "pd",
 		[RDMA_RESTRACK_CQ] = "cq",
 		[RDMA_RESTRACK_QP] = "qp",
@@ -553,10 +553,6 @@  static int _nldev_res_get_dumpit(struct ib_device *device,
 static int nldev_res_get_dumpit(struct sk_buff *skb,
 				struct netlink_callback *cb)
 {
-	/*
-	 * There is no need to take lock, because
-	 * we are relying on ib_core's lists_rwsem
-	 */
 	return ib_enum_all_devs(_nldev_res_get_dumpit, skb, cb);
 }
 
@@ -627,8 +623,8 @@  static int nldev_res_get_qp_dumpit(struct sk_buff *skb,
 		    (!rdma_is_kernel_res(res) &&
 		     task_active_pid_ns(current) != task_active_pid_ns(res->task)))
 			/*
-			 * 1. Kernel QPs should be visible in init namsapce only
-			 * 2. Preent only QPs visible in the current namespace
+			 * 1. Kernel QPs should be visible in init namspace only
+			 * 2. Present only QPs visible in the current namespace
 			 */
 			goto next;
 
diff --git a/drivers/infiniband/core/restrack.c b/drivers/infiniband/core/restrack.c
index 351b6940f6dc17..857637bf46da27 100644
--- a/drivers/infiniband/core/restrack.c
+++ b/drivers/infiniband/core/restrack.c
@@ -150,9 +150,7 @@  void rdma_restrack_del(struct rdma_restrack_entry *res)
 	if (!dev)
 		return;
 
-	down_read(&dev->res.rwsem);
 	rdma_restrack_put(res);
-	up_read(&dev->res.rwsem);
 
 	wait_for_completion(&res->comp);
 
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index a98a3e8412f810..16ebc6372c31ab 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -849,6 +849,7 @@  static struct ib_qp *ib_create_xrc_qp(struct ib_qp *qp,
 
 	qp->event_handler = __ib_shared_qp_event_handler;
 	qp->qp_context = qp;
+	qp->pd = NULL;
 	qp->send_cq = qp->recv_cq = NULL;
 	qp->srq = NULL;
 	qp->xrcd = qp_init_attr->xrcd;
diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h
index 23bef401598208..17e59bec169ec0 100644
--- a/include/uapi/rdma/rdma_netlink.h
+++ b/include/uapi/rdma/rdma_netlink.h
@@ -237,14 +237,8 @@  enum rdma_nldev_command {
 	RDMA_NLDEV_CMD_PORT_DEL,
 
 	RDMA_NLDEV_CMD_RES_GET, /* can dump */
-	RDMA_NLDEV_CMD_RES_SET,
-	RDMA_NLDEV_CMD_RES_NEW,
-	RDMA_NLDEV_CMD_RES_DEL,
 
 	RDMA_NLDEV_CMD_RES_QP_GET, /* can dump */
-	RDMA_NLDEV_CMD_RES_QP_SET,
-	RDMA_NLDEV_CMD_RES_QP_NEW,
-	RDMA_NLDEV_CMD_RES_QP_DEL,
 
 	RDMA_NLDEV_NUM_OPS
 };