mbox series

[for-next,v6,0/6] RDMA/rxe: Replace mr page map with an xarray

Message ID 20230117172540.33205-1-rpearsonhpe@gmail.com (mailing list archive)
Headers show
Series RDMA/rxe: Replace mr page map with an xarray | expand

Message

Bob Pearson Jan. 17, 2023, 5:25 p.m. UTC
This patch series replaces the page map carried in each memory region
with a struct xarray. It is based on a sketch developed by Jason
Gunthorpe. The first five patches are preparation that tries to
cleanly isolate all the mr specific code into rxe_mr.c. The sixth
patch is the actual change.

v6:
  Respond to a comment from Jason and moved the #if defined CONFIG_64BIT
  outside of the rxe_mr_do_atomic_write() function. I realized that
  the #else case isn't needed since it will never be called so if
  someone does try it will fail at compile/link time.
v5:
  Responded to a note from lizhijian@fujitsu.com and restored calls to
  is_pmem_page() which were accidentally dropped in earlier versions.
v4:
  Responded to a comment by Zhu and cleaned up error passing between
  rxe_mr.c and rxe_resp.c.
  Other various cleanups including more careful use of unsigned ints.
  Rebased to current for-next.
v3:
  Fixed an error reported by kernel test robot
v2:
  Rebased to 6.2.0-rc1+
  Minor cleanups
  Fixed error reported by Jason in 4/6 missing if after else.

Bob Pearson (6):
  RDMA/rxe: Cleanup mr_check_range
  RDMA/rxe: Move rxe_map_mr_sg to rxe_mr.c
  RDMA-rxe: Isolate mr code from atomic_reply()
  RDMA-rxe: Isolate mr code from atomic_write_reply()
  RDMA/rxe: Cleanup page variables in rxe_mr.c
  RDMA/rxe: Replace rxe_map and rxe_phys_buf by xarray

 drivers/infiniband/sw/rxe/rxe_loc.h   |  13 +-
 drivers/infiniband/sw/rxe/rxe_mr.c    | 622 +++++++++++++++-----------
 drivers/infiniband/sw/rxe/rxe_resp.c  | 118 ++---
 drivers/infiniband/sw/rxe/rxe_verbs.c |  36 --
 drivers/infiniband/sw/rxe/rxe_verbs.h |  37 +-
 5 files changed, 422 insertions(+), 404 deletions(-)


base-commit: 1ec82317a1daac78c04b0c15af89018ccf9fa2b7

Comments

Jason Gunthorpe Jan. 23, 2023, 7:36 p.m. UTC | #1
On Tue, Jan 17, 2023 at 11:25:35AM -0600, Bob Pearson wrote:
> This patch series replaces the page map carried in each memory region
> with a struct xarray. It is based on a sketch developed by Jason
> Gunthorpe. The first five patches are preparation that tries to
> cleanly isolate all the mr specific code into rxe_mr.c. The sixth
> patch is the actual change.

I think this is fine, are all the other people satisfied?

Thanks,
Jason
Zhu Yanjun Jan. 24, 2023, 3:43 a.m. UTC | #2
On Tue, Jan 24, 2023 at 3:36 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Tue, Jan 17, 2023 at 11:25:35AM -0600, Bob Pearson wrote:
> > This patch series replaces the page map carried in each memory region
> > with a struct xarray. It is based on a sketch developed by Jason
> > Gunthorpe. The first five patches are preparation that tries to
> > cleanly isolate all the mr specific code into rxe_mr.c. The sixth
> > patch is the actual change.
>
> I think this is fine, are all the other people satisfied?

I noticed that RXE is disabled in RHEL9.x. And in RHEL 8.x, RXE still
is enabled.
It seems that RXE is disabled in RHEL 9.x because of instability.
And recently RXE accepted several patch series.
IMO, we should have more time to make tests and fix bugs before the
new patch series are accepted.

This can make RXE more stable. And more linux distributions will enable it.
Or else, more and more linux distributions will disable RXE. Less and
less users will use RXE.
Finally RXE will never be used in the future.

I think, this is not what the RXE guys expect.

As such, it had better have more time to make tests and let RXE more stable.
We should not accept too many patch series in short time. Too many
patch series will bring risks.

Zhu Yanjun


>
> Thanks,
> Jason
Daisuke Matsuda (Fujitsu) Jan. 24, 2023, 5:39 a.m. UTC | #3
Tue, Jan 24, 2023 12:43 PM Zhu Yanjun wrote:
> On Tue, Jan 24, 2023 at 3:36 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
> >
> > On Tue, Jan 17, 2023 at 11:25:35AM -0600, Bob Pearson wrote:
> > > This patch series replaces the page map carried in each memory region
> > > with a struct xarray. It is based on a sketch developed by Jason
> > > Gunthorpe. The first five patches are preparation that tries to
> > > cleanly isolate all the mr specific code into rxe_mr.c. The sixth
> > > patch is the actual change.
> >
> > I think this is fine, are all the other people satisfied?
> 
> I noticed that RXE is disabled in RHEL9.x. And in RHEL 8.x, RXE still
> is enabled.
> It seems that RXE is disabled in RHEL 9.x because of instability.
> And recently RXE accepted several patch series.
> IMO, we should have more time to make tests and fix bugs before the
> new patch series are accepted.

I am relatively a newcomer here, but I think what Zhu says is true.

While there are some pending patch series, there comes a new large
patch series that is hard to review, and they get merged without being
tested and inspected enough resulting in new bugs. I suppose that is 
what have been happening here.

> 
> This can make RXE more stable. And more linux distributions will enable it.
> Or else, more and more linux distributions will disable RXE. Less and
> less users will use RXE.
> Finally RXE will never be used in the future.
> 
> I think, this is not what the RXE guys expect.
> 
> As such, it had better have more time to make tests and let RXE more stable.
> We should not accept too many patch series in short time. Too many
> patch series will bring risks.

Blocking new patch series totally will make the rxe less attractive to
the contributors. I propose that each of us should have at most one
pending patch series at once that consists of more than 4 patches or so.
That will make the situation a lot clearer and make it easier for us to
review patches each other.

Daisuke

> 
> Zhu Yanjun
> 
> 
> >
> > Thanks,
> > Jason
Jason Gunthorpe Jan. 24, 2023, 3:17 p.m. UTC | #4
On Tue, Jan 24, 2023 at 05:39:34AM +0000, Daisuke Matsuda (Fujitsu) wrote:
> Tue, Jan 24, 2023 12:43 PM Zhu Yanjun wrote:
> > On Tue, Jan 24, 2023 at 3:36 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
> > >
> > > On Tue, Jan 17, 2023 at 11:25:35AM -0600, Bob Pearson wrote:
> > > > This patch series replaces the page map carried in each memory region
> > > > with a struct xarray. It is based on a sketch developed by Jason
> > > > Gunthorpe. The first five patches are preparation that tries to
> > > > cleanly isolate all the mr specific code into rxe_mr.c. The sixth
> > > > patch is the actual change.
> > >
> > > I think this is fine, are all the other people satisfied?
> > 
> > I noticed that RXE is disabled in RHEL9.x. And in RHEL 8.x, RXE still
> > is enabled.
> > It seems that RXE is disabled in RHEL 9.x because of instability.
> > And recently RXE accepted several patch series.
> > IMO, we should have more time to make tests and fix bugs before the
> > new patch series are accepted.
> 
> I am relatively a newcomer here, but I think what Zhu says is true.
> 
> While there are some pending patch series, there comes a new large
> patch series that is hard to review, and they get merged without being
> tested and inspected enough resulting in new bugs. I suppose that is 
> what have been happening here.

I think the counterpoint is that rxe isn't really a production piece
of software. If the goal if rxe is to experiment with these new
features people keep proposing then we should let it advance.

Stability comes from actual testing, not from time. If people aren't
testing then waiting longer won't magically make it better.

> Blocking new patch series totally will make the rxe less attractive to
> the contributors. 

Exactly, I would rather people work on rxe than chase some ideal of
bug-freeness

Jason