diff mbox

[4/6] IB/core: Enhance ib_map_mr_sg()

Message ID 57327981.4080404@sandisk.com (mailing list archive)
State Superseded
Headers show

Commit Message

Bart Van Assche May 11, 2016, 12:14 a.m. UTC
The SRP initiator allows to set max_sectors to a value that exceeds
the largest amount of data that can be mapped at once with an mlx4
HCA using fast registration and a page size of 4 KB. Hence modify
ib_map_mr_sg() such that it can map partial sg-elements. If an
sg-element has been mapped partially, let the caller know
which fraction has been mapped by adjusting *sg_offset.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Laurence Oberman <loberman@redhat.com>
---
 drivers/infiniband/core/rw.c                |  2 +-
 drivers/infiniband/core/verbs.c             | 26 +++++++++++++++++++++-----
 drivers/infiniband/hw/cxgb3/iwch_provider.c |  2 +-
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h      |  2 +-
 drivers/infiniband/hw/cxgb4/mem.c           |  2 +-
 drivers/infiniband/hw/i40iw/i40iw_verbs.c   |  2 +-
 drivers/infiniband/hw/mlx4/mlx4_ib.h        |  2 +-
 drivers/infiniband/hw/mlx4/mr.c             |  2 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h        |  2 +-
 drivers/infiniband/hw/mlx5/mr.c             |  8 ++++++--
 drivers/infiniband/hw/nes/nes_verbs.c       |  2 +-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c |  2 +-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.h |  2 +-
 drivers/infiniband/ulp/iser/iser_memory.c   |  4 ++--
 drivers/infiniband/ulp/srp/ib_srp.c         |  2 +-
 include/rdma/ib_verbs.h                     |  8 ++++----
 net/sunrpc/xprtrdma/frwr_ops.c              |  2 +-
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c     |  2 +-
 18 files changed, 47 insertions(+), 27 deletions(-)

Comments

Leon Romanovsky May 11, 2016, 7:53 a.m. UTC | #1
On Tue, May 10, 2016 at 05:14:57PM -0700, Bart Van Assche wrote:
> The SRP initiator allows to set max_sectors to a value that exceeds
> the largest amount of data that can be mapped at once with an mlx4
> HCA using fast registration and a page size of 4 KB. Hence modify
> ib_map_mr_sg() such that it can map partial sg-elements. If an
> sg-element has been mapped partially, let the caller know
> which fraction has been mapped by adjusting *sg_offset.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Sagi Grimberg <sagi@grimberg.me>
> Cc: Laurence Oberman <loberman@redhat.com>
> ---
> --- a/drivers/infiniband/hw/mlx5/mr.c
> +++ b/drivers/infiniband/hw/mlx5/mr.c
> @@ -1752,10 +1752,11 @@ static int
>  mlx5_ib_sg_to_klms(struct mlx5_ib_mr *mr,
>  		   struct scatterlist *sgl,
>  		   unsigned short sg_nents,
> -		   unsigned int sg_offset)
> +		   unsigned int *sg_offset_p)
>  {

I wonder on which tree are you basing?
In Linus (4.6-rc7) the function signature is different [1], the same
goes for my tree and Doug's for-4.7 branch [2].

[1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/infiniband/hw/mlx5/mr.c#n1752
[2] https://github.com/dledford/linux/blob/k.o/for-4.7/drivers/infiniband/hw/mlx5/mr.c#L1752
Bart Van Assche May 11, 2016, 3:22 p.m. UTC | #2
On 05/11/2016 12:54 AM, Leon Romanovsky wrote:
> On Tue, May 10, 2016 at 05:14:57PM -0700, Bart Van Assche wrote:
>> The SRP initiator allows to set max_sectors to a value that exceeds
>> the largest amount of data that can be mapped at once with an mlx4
>> HCA using fast registration and a page size of 4 KB. Hence modify
>> ib_map_mr_sg() such that it can map partial sg-elements. If an
>> sg-element has been mapped partially, let the caller know
>> which fraction has been mapped by adjusting *sg_offset.
>>
>> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
>> Cc: Christoph Hellwig <hch@lst.de>
>> Cc: Sagi Grimberg <sagi@grimberg.me>
>> Cc: Laurence Oberman <loberman@redhat.com>
>> ---
>> --- a/drivers/infiniband/hw/mlx5/mr.c
>> +++ b/drivers/infiniband/hw/mlx5/mr.c
>> @@ -1752,10 +1752,11 @@ static int
>>   mlx5_ib_sg_to_klms(struct mlx5_ib_mr *mr,
>>   		   struct scatterlist *sgl,
>>   		   unsigned short sg_nents,
>> -		   unsigned int sg_offset)
>> +		   unsigned int *sg_offset_p)
>>   {
>
> I wonder on which tree are you basing?
> In Linus (4.6-rc7) the function signature is different [1], the same
> goes for my tree and Doug's for-4.7 branch [2].

Hello Leon,

Sorry that I hadn't mentioned this explicitly in the cover letter of 
this patch series but this patch series is based on Christoph's generic 
RDMA READ/WRITE API work.

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Laurence Oberman May 11, 2016, 3:31 p.m. UTC | #3
----- Original Message -----
> From: "Bart Van Assche" <bart.vanassche@sandisk.com>
> To: leon@kernel.org
> Cc: "Doug Ledford" <dledford@redhat.com>, "Christoph Hellwig" <hch@lst.de>, "Sagi Grimberg" <sagi@grimberg.me>,
> "Laurence Oberman" <loberman@redhat.com>, linux-rdma@vger.kernel.org, "Or Gerlitz" <ogerlitz@mellanox.com>
> Sent: Wednesday, May 11, 2016 11:22:29 AM
> Subject: Re: [PATCH 4/6] IB/core: Enhance ib_map_mr_sg()
> 
> On 05/11/2016 12:54 AM, Leon Romanovsky wrote:
> > On Tue, May 10, 2016 at 05:14:57PM -0700, Bart Van Assche wrote:
> >> The SRP initiator allows to set max_sectors to a value that exceeds
> >> the largest amount of data that can be mapped at once with an mlx4
> >> HCA using fast registration and a page size of 4 KB. Hence modify
> >> ib_map_mr_sg() such that it can map partial sg-elements. If an
> >> sg-element has been mapped partially, let the caller know
> >> which fraction has been mapped by adjusting *sg_offset.
> >>
> >> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> >> Cc: Christoph Hellwig <hch@lst.de>
> >> Cc: Sagi Grimberg <sagi@grimberg.me>
> >> Cc: Laurence Oberman <loberman@redhat.com>
> >> ---
> >> --- a/drivers/infiniband/hw/mlx5/mr.c
> >> +++ b/drivers/infiniband/hw/mlx5/mr.c
> >> @@ -1752,10 +1752,11 @@ static int
> >>   mlx5_ib_sg_to_klms(struct mlx5_ib_mr *mr,
> >>   		   struct scatterlist *sgl,
> >>   		   unsigned short sg_nents,
> >> -		   unsigned int sg_offset)
> >> +		   unsigned int *sg_offset_p)
> >>   {
> >
> > I wonder on which tree are you basing?
> > In Linus (4.6-rc7) the function signature is different [1], the same
> > goes for my tree and Doug's for-4.7 branch [2].
> 
> Hello Leon,
> 
> Sorry that I hadn't mentioned this explicitly in the cover letter of
> this patch series but this patch series is based on Christoph's generic
> RDMA READ/WRITE API work.
> 
> Bart.
> 
> 

I chased that for a while too.:)
Landed up pulling the latest next, applying all of Christoph's 11 RDMA patches, then the first 11 of Barts and the latest 6.
I had to hand fix some stuff.
Kernel is building now for testing :)


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche May 11, 2016, 3:41 p.m. UTC | #4
On 05/11/2016 08:31 AM, Laurence Oberman wrote:
> I chased that for a while too.:)
> Landed up pulling the latest next, applying all of Christoph's 11 RDMA patches, then the first 11 of Barts and the latest 6.
> I had to hand fix some stuff.
> Kernel is building now for testing :)

Hello Laurence,

Please wait with starting your tests until I have made a kernel tree 
with this patch series available.

Thanks,

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Laurence Oberman May 11, 2016, 3:56 p.m. UTC | #5
----- Original Message -----
> From: "Bart Van Assche" <bart.vanassche@sandisk.com>
> To: "Laurence Oberman" <loberman@redhat.com>
> Cc: leon@kernel.org, "Doug Ledford" <dledford@redhat.com>, "Christoph Hellwig" <hch@lst.de>, "Sagi Grimberg"
> <sagi@grimberg.me>, linux-rdma@vger.kernel.org, "Or Gerlitz" <ogerlitz@mellanox.com>
> Sent: Wednesday, May 11, 2016 11:41:39 AM
> Subject: Re: [PATCH 4/6] IB/core: Enhance ib_map_mr_sg()
> 
> On 05/11/2016 08:31 AM, Laurence Oberman wrote:
> > I chased that for a while too.:)
> > Landed up pulling the latest next, applying all of Christoph's 11 RDMA
> > patches, then the first 11 of Barts and the latest 6.
> > I had to hand fix some stuff.
> > Kernel is building now for testing :)
> 
> Hello Laurence,
> 
> Please wait with starting your tests until I have made a kernel tree
> with this patch series available.
> 
> Thanks,
> 
> Bart.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Hello Bart

I had started already, and its looking awesomely stable so far.
Awesome work from all of you guys.

### RECORD   84 >>> jumpclient <<< (1462981973.001) (Wed May 11 11:52:53 2016) ###
# DISK STATISTICS (/sec)
#                   <---------reads---------><---------writes---------><--------averages--------> Pct
#Time     Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size  RWSize  QLen  Wait SvcTim Util
11:52:53 sdc              0      0    0    0  163840      0   40 4096    4096     1    10     10   42
11:52:53 dm-6             0      0    0    0  327680    320   80 4096    4096     1    11     11   90
11:52:53 sdd              0      0    0    0  176128      0   43 4096    4096     1    10     10   44
11:52:53 dm-7             0      0    0    0  348160    336   85 4096    4096     1    11     10   92
11:52:53 sde              0      0    0    0  159744      0   39 4096    4096     1    11     11   43
11:52:53 dm-8             0      0    0    0  319488    312   78 4096    4096     1    11     11   89
11:52:53 sdf              4      0    1    4  167936      0   41 4096    3998     1    10     10   44
11:52:53 sdg              4      0    1    4  163840      0   40 4096    3996     1    10     10   44
11:52:53 dm-9             0      0    0    0  335872    328   82 4096    4096     1    11     11   91
11:52:53 dm-10            0      0    0    0  331776    324   81 4096    4096     1    11     11   91
11:52:53 sdh              4      0    1    4  159744      0   39 4096    3993     1    11     11   45
11:52:53 dm-11            0      0    0    0  319488    308   78 4096    4096     1    11     11   91
11:52:53 sdi              0      0    0    0  167936      0   41 4096    4096     1    10     10   43
11:52:53 dm-12            0      0    0    0  335872    328   82 4096    4096     1    11     11   93
11:52:53 sdj              0      0    0    0  172032      0   42 4096    4096     1    10     10   44
11:52:53 dm-13            0      0    0    0  344064    332   84 4096    4096     1    10     10   91
11:52:53 sdk              0      0    0    0  176128      0   43 4096    4096     1    11     11   47
11:52:53 dm-14            0      0    0    0  352256    344   86 4096    4096     1    10     10   91
11:52:53 sdl              0      0    0    0  163840      0   40 4096    4096     1    10     11   43
11:52:53 dm-15            0      0    0    0  331776    324   81 4096    4096     1    11     11   91
11:52:53 sdm              0      0    0    0  163840      0   40 4096    4096     1    11     11   45
11:52:53 sdn              0      0    0    0  172032      0   42 4096    4096     1    10     10   45
11:52:53 sdo              0      0    0    0  159744      0   39 4096    4096     1    11     11   44
11:52:53 sdp              4      0    1    4  167936      0   41 4096    3998     1    10     10   45
11:52:53 sdq              0      0    0    0  167936      0   41 4096    4096     1    11     11   45
11:52:53 sdr              4      0    1    4  159744      0   39 4096    3993     1    11     10   43
11:52:53 sds              0      0    0    0  167936      0   41 4096    4096     1    11     11   46
11:52:53 sdt              0      0    0    0  172032      0   42 4096    4096     1    10     10   44
11:52:53 sdu              0      0    0    0  176128      0   43 4096    4096     1     9      9   42
11:52:53 sdv              0      0    0    0  167936      0   41 4096    4096     1    11     11   45

Whoop, 3.2GBytes/sec and no errors :)

#         <----CPU[HYPER]-----><----------Disks-----------><----------Network---------->
#Time     cpu sys inter  ctxsw KBRead  Reads KBWrit Writes   KBIn  PktIn  KBOut  PktOut 
11:55:21    7   7  2652   7721     12      3  3223K    974      0      3      0       3 
11:55:22    7   7  2714   7984      4      1  3336K    834      0      1      0       1 
11:55:23    6   6  2545   7698      0      0  3216K    804      0      1      0       1 
11:55:24    7   7  2576   7455      0      0  3012K    758      0      3      0       1 
11:55:25    6   6  2717   8096     24      6  3314K    900      0      1      0       1 
11:55:26    7   7  2651   7807      0      0  3118K    955      1      9      2      11 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Laurence Oberman May 12, 2016, 4:28 p.m. UTC | #6
----- Original Message -----
> From: "Bart Van Assche" <bart.vanassche@sandisk.com>
> To: "Laurence Oberman" <loberman@redhat.com>
> Cc: leon@kernel.org, "Doug Ledford" <dledford@redhat.com>, "Christoph Hellwig" <hch@lst.de>, "Sagi Grimberg"
> <sagi@grimberg.me>, linux-rdma@vger.kernel.org, "Or Gerlitz" <ogerlitz@mellanox.com>
> Sent: Wednesday, May 11, 2016 11:41:39 AM
> Subject: Re: [PATCH 4/6] IB/core: Enhance ib_map_mr_sg()
> 
> On 05/11/2016 08:31 AM, Laurence Oberman wrote:
> > I chased that for a while too.:)
> > Landed up pulling the latest next, applying all of Christoph's 11 RDMA
> > patches, then the first 11 of Barts and the latest 6.
> > I had to hand fix some stuff.
> > Kernel is building now for testing :)
> 
> Hello Laurence,
> 
> Please wait with starting your tests until I have made a kernel tree
> with this patch series available.
> 
> Thanks,
> 
> Bart.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

For Barts latest set of patches see subject, using Barts's tree the mapping failures are gone and its run for over 24 hours stable.
This is with multiple parallel reads of 4MB issued direct and multiple parallel writes to the same mpath devices issued buffered.
The only variation from what I am used to seeing (when its not failing :) on prior ib_srp) is that the I/O sizes reach 4MB often but are also often smaller.
Some of this could be issues with my LIO target and next week will have an enterprise array directly connected I will be testing with.

Its a huge improvement and seems good to me as I know these failures very well.

Tested-by: Laurence Oberman <loberman@redhat.com>

Example with direct reads and buffered writes

mpath view

### RECORD  556 >>> jumpclient <<< (1463070126.001) (Thu May 12 12:22:06 2016) ###
# DISK STATISTICS (/sec)
#                   <---------reads---------><---------writes---------><--------averages--------> Pct
#Time     Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size  RWSize  QLen  Wait SvcTim Util
12:22:06 dm-6        212992    128  247  862  130200     64  126 1033     920     7    20      2   99
12:22:06 dm-7        213180     52  885  241  118784     29  406  293     257    25    20      0   99
12:22:06 dm-8        217088     53  424  512  122880     60  270  455     489    14    21      1   99
12:22:06 dm-9        211968     52  465  456  119760     30  409  293     379    19    22      1   99
12:22:06 dm-10       212992     52  364  585  121340     60  146  831     655    10    20      1   99
12:22:06 dm-11       221184    162   54 4096  135168     33  297  455    1015     9    26      2   99  *** Reads reach 4MB here, often smaller though
12:22:06 dm-12       229376    168  280  819  126976     93   62 2048    1041     6    17      2   99
12:22:06 dm-13       229376    168  112 2048  131072     64  128 1024    1501     5    23      4   99
12:22:06 dm-14       225280    110  385  585  122880     29  185  664     610    11    20      1   99
12:22:06 dm-15       203912     50  549  371  118792     29  379  313     347    20    21      1   99

individual path view

### RECORD  556 >>> jumpclient <<< (1463070126.001) (Thu May 12 12:22:06 2016) ###
# DISK STATISTICS (/sec)
#                   <---------reads---------><---------writes---------><--------averages--------> Pct
#Time     Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size  RWSize  QLen  Wait SvcTim Util
12:26:37 sdc         128092      0  348  368   44468      0  482   92     207    18    21      1   95
12:26:37 sdd         103476      0  461  224   62956      0  266  237     228    14    20      1   97
12:26:37 sde         118100      0  145  814   65488      0  127  516     674     5    19      3   97
12:26:37 sdf         121660      0  651  187   53960      0  277  195     189    17    18      1   97
12:26:37 sdg          87276      0  305  286   76020      0  191  398     329    10    20      1   97
12:26:37 sdh          98688      0  166  595   67400      0  554  122     230    17    23      1   95
12:26:37 sdi         197112      0  945  209   16208      0   81  200     207    16    15      0   99
12:26:37 sdj           1776      0   16  111  143360      0   35 4096    2845     1    24     18   95
12:26:37 sdk              0      0    0    0  139264      0  374  372     372    10    26      2   96
12:26:37 sdl          77600      0  572  136   76624      0  268  286     183    17    19      1   93
12:26:37 sdm          80804      0  264  306   74316      0  533  139     194    19    24      1   98
12:26:37 sdn         101324      0  439  231   64020      0  385  166     200    18    22      1   98
12:26:37 sdo         107180      0  130  824   73776      0  145  509     658     5    21      3   97
12:26:37 sdp          80788      0  389  208   69376      0  339  205     206    17    22      1   99
12:26:37 sdq         116704      0  390  299   55052      0  129  427     330    10    20      1   99
12:26:37 sdr         113280      0  249  455   53816      0  401  134     257    16    24      1   99
12:26:37 sds          13192      0   93  142  110768      0  291  381     322    10    26      2   95
12:26:37 sdt         165184      0 1197  138       0      0    0    0     137    23    18      0   99
12:26:37 sdu         241664      0   59 4096       0      0    0    0    4096     1    16     16   97
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Laurence Oberman May 12, 2016, 4:38 p.m. UTC | #7
----- Original Message -----
> From: "Laurence Oberman" <loberman@redhat.com>
> To: "Bart Van Assche" <bart.vanassche@sandisk.com>
> Cc: leon@kernel.org, "Doug Ledford" <dledford@redhat.com>, "Christoph Hellwig" <hch@lst.de>, "Sagi Grimberg"
> <sagi@grimberg.me>, linux-rdma@vger.kernel.org, "Or Gerlitz" <ogerlitz@mellanox.com>
> Sent: Thursday, May 12, 2016 12:28:34 PM
> Subject: Re: [PATCH 4/6] IB/core: Enhance ib_map_mr_sg()
> 
> 
> 
> ----- Original Message -----
> > From: "Bart Van Assche" <bart.vanassche@sandisk.com>
> > To: "Laurence Oberman" <loberman@redhat.com>
> > Cc: leon@kernel.org, "Doug Ledford" <dledford@redhat.com>, "Christoph
> > Hellwig" <hch@lst.de>, "Sagi Grimberg"
> > <sagi@grimberg.me>, linux-rdma@vger.kernel.org, "Or Gerlitz"
> > <ogerlitz@mellanox.com>
> > Sent: Wednesday, May 11, 2016 11:41:39 AM
> > Subject: Re: [PATCH 4/6] IB/core: Enhance ib_map_mr_sg()
> > 
> > On 05/11/2016 08:31 AM, Laurence Oberman wrote:
> > > I chased that for a while too.:)
> > > Landed up pulling the latest next, applying all of Christoph's 11 RDMA
> > > patches, then the first 11 of Barts and the latest 6.
> > > I had to hand fix some stuff.
> > > Kernel is building now for testing :)
> > 
> > Hello Laurence,
> > 
> > Please wait with starting your tests until I have made a kernel tree
> > with this patch series available.
> > 
> > Thanks,
> > 
> > Bart.
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
> For Barts latest set of patches see subject, using Barts's tree the mapping
> failures are gone and its run for over 24 hours stable.
> This is with multiple parallel reads of 4MB issued direct and multiple
> parallel writes to the same mpath devices issued buffered.
> The only variation from what I am used to seeing (when its not failing :) on
> prior ib_srp) is that the I/O sizes reach 4MB often but are also often
> smaller.
> Some of this could be issues with my LIO target and next week will have an
> enterprise array directly connected I will be testing with.
> 
> Its a huge improvement and seems good to me as I know these failures very
> well.
> 
> Tested-by: Laurence Oberman <loberman@redhat.com>
> 
> Example with direct reads and buffered writes
> 
> mpath view
> 
> ### RECORD  556 >>> jumpclient <<< (1463070126.001) (Thu May 12 12:22:06
> 2016) ###
> # DISK STATISTICS (/sec)
> #
> <---------reads---------><---------writes---------><--------averages-------->
> Pct
> #Time     Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size  RWSize
> QLen  Wait SvcTim Util
> 12:22:06 dm-6        212992    128  247  862  130200     64  126 1033     920
> 7    20      2   99
> 12:22:06 dm-7        213180     52  885  241  118784     29  406  293     257
> 25    20      0   99
> 12:22:06 dm-8        217088     53  424  512  122880     60  270  455     489
> 14    21      1   99
> 12:22:06 dm-9        211968     52  465  456  119760     30  409  293     379
> 19    22      1   99
> 12:22:06 dm-10       212992     52  364  585  121340     60  146  831     655
> 10    20      1   99
> 12:22:06 dm-11       221184    162   54 4096  135168     33  297  455    1015
> 9    26      2   99  *** Reads reach 4MB here, often smaller though
> 12:22:06 dm-12       229376    168  280  819  126976     93   62 2048    1041
> 6    17      2   99
> 12:22:06 dm-13       229376    168  112 2048  131072     64  128 1024    1501
> 5    23      4   99
> 12:22:06 dm-14       225280    110  385  585  122880     29  185  664     610
> 11    20      1   99
> 12:22:06 dm-15       203912     50  549  371  118792     29  379  313     347
> 20    21      1   99
> 
> individual path view
> 
> ### RECORD  556 >>> jumpclient <<< (1463070126.001) (Thu May 12 12:22:06
> 2016) ###
> # DISK STATISTICS (/sec)
> #
> <---------reads---------><---------writes---------><--------averages-------->
> Pct
> #Time     Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size  RWSize
> QLen  Wait SvcTim Util
> 12:26:37 sdc         128092      0  348  368   44468      0  482   92     207
> 18    21      1   95
> 12:26:37 sdd         103476      0  461  224   62956      0  266  237     228
> 14    20      1   97
> 12:26:37 sde         118100      0  145  814   65488      0  127  516     674
> 5    19      3   97
> 12:26:37 sdf         121660      0  651  187   53960      0  277  195     189
> 17    18      1   97
> 12:26:37 sdg          87276      0  305  286   76020      0  191  398     329
> 10    20      1   97
> 12:26:37 sdh          98688      0  166  595   67400      0  554  122     230
> 17    23      1   95
> 12:26:37 sdi         197112      0  945  209   16208      0   81  200     207
> 16    15      0   99
> 12:26:37 sdj           1776      0   16  111  143360      0   35 4096    2845
> 1    24     18   95
> 12:26:37 sdk              0      0    0    0  139264      0  374  372     372
> 10    26      2   96
> 12:26:37 sdl          77600      0  572  136   76624      0  268  286     183
> 17    19      1   93
> 12:26:37 sdm          80804      0  264  306   74316      0  533  139     194
> 19    24      1   98
> 12:26:37 sdn         101324      0  439  231   64020      0  385  166     200
> 18    22      1   98
> 12:26:37 sdo         107180      0  130  824   73776      0  145  509     658
> 5    21      3   97
> 12:26:37 sdp          80788      0  389  208   69376      0  339  205     206
> 17    22      1   99
> 12:26:37 sdq         116704      0  390  299   55052      0  129  427     330
> 10    20      1   99
> 12:26:37 sdr         113280      0  249  455   53816      0  401  134     257
> 16    24      1   99
> 12:26:37 sds          13192      0   93  142  110768      0  291  381     322
> 10    26      2   95
> 12:26:37 sdt         165184      0 1197  138       0      0    0    0     137
> 23    18      0   99
> 12:26:37 sdu         241664      0   59 4096       0      0    0    0    4096
> 1    16     16   97
> 

Bart,
I meant to mention that that was with ib_srp untuned.
My next set of tests will be with indirect_sg_entries=512 and cmd_sg_entries=64 for a start.
Then I will max them out and see how we do.

Thanks!!!
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Laurence Oberman May 12, 2016, 4:50 p.m. UTC | #8
----- Original Message -----
> From: "Laurence Oberman" <loberman@redhat.com>
> To: "Bart Van Assche" <bart.vanassche@sandisk.com>
> Cc: leon@kernel.org, "Doug Ledford" <dledford@redhat.com>, "Christoph Hellwig" <hch@lst.de>, "Sagi Grimberg"
> <sagi@grimberg.me>, linux-rdma@vger.kernel.org, "Or Gerlitz" <ogerlitz@mellanox.com>
> Sent: Thursday, May 12, 2016 12:38:13 PM
> Subject: Re: [PATCH 4/6] IB/core: Enhance ib_map_mr_sg()
> 
> 
> 
> ----- Original Message -----
> > From: "Laurence Oberman" <loberman@redhat.com>
> > To: "Bart Van Assche" <bart.vanassche@sandisk.com>
> > Cc: leon@kernel.org, "Doug Ledford" <dledford@redhat.com>, "Christoph
> > Hellwig" <hch@lst.de>, "Sagi Grimberg"
> > <sagi@grimberg.me>, linux-rdma@vger.kernel.org, "Or Gerlitz"
> > <ogerlitz@mellanox.com>
> > Sent: Thursday, May 12, 2016 12:28:34 PM
> > Subject: Re: [PATCH 4/6] IB/core: Enhance ib_map_mr_sg()
> > 
> > 
> > 
> > ----- Original Message -----
> > > From: "Bart Van Assche" <bart.vanassche@sandisk.com>
> > > To: "Laurence Oberman" <loberman@redhat.com>
> > > Cc: leon@kernel.org, "Doug Ledford" <dledford@redhat.com>, "Christoph
> > > Hellwig" <hch@lst.de>, "Sagi Grimberg"
> > > <sagi@grimberg.me>, linux-rdma@vger.kernel.org, "Or Gerlitz"
> > > <ogerlitz@mellanox.com>
> > > Sent: Wednesday, May 11, 2016 11:41:39 AM
> > > Subject: Re: [PATCH 4/6] IB/core: Enhance ib_map_mr_sg()
> > > 
> > > On 05/11/2016 08:31 AM, Laurence Oberman wrote:
> > > > I chased that for a while too.:)
> > > > Landed up pulling the latest next, applying all of Christoph's 11 RDMA
> > > > patches, then the first 11 of Barts and the latest 6.
> > > > I had to hand fix some stuff.
> > > > Kernel is building now for testing :)
> > > 
> > > Hello Laurence,
> > > 
> > > Please wait with starting your tests until I have made a kernel tree
> > > with this patch series available.
> > > 
> > > Thanks,
> > > 
> > > Bart.
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > 
> > For Barts latest set of patches see subject, using Barts's tree the mapping
> > failures are gone and its run for over 24 hours stable.
> > This is with multiple parallel reads of 4MB issued direct and multiple
> > parallel writes to the same mpath devices issued buffered.
> > The only variation from what I am used to seeing (when its not failing :)
> > on
> > prior ib_srp) is that the I/O sizes reach 4MB often but are also often
> > smaller.
> > Some of this could be issues with my LIO target and next week will have an
> > enterprise array directly connected I will be testing with.
> > 
> > Its a huge improvement and seems good to me as I know these failures very
> > well.
> > 
> > Tested-by: Laurence Oberman <loberman@redhat.com>
> > 
> > Example with direct reads and buffered writes
> > 
> > mpath view
> > 
> > ### RECORD  556 >>> jumpclient <<< (1463070126.001) (Thu May 12 12:22:06
> > 2016) ###
> > # DISK STATISTICS (/sec)
> > #
> > <---------reads---------><---------writes---------><--------averages-------->
> > Pct
> > #Time     Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size
> > RWSize
> > QLen  Wait SvcTim Util
> > 12:22:06 dm-6        212992    128  247  862  130200     64  126 1033
> > 920
> > 7    20      2   99
> > 12:22:06 dm-7        213180     52  885  241  118784     29  406  293
> > 257
> > 25    20      0   99
> > 12:22:06 dm-8        217088     53  424  512  122880     60  270  455
> > 489
> > 14    21      1   99
> > 12:22:06 dm-9        211968     52  465  456  119760     30  409  293
> > 379
> > 19    22      1   99
> > 12:22:06 dm-10       212992     52  364  585  121340     60  146  831
> > 655
> > 10    20      1   99
> > 12:22:06 dm-11       221184    162   54 4096  135168     33  297  455
> > 1015
> > 9    26      2   99  *** Reads reach 4MB here, often smaller though
> > 12:22:06 dm-12       229376    168  280  819  126976     93   62 2048
> > 1041
> > 6    17      2   99
> > 12:22:06 dm-13       229376    168  112 2048  131072     64  128 1024
> > 1501
> > 5    23      4   99
> > 12:22:06 dm-14       225280    110  385  585  122880     29  185  664
> > 610
> > 11    20      1   99
> > 12:22:06 dm-15       203912     50  549  371  118792     29  379  313
> > 347
> > 20    21      1   99
> > 
> > individual path view
> > 
> > ### RECORD  556 >>> jumpclient <<< (1463070126.001) (Thu May 12 12:22:06
> > 2016) ###
> > # DISK STATISTICS (/sec)
> > #
> > <---------reads---------><---------writes---------><--------averages-------->
> > Pct
> > #Time     Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size
> > RWSize
> > QLen  Wait SvcTim Util
> > 12:26:37 sdc         128092      0  348  368   44468      0  482   92
> > 207
> > 18    21      1   95
> > 12:26:37 sdd         103476      0  461  224   62956      0  266  237
> > 228
> > 14    20      1   97
> > 12:26:37 sde         118100      0  145  814   65488      0  127  516
> > 674
> > 5    19      3   97
> > 12:26:37 sdf         121660      0  651  187   53960      0  277  195
> > 189
> > 17    18      1   97
> > 12:26:37 sdg          87276      0  305  286   76020      0  191  398
> > 329
> > 10    20      1   97
> > 12:26:37 sdh          98688      0  166  595   67400      0  554  122
> > 230
> > 17    23      1   95
> > 12:26:37 sdi         197112      0  945  209   16208      0   81  200
> > 207
> > 16    15      0   99
> > 12:26:37 sdj           1776      0   16  111  143360      0   35 4096
> > 2845
> > 1    24     18   95
> > 12:26:37 sdk              0      0    0    0  139264      0  374  372
> > 372
> > 10    26      2   96
> > 12:26:37 sdl          77600      0  572  136   76624      0  268  286
> > 183
> > 17    19      1   93
> > 12:26:37 sdm          80804      0  264  306   74316      0  533  139
> > 194
> > 19    24      1   98
> > 12:26:37 sdn         101324      0  439  231   64020      0  385  166
> > 200
> > 18    22      1   98
> > 12:26:37 sdo         107180      0  130  824   73776      0  145  509
> > 658
> > 5    21      3   97
> > 12:26:37 sdp          80788      0  389  208   69376      0  339  205
> > 206
> > 17    22      1   99
> > 12:26:37 sdq         116704      0  390  299   55052      0  129  427
> > 330
> > 10    20      1   99
> > 12:26:37 sdr         113280      0  249  455   53816      0  401  134
> > 257
> > 16    24      1   99
> > 12:26:37 sds          13192      0   93  142  110768      0  291  381
> > 322
> > 10    26      2   95
> > 12:26:37 sdt         165184      0 1197  138       0      0    0    0
> > 137
> > 23    18      0   99
> > 12:26:37 sdu         241664      0   59 4096       0      0    0    0
> > 4096
> > 1    16     16   97
> > 
> 
> Bart,
> I meant to mention that that was with ib_srp untuned.
> My next set of tests will be with indirect_sg_entries=512 and
> cmd_sg_entries=64 for a start.
> Then I will max them out and see how we do.
> 
> Thanks!!!
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Replying to my own message

Tuning ib_srp and I am back to full 4MB as expected so now we should be all set.

Bart, and all, thanks for all the assistance with this.
Awesome work Bart on your part as always.

### RECORD    3 >>> jumpclient <<< (1463071707.001) (Thu May 12 12:48:27 2016) ###
# DISK STATISTICS (/sec)
#                   <---------reads---------><---------writes---------><--------averages--------> Pct
#Time     Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size  RWSize  QLen  Wait SvcTim Util
12:48:27 dm-6             0      0    0    0  286720    284   70 4096    4096     1    12     12   89
12:48:27 dm-7             0      0    0    0  290816    284   71 4096    4096     1    12     12   89
12:48:27 dm-8             0      0    0    0  286720    280   70 4096    4096     1    12     12   89
12:48:27 dm-9             0      0    0    0  294912    288   72 4096    4096     1    13     12   93
12:48:27 dm-10            0      0    0    0  290816    284   71 4096    4096     1    13     13   93
12:48:27 dm-11            0      0    0    0  286720    284   70 4096    4096     1    12     12   88
12:48:27 dm-12            0      0    0    0  290816    284   71 4096    4096     1    12     12   89
12:48:27 dm-13            0      0    0    0  290816    288   71 4096    4096     1    12     12   90
12:48:27 dm-14            0      0    0    0  286720    280   70 4096    4096     1    12     12   89
12:48:27 dm-15            0      0    0    0  282624    280   69 4096    4096     1    12     12   88
12:48:27 sdm              0      0    0    0  143360      0   35 4096    4096     1    12     12   42
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche May 12, 2016, 5 p.m. UTC | #9
On 05/12/2016 09:50 AM, Laurence Oberman wrote:
> Tuning ib_srp and I am back to full 4MB as expected so now we should be all set.
>
> Bart, and all, thanks for all the assistance with this.
> Awesome work Bart on your part as always.

Hello Laurence,

Thank you for having tested this patch series so quickly. I assume that 
this means that I can add your Tested-by when I repost this patch series?

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Laurence Oberman May 12, 2016, 5:02 p.m. UTC | #10
----- Original Message -----
> From: "Bart Van Assche" <bart.vanassche@sandisk.com>
> To: "Laurence Oberman" <loberman@redhat.com>
> Cc: leon@kernel.org, "Doug Ledford" <dledford@redhat.com>, "Christoph Hellwig" <hch@lst.de>, "Sagi Grimberg"
> <sagi@grimberg.me>, linux-rdma@vger.kernel.org, "Or Gerlitz" <ogerlitz@mellanox.com>
> Sent: Thursday, May 12, 2016 1:00:20 PM
> Subject: Re: [PATCH 4/6] IB/core: Enhance ib_map_mr_sg()
> 
> On 05/12/2016 09:50 AM, Laurence Oberman wrote:
> > Tuning ib_srp and I am back to full 4MB as expected so now we should be all
> > set.
> >
> > Bart, and all, thanks for all the assistance with this.
> > Awesome work Bart on your part as always.
> 
> Hello Laurence,
> 
> Thank you for having tested this patch series so quickly. I assume that
> this means that I can add your Tested-by when I repost this patch series?
> 
> Bart.
> 

Hello Bart

Absolutely.
And in fact now I am running with options ib_srp cmd_sg_entries=255 indirect_sg_entries=2048 and it singing along
I have never reached that before without issues.
Its rock solid from what I can see.
Thanks!!!
Laurence
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c
index 6fc50bf..1eb9b12 100644
--- a/drivers/infiniband/core/rw.c
+++ b/drivers/infiniband/core/rw.c
@@ -92,7 +92,7 @@  static int rdma_rw_init_one_mr(struct ib_qp *qp, u8 port_num,
 		reg->inv_wr.next = NULL;
 	}
 
-	ret = ib_map_mr_sg(reg->mr, sg, nents, offset, PAGE_SIZE);
+	ret = ib_map_mr_sg(reg->mr, sg, nents, &offset, PAGE_SIZE);
 	if (ret < nents) {
 		ib_mr_pool_put(qp, &qp->rdma_mrs, reg->mr);
 		return -EINVAL;
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 3d7b266..ffb9863 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1655,7 +1655,7 @@  EXPORT_SYMBOL(ib_set_vf_guid);
  * is ready for registration.
  */
 int ib_map_mr_sg(struct ib_mr *mr, struct scatterlist *sg, int sg_nents,
-		unsigned int sg_offset, unsigned int page_size)
+		 unsigned int *sg_offset, unsigned int page_size)
 {
 	if (unlikely(!mr->device->map_mr_sg))
 		return -ENOSYS;
@@ -1672,7 +1672,10 @@  EXPORT_SYMBOL(ib_map_mr_sg);
  * @mr:            memory region
  * @sgl:           dma mapped scatterlist
  * @sg_nents:      number of entries in sg
- * @sg_offset:     offset in bytes into sg
+ * @sg_offset_p:   IN:  start offset in bytes into sg
+ *                 OUT: offset in bytes for element n of the sg of the first
+ *                      byte that has not been processed where n is the return
+ *                      value of this function.
  * @set_page:      driver page assignment function pointer
  *
  * Core service helper for drivers to convert the largest
@@ -1684,19 +1687,24 @@  EXPORT_SYMBOL(ib_map_mr_sg);
  * a page vector.
  */
 int ib_sg_to_pages(struct ib_mr *mr, struct scatterlist *sgl, int sg_nents,
-		unsigned int sg_offset, int (*set_page)(struct ib_mr *, u64))
+		unsigned int *sg_offset_p, int (*set_page)(struct ib_mr *, u64))
 {
 	struct scatterlist *sg;
 	u64 last_end_dma_addr = 0;
+	unsigned int sg_offset = sg_offset_p ? *sg_offset_p : 0;
 	unsigned int last_page_off = 0;
 	u64 page_mask = ~((u64)mr->page_size - 1);
 	int i, ret;
 
+	if (unlikely(sg_nents <= 0 || sg_offset > sg_dma_len(&sgl[0])))
+		return -EINVAL;
+
 	mr->iova = sg_dma_address(&sgl[0]) + sg_offset;
 	mr->length = 0;
 
 	for_each_sg(sgl, sg, sg_nents, i) {
 		u64 dma_addr = sg_dma_address(sg) + sg_offset;
+		u64 prev_addr = dma_addr;
 		unsigned int dma_len = sg_dma_len(sg) - sg_offset;
 		u64 end_dma_addr = dma_addr + dma_len;
 		u64 page_addr = dma_addr & page_mask;
@@ -1721,8 +1729,14 @@  int ib_sg_to_pages(struct ib_mr *mr, struct scatterlist *sgl, int sg_nents,
 
 		do {
 			ret = set_page(mr, page_addr);
-			if (unlikely(ret < 0))
-				return i ? : ret;
+			if (unlikely(ret < 0)) {
+				sg_offset = prev_addr - dma_addr;
+				mr->length += sg_offset;
+				if (sg_offset_p)
+					*sg_offset_p = sg_offset;
+				return i || sg_offset ? i : ret;
+			}
+			prev_addr = page_addr;
 next_page:
 			page_addr += mr->page_size;
 		} while (page_addr < end_dma_addr);
@@ -1734,6 +1748,8 @@  next_page:
 		sg_offset = 0;
 	}
 
+	if (sg_offset_p)
+		*sg_offset_p = 0;
 	return i;
 }
 EXPORT_SYMBOL(ib_sg_to_pages);
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index 608aa0c..47cb927 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -784,7 +784,7 @@  static int iwch_set_page(struct ib_mr *ibmr, u64 addr)
 }
 
 static int iwch_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg,
-		int sg_nents, unsigned sg_offset)
+			  int sg_nents, unsigned int *sg_offset)
 {
 	struct iwch_mr *mhp = to_iwch_mr(ibmr);
 
diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
index 067cb3f..1ff3ba8 100644
--- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -918,7 +918,7 @@  struct ib_mr *c4iw_alloc_mr(struct ib_pd *pd,
 			    enum ib_mr_type mr_type,
 			    u32 max_num_sg);
 int c4iw_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
-		unsigned int sg_offset);
+		   unsigned int *sg_offset);
 int c4iw_dealloc_mw(struct ib_mw *mw);
 struct ib_mw *c4iw_alloc_mw(struct ib_pd *pd, enum ib_mw_type type,
 			    struct ib_udata *udata);
diff --git a/drivers/infiniband/hw/cxgb4/mem.c b/drivers/infiniband/hw/cxgb4/mem.c
index 38afb3d..83960df 100644
--- a/drivers/infiniband/hw/cxgb4/mem.c
+++ b/drivers/infiniband/hw/cxgb4/mem.c
@@ -691,7 +691,7 @@  static int c4iw_set_page(struct ib_mr *ibmr, u64 addr)
 }
 
 int c4iw_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
-		unsigned int sg_offset)
+		   unsigned int *sg_offset)
 {
 	struct c4iw_mr *mhp = to_c4iw_mr(ibmr);
 
diff --git a/drivers/infiniband/hw/i40iw/i40iw_verbs.c b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
index 141eaba..4a740f7 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_verbs.c
+++ b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
@@ -1574,7 +1574,7 @@  static int i40iw_set_page(struct ib_mr *ibmr, u64 addr)
  * @sg_nents: number of sg pages
  */
 static int i40iw_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg,
-		int sg_nents, unsigned int sg_offset)
+			   int sg_nents, unsigned int *sg_offset)
 {
 	struct i40iw_mr *iwmr = to_iwmr(ibmr);
 
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index ba328177..6c5ac5d 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -718,7 +718,7 @@  struct ib_mr *mlx4_ib_alloc_mr(struct ib_pd *pd,
 			       enum ib_mr_type mr_type,
 			       u32 max_num_sg);
 int mlx4_ib_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
-		unsigned int sg_offset);
+		      unsigned int *sg_offset);
 int mlx4_ib_modify_cq(struct ib_cq *cq, u16 cq_count, u16 cq_period);
 int mlx4_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata);
 struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev,
diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c
index b04f623..6312721 100644
--- a/drivers/infiniband/hw/mlx4/mr.c
+++ b/drivers/infiniband/hw/mlx4/mr.c
@@ -529,7 +529,7 @@  static int mlx4_set_page(struct ib_mr *ibmr, u64 addr)
 }
 
 int mlx4_ib_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
-		unsigned int sg_offset)
+		      unsigned int *sg_offset)
 {
 	struct mlx4_ib_mr *mr = to_mmr(ibmr);
 	int rc;
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 8c835b2..f05cf57 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -713,7 +713,7 @@  struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd,
 			       enum ib_mr_type mr_type,
 			       u32 max_num_sg);
 int mlx5_ib_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
-		unsigned int sg_offset);
+		      unsigned int *sg_offset);
 int mlx5_ib_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num,
 			const struct ib_wc *in_wc, const struct ib_grh *in_grh,
 			const struct ib_mad_hdr *in, size_t in_mad_size,
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index b678eac..8cf2ce5 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1752,10 +1752,11 @@  static int
 mlx5_ib_sg_to_klms(struct mlx5_ib_mr *mr,
 		   struct scatterlist *sgl,
 		   unsigned short sg_nents,
-		   unsigned int sg_offset)
+		   unsigned int *sg_offset_p)
 {
 	struct scatterlist *sg = sgl;
 	struct mlx5_klm *klms = mr->descs;
+	unsigned int sg_offset = sg_offset_p ? *sg_offset_p : 0;
 	u32 lkey = mr->ibmr.pd->local_dma_lkey;
 	int i;
 
@@ -1774,6 +1775,9 @@  mlx5_ib_sg_to_klms(struct mlx5_ib_mr *mr,
 		sg_offset = 0;
 	}
 
+	if (sg_offset_p)
+		*sg_offset_p = sg_offset;
+
 	return i;
 }
 
@@ -1792,7 +1796,7 @@  static int mlx5_set_page(struct ib_mr *ibmr, u64 addr)
 }
 
 int mlx5_ib_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
-		unsigned int sg_offset)
+		      unsigned int *sg_offset)
 {
 	struct mlx5_ib_mr *mr = to_mmr(ibmr);
 	int n;
diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c
index 698aab6..4ebea4c 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.c
+++ b/drivers/infiniband/hw/nes/nes_verbs.c
@@ -403,7 +403,7 @@  static int nes_set_page(struct ib_mr *ibmr, u64 addr)
 }
 
 static int nes_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg,
-		int sg_nents, unsigned int sg_offset)
+			 int sg_nents, unsigned int *sg_offset)
 {
 	struct nes_mr *nesmr = to_nesmr(ibmr);
 
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
index 9ddd550..b1a3d91 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
@@ -3082,7 +3082,7 @@  static int ocrdma_set_page(struct ib_mr *ibmr, u64 addr)
 }
 
 int ocrdma_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
-		unsigned int sg_offset)
+		     unsigned int *sg_offset)
 {
 	struct ocrdma_mr *mr = get_ocrdma_mr(ibmr);
 
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
index b290e5d..704ef1e 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
@@ -123,6 +123,6 @@  struct ib_mr *ocrdma_alloc_mr(struct ib_pd *pd,
 			      enum ib_mr_type mr_type,
 			      u32 max_num_sg);
 int ocrdma_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
-		unsigned sg_offset);
+		     unsigned int *sg_offset);
 
 #endif				/* __OCRDMA_VERBS_H__ */
diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c
index 44cc85f..90be568 100644
--- a/drivers/infiniband/ulp/iser/iser_memory.c
+++ b/drivers/infiniband/ulp/iser/iser_memory.c
@@ -236,7 +236,7 @@  int iser_fast_reg_fmr(struct iscsi_iser_task *iser_task,
 	page_vec->npages = 0;
 	page_vec->fake_mr.page_size = SIZE_4K;
 	plen = ib_sg_to_pages(&page_vec->fake_mr, mem->sg,
-			      mem->size, 0, iser_set_page);
+			      mem->size, NULL, iser_set_page);
 	if (unlikely(plen < mem->size)) {
 		iser_err("page vec too short to hold this SG\n");
 		iser_data_buf_dump(mem, device->ib_device);
@@ -446,7 +446,7 @@  static int iser_fast_reg_mr(struct iscsi_iser_task *iser_task,
 
 	ib_update_fast_reg_key(mr, ib_inc_rkey(mr->rkey));
 
-	n = ib_map_mr_sg(mr, mem->sg, mem->size, 0, SIZE_4K);
+	n = ib_map_mr_sg(mr, mem->sg, mem->size, NULL, SIZE_4K);
 	if (unlikely(n != mem->size)) {
 		iser_err("failed to map sg (%d/%d)\n",
 			 n, mem->size);
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index f4dc6f9..6440469 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1329,7 +1329,7 @@  static int srp_map_finish_fr(struct srp_map_state *state,
 	rkey = ib_inc_rkey(desc->mr->rkey);
 	ib_update_fast_reg_key(desc->mr, rkey);
 
-	n = ib_map_mr_sg(desc->mr, state->sg, sg_nents, 0, dev->mr_page_size);
+	n = ib_map_mr_sg(desc->mr, state->sg, sg_nents, NULL, dev->mr_page_size);
 	if (unlikely(n < 0)) {
 		srp_fr_pool_put(ch->fr_pool, &desc, 1);
 		pr_debug("%s: ib_map_mr_sg(%d) returned %d.\n",
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 544c55b..56bb0f3 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1848,7 +1848,7 @@  struct ib_device {
 	int                        (*map_mr_sg)(struct ib_mr *mr,
 						struct scatterlist *sg,
 						int sg_nents,
-						unsigned sg_offset);
+						unsigned int *sg_offset);
 	struct ib_mw *             (*alloc_mw)(struct ib_pd *pd,
 					       enum ib_mw_type type,
 					       struct ib_udata *udata);
@@ -3145,11 +3145,11 @@  struct net_device *ib_get_net_dev_by_params(struct ib_device *dev, u8 port,
 					    const struct sockaddr *addr);
 
 int ib_map_mr_sg(struct ib_mr *mr, struct scatterlist *sg, int sg_nents,
-		unsigned int sg_offset, unsigned int page_size);
+		 unsigned int *sg_offset, unsigned int page_size);
 
 static inline int
 ib_map_mr_sg_zbva(struct ib_mr *mr, struct scatterlist *sg, int sg_nents,
-		unsigned int sg_offset, unsigned int page_size)
+		  unsigned int *sg_offset, unsigned int page_size)
 {
 	int n;
 
@@ -3160,7 +3160,7 @@  ib_map_mr_sg_zbva(struct ib_mr *mr, struct scatterlist *sg, int sg_nents,
 }
 
 int ib_sg_to_pages(struct ib_mr *mr, struct scatterlist *sgl, int sg_nents,
-		unsigned int sg_offset, int (*set_page)(struct ib_mr *, u64));
+		unsigned int *sg_offset, int (*set_page)(struct ib_mr *, u64));
 
 void ib_drain_rq(struct ib_qp *qp);
 void ib_drain_sq(struct ib_qp *qp);
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 3274a4a..94c3fa9 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -421,7 +421,7 @@  frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
 		return -ENOMEM;
 	}
 
-	n = ib_map_mr_sg(mr, frmr->sg, frmr->sg_nents, 0, PAGE_SIZE);
+	n = ib_map_mr_sg(mr, frmr->sg, frmr->sg_nents, NULL, PAGE_SIZE);
 	if (unlikely(n != frmr->sg_nents)) {
 		pr_err("RPC:       %s: failed to map mr %p (%u/%u)\n",
 		       __func__, frmr->fr_mr, n, frmr->sg_nents);
diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index 19a74e9..fbe7444 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -281,7 +281,7 @@  int rdma_read_chunk_frmr(struct svcxprt_rdma *xprt,
 	}
 	atomic_inc(&xprt->sc_dma_used);
 
-	n = ib_map_mr_sg(frmr->mr, frmr->sg, frmr->sg_nents, 0, PAGE_SIZE);
+	n = ib_map_mr_sg(frmr->mr, frmr->sg, frmr->sg_nents, NULL, PAGE_SIZE);
 	if (unlikely(n != frmr->sg_nents)) {
 		pr_err("svcrdma: failed to map mr %p (%d/%d elements)\n",
 		       frmr->mr, n, frmr->sg_nents);