Message ID | 57327981.4080404@sandisk.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
On Tue, May 10, 2016 at 05:14:57PM -0700, Bart Van Assche wrote: > The SRP initiator allows to set max_sectors to a value that exceeds > the largest amount of data that can be mapped at once with an mlx4 > HCA using fast registration and a page size of 4 KB. Hence modify > ib_map_mr_sg() such that it can map partial sg-elements. If an > sg-element has been mapped partially, let the caller know > which fraction has been mapped by adjusting *sg_offset. > > Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> > Cc: Christoph Hellwig <hch@lst.de> > Cc: Sagi Grimberg <sagi@grimberg.me> > Cc: Laurence Oberman <loberman@redhat.com> > --- > --- a/drivers/infiniband/hw/mlx5/mr.c > +++ b/drivers/infiniband/hw/mlx5/mr.c > @@ -1752,10 +1752,11 @@ static int > mlx5_ib_sg_to_klms(struct mlx5_ib_mr *mr, > struct scatterlist *sgl, > unsigned short sg_nents, > - unsigned int sg_offset) > + unsigned int *sg_offset_p) > { I wonder on which tree are you basing? In Linus (4.6-rc7) the function signature is different [1], the same goes for my tree and Doug's for-4.7 branch [2]. [1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/infiniband/hw/mlx5/mr.c#n1752 [2] https://github.com/dledford/linux/blob/k.o/for-4.7/drivers/infiniband/hw/mlx5/mr.c#L1752
On 05/11/2016 12:54 AM, Leon Romanovsky wrote: > On Tue, May 10, 2016 at 05:14:57PM -0700, Bart Van Assche wrote: >> The SRP initiator allows to set max_sectors to a value that exceeds >> the largest amount of data that can be mapped at once with an mlx4 >> HCA using fast registration and a page size of 4 KB. Hence modify >> ib_map_mr_sg() such that it can map partial sg-elements. If an >> sg-element has been mapped partially, let the caller know >> which fraction has been mapped by adjusting *sg_offset. >> >> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> >> Cc: Christoph Hellwig <hch@lst.de> >> Cc: Sagi Grimberg <sagi@grimberg.me> >> Cc: Laurence Oberman <loberman@redhat.com> >> --- >> --- a/drivers/infiniband/hw/mlx5/mr.c >> +++ b/drivers/infiniband/hw/mlx5/mr.c >> @@ -1752,10 +1752,11 @@ static int >> mlx5_ib_sg_to_klms(struct mlx5_ib_mr *mr, >> struct scatterlist *sgl, >> unsigned short sg_nents, >> - unsigned int sg_offset) >> + unsigned int *sg_offset_p) >> { > > I wonder on which tree are you basing? > In Linus (4.6-rc7) the function signature is different [1], the same > goes for my tree and Doug's for-4.7 branch [2]. Hello Leon, Sorry that I hadn't mentioned this explicitly in the cover letter of this patch series but this patch series is based on Christoph's generic RDMA READ/WRITE API work. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
----- Original Message ----- > From: "Bart Van Assche" <bart.vanassche@sandisk.com> > To: leon@kernel.org > Cc: "Doug Ledford" <dledford@redhat.com>, "Christoph Hellwig" <hch@lst.de>, "Sagi Grimberg" <sagi@grimberg.me>, > "Laurence Oberman" <loberman@redhat.com>, linux-rdma@vger.kernel.org, "Or Gerlitz" <ogerlitz@mellanox.com> > Sent: Wednesday, May 11, 2016 11:22:29 AM > Subject: Re: [PATCH 4/6] IB/core: Enhance ib_map_mr_sg() > > On 05/11/2016 12:54 AM, Leon Romanovsky wrote: > > On Tue, May 10, 2016 at 05:14:57PM -0700, Bart Van Assche wrote: > >> The SRP initiator allows to set max_sectors to a value that exceeds > >> the largest amount of data that can be mapped at once with an mlx4 > >> HCA using fast registration and a page size of 4 KB. Hence modify > >> ib_map_mr_sg() such that it can map partial sg-elements. If an > >> sg-element has been mapped partially, let the caller know > >> which fraction has been mapped by adjusting *sg_offset. > >> > >> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> > >> Cc: Christoph Hellwig <hch@lst.de> > >> Cc: Sagi Grimberg <sagi@grimberg.me> > >> Cc: Laurence Oberman <loberman@redhat.com> > >> --- > >> --- a/drivers/infiniband/hw/mlx5/mr.c > >> +++ b/drivers/infiniband/hw/mlx5/mr.c > >> @@ -1752,10 +1752,11 @@ static int > >> mlx5_ib_sg_to_klms(struct mlx5_ib_mr *mr, > >> struct scatterlist *sgl, > >> unsigned short sg_nents, > >> - unsigned int sg_offset) > >> + unsigned int *sg_offset_p) > >> { > > > > I wonder on which tree are you basing? > > In Linus (4.6-rc7) the function signature is different [1], the same > > goes for my tree and Doug's for-4.7 branch [2]. > > Hello Leon, > > Sorry that I hadn't mentioned this explicitly in the cover letter of > this patch series but this patch series is based on Christoph's generic > RDMA READ/WRITE API work. > > Bart. > > I chased that for a while too.:) Landed up pulling the latest next, applying all of Christoph's 11 RDMA patches, then the first 11 of Barts and the latest 6. I had to hand fix some stuff. Kernel is building now for testing :) -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/11/2016 08:31 AM, Laurence Oberman wrote: > I chased that for a while too.:) > Landed up pulling the latest next, applying all of Christoph's 11 RDMA patches, then the first 11 of Barts and the latest 6. > I had to hand fix some stuff. > Kernel is building now for testing :) Hello Laurence, Please wait with starting your tests until I have made a kernel tree with this patch series available. Thanks, Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
----- Original Message ----- > From: "Bart Van Assche" <bart.vanassche@sandisk.com> > To: "Laurence Oberman" <loberman@redhat.com> > Cc: leon@kernel.org, "Doug Ledford" <dledford@redhat.com>, "Christoph Hellwig" <hch@lst.de>, "Sagi Grimberg" > <sagi@grimberg.me>, linux-rdma@vger.kernel.org, "Or Gerlitz" <ogerlitz@mellanox.com> > Sent: Wednesday, May 11, 2016 11:41:39 AM > Subject: Re: [PATCH 4/6] IB/core: Enhance ib_map_mr_sg() > > On 05/11/2016 08:31 AM, Laurence Oberman wrote: > > I chased that for a while too.:) > > Landed up pulling the latest next, applying all of Christoph's 11 RDMA > > patches, then the first 11 of Barts and the latest 6. > > I had to hand fix some stuff. > > Kernel is building now for testing :) > > Hello Laurence, > > Please wait with starting your tests until I have made a kernel tree > with this patch series available. > > Thanks, > > Bart. > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Hello Bart I had started already, and its looking awesomely stable so far. Awesome work from all of you guys. ### RECORD 84 >>> jumpclient <<< (1462981973.001) (Wed May 11 11:52:53 2016) ### # DISK STATISTICS (/sec) # <---------reads---------><---------writes---------><--------averages--------> Pct #Time Name KBytes Merged IOs Size KBytes Merged IOs Size RWSize QLen Wait SvcTim Util 11:52:53 sdc 0 0 0 0 163840 0 40 4096 4096 1 10 10 42 11:52:53 dm-6 0 0 0 0 327680 320 80 4096 4096 1 11 11 90 11:52:53 sdd 0 0 0 0 176128 0 43 4096 4096 1 10 10 44 11:52:53 dm-7 0 0 0 0 348160 336 85 4096 4096 1 11 10 92 11:52:53 sde 0 0 0 0 159744 0 39 4096 4096 1 11 11 43 11:52:53 dm-8 0 0 0 0 319488 312 78 4096 4096 1 11 11 89 11:52:53 sdf 4 0 1 4 167936 0 41 4096 3998 1 10 10 44 11:52:53 sdg 4 0 1 4 163840 0 40 4096 3996 1 10 10 44 11:52:53 dm-9 0 0 0 0 335872 328 82 4096 4096 1 11 11 91 11:52:53 dm-10 0 0 0 0 331776 324 81 4096 4096 1 11 11 91 11:52:53 sdh 4 0 1 4 159744 0 39 4096 3993 1 11 11 45 11:52:53 dm-11 0 0 0 0 319488 308 78 4096 4096 1 11 11 91 11:52:53 sdi 0 0 0 0 167936 0 41 4096 4096 1 10 10 43 11:52:53 dm-12 0 0 0 0 335872 328 82 4096 4096 1 11 11 93 11:52:53 sdj 0 0 0 0 172032 0 42 4096 4096 1 10 10 44 11:52:53 dm-13 0 0 0 0 344064 332 84 4096 4096 1 10 10 91 11:52:53 sdk 0 0 0 0 176128 0 43 4096 4096 1 11 11 47 11:52:53 dm-14 0 0 0 0 352256 344 86 4096 4096 1 10 10 91 11:52:53 sdl 0 0 0 0 163840 0 40 4096 4096 1 10 11 43 11:52:53 dm-15 0 0 0 0 331776 324 81 4096 4096 1 11 11 91 11:52:53 sdm 0 0 0 0 163840 0 40 4096 4096 1 11 11 45 11:52:53 sdn 0 0 0 0 172032 0 42 4096 4096 1 10 10 45 11:52:53 sdo 0 0 0 0 159744 0 39 4096 4096 1 11 11 44 11:52:53 sdp 4 0 1 4 167936 0 41 4096 3998 1 10 10 45 11:52:53 sdq 0 0 0 0 167936 0 41 4096 4096 1 11 11 45 11:52:53 sdr 4 0 1 4 159744 0 39 4096 3993 1 11 10 43 11:52:53 sds 0 0 0 0 167936 0 41 4096 4096 1 11 11 46 11:52:53 sdt 0 0 0 0 172032 0 42 4096 4096 1 10 10 44 11:52:53 sdu 0 0 0 0 176128 0 43 4096 4096 1 9 9 42 11:52:53 sdv 0 0 0 0 167936 0 41 4096 4096 1 11 11 45 Whoop, 3.2GBytes/sec and no errors :) # <----CPU[HYPER]-----><----------Disks-----------><----------Network----------> #Time cpu sys inter ctxsw KBRead Reads KBWrit Writes KBIn PktIn KBOut PktOut 11:55:21 7 7 2652 7721 12 3 3223K 974 0 3 0 3 11:55:22 7 7 2714 7984 4 1 3336K 834 0 1 0 1 11:55:23 6 6 2545 7698 0 0 3216K 804 0 1 0 1 11:55:24 7 7 2576 7455 0 0 3012K 758 0 3 0 1 11:55:25 6 6 2717 8096 24 6 3314K 900 0 1 0 1 11:55:26 7 7 2651 7807 0 0 3118K 955 1 9 2 11 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
----- Original Message ----- > From: "Bart Van Assche" <bart.vanassche@sandisk.com> > To: "Laurence Oberman" <loberman@redhat.com> > Cc: leon@kernel.org, "Doug Ledford" <dledford@redhat.com>, "Christoph Hellwig" <hch@lst.de>, "Sagi Grimberg" > <sagi@grimberg.me>, linux-rdma@vger.kernel.org, "Or Gerlitz" <ogerlitz@mellanox.com> > Sent: Wednesday, May 11, 2016 11:41:39 AM > Subject: Re: [PATCH 4/6] IB/core: Enhance ib_map_mr_sg() > > On 05/11/2016 08:31 AM, Laurence Oberman wrote: > > I chased that for a while too.:) > > Landed up pulling the latest next, applying all of Christoph's 11 RDMA > > patches, then the first 11 of Barts and the latest 6. > > I had to hand fix some stuff. > > Kernel is building now for testing :) > > Hello Laurence, > > Please wait with starting your tests until I have made a kernel tree > with this patch series available. > > Thanks, > > Bart. > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > For Barts latest set of patches see subject, using Barts's tree the mapping failures are gone and its run for over 24 hours stable. This is with multiple parallel reads of 4MB issued direct and multiple parallel writes to the same mpath devices issued buffered. The only variation from what I am used to seeing (when its not failing :) on prior ib_srp) is that the I/O sizes reach 4MB often but are also often smaller. Some of this could be issues with my LIO target and next week will have an enterprise array directly connected I will be testing with. Its a huge improvement and seems good to me as I know these failures very well. Tested-by: Laurence Oberman <loberman@redhat.com> Example with direct reads and buffered writes mpath view ### RECORD 556 >>> jumpclient <<< (1463070126.001) (Thu May 12 12:22:06 2016) ### # DISK STATISTICS (/sec) # <---------reads---------><---------writes---------><--------averages--------> Pct #Time Name KBytes Merged IOs Size KBytes Merged IOs Size RWSize QLen Wait SvcTim Util 12:22:06 dm-6 212992 128 247 862 130200 64 126 1033 920 7 20 2 99 12:22:06 dm-7 213180 52 885 241 118784 29 406 293 257 25 20 0 99 12:22:06 dm-8 217088 53 424 512 122880 60 270 455 489 14 21 1 99 12:22:06 dm-9 211968 52 465 456 119760 30 409 293 379 19 22 1 99 12:22:06 dm-10 212992 52 364 585 121340 60 146 831 655 10 20 1 99 12:22:06 dm-11 221184 162 54 4096 135168 33 297 455 1015 9 26 2 99 *** Reads reach 4MB here, often smaller though 12:22:06 dm-12 229376 168 280 819 126976 93 62 2048 1041 6 17 2 99 12:22:06 dm-13 229376 168 112 2048 131072 64 128 1024 1501 5 23 4 99 12:22:06 dm-14 225280 110 385 585 122880 29 185 664 610 11 20 1 99 12:22:06 dm-15 203912 50 549 371 118792 29 379 313 347 20 21 1 99 individual path view ### RECORD 556 >>> jumpclient <<< (1463070126.001) (Thu May 12 12:22:06 2016) ### # DISK STATISTICS (/sec) # <---------reads---------><---------writes---------><--------averages--------> Pct #Time Name KBytes Merged IOs Size KBytes Merged IOs Size RWSize QLen Wait SvcTim Util 12:26:37 sdc 128092 0 348 368 44468 0 482 92 207 18 21 1 95 12:26:37 sdd 103476 0 461 224 62956 0 266 237 228 14 20 1 97 12:26:37 sde 118100 0 145 814 65488 0 127 516 674 5 19 3 97 12:26:37 sdf 121660 0 651 187 53960 0 277 195 189 17 18 1 97 12:26:37 sdg 87276 0 305 286 76020 0 191 398 329 10 20 1 97 12:26:37 sdh 98688 0 166 595 67400 0 554 122 230 17 23 1 95 12:26:37 sdi 197112 0 945 209 16208 0 81 200 207 16 15 0 99 12:26:37 sdj 1776 0 16 111 143360 0 35 4096 2845 1 24 18 95 12:26:37 sdk 0 0 0 0 139264 0 374 372 372 10 26 2 96 12:26:37 sdl 77600 0 572 136 76624 0 268 286 183 17 19 1 93 12:26:37 sdm 80804 0 264 306 74316 0 533 139 194 19 24 1 98 12:26:37 sdn 101324 0 439 231 64020 0 385 166 200 18 22 1 98 12:26:37 sdo 107180 0 130 824 73776 0 145 509 658 5 21 3 97 12:26:37 sdp 80788 0 389 208 69376 0 339 205 206 17 22 1 99 12:26:37 sdq 116704 0 390 299 55052 0 129 427 330 10 20 1 99 12:26:37 sdr 113280 0 249 455 53816 0 401 134 257 16 24 1 99 12:26:37 sds 13192 0 93 142 110768 0 291 381 322 10 26 2 95 12:26:37 sdt 165184 0 1197 138 0 0 0 0 137 23 18 0 99 12:26:37 sdu 241664 0 59 4096 0 0 0 0 4096 1 16 16 97 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
----- Original Message ----- > From: "Laurence Oberman" <loberman@redhat.com> > To: "Bart Van Assche" <bart.vanassche@sandisk.com> > Cc: leon@kernel.org, "Doug Ledford" <dledford@redhat.com>, "Christoph Hellwig" <hch@lst.de>, "Sagi Grimberg" > <sagi@grimberg.me>, linux-rdma@vger.kernel.org, "Or Gerlitz" <ogerlitz@mellanox.com> > Sent: Thursday, May 12, 2016 12:28:34 PM > Subject: Re: [PATCH 4/6] IB/core: Enhance ib_map_mr_sg() > > > > ----- Original Message ----- > > From: "Bart Van Assche" <bart.vanassche@sandisk.com> > > To: "Laurence Oberman" <loberman@redhat.com> > > Cc: leon@kernel.org, "Doug Ledford" <dledford@redhat.com>, "Christoph > > Hellwig" <hch@lst.de>, "Sagi Grimberg" > > <sagi@grimberg.me>, linux-rdma@vger.kernel.org, "Or Gerlitz" > > <ogerlitz@mellanox.com> > > Sent: Wednesday, May 11, 2016 11:41:39 AM > > Subject: Re: [PATCH 4/6] IB/core: Enhance ib_map_mr_sg() > > > > On 05/11/2016 08:31 AM, Laurence Oberman wrote: > > > I chased that for a while too.:) > > > Landed up pulling the latest next, applying all of Christoph's 11 RDMA > > > patches, then the first 11 of Barts and the latest 6. > > > I had to hand fix some stuff. > > > Kernel is building now for testing :) > > > > Hello Laurence, > > > > Please wait with starting your tests until I have made a kernel tree > > with this patch series available. > > > > Thanks, > > > > Bart. > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > For Barts latest set of patches see subject, using Barts's tree the mapping > failures are gone and its run for over 24 hours stable. > This is with multiple parallel reads of 4MB issued direct and multiple > parallel writes to the same mpath devices issued buffered. > The only variation from what I am used to seeing (when its not failing :) on > prior ib_srp) is that the I/O sizes reach 4MB often but are also often > smaller. > Some of this could be issues with my LIO target and next week will have an > enterprise array directly connected I will be testing with. > > Its a huge improvement and seems good to me as I know these failures very > well. > > Tested-by: Laurence Oberman <loberman@redhat.com> > > Example with direct reads and buffered writes > > mpath view > > ### RECORD 556 >>> jumpclient <<< (1463070126.001) (Thu May 12 12:22:06 > 2016) ### > # DISK STATISTICS (/sec) > # > <---------reads---------><---------writes---------><--------averages--------> > Pct > #Time Name KBytes Merged IOs Size KBytes Merged IOs Size RWSize > QLen Wait SvcTim Util > 12:22:06 dm-6 212992 128 247 862 130200 64 126 1033 920 > 7 20 2 99 > 12:22:06 dm-7 213180 52 885 241 118784 29 406 293 257 > 25 20 0 99 > 12:22:06 dm-8 217088 53 424 512 122880 60 270 455 489 > 14 21 1 99 > 12:22:06 dm-9 211968 52 465 456 119760 30 409 293 379 > 19 22 1 99 > 12:22:06 dm-10 212992 52 364 585 121340 60 146 831 655 > 10 20 1 99 > 12:22:06 dm-11 221184 162 54 4096 135168 33 297 455 1015 > 9 26 2 99 *** Reads reach 4MB here, often smaller though > 12:22:06 dm-12 229376 168 280 819 126976 93 62 2048 1041 > 6 17 2 99 > 12:22:06 dm-13 229376 168 112 2048 131072 64 128 1024 1501 > 5 23 4 99 > 12:22:06 dm-14 225280 110 385 585 122880 29 185 664 610 > 11 20 1 99 > 12:22:06 dm-15 203912 50 549 371 118792 29 379 313 347 > 20 21 1 99 > > individual path view > > ### RECORD 556 >>> jumpclient <<< (1463070126.001) (Thu May 12 12:22:06 > 2016) ### > # DISK STATISTICS (/sec) > # > <---------reads---------><---------writes---------><--------averages--------> > Pct > #Time Name KBytes Merged IOs Size KBytes Merged IOs Size RWSize > QLen Wait SvcTim Util > 12:26:37 sdc 128092 0 348 368 44468 0 482 92 207 > 18 21 1 95 > 12:26:37 sdd 103476 0 461 224 62956 0 266 237 228 > 14 20 1 97 > 12:26:37 sde 118100 0 145 814 65488 0 127 516 674 > 5 19 3 97 > 12:26:37 sdf 121660 0 651 187 53960 0 277 195 189 > 17 18 1 97 > 12:26:37 sdg 87276 0 305 286 76020 0 191 398 329 > 10 20 1 97 > 12:26:37 sdh 98688 0 166 595 67400 0 554 122 230 > 17 23 1 95 > 12:26:37 sdi 197112 0 945 209 16208 0 81 200 207 > 16 15 0 99 > 12:26:37 sdj 1776 0 16 111 143360 0 35 4096 2845 > 1 24 18 95 > 12:26:37 sdk 0 0 0 0 139264 0 374 372 372 > 10 26 2 96 > 12:26:37 sdl 77600 0 572 136 76624 0 268 286 183 > 17 19 1 93 > 12:26:37 sdm 80804 0 264 306 74316 0 533 139 194 > 19 24 1 98 > 12:26:37 sdn 101324 0 439 231 64020 0 385 166 200 > 18 22 1 98 > 12:26:37 sdo 107180 0 130 824 73776 0 145 509 658 > 5 21 3 97 > 12:26:37 sdp 80788 0 389 208 69376 0 339 205 206 > 17 22 1 99 > 12:26:37 sdq 116704 0 390 299 55052 0 129 427 330 > 10 20 1 99 > 12:26:37 sdr 113280 0 249 455 53816 0 401 134 257 > 16 24 1 99 > 12:26:37 sds 13192 0 93 142 110768 0 291 381 322 > 10 26 2 95 > 12:26:37 sdt 165184 0 1197 138 0 0 0 0 137 > 23 18 0 99 > 12:26:37 sdu 241664 0 59 4096 0 0 0 0 4096 > 1 16 16 97 > Bart, I meant to mention that that was with ib_srp untuned. My next set of tests will be with indirect_sg_entries=512 and cmd_sg_entries=64 for a start. Then I will max them out and see how we do. Thanks!!! -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
----- Original Message ----- > From: "Laurence Oberman" <loberman@redhat.com> > To: "Bart Van Assche" <bart.vanassche@sandisk.com> > Cc: leon@kernel.org, "Doug Ledford" <dledford@redhat.com>, "Christoph Hellwig" <hch@lst.de>, "Sagi Grimberg" > <sagi@grimberg.me>, linux-rdma@vger.kernel.org, "Or Gerlitz" <ogerlitz@mellanox.com> > Sent: Thursday, May 12, 2016 12:38:13 PM > Subject: Re: [PATCH 4/6] IB/core: Enhance ib_map_mr_sg() > > > > ----- Original Message ----- > > From: "Laurence Oberman" <loberman@redhat.com> > > To: "Bart Van Assche" <bart.vanassche@sandisk.com> > > Cc: leon@kernel.org, "Doug Ledford" <dledford@redhat.com>, "Christoph > > Hellwig" <hch@lst.de>, "Sagi Grimberg" > > <sagi@grimberg.me>, linux-rdma@vger.kernel.org, "Or Gerlitz" > > <ogerlitz@mellanox.com> > > Sent: Thursday, May 12, 2016 12:28:34 PM > > Subject: Re: [PATCH 4/6] IB/core: Enhance ib_map_mr_sg() > > > > > > > > ----- Original Message ----- > > > From: "Bart Van Assche" <bart.vanassche@sandisk.com> > > > To: "Laurence Oberman" <loberman@redhat.com> > > > Cc: leon@kernel.org, "Doug Ledford" <dledford@redhat.com>, "Christoph > > > Hellwig" <hch@lst.de>, "Sagi Grimberg" > > > <sagi@grimberg.me>, linux-rdma@vger.kernel.org, "Or Gerlitz" > > > <ogerlitz@mellanox.com> > > > Sent: Wednesday, May 11, 2016 11:41:39 AM > > > Subject: Re: [PATCH 4/6] IB/core: Enhance ib_map_mr_sg() > > > > > > On 05/11/2016 08:31 AM, Laurence Oberman wrote: > > > > I chased that for a while too.:) > > > > Landed up pulling the latest next, applying all of Christoph's 11 RDMA > > > > patches, then the first 11 of Barts and the latest 6. > > > > I had to hand fix some stuff. > > > > Kernel is building now for testing :) > > > > > > Hello Laurence, > > > > > > Please wait with starting your tests until I have made a kernel tree > > > with this patch series available. > > > > > > Thanks, > > > > > > Bart. > > > > > > -- > > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > > > the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > > For Barts latest set of patches see subject, using Barts's tree the mapping > > failures are gone and its run for over 24 hours stable. > > This is with multiple parallel reads of 4MB issued direct and multiple > > parallel writes to the same mpath devices issued buffered. > > The only variation from what I am used to seeing (when its not failing :) > > on > > prior ib_srp) is that the I/O sizes reach 4MB often but are also often > > smaller. > > Some of this could be issues with my LIO target and next week will have an > > enterprise array directly connected I will be testing with. > > > > Its a huge improvement and seems good to me as I know these failures very > > well. > > > > Tested-by: Laurence Oberman <loberman@redhat.com> > > > > Example with direct reads and buffered writes > > > > mpath view > > > > ### RECORD 556 >>> jumpclient <<< (1463070126.001) (Thu May 12 12:22:06 > > 2016) ### > > # DISK STATISTICS (/sec) > > # > > <---------reads---------><---------writes---------><--------averages--------> > > Pct > > #Time Name KBytes Merged IOs Size KBytes Merged IOs Size > > RWSize > > QLen Wait SvcTim Util > > 12:22:06 dm-6 212992 128 247 862 130200 64 126 1033 > > 920 > > 7 20 2 99 > > 12:22:06 dm-7 213180 52 885 241 118784 29 406 293 > > 257 > > 25 20 0 99 > > 12:22:06 dm-8 217088 53 424 512 122880 60 270 455 > > 489 > > 14 21 1 99 > > 12:22:06 dm-9 211968 52 465 456 119760 30 409 293 > > 379 > > 19 22 1 99 > > 12:22:06 dm-10 212992 52 364 585 121340 60 146 831 > > 655 > > 10 20 1 99 > > 12:22:06 dm-11 221184 162 54 4096 135168 33 297 455 > > 1015 > > 9 26 2 99 *** Reads reach 4MB here, often smaller though > > 12:22:06 dm-12 229376 168 280 819 126976 93 62 2048 > > 1041 > > 6 17 2 99 > > 12:22:06 dm-13 229376 168 112 2048 131072 64 128 1024 > > 1501 > > 5 23 4 99 > > 12:22:06 dm-14 225280 110 385 585 122880 29 185 664 > > 610 > > 11 20 1 99 > > 12:22:06 dm-15 203912 50 549 371 118792 29 379 313 > > 347 > > 20 21 1 99 > > > > individual path view > > > > ### RECORD 556 >>> jumpclient <<< (1463070126.001) (Thu May 12 12:22:06 > > 2016) ### > > # DISK STATISTICS (/sec) > > # > > <---------reads---------><---------writes---------><--------averages--------> > > Pct > > #Time Name KBytes Merged IOs Size KBytes Merged IOs Size > > RWSize > > QLen Wait SvcTim Util > > 12:26:37 sdc 128092 0 348 368 44468 0 482 92 > > 207 > > 18 21 1 95 > > 12:26:37 sdd 103476 0 461 224 62956 0 266 237 > > 228 > > 14 20 1 97 > > 12:26:37 sde 118100 0 145 814 65488 0 127 516 > > 674 > > 5 19 3 97 > > 12:26:37 sdf 121660 0 651 187 53960 0 277 195 > > 189 > > 17 18 1 97 > > 12:26:37 sdg 87276 0 305 286 76020 0 191 398 > > 329 > > 10 20 1 97 > > 12:26:37 sdh 98688 0 166 595 67400 0 554 122 > > 230 > > 17 23 1 95 > > 12:26:37 sdi 197112 0 945 209 16208 0 81 200 > > 207 > > 16 15 0 99 > > 12:26:37 sdj 1776 0 16 111 143360 0 35 4096 > > 2845 > > 1 24 18 95 > > 12:26:37 sdk 0 0 0 0 139264 0 374 372 > > 372 > > 10 26 2 96 > > 12:26:37 sdl 77600 0 572 136 76624 0 268 286 > > 183 > > 17 19 1 93 > > 12:26:37 sdm 80804 0 264 306 74316 0 533 139 > > 194 > > 19 24 1 98 > > 12:26:37 sdn 101324 0 439 231 64020 0 385 166 > > 200 > > 18 22 1 98 > > 12:26:37 sdo 107180 0 130 824 73776 0 145 509 > > 658 > > 5 21 3 97 > > 12:26:37 sdp 80788 0 389 208 69376 0 339 205 > > 206 > > 17 22 1 99 > > 12:26:37 sdq 116704 0 390 299 55052 0 129 427 > > 330 > > 10 20 1 99 > > 12:26:37 sdr 113280 0 249 455 53816 0 401 134 > > 257 > > 16 24 1 99 > > 12:26:37 sds 13192 0 93 142 110768 0 291 381 > > 322 > > 10 26 2 95 > > 12:26:37 sdt 165184 0 1197 138 0 0 0 0 > > 137 > > 23 18 0 99 > > 12:26:37 sdu 241664 0 59 4096 0 0 0 0 > > 4096 > > 1 16 16 97 > > > > Bart, > I meant to mention that that was with ib_srp untuned. > My next set of tests will be with indirect_sg_entries=512 and > cmd_sg_entries=64 for a start. > Then I will max them out and see how we do. > > Thanks!!! > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Replying to my own message Tuning ib_srp and I am back to full 4MB as expected so now we should be all set. Bart, and all, thanks for all the assistance with this. Awesome work Bart on your part as always. ### RECORD 3 >>> jumpclient <<< (1463071707.001) (Thu May 12 12:48:27 2016) ### # DISK STATISTICS (/sec) # <---------reads---------><---------writes---------><--------averages--------> Pct #Time Name KBytes Merged IOs Size KBytes Merged IOs Size RWSize QLen Wait SvcTim Util 12:48:27 dm-6 0 0 0 0 286720 284 70 4096 4096 1 12 12 89 12:48:27 dm-7 0 0 0 0 290816 284 71 4096 4096 1 12 12 89 12:48:27 dm-8 0 0 0 0 286720 280 70 4096 4096 1 12 12 89 12:48:27 dm-9 0 0 0 0 294912 288 72 4096 4096 1 13 12 93 12:48:27 dm-10 0 0 0 0 290816 284 71 4096 4096 1 13 13 93 12:48:27 dm-11 0 0 0 0 286720 284 70 4096 4096 1 12 12 88 12:48:27 dm-12 0 0 0 0 290816 284 71 4096 4096 1 12 12 89 12:48:27 dm-13 0 0 0 0 290816 288 71 4096 4096 1 12 12 90 12:48:27 dm-14 0 0 0 0 286720 280 70 4096 4096 1 12 12 89 12:48:27 dm-15 0 0 0 0 282624 280 69 4096 4096 1 12 12 88 12:48:27 sdm 0 0 0 0 143360 0 35 4096 4096 1 12 12 42 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/12/2016 09:50 AM, Laurence Oberman wrote: > Tuning ib_srp and I am back to full 4MB as expected so now we should be all set. > > Bart, and all, thanks for all the assistance with this. > Awesome work Bart on your part as always. Hello Laurence, Thank you for having tested this patch series so quickly. I assume that this means that I can add your Tested-by when I repost this patch series? Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
----- Original Message ----- > From: "Bart Van Assche" <bart.vanassche@sandisk.com> > To: "Laurence Oberman" <loberman@redhat.com> > Cc: leon@kernel.org, "Doug Ledford" <dledford@redhat.com>, "Christoph Hellwig" <hch@lst.de>, "Sagi Grimberg" > <sagi@grimberg.me>, linux-rdma@vger.kernel.org, "Or Gerlitz" <ogerlitz@mellanox.com> > Sent: Thursday, May 12, 2016 1:00:20 PM > Subject: Re: [PATCH 4/6] IB/core: Enhance ib_map_mr_sg() > > On 05/12/2016 09:50 AM, Laurence Oberman wrote: > > Tuning ib_srp and I am back to full 4MB as expected so now we should be all > > set. > > > > Bart, and all, thanks for all the assistance with this. > > Awesome work Bart on your part as always. > > Hello Laurence, > > Thank you for having tested this patch series so quickly. I assume that > this means that I can add your Tested-by when I repost this patch series? > > Bart. > Hello Bart Absolutely. And in fact now I am running with options ib_srp cmd_sg_entries=255 indirect_sg_entries=2048 and it singing along I have never reached that before without issues. Its rock solid from what I can see. Thanks!!! Laurence -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c index 6fc50bf..1eb9b12 100644 --- a/drivers/infiniband/core/rw.c +++ b/drivers/infiniband/core/rw.c @@ -92,7 +92,7 @@ static int rdma_rw_init_one_mr(struct ib_qp *qp, u8 port_num, reg->inv_wr.next = NULL; } - ret = ib_map_mr_sg(reg->mr, sg, nents, offset, PAGE_SIZE); + ret = ib_map_mr_sg(reg->mr, sg, nents, &offset, PAGE_SIZE); if (ret < nents) { ib_mr_pool_put(qp, &qp->rdma_mrs, reg->mr); return -EINVAL; diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 3d7b266..ffb9863 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -1655,7 +1655,7 @@ EXPORT_SYMBOL(ib_set_vf_guid); * is ready for registration. */ int ib_map_mr_sg(struct ib_mr *mr, struct scatterlist *sg, int sg_nents, - unsigned int sg_offset, unsigned int page_size) + unsigned int *sg_offset, unsigned int page_size) { if (unlikely(!mr->device->map_mr_sg)) return -ENOSYS; @@ -1672,7 +1672,10 @@ EXPORT_SYMBOL(ib_map_mr_sg); * @mr: memory region * @sgl: dma mapped scatterlist * @sg_nents: number of entries in sg - * @sg_offset: offset in bytes into sg + * @sg_offset_p: IN: start offset in bytes into sg + * OUT: offset in bytes for element n of the sg of the first + * byte that has not been processed where n is the return + * value of this function. * @set_page: driver page assignment function pointer * * Core service helper for drivers to convert the largest @@ -1684,19 +1687,24 @@ EXPORT_SYMBOL(ib_map_mr_sg); * a page vector. */ int ib_sg_to_pages(struct ib_mr *mr, struct scatterlist *sgl, int sg_nents, - unsigned int sg_offset, int (*set_page)(struct ib_mr *, u64)) + unsigned int *sg_offset_p, int (*set_page)(struct ib_mr *, u64)) { struct scatterlist *sg; u64 last_end_dma_addr = 0; + unsigned int sg_offset = sg_offset_p ? *sg_offset_p : 0; unsigned int last_page_off = 0; u64 page_mask = ~((u64)mr->page_size - 1); int i, ret; + if (unlikely(sg_nents <= 0 || sg_offset > sg_dma_len(&sgl[0]))) + return -EINVAL; + mr->iova = sg_dma_address(&sgl[0]) + sg_offset; mr->length = 0; for_each_sg(sgl, sg, sg_nents, i) { u64 dma_addr = sg_dma_address(sg) + sg_offset; + u64 prev_addr = dma_addr; unsigned int dma_len = sg_dma_len(sg) - sg_offset; u64 end_dma_addr = dma_addr + dma_len; u64 page_addr = dma_addr & page_mask; @@ -1721,8 +1729,14 @@ int ib_sg_to_pages(struct ib_mr *mr, struct scatterlist *sgl, int sg_nents, do { ret = set_page(mr, page_addr); - if (unlikely(ret < 0)) - return i ? : ret; + if (unlikely(ret < 0)) { + sg_offset = prev_addr - dma_addr; + mr->length += sg_offset; + if (sg_offset_p) + *sg_offset_p = sg_offset; + return i || sg_offset ? i : ret; + } + prev_addr = page_addr; next_page: page_addr += mr->page_size; } while (page_addr < end_dma_addr); @@ -1734,6 +1748,8 @@ next_page: sg_offset = 0; } + if (sg_offset_p) + *sg_offset_p = 0; return i; } EXPORT_SYMBOL(ib_sg_to_pages); diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index 608aa0c..47cb927 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -784,7 +784,7 @@ static int iwch_set_page(struct ib_mr *ibmr, u64 addr) } static int iwch_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, - int sg_nents, unsigned sg_offset) + int sg_nents, unsigned int *sg_offset) { struct iwch_mr *mhp = to_iwch_mr(ibmr); diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h index 067cb3f..1ff3ba8 100644 --- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h +++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h @@ -918,7 +918,7 @@ struct ib_mr *c4iw_alloc_mr(struct ib_pd *pd, enum ib_mr_type mr_type, u32 max_num_sg); int c4iw_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents, - unsigned int sg_offset); + unsigned int *sg_offset); int c4iw_dealloc_mw(struct ib_mw *mw); struct ib_mw *c4iw_alloc_mw(struct ib_pd *pd, enum ib_mw_type type, struct ib_udata *udata); diff --git a/drivers/infiniband/hw/cxgb4/mem.c b/drivers/infiniband/hw/cxgb4/mem.c index 38afb3d..83960df 100644 --- a/drivers/infiniband/hw/cxgb4/mem.c +++ b/drivers/infiniband/hw/cxgb4/mem.c @@ -691,7 +691,7 @@ static int c4iw_set_page(struct ib_mr *ibmr, u64 addr) } int c4iw_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents, - unsigned int sg_offset) + unsigned int *sg_offset) { struct c4iw_mr *mhp = to_c4iw_mr(ibmr); diff --git a/drivers/infiniband/hw/i40iw/i40iw_verbs.c b/drivers/infiniband/hw/i40iw/i40iw_verbs.c index 141eaba..4a740f7 100644 --- a/drivers/infiniband/hw/i40iw/i40iw_verbs.c +++ b/drivers/infiniband/hw/i40iw/i40iw_verbs.c @@ -1574,7 +1574,7 @@ static int i40iw_set_page(struct ib_mr *ibmr, u64 addr) * @sg_nents: number of sg pages */ static int i40iw_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, - int sg_nents, unsigned int sg_offset) + int sg_nents, unsigned int *sg_offset) { struct i40iw_mr *iwmr = to_iwmr(ibmr); diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h index ba328177..6c5ac5d 100644 --- a/drivers/infiniband/hw/mlx4/mlx4_ib.h +++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h @@ -718,7 +718,7 @@ struct ib_mr *mlx4_ib_alloc_mr(struct ib_pd *pd, enum ib_mr_type mr_type, u32 max_num_sg); int mlx4_ib_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents, - unsigned int sg_offset); + unsigned int *sg_offset); int mlx4_ib_modify_cq(struct ib_cq *cq, u16 cq_count, u16 cq_period); int mlx4_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata); struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c index b04f623..6312721 100644 --- a/drivers/infiniband/hw/mlx4/mr.c +++ b/drivers/infiniband/hw/mlx4/mr.c @@ -529,7 +529,7 @@ static int mlx4_set_page(struct ib_mr *ibmr, u64 addr) } int mlx4_ib_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents, - unsigned int sg_offset) + unsigned int *sg_offset) { struct mlx4_ib_mr *mr = to_mmr(ibmr); int rc; diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index 8c835b2..f05cf57 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -713,7 +713,7 @@ struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd, enum ib_mr_type mr_type, u32 max_num_sg); int mlx5_ib_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents, - unsigned int sg_offset); + unsigned int *sg_offset); int mlx5_ib_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num, const struct ib_wc *in_wc, const struct ib_grh *in_grh, const struct ib_mad_hdr *in, size_t in_mad_size, diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c index b678eac..8cf2ce5 100644 --- a/drivers/infiniband/hw/mlx5/mr.c +++ b/drivers/infiniband/hw/mlx5/mr.c @@ -1752,10 +1752,11 @@ static int mlx5_ib_sg_to_klms(struct mlx5_ib_mr *mr, struct scatterlist *sgl, unsigned short sg_nents, - unsigned int sg_offset) + unsigned int *sg_offset_p) { struct scatterlist *sg = sgl; struct mlx5_klm *klms = mr->descs; + unsigned int sg_offset = sg_offset_p ? *sg_offset_p : 0; u32 lkey = mr->ibmr.pd->local_dma_lkey; int i; @@ -1774,6 +1775,9 @@ mlx5_ib_sg_to_klms(struct mlx5_ib_mr *mr, sg_offset = 0; } + if (sg_offset_p) + *sg_offset_p = sg_offset; + return i; } @@ -1792,7 +1796,7 @@ static int mlx5_set_page(struct ib_mr *ibmr, u64 addr) } int mlx5_ib_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents, - unsigned int sg_offset) + unsigned int *sg_offset) { struct mlx5_ib_mr *mr = to_mmr(ibmr); int n; diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c index 698aab6..4ebea4c 100644 --- a/drivers/infiniband/hw/nes/nes_verbs.c +++ b/drivers/infiniband/hw/nes/nes_verbs.c @@ -403,7 +403,7 @@ static int nes_set_page(struct ib_mr *ibmr, u64 addr) } static int nes_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, - int sg_nents, unsigned int sg_offset) + int sg_nents, unsigned int *sg_offset) { struct nes_mr *nesmr = to_nesmr(ibmr); diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c index 9ddd550..b1a3d91 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c +++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c @@ -3082,7 +3082,7 @@ static int ocrdma_set_page(struct ib_mr *ibmr, u64 addr) } int ocrdma_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents, - unsigned int sg_offset) + unsigned int *sg_offset) { struct ocrdma_mr *mr = get_ocrdma_mr(ibmr); diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h index b290e5d..704ef1e 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h +++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h @@ -123,6 +123,6 @@ struct ib_mr *ocrdma_alloc_mr(struct ib_pd *pd, enum ib_mr_type mr_type, u32 max_num_sg); int ocrdma_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents, - unsigned sg_offset); + unsigned int *sg_offset); #endif /* __OCRDMA_VERBS_H__ */ diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c index 44cc85f..90be568 100644 --- a/drivers/infiniband/ulp/iser/iser_memory.c +++ b/drivers/infiniband/ulp/iser/iser_memory.c @@ -236,7 +236,7 @@ int iser_fast_reg_fmr(struct iscsi_iser_task *iser_task, page_vec->npages = 0; page_vec->fake_mr.page_size = SIZE_4K; plen = ib_sg_to_pages(&page_vec->fake_mr, mem->sg, - mem->size, 0, iser_set_page); + mem->size, NULL, iser_set_page); if (unlikely(plen < mem->size)) { iser_err("page vec too short to hold this SG\n"); iser_data_buf_dump(mem, device->ib_device); @@ -446,7 +446,7 @@ static int iser_fast_reg_mr(struct iscsi_iser_task *iser_task, ib_update_fast_reg_key(mr, ib_inc_rkey(mr->rkey)); - n = ib_map_mr_sg(mr, mem->sg, mem->size, 0, SIZE_4K); + n = ib_map_mr_sg(mr, mem->sg, mem->size, NULL, SIZE_4K); if (unlikely(n != mem->size)) { iser_err("failed to map sg (%d/%d)\n", n, mem->size); diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index f4dc6f9..6440469 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -1329,7 +1329,7 @@ static int srp_map_finish_fr(struct srp_map_state *state, rkey = ib_inc_rkey(desc->mr->rkey); ib_update_fast_reg_key(desc->mr, rkey); - n = ib_map_mr_sg(desc->mr, state->sg, sg_nents, 0, dev->mr_page_size); + n = ib_map_mr_sg(desc->mr, state->sg, sg_nents, NULL, dev->mr_page_size); if (unlikely(n < 0)) { srp_fr_pool_put(ch->fr_pool, &desc, 1); pr_debug("%s: ib_map_mr_sg(%d) returned %d.\n", diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 544c55b..56bb0f3 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1848,7 +1848,7 @@ struct ib_device { int (*map_mr_sg)(struct ib_mr *mr, struct scatterlist *sg, int sg_nents, - unsigned sg_offset); + unsigned int *sg_offset); struct ib_mw * (*alloc_mw)(struct ib_pd *pd, enum ib_mw_type type, struct ib_udata *udata); @@ -3145,11 +3145,11 @@ struct net_device *ib_get_net_dev_by_params(struct ib_device *dev, u8 port, const struct sockaddr *addr); int ib_map_mr_sg(struct ib_mr *mr, struct scatterlist *sg, int sg_nents, - unsigned int sg_offset, unsigned int page_size); + unsigned int *sg_offset, unsigned int page_size); static inline int ib_map_mr_sg_zbva(struct ib_mr *mr, struct scatterlist *sg, int sg_nents, - unsigned int sg_offset, unsigned int page_size) + unsigned int *sg_offset, unsigned int page_size) { int n; @@ -3160,7 +3160,7 @@ ib_map_mr_sg_zbva(struct ib_mr *mr, struct scatterlist *sg, int sg_nents, } int ib_sg_to_pages(struct ib_mr *mr, struct scatterlist *sgl, int sg_nents, - unsigned int sg_offset, int (*set_page)(struct ib_mr *, u64)); + unsigned int *sg_offset, int (*set_page)(struct ib_mr *, u64)); void ib_drain_rq(struct ib_qp *qp); void ib_drain_sq(struct ib_qp *qp); diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c index 3274a4a..94c3fa9 100644 --- a/net/sunrpc/xprtrdma/frwr_ops.c +++ b/net/sunrpc/xprtrdma/frwr_ops.c @@ -421,7 +421,7 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg, return -ENOMEM; } - n = ib_map_mr_sg(mr, frmr->sg, frmr->sg_nents, 0, PAGE_SIZE); + n = ib_map_mr_sg(mr, frmr->sg, frmr->sg_nents, NULL, PAGE_SIZE); if (unlikely(n != frmr->sg_nents)) { pr_err("RPC: %s: failed to map mr %p (%u/%u)\n", __func__, frmr->fr_mr, n, frmr->sg_nents); diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c index 19a74e9..fbe7444 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c @@ -281,7 +281,7 @@ int rdma_read_chunk_frmr(struct svcxprt_rdma *xprt, } atomic_inc(&xprt->sc_dma_used); - n = ib_map_mr_sg(frmr->mr, frmr->sg, frmr->sg_nents, 0, PAGE_SIZE); + n = ib_map_mr_sg(frmr->mr, frmr->sg, frmr->sg_nents, NULL, PAGE_SIZE); if (unlikely(n != frmr->sg_nents)) { pr_err("svcrdma: failed to map mr %p (%d/%d elements)\n", frmr->mr, n, frmr->sg_nents);
The SRP initiator allows to set max_sectors to a value that exceeds the largest amount of data that can be mapped at once with an mlx4 HCA using fast registration and a page size of 4 KB. Hence modify ib_map_mr_sg() such that it can map partial sg-elements. If an sg-element has been mapped partially, let the caller know which fraction has been mapped by adjusting *sg_offset. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Laurence Oberman <loberman@redhat.com> --- drivers/infiniband/core/rw.c | 2 +- drivers/infiniband/core/verbs.c | 26 +++++++++++++++++++++----- drivers/infiniband/hw/cxgb3/iwch_provider.c | 2 +- drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 2 +- drivers/infiniband/hw/cxgb4/mem.c | 2 +- drivers/infiniband/hw/i40iw/i40iw_verbs.c | 2 +- drivers/infiniband/hw/mlx4/mlx4_ib.h | 2 +- drivers/infiniband/hw/mlx4/mr.c | 2 +- drivers/infiniband/hw/mlx5/mlx5_ib.h | 2 +- drivers/infiniband/hw/mlx5/mr.c | 8 ++++++-- drivers/infiniband/hw/nes/nes_verbs.c | 2 +- drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 2 +- drivers/infiniband/hw/ocrdma/ocrdma_verbs.h | 2 +- drivers/infiniband/ulp/iser/iser_memory.c | 4 ++-- drivers/infiniband/ulp/srp/ib_srp.c | 2 +- include/rdma/ib_verbs.h | 8 ++++---- net/sunrpc/xprtrdma/frwr_ops.c | 2 +- net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 2 +- 18 files changed, 47 insertions(+), 27 deletions(-)