diff mbox

[v1,00/24] New fast registration API

Message ID 56138854.4040209@dev.mellanox.co.il (mailing list archive)
State Superseded
Headers show

Commit Message

Sagi Grimberg Oct. 6, 2015, 8:37 a.m. UTC
On 10/2/2015 6:37 PM, Bart Van Assche wrote:
> On 10/01/2015 11:14 PM, Sagi Grimberg wrote:
>> Would you mind sending me your .config?
>
> Hello Sagi,

Hi Bart,

>
> I just sent this .config file to you off-list.

I see now the error you are referring to.

The issue is that the device requires the MR page array to have
an alignment (0x40 for mlx4 and 0x400 for mlx5). When I modified the
page array allocation to be non-coherent I didn't take care of
alignment.

Taking care of this alignment may result in a higher order allocation
as we'd need to add (alignment - 1) to the allocation size.

e.g. a 512 pages on mlx4 will become:
512 * 8 + 0x40 - 1 = 4159

I'm leaning towards this approach. Any preference?

I think this patch should take care of mlx4:
  }
--

Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Christoph Hellwig Oct. 7, 2015, 9:20 a.m. UTC | #1
On Tue, Oct 06, 2015 at 11:37:40AM +0300, Sagi Grimberg wrote:
> The issue is that the device requires the MR page array to have
> an alignment (0x40 for mlx4 and 0x400 for mlx5). When I modified the
> page array allocation to be non-coherent I didn't take care of
> alignment.

Just curious:  why did you switch away from the coheret dma allocations
anyway?  Seems like the page lists are mapped as long as they are
allocated so the coherent allocator would seem like a nice fit.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sagi Grimberg Oct. 7, 2015, 9:25 a.m. UTC | #2
On 10/7/2015 12:20 PM, Christoph Hellwig wrote:
> On Tue, Oct 06, 2015 at 11:37:40AM +0300, Sagi Grimberg wrote:
>> The issue is that the device requires the MR page array to have
>> an alignment (0x40 for mlx4 and 0x400 for mlx5). When I modified the
>> page array allocation to be non-coherent I didn't take care of
>> alignment.
>
> Just curious:  why did you switch away from the coheret dma allocations
> anyway?  Seems like the page lists are mapped as long as they are
> allocated so the coherent allocator would seem like a nice fit.
>

Bart suggested that having to sync once for the entire page list might
perform better than coherent memory. I'll settle either way since using
non-coherent memory might cause higher-order allocations due to
alignment, so it's not free-of-charge.

Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig Oct. 7, 2015, 9:36 a.m. UTC | #3
On Wed, Oct 07, 2015 at 12:25:25PM +0300, Sagi Grimberg wrote:
> Bart suggested that having to sync once for the entire page list might
> perform better than coherent memory. I'll settle either way since using
> non-coherent memory might cause higher-order allocations due to
> alignment, so it's not free-of-charge.

I don't really care either way, it just seemed like an odd change hiding
in here that I missed when reviewing earlier.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sagi Grimberg Oct. 7, 2015, 10 a.m. UTC | #4
> I don't really care either way, it just seemed like an odd change hiding
> in here that I missed when reviewing earlier.

OK, so I'm sticking with it until someone suggests otherwise.

Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche Oct. 7, 2015, 4:30 p.m. UTC | #5
On 10/07/2015 02:20 AM, Christoph Hellwig wrote:
> On Tue, Oct 06, 2015 at 11:37:40AM +0300, Sagi Grimberg wrote:
>> The issue is that the device requires the MR page array to have
>> an alignment (0x40 for mlx4 and 0x400 for mlx5). When I modified the
>> page array allocation to be non-coherent I didn't take care of
>> alignment.
>
> Just curious:  why did you switch away from the coheret dma allocations
> anyway?  Seems like the page lists are mapped as long as they are
> allocated so the coherent allocator would seem like a nice fit.

Hello Christoph,

My concern is that caching and/or write combining might be disabled for 
DMA coherent memory regions. This is why I assume that calling 
dma_map_single() and dma_unmap_single() will be faster for registering 
multiple pages as a single memory region instead of using DMA coherent 
memory.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h 
b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index de6eab3..4c69247 100644 

--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -129,6 +129,8 @@  struct mlx4_ib_cq {
         struct list_head                recv_qp_list;
  };

+#define MLX4_MR_PAGES_ALIGN 0x40
+
  struct mlx4_ib_mr {
         struct ib_mr            ibmr;
         __be64                  *pages;
@@ -137,6 +139,7 @@  struct mlx4_ib_mr {
         u32                     max_pages;
         struct mlx4_mr          mmr;
         struct ib_umem         *umem;
+       void                    *pages_alloc;
  };

  struct mlx4_ib_mw {
diff --git a/drivers/infiniband/hw/mlx4/mr.c 
b/drivers/infiniband/hw/mlx4/mr.c
index fa01f75..d3f8175 100644
--- a/drivers/infiniband/hw/mlx4/mr.c
+++ b/drivers/infiniband/hw/mlx4/mr.c
@@ -279,10 +279,14 @@  mlx4_alloc_priv_pages(struct ib_device *device,
         int size = max_pages * sizeof(u64);
         int ret;

-       mr->pages = kzalloc(size, GFP_KERNEL);
-       if (!mr->pages)
+       size += max_t(int, MLX4_MR_PAGES_ALIGN - ARCH_KMALLOC_MINALIGN, 0);
+
+       mr->pages_alloc = kzalloc(size, GFP_KERNEL);
+       if (!mr->pages_alloc)
                 return -ENOMEM;

+       mr->pages = PTR_ALIGN(mr->pages_alloc, MLX4_MR_PAGES_ALIGN);
+
         mr->page_map = dma_map_single(device->dma_device, mr->pages,
                                       size, DMA_TO_DEVICE);

@@ -293,20 +297,22 @@  mlx4_alloc_priv_pages(struct ib_device *device,

         return 0;
  err:
-       kfree(mr->pages);
+       kfree(mr->pages_alloc);
         return ret;
  }

  static void
  mlx4_free_priv_pages(struct mlx4_ib_mr *mr)
  {
-       struct ib_device *device = mr->ibmr.device;
-       int size = mr->max_pages * sizeof(u64);
-
         if (mr->pages) {
+               struct ib_device *device = mr->ibmr.device;
+               int size = mr->max_pages * sizeof(u64);
+
+               size += max_t(int, MLX4_MR_PAGES_ALIGN - 
ARCH_KMALLOC_MINALIGN, 0);
+
                 dma_unmap_single(device->dma_device, mr->page_map,
                                  size, DMA_TO_DEVICE);
-               kfree(mr->pages);
+               kfree(mr->pages_alloc);
                 mr->pages = NULL;
         }