diff mbox series

[v3] dma-contiguous: support numa CMA for specified node

Message ID 20230712074758.1133272-1-yajun.deng@linux.dev (mailing list archive)
State New
Headers show
Series [v3] dma-contiguous: support numa CMA for specified node | expand

Commit Message

Yajun Deng July 12, 2023, 7:47 a.m. UTC
The kernel parameter 'cma_pernuma=' only supports reserving the same
size of CMA area for each node. We need to reserve different sizes of
CMA area for specified nodes if these devices belong to different nodes.

Adding another kernel parameter 'numa_cma=' to reserve CMA area for
the specified node. If we want to use one of these parameters, we need to
enable DMA_NUMA_CMA.

At the same time, print the node id in cma_declare_contiguous_nid() if
CONFIG_NUMA is enabled.

Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
---
 V2 -> V3: Use nid but not nid_buf in cma_declare_contiguous_nid().
 V1 -> V2: Add 'numa_cma=' and keep 'cma_pernuma=' kernel parameter.
---
 .../admin-guide/kernel-parameters.txt         |  11 ++
 kernel/dma/Kconfig                            |   9 +-
 kernel/dma/contiguous.c                       | 101 ++++++++++++++----
 mm/cma.c                                      |  10 +-
 4 files changed, 102 insertions(+), 29 deletions(-)

Comments

Christoph Hellwig July 20, 2023, 6:56 a.m. UTC | #1
Thanks.

I'll pick this up for the dma-mapping tree with some minor style
fixups.
Christoph Hellwig July 20, 2023, 8:25 a.m. UTC | #2
It turns out this doesn't apply at all.  Can you resend it against
Linus' latest tree?
Yajun Deng July 20, 2023, 8:47 a.m. UTC | #3
July 20, 2023 at 4:25 PM, "Christoph Hellwig" <hch@lst.de> wrote:


> 
> It turns out this doesn't apply at all. Can you resend it against
> Linus' latest tree?
>

It's based on linux-next tree.

This patch should be after my other patch in linux-next tree.
a960925a6b23("dma-contiguous: support per-numa CMA for all architectures").
Christoph Hellwig July 20, 2023, 11:54 a.m. UTC | #4
On Thu, Jul 20, 2023 at 08:47:37AM +0000, Yajun Deng wrote:
> It's based on linux-next tree.
> 
> This patch should be after my other patch in linux-next tree.
> a960925a6b23("dma-contiguous: support per-numa CMA for all architectures").

Where did this land?  dma patches really should be going through
the DMA tree..
Petr Tesařík July 20, 2023, 12:27 p.m. UTC | #5
V Thu, 20 Jul 2023 13:54:08 +0200
Christoph Hellwig <hch@lst.de> napsáno:

> On Thu, Jul 20, 2023 at 08:47:37AM +0000, Yajun Deng wrote:
> > It's based on linux-next tree.
> > 
> > This patch should be after my other patch in linux-next tree.
> > a960925a6b23("dma-contiguous: support per-numa CMA for all architectures").  
> 
> Where did this land?

Well... in the linux-next tree:

https://www.kernel.org/doc/man-pages/linux-next.html

>  dma patches really should be going through the DMA tree..

Indeed. The other patch was also sent to the iommu ML back in May. It's
the thread where we were looking for Barry Song's current email address.

Petr T
Andrew Morton July 20, 2023, 4:59 p.m. UTC | #6
On Thu, 20 Jul 2023 13:54:08 +0200 Christoph Hellwig <hch@lst.de> wrote:

> On Thu, Jul 20, 2023 at 08:47:37AM +0000, Yajun Deng wrote:
> > It's based on linux-next tree.
> > 
> > This patch should be after my other patch in linux-next tree.
> > a960925a6b23("dma-contiguous: support per-numa CMA for all architectures").
> 
> Where did this land?  dma patches really should be going through
> the DMA tree..

It's in mm-unstable with a note "hch" :)

I'll drop it.
Yajun Deng July 21, 2023, 1:56 a.m. UTC | #7
July 20, 2023 at 8:27 PM, "Petr Tesařík" <petr@tesarici.cz> wrote:


> 
> V Thu, 20 Jul 2023 13:54:08 +0200
> Christoph Hellwig <hch@lst.de> napsáno:
> 
> > 
> > On Thu, Jul 20, 2023 at 08:47:37AM +0000, Yajun Deng wrote:
> >  It's based on linux-next tree.
> >  
> >  This patch should be after my other patch in linux-next tree.
> >  a960925a6b23("dma-contiguous: support per-numa CMA for all architectures"). 
> >  
> >  Where did this land?
> > 
> 
> Well... in the linux-next tree:
> 
> https://www.kernel.org/doc/man-pages/linux-next.html
> 
> > 
> > dma patches really should be going through the DMA tree..
> > 
> 
> Indeed. The other patch was also sent to the iommu ML back in May. It's
> the thread where we were looking for Barry Song's current email address.
>

Thanks. This link shows what happened at that time:
https://lore.kernel.org/all/20230512094210.141540-1-yajun.deng@linux.dev/T/#u
 
> Petr T
>
Christoph Hellwig July 21, 2023, 7:41 a.m. UTC | #8
On Thu, Jul 20, 2023 at 02:27:12PM +0200, Petr Tesařík wrote:
> V Thu, 20 Jul 2023 13:54:08 +0200
> Christoph Hellwig <hch@lst.de> napsáno:
> 
> > On Thu, Jul 20, 2023 at 08:47:37AM +0000, Yajun Deng wrote:
> > > It's based on linux-next tree.
> > > 
> > > This patch should be after my other patch in linux-next tree.
> > > a960925a6b23("dma-contiguous: support per-numa CMA for all architectures").  
> > 
> > Where did this land?
> 
> Well... in the linux-next tree:
> 
> https://www.kernel.org/doc/man-pages/linux-next.html

Well, linux-next just pulls in maintainer trees.  So nothing lands
directly in linux-next.  But it looks like Andrew already cleared up
the confusion.
Christoph Hellwig July 21, 2023, 7:42 a.m. UTC | #9
On Thu, Jul 20, 2023 at 09:59:41AM -0700, Andrew Morton wrote:
> > Where did this land?  dma patches really should be going through
> > the DMA tree..
> 
> It's in mm-unstable with a note "hch" :)
> 
> I'll drop it.

Ah.  And looking at the patch is isn't even pure dma but also has
a significant mm side.  If you're fine with it I'll pick it up in
the dma-mapping tree in addition to the follow up.
Andrew Morton July 24, 2023, 12:42 a.m. UTC | #10
On Fri, 21 Jul 2023 09:42:09 +0200 Christoph Hellwig <hch@lst.de> wrote:

> On Thu, Jul 20, 2023 at 09:59:41AM -0700, Andrew Morton wrote:
> > > Where did this land?  dma patches really should be going through
> > > the DMA tree..
> > 
> > It's in mm-unstable with a note "hch" :)
> > 
> > I'll drop it.
> 
> Ah.  And looking at the patch is isn't even pure dma but also has
> a significant mm side.  If you're fine with it I'll pick it up in
> the dma-mapping tree in addition to the follow up.

Please do.
Christoph Hellwig July 31, 2023, 4:02 p.m. UTC | #11
Thanks,

applied to the dma-mapping tree for 6.6.
diff mbox series

Patch

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index bdf0ab6716c8..87ad8154b730 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -706,6 +706,17 @@ 
 			which is located in node nid, if the allocation fails,
 			they will fallback to the global default memory area.
 
+	numa_cma=<node>:nn[MG][,<node>:nn[MG]]
+			[KNL,CMA]
+			Sets the size of kernel numa memory area for
+			contiguous memory allocations. It will reserve CMA
+			area for the specified node.
+
+			With numa CMA enabled, DMA users on node nid will
+			first try to allocate buffer from the numa area
+			which is located in node nid, if the allocation fails,
+			they will fallback to the global default memory area.
+
 	cmo_free_hint=	[PPC] Format: { yes | no }
 			Specify whether pages are marked as being inactive
 			when they are freed.  This is used in CMO environments
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 7afde9bc529f..562463fe30ea 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -145,15 +145,16 @@  config DMA_CMA
 
 if  DMA_CMA
 
-config DMA_PERNUMA_CMA
-	bool "Enable separate DMA Contiguous Memory Area for each NUMA Node"
+config DMA_NUMA_CMA
+	bool "Enable separate DMA Contiguous Memory Area for NUMA Node"
 	default NUMA
 	help
-	  Enable this option to get pernuma CMA areas so that NUMA devices
+	  Enable this option to get numa CMA areas so that NUMA devices
 	  can get local memory by DMA coherent APIs.
 
 	  You can set the size of pernuma CMA by specifying "cma_pernuma=size"
-	  on the kernel's command line.
+	  or set the node id and its size of CMA by specifying "numa_cma=
+	  <node>:size[,<node>:size]" on the kernel's command line.
 
 comment "Default contiguous memory area size:"
 
diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
index 26a8e5365fcd..f005c66f378c 100644
--- a/kernel/dma/contiguous.c
+++ b/kernel/dma/contiguous.c
@@ -50,6 +50,7 @@ 
 #include <linux/sizes.h>
 #include <linux/dma-map-ops.h>
 #include <linux/cma.h>
+#include <linux/nospec.h>
 
 #ifdef CONFIG_CMA_SIZE_MBYTES
 #define CMA_SIZE_MBYTES CONFIG_CMA_SIZE_MBYTES
@@ -96,11 +97,44 @@  static int __init early_cma(char *p)
 }
 early_param("cma", early_cma);
 
-#ifdef CONFIG_DMA_PERNUMA_CMA
+#ifdef CONFIG_DMA_NUMA_CMA
 
+static struct cma *dma_contiguous_numa_area[MAX_NUMNODES];
+static phys_addr_t numa_cma_size[MAX_NUMNODES] __initdata;
 static struct cma *dma_contiguous_pernuma_area[MAX_NUMNODES];
 static phys_addr_t pernuma_size_bytes __initdata;
 
+static int __init early_numa_cma(char *p)
+{
+	int nid, count = 0;
+	unsigned long tmp;
+	char *s = p;
+
+	while (*s) {
+		if (sscanf(s, "%lu%n", &tmp, &count) != 1)
+			break;
+
+		if (s[count] == ':') {
+			if (tmp >= MAX_NUMNODES)
+				break;
+			nid = array_index_nospec(tmp, MAX_NUMNODES);
+
+			s += count + 1;
+			tmp = memparse(s, &s);
+			numa_cma_size[nid] = tmp;
+
+			if (*s == ',')
+				s++;
+			else
+				break;
+		} else
+			break;
+	}
+
+	return 0;
+}
+early_param("numa_cma", early_numa_cma);
+
 static int __init early_cma_pernuma(char *p)
 {
 	pernuma_size_bytes = memparse(p, &p);
@@ -127,34 +161,47 @@  static inline __maybe_unused phys_addr_t cma_early_percent_memory(void)
 
 #endif
 
-#ifdef CONFIG_DMA_PERNUMA_CMA
-static void __init dma_pernuma_cma_reserve(void)
+#ifdef CONFIG_DMA_NUMA_CMA
+static void __init dma_numa_cma_reserve(void)
 {
 	int nid;
 
-	if (!pernuma_size_bytes)
-		return;
-
-	for_each_online_node(nid) {
+	for_each_node(nid) {
 		int ret;
 		char name[CMA_MAX_NAME];
-		struct cma **cma = &dma_contiguous_pernuma_area[nid];
-
-		snprintf(name, sizeof(name), "pernuma%d", nid);
-		ret = cma_declare_contiguous_nid(0, pernuma_size_bytes, 0, 0,
-						 0, false, name, cma, nid);
-		if (ret) {
-			pr_warn("%s: reservation failed: err %d, node %d", __func__,
-				ret, nid);
+		struct cma **cma;
+
+		if (!node_online(nid)) {
+			if (pernuma_size_bytes || numa_cma_size[nid])
+				pr_warn("invalid node %d specified\n", nid);
 			continue;
 		}
 
-		pr_debug("%s: reserved %llu MiB on node %d\n", __func__,
-			(unsigned long long)pernuma_size_bytes / SZ_1M, nid);
+		if (pernuma_size_bytes) {
+
+			cma = &dma_contiguous_pernuma_area[nid];
+			snprintf(name, sizeof(name), "pernuma%d", nid);
+			ret = cma_declare_contiguous_nid(0, pernuma_size_bytes, 0, 0,
+							 0, false, name, cma, nid);
+			if (ret)
+				pr_warn("%s: reservation failed: err %d, node %d", __func__,
+					ret, nid);
+		}
+
+		if (numa_cma_size[nid]) {
+
+			cma = &dma_contiguous_numa_area[nid];
+			snprintf(name, sizeof(name), "numa%d", nid);
+			ret = cma_declare_contiguous_nid(0, numa_cma_size[nid], 0, 0, 0, false,
+							 name, cma, nid);
+			if (ret)
+				pr_warn("%s: reservation failed: err %d, node %d", __func__,
+					ret, nid);
+		}
 	}
 }
 #else
-static inline void __init dma_pernuma_cma_reserve(void)
+static inline void __init dma_numa_cma_reserve(void)
 {
 }
 #endif
@@ -175,7 +222,7 @@  void __init dma_contiguous_reserve(phys_addr_t limit)
 	phys_addr_t selected_limit = limit;
 	bool fixed = false;
 
-	dma_pernuma_cma_reserve();
+	dma_numa_cma_reserve();
 
 	pr_debug("%s(limit %08lx)\n", __func__, (unsigned long)limit);
 
@@ -309,7 +356,7 @@  static struct page *cma_alloc_aligned(struct cma *cma, size_t size, gfp_t gfp)
  */
 struct page *dma_alloc_contiguous(struct device *dev, size_t size, gfp_t gfp)
 {
-#ifdef CONFIG_DMA_PERNUMA_CMA
+#ifdef CONFIG_DMA_NUMA_CMA
 	int nid = dev_to_node(dev);
 #endif
 
@@ -321,7 +368,7 @@  struct page *dma_alloc_contiguous(struct device *dev, size_t size, gfp_t gfp)
 	if (size <= PAGE_SIZE)
 		return NULL;
 
-#ifdef CONFIG_DMA_PERNUMA_CMA
+#ifdef CONFIG_DMA_NUMA_CMA
 	if (nid != NUMA_NO_NODE && !(gfp & (GFP_DMA | GFP_DMA32))) {
 		struct cma *cma = dma_contiguous_pernuma_area[nid];
 		struct page *page;
@@ -331,6 +378,13 @@  struct page *dma_alloc_contiguous(struct device *dev, size_t size, gfp_t gfp)
 			if (page)
 				return page;
 		}
+
+		cma = dma_contiguous_numa_area[nid];
+		if (cma) {
+			page = cma_alloc_aligned(cma, size, gfp);
+			if (page)
+				return page;
+		}
 	}
 #endif
 	if (!dma_contiguous_default_area)
@@ -362,10 +416,13 @@  void dma_free_contiguous(struct device *dev, struct page *page, size_t size)
 		/*
 		 * otherwise, page is from either per-numa cma or default cma
 		 */
-#ifdef CONFIG_DMA_PERNUMA_CMA
+#ifdef CONFIG_DMA_NUMA_CMA
 		if (cma_release(dma_contiguous_pernuma_area[page_to_nid(page)],
 					page, count))
 			return;
+		if (cma_release(dma_contiguous_numa_area[page_to_nid(page)],
+					page, count))
+			return;
 #endif
 		if (cma_release(dma_contiguous_default_area, page, count))
 			return;
diff --git a/mm/cma.c b/mm/cma.c
index 4880f72102fa..da2967c6a223 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -267,6 +267,9 @@  int __init cma_declare_contiguous_nid(phys_addr_t base,
 	if (alignment && !is_power_of_2(alignment))
 		return -EINVAL;
 
+	if (!IS_ENABLED(CONFIG_NUMA))
+		nid = NUMA_NO_NODE;
+
 	/* Sanitise input arguments. */
 	alignment = max_t(phys_addr_t, alignment, CMA_MIN_ALIGNMENT_BYTES);
 	if (fixed && base & (alignment - 1)) {
@@ -372,14 +375,15 @@  int __init cma_declare_contiguous_nid(phys_addr_t base,
 	if (ret)
 		goto free_mem;
 
-	pr_info("Reserved %ld MiB at %pa\n", (unsigned long)size / SZ_1M,
-		&base);
+	pr_info("Reserved %ld MiB at %pa on node %d\n", (unsigned long)size / SZ_1M,
+		&base, nid);
 	return 0;
 
 free_mem:
 	memblock_phys_free(base, size);
 err:
-	pr_err("Failed to reserve %ld MiB\n", (unsigned long)size / SZ_1M);
+	pr_err("Failed to reserve %ld MiB on node %d\n", (unsigned long)size / SZ_1M,
+	       nid);
 	return ret;
 }