diff mbox series

[net-next] net: mana: Add support for variable page sizes of ARM64

Message ID 1718054553-6588-1-git-send-email-haiyangz@microsoft.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net-next] net: mana: Add support for variable page sizes of ARM64 | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 864 this patch: 864
netdev/build_tools success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 3 maintainers not CCed: sharmaajay@microsoft.com schakrabarti@linux.microsoft.com kotaranov@microsoft.com
netdev/build_clang success Errors and warnings before: 868 this patch: 868
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 868 this patch: 868
netdev/checkpatch warning CHECK: Prefer using the BIT macro WARNING: line length of 81 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-06-11--09-00 (tests: 643)

Commit Message

Haiyang Zhang June 10, 2024, 9:22 p.m. UTC
As defined by the MANA Hardware spec, the queue size for DMA is 4KB
minimal, and power of 2.
To support variable page sizes (4KB, 16KB, 64KB) of ARM64, define
the minimal queue size as a macro separate from the PAGE_SIZE, which
we always assumed it to be 4KB before supporting ARM64.
Also, update the relevant code related to size alignment, DMA region
calculations, etc.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 drivers/net/ethernet/microsoft/Kconfig        |  2 +-
 .../net/ethernet/microsoft/mana/gdma_main.c   |  8 +++----
 .../net/ethernet/microsoft/mana/hw_channel.c  | 22 +++++++++----------
 drivers/net/ethernet/microsoft/mana/mana_en.c |  8 +++----
 .../net/ethernet/microsoft/mana/shm_channel.c |  9 ++++----
 include/net/mana/gdma.h                       |  7 +++++-
 include/net/mana/mana.h                       |  3 ++-
 7 files changed, 33 insertions(+), 26 deletions(-)

Comments

Michael Kelley June 11, 2024, 4:34 p.m. UTC | #1
From: LKML haiyangz <lkmlhyz@microsoft.com> On Behalf Of Haiyang Zhang Sent: Monday, June 10, 2024 2:23 PM
> 
> As defined by the MANA Hardware spec, the queue size for DMA is 4KB
> minimal, and power of 2.

You say the hardware requires 4K "minimal". But the definitions in this
patch hardcode to 4K, as if that's the only choice. Is the hardcoding to
4K a design decision made to simplify the MANA driver?

> To support variable page sizes (4KB, 16KB, 64KB) of ARM64, define

A minor nit, but "variable" page size doesn't seem like quite the right
description -- both here and in the Subject line.  On ARM64, the page size
is a choice among a few fixed options.  Perhaps call it support for "page sizes
other than 4K"?

> the minimal queue size as a macro separate from the PAGE_SIZE, which
> we always assumed it to be 4KB before supporting ARM64.
> Also, update the relevant code related to size alignment, DMA region
> calculations, etc.
> 
> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> ---
>  drivers/net/ethernet/microsoft/Kconfig        |  2 +-
>  .../net/ethernet/microsoft/mana/gdma_main.c   |  8 +++----
>  .../net/ethernet/microsoft/mana/hw_channel.c  | 22 +++++++++----------
>  drivers/net/ethernet/microsoft/mana/mana_en.c |  8 +++----
>  .../net/ethernet/microsoft/mana/shm_channel.c |  9 ++++----
>  include/net/mana/gdma.h                       |  7 +++++-
>  include/net/mana/mana.h                       |  3 ++-
>  7 files changed, 33 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/net/ethernet/microsoft/Kconfig
> b/drivers/net/ethernet/microsoft/Kconfig
> index 286f0d5697a1..901fbffbf718 100644
> --- a/drivers/net/ethernet/microsoft/Kconfig
> +++ b/drivers/net/ethernet/microsoft/Kconfig
> @@ -18,7 +18,7 @@ if NET_VENDOR_MICROSOFT
>  config MICROSOFT_MANA
>  	tristate "Microsoft Azure Network Adapter (MANA) support"
>  	depends on PCI_MSI
> -	depends on X86_64 || (ARM64 && !CPU_BIG_ENDIAN && ARM64_4K_PAGES)
> +	depends on X86_64 || (ARM64 && !CPU_BIG_ENDIAN)
>  	depends on PCI_HYPERV
>  	select AUXILIARY_BUS
>  	select PAGE_POOL
> diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> index 1332db9a08eb..c9df942d0d02 100644
> --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> @@ -182,7 +182,7 @@ int mana_gd_alloc_memory(struct gdma_context *gc,
> unsigned int length,
>  	dma_addr_t dma_handle;
>  	void *buf;
> 
> -	if (length < PAGE_SIZE || !is_power_of_2(length))
> +	if (length < MANA_MIN_QSIZE || !is_power_of_2(length))
>  		return -EINVAL;
> 
>  	gmi->dev = gc->dev;
> @@ -717,7 +717,7 @@ EXPORT_SYMBOL_NS(mana_gd_destroy_dma_region,
> NET_MANA);
>  static int mana_gd_create_dma_region(struct gdma_dev *gd,
>  				     struct gdma_mem_info *gmi)
>  {
> -	unsigned int num_page = gmi->length / PAGE_SIZE;
> +	unsigned int num_page = gmi->length / MANA_MIN_QSIZE;

This calculation seems a bit weird when using MANA_MIN_QSIZE. The
number of pages, and the construction of the page_addr_list array
a few lines later, seem unrelated to the concept of a minimum queue
size. Is the right concept really a "mapping chunk", and num_page
would conceptually be "num_chunks", or something like that?  Then
a queue must be at least one chunk in size, but that's derived from the
chunk size, and is not the core concept.

Another approach might be to just call it "MANA_PAGE_SIZE", like
has been done with HV_HYP_PAGE_SIZE.  HV_HYP_PAGE_SIZE exists to
handle exactly the same issue of the guest PAGE_SIZE potentially
being different from the fixed 4K size that must be used in host-guest
communication on Hyper-V.  Same thing here with MANA.

>  	struct gdma_create_dma_region_req *req = NULL;
>  	struct gdma_create_dma_region_resp resp = {};
>  	struct gdma_context *gc = gd->gdma_context;
> @@ -727,7 +727,7 @@ static int mana_gd_create_dma_region(struct gdma_dev *gd,
>  	int err;
>  	int i;
> 
> -	if (length < PAGE_SIZE || !is_power_of_2(length))
> +	if (length < MANA_MIN_QSIZE || !is_power_of_2(length))
>  		return -EINVAL;
> 
>  	if (offset_in_page(gmi->virt_addr) != 0)
> @@ -751,7 +751,7 @@ static int mana_gd_create_dma_region(struct gdma_dev *gd,
>  	req->page_addr_list_len = num_page;
> 
>  	for (i = 0; i < num_page; i++)
> -		req->page_addr_list[i] = gmi->dma_handle +  i * PAGE_SIZE;
> +		req->page_addr_list[i] = gmi->dma_handle +  i * MANA_MIN_QSIZE;
> 
>  	err = mana_gd_send_request(gc, req_msg_size, req, sizeof(resp), &resp);
>  	if (err)
> diff --git a/drivers/net/ethernet/microsoft/mana/hw_channel.c
> b/drivers/net/ethernet/microsoft/mana/hw_channel.c
> index bbc4f9e16c98..038dc31e09cd 100644
> --- a/drivers/net/ethernet/microsoft/mana/hw_channel.c
> +++ b/drivers/net/ethernet/microsoft/mana/hw_channel.c
> @@ -362,12 +362,12 @@ static int mana_hwc_create_cq(struct hw_channel_context
> *hwc, u16 q_depth,
>  	int err;
> 
>  	eq_size = roundup_pow_of_two(GDMA_EQE_SIZE * q_depth);
> -	if (eq_size < MINIMUM_SUPPORTED_PAGE_SIZE)
> -		eq_size = MINIMUM_SUPPORTED_PAGE_SIZE;
> +	if (eq_size < MANA_MIN_QSIZE)
> +		eq_size = MANA_MIN_QSIZE;
> 
>  	cq_size = roundup_pow_of_two(GDMA_CQE_SIZE * q_depth);
> -	if (cq_size < MINIMUM_SUPPORTED_PAGE_SIZE)
> -		cq_size = MINIMUM_SUPPORTED_PAGE_SIZE;
> +	if (cq_size < MANA_MIN_QSIZE)
> +		cq_size = MANA_MIN_QSIZE;
> 
>  	hwc_cq = kzalloc(sizeof(*hwc_cq), GFP_KERNEL);
>  	if (!hwc_cq)
> @@ -429,7 +429,7 @@ static int mana_hwc_alloc_dma_buf(struct
> hw_channel_context *hwc, u16 q_depth,
> 
>  	dma_buf->num_reqs = q_depth;
> 
> -	buf_size = PAGE_ALIGN(q_depth * max_msg_size);
> +	buf_size = MANA_MIN_QALIGN(q_depth * max_msg_size);
> 
>  	gmi = &dma_buf->mem_info;
>  	err = mana_gd_alloc_memory(gc, buf_size, gmi);
> @@ -497,8 +497,8 @@ static int mana_hwc_create_wq(struct hw_channel_context
> *hwc,
>  	else
>  		queue_size = roundup_pow_of_two(GDMA_MAX_SQE_SIZE *
> q_depth);
> 
> -	if (queue_size < MINIMUM_SUPPORTED_PAGE_SIZE)
> -		queue_size = MINIMUM_SUPPORTED_PAGE_SIZE;
> +	if (queue_size < MANA_MIN_QSIZE)
> +		queue_size = MANA_MIN_QSIZE;
> 
>  	hwc_wq = kzalloc(sizeof(*hwc_wq), GFP_KERNEL);
>  	if (!hwc_wq)
> @@ -628,10 +628,10 @@ static int mana_hwc_establish_channel(struct
> gdma_context *gc, u16 *q_depth,
>  	init_completion(&hwc->hwc_init_eqe_comp);
> 
>  	err = mana_smc_setup_hwc(&gc->shm_channel, false,
> -				 eq->mem_info.dma_handle,
> -				 cq->mem_info.dma_handle,
> -				 rq->mem_info.dma_handle,
> -				 sq->mem_info.dma_handle,
> +				 virt_to_phys(eq->mem_info.virt_addr),
> +				 virt_to_phys(cq->mem_info.virt_addr),
> +				 virt_to_phys(rq->mem_info.virt_addr),
> +				 virt_to_phys(sq->mem_info.virt_addr),

This change seems unrelated to handling guest PAGE_SIZE values
other than 4K.  Does it belong in a separate patch?  Or maybe it just
needs an explanation in the commit message of this patch?

>  				 eq->eq.msix_index);
>  	if (err)
>  		return err;
> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c
> b/drivers/net/ethernet/microsoft/mana/mana_en.c
> index d087cf954f75..6a891dbce686 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> @@ -1889,10 +1889,10 @@ static int mana_create_txq(struct mana_port_context
> *apc,
>  	 *  to prevent overflow.
>  	 */
>  	txq_size = MAX_SEND_BUFFERS_PER_QUEUE * 32;
> -	BUILD_BUG_ON(!PAGE_ALIGNED(txq_size));
> +	BUILD_BUG_ON(!MANA_MIN_QALIGNED(txq_size));
> 
>  	cq_size = MAX_SEND_BUFFERS_PER_QUEUE * COMP_ENTRY_SIZE;
> -	cq_size = PAGE_ALIGN(cq_size);
> +	cq_size = MANA_MIN_QALIGN(cq_size);
> 
>  	gc = gd->gdma_context;
> 
> @@ -2189,8 +2189,8 @@ static struct mana_rxq *mana_create_rxq(struct
> mana_port_context *apc,
>  	if (err)
>  		goto out;
> 
> -	rq_size = PAGE_ALIGN(rq_size);
> -	cq_size = PAGE_ALIGN(cq_size);
> +	rq_size = MANA_MIN_QALIGN(rq_size);
> +	cq_size = MANA_MIN_QALIGN(cq_size);
> 
>  	/* Create RQ */
>  	memset(&spec, 0, sizeof(spec));
> diff --git a/drivers/net/ethernet/microsoft/mana/shm_channel.c
> b/drivers/net/ethernet/microsoft/mana/shm_channel.c
> index 5553af9c8085..9a54a163d8d1 100644
> --- a/drivers/net/ethernet/microsoft/mana/shm_channel.c
> +++ b/drivers/net/ethernet/microsoft/mana/shm_channel.c
> @@ -6,6 +6,7 @@
>  #include <linux/io.h>
>  #include <linux/mm.h>
> 
> +#include <net/mana/gdma.h>
>  #include <net/mana/shm_channel.h>
> 
>  #define PAGE_FRAME_L48_WIDTH_BYTES 6
> @@ -183,7 +184,7 @@ int mana_smc_setup_hwc(struct shm_channel *sc, bool
> reset_vf, u64 eq_addr,
> 
>  	/* EQ addr: low 48 bits of frame address */
>  	shmem = (u64 *)ptr;
> -	frame_addr = PHYS_PFN(eq_addr);
> +	frame_addr = MANA_PFN(eq_addr);
>  	*shmem = frame_addr & PAGE_FRAME_L48_MASK;
>  	all_addr_h4bits |= (frame_addr >> PAGE_FRAME_L48_WIDTH_BITS) <<
>  		(frame_addr_seq++ * PAGE_FRAME_H4_WIDTH_BITS);

In mana_smc_setup_hwc() a few lines above this change, code using
PAGE_ALIGNED() is unchanged.  Is it correct that the eq/cq/rq/sq addresses
must be aligned to 64K if PAGE_SIZE is 64K?

Related, I wonder about how MANA_PFN() is defined. If PAGE_SIZE is 64K,
MANA_PFN() will first right-shift 16, then left shift 4. The net is right-shift 12,
corresponding to the 4K chunks that MANA expects. But that approach guarantees
that the rightmost 4 bits of the MANA PFN will always be zero. That's consistent
with requiring the addresses to be PAGE_ALIGNED() to 64K, but I'm unclear whether
that is really the requirement. You might compare with the definition of
HVPFN_DOWN(), which has a similar goal for Linux guests communicating with
Hyper-V.

> @@ -191,7 +192,7 @@ int mana_smc_setup_hwc(struct shm_channel *sc, bool
> reset_vf, u64 eq_addr,
> 
>  	/* CQ addr: low 48 bits of frame address */
>  	shmem = (u64 *)ptr;
> -	frame_addr = PHYS_PFN(cq_addr);
> +	frame_addr = MANA_PFN(cq_addr);
>  	*shmem = frame_addr & PAGE_FRAME_L48_MASK;
>  	all_addr_h4bits |= (frame_addr >> PAGE_FRAME_L48_WIDTH_BITS) <<
>  		(frame_addr_seq++ * PAGE_FRAME_H4_WIDTH_BITS);
> @@ -199,7 +200,7 @@ int mana_smc_setup_hwc(struct shm_channel *sc, bool
> reset_vf, u64 eq_addr,
> 
>  	/* RQ addr: low 48 bits of frame address */
>  	shmem = (u64 *)ptr;
> -	frame_addr = PHYS_PFN(rq_addr);
> +	frame_addr = MANA_PFN(rq_addr);
>  	*shmem = frame_addr & PAGE_FRAME_L48_MASK;
>  	all_addr_h4bits |= (frame_addr >> PAGE_FRAME_L48_WIDTH_BITS) <<
>  		(frame_addr_seq++ * PAGE_FRAME_H4_WIDTH_BITS);
> @@ -207,7 +208,7 @@ int mana_smc_setup_hwc(struct shm_channel *sc, bool
> reset_vf, u64 eq_addr,
> 
>  	/* SQ addr: low 48 bits of frame address */
>  	shmem = (u64 *)ptr;
> -	frame_addr = PHYS_PFN(sq_addr);
> +	frame_addr = MANA_PFN(sq_addr);
>  	*shmem = frame_addr & PAGE_FRAME_L48_MASK;
>  	all_addr_h4bits |= (frame_addr >> PAGE_FRAME_L48_WIDTH_BITS) <<
>  		(frame_addr_seq++ * PAGE_FRAME_H4_WIDTH_BITS);
> diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h
> index 27684135bb4d..b392559c33e9 100644
> --- a/include/net/mana/gdma.h
> +++ b/include/net/mana/gdma.h
> @@ -224,7 +224,12 @@ struct gdma_dev {
>  	struct auxiliary_device *adev;
>  };
> 
> -#define MINIMUM_SUPPORTED_PAGE_SIZE PAGE_SIZE
> +/* These are defined by HW */
> +#define MANA_MIN_QSHIFT 12
> +#define MANA_MIN_QSIZE (1 << MANA_MIN_QSHIFT)
> +#define MANA_MIN_QALIGN(x) ALIGN((x), MANA_MIN_QSIZE)
> +#define MANA_MIN_QALIGNED(addr) IS_ALIGNED((unsigned long)(addr), MANA_MIN_QSIZE)
> +#define MANA_PFN(a) (PHYS_PFN(a) << (PAGE_SHIFT - MANA_MIN_QSHIFT))

See comments above about how this is defined.

Michael

> 
>  #define GDMA_CQE_SIZE 64
>  #define GDMA_EQE_SIZE 16
> diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
> index 561f6719fb4e..43e8fc574354 100644
> --- a/include/net/mana/mana.h
> +++ b/include/net/mana/mana.h
> @@ -42,7 +42,8 @@ enum TRI_STATE {
> 
>  #define MAX_SEND_BUFFERS_PER_QUEUE 256
> 
> -#define EQ_SIZE (8 * PAGE_SIZE)
> +#define EQ_SIZE (8 * MANA_MIN_QSIZE)
> +
>  #define LOG2_EQ_THROTTLE 3
> 
>  #define MAX_PORTS_IN_MANA_DEV 256
> --
> 2.34.1
>
Haiyang Zhang June 11, 2024, 5:43 p.m. UTC | #2
(resending in plain text)

> -----Original Message-----
> From: Michael Kelley <mailto:mhklinux@outlook.com>
> Sent: Tuesday, June 11, 2024 12:35 PM
> To: Haiyang Zhang <mailto:haiyangz@microsoft.com>; mailto:linux-hyperv@vger.kernel.org;
> mailto:netdev@vger.kernel.org
> Cc: Dexuan Cui <mailto:decui@microsoft.com>; mailto:stephen@networkplumber.org; KY
> Srinivasan <mailto:kys@microsoft.com>; Paul Rosswurm <mailto:paulros@microsoft.com>;
> mailto:olaf@aepfle.de; vkuznets <mailto:vkuznets@redhat.com>; mailto:davem@davemloft.net;
> mailto:wei.liu@kernel.org; mailto:edumazet@google.com; mailto:kuba@kernel.org;
> mailto:pabeni@redhat.com; mailto:leon@kernel.org; Long Li <mailto:longli@microsoft.com>;
> mailto:ssengar@linux.microsoft.com; mailto:linux-rdma@vger.kernel.org;
> mailto:daniel@iogearbox.net; mailto:john.fastabend@gmail.com; mailto:bpf@vger.kernel.org;
> mailto:ast@kernel.org; mailto:hawk@kernel.org; mailto:tglx@linutronix.de;
> mailto:shradhagupta@linux.microsoft.com; mailto:linux-kernel@vger.kernel.org
> Subject: RE: [PATCH net-next] net: mana: Add support for variable page
> sizes of ARM64
> 
> From: LKML haiyangz <mailto:lkmlhyz@microsoft.com> On Behalf Of Haiyang Zhang
> Sent: Monday, June 10, 2024 2:23 PM
> >
> > As defined by the MANA Hardware spec, the queue size for DMA is 4KB
> > minimal, and power of 2.
> 
> You say the hardware requires 4K "minimal". But the definitions in this
> patch hardcode to 4K, as if that's the only choice. Is the hardcoding to
> 4K a design decision made to simplify the MANA driver?

The HWC q size has to be exactly 4k, which is by HW design. 
Other "regular" queues can be 2^n >= 4k.

> 
> > To support variable page sizes (4KB, 16KB, 64KB) of ARM64, define
> 
> A minor nit, but "variable" page size doesn't seem like quite the right
> description -- both here and in the Subject line.  On ARM64, the page
> size
> is a choice among a few fixed options.  Perhaps call it support for "page
> sizes
> other than 4K"?

"page sizes other than 4K" sounds good.

> 
> > the minimal queue size as a macro separate from the PAGE_SIZE, which
> > we always assumed it to be 4KB before supporting ARM64.
> > Also, update the relevant code related to size alignment, DMA region
> > calculations, etc.
> >
> > Signed-off-by: Haiyang Zhang <mailto:haiyangz@microsoft.com>
> > ---
> >  drivers/net/ethernet/microsoft/Kconfig        |  2 +-
> >  .../net/ethernet/microsoft/mana/gdma_main.c   |  8 +++----
> >  .../net/ethernet/microsoft/mana/hw_channel.c  | 22 +++++++++----------
> >  drivers/net/ethernet/microsoft/mana/mana_en.c |  8 +++----
> >  .../net/ethernet/microsoft/mana/shm_channel.c |  9 ++++----
> >  include/net/mana/gdma.h                       |  7 +++++-
> >  include/net/mana/mana.h                       |  3 ++-
> >  7 files changed, 33 insertions(+), 26 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/microsoft/Kconfig
> > b/drivers/net/ethernet/microsoft/Kconfig
> > index 286f0d5697a1..901fbffbf718 100644
> > --- a/drivers/net/ethernet/microsoft/Kconfig
> > +++ b/drivers/net/ethernet/microsoft/Kconfig
> > @@ -18,7 +18,7 @@ if NET_VENDOR_MICROSOFT
> >  config MICROSOFT_MANA
> >  tristate "Microsoft Azure Network Adapter (MANA) support"
> >  depends on PCI_MSI
> > - depends on X86_64 || (ARM64 && !CPU_BIG_ENDIAN && ARM64_4K_PAGES)
> > + depends on X86_64 || (ARM64 && !CPU_BIG_ENDIAN)
> >  depends on PCI_HYPERV
> >  select AUXILIARY_BUS
> >  select PAGE_POOL
> > diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > index 1332db9a08eb..c9df942d0d02 100644
> > --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > @@ -182,7 +182,7 @@ int mana_gd_alloc_memory(struct gdma_context *gc,
> > unsigned int length,
> >  dma_addr_t dma_handle;
> >  void *buf;
> >
> > - if (length < PAGE_SIZE || !is_power_of_2(length))
> > + if (length < MANA_MIN_QSIZE || !is_power_of_2(length))
> >         return -EINVAL;
> >
> >  gmi->dev = gc->dev;
> > @@ -717,7 +717,7 @@ EXPORT_SYMBOL_NS(mana_gd_destroy_dma_region,
> > NET_MANA);
> >  static int mana_gd_create_dma_region(struct gdma_dev *gd,
> >                          struct gdma_mem_info *gmi)
> >  {
> > - unsigned int num_page = gmi->length / PAGE_SIZE;
> > + unsigned int num_page = gmi->length / MANA_MIN_QSIZE;
> 
> This calculation seems a bit weird when using MANA_MIN_QSIZE. The
> number of pages, and the construction of the page_addr_list array
> a few lines later, seem unrelated to the concept of a minimum queue
> size. Is the right concept really a "mapping chunk", and num_page
> would conceptually be "num_chunks", or something like that?  Then
> a queue must be at least one chunk in size, but that's derived from the
> chunk size, and is not the core concept.

I think calling it "num_chunks" is fine. 
May I use "num_chunks" in next version?

>
> Another approach might be to just call it "MANA_PAGE_SIZE", like
> has been done with HV_HYP_PAGE_SIZE.  HV_HYP_PAGE_SIZE exists to
> handle exactly the same issue of the guest PAGE_SIZE potentially
> being different from the fixed 4K size that must be used in host-guest
> communication on Hyper-V.  Same thing here with MANA.

I actually called it "MANA_PAGE_SIZE" in my previous internal patch.
But Paul from Hostnet team opposed using that name, because
4kB is the min q size. MANA doesn't have "page" at HW level.


> >  struct gdma_create_dma_region_req *req = NULL;
> >  struct gdma_create_dma_region_resp resp = {};
> >  struct gdma_context *gc = gd->gdma_context;
> > @@ -727,7 +727,7 @@ static int mana_gd_create_dma_region(struct
> gdma_dev *gd,
> >  int err;
> >  int i;
> >
> > - if (length < PAGE_SIZE || !is_power_of_2(length))
> > + if (length < MANA_MIN_QSIZE || !is_power_of_2(length))
> >         return -EINVAL;
> >
> >  if (offset_in_page(gmi->virt_addr) != 0)
> > @@ -751,7 +751,7 @@ static int mana_gd_create_dma_region(struct
> gdma_dev *gd,
> >  req->page_addr_list_len = num_page;
> >
> >  for (i = 0; i < num_page; i++)
> > -       req->page_addr_list[i] = gmi->dma_handle +  i * PAGE_SIZE;
> > +       req->page_addr_list[i] = gmi->dma_handle +  i *
> MANA_MIN_QSIZE;
> >
> >  err = mana_gd_send_request(gc, req_msg_size, req, sizeof(resp),
> &resp);
> >  if (err)
> > diff --git a/drivers/net/ethernet/microsoft/mana/hw_channel.c
> > b/drivers/net/ethernet/microsoft/mana/hw_channel.c
> > index bbc4f9e16c98..038dc31e09cd 100644
> > --- a/drivers/net/ethernet/microsoft/mana/hw_channel.c
> > +++ b/drivers/net/ethernet/microsoft/mana/hw_channel.c
> > @@ -362,12 +362,12 @@ static int mana_hwc_create_cq(struct
> hw_channel_context
> > *hwc, u16 q_depth,
> >  int err;
> >
> >  eq_size = roundup_pow_of_two(GDMA_EQE_SIZE * q_depth);
> > - if (eq_size < MINIMUM_SUPPORTED_PAGE_SIZE)
> > -       eq_size = MINIMUM_SUPPORTED_PAGE_SIZE;
> > + if (eq_size < MANA_MIN_QSIZE)
> > +       eq_size = MANA_MIN_QSIZE;
> >
> >  cq_size = roundup_pow_of_two(GDMA_CQE_SIZE * q_depth);
> > - if (cq_size < MINIMUM_SUPPORTED_PAGE_SIZE)
> > -       cq_size = MINIMUM_SUPPORTED_PAGE_SIZE;
> > + if (cq_size < MANA_MIN_QSIZE)
> > +       cq_size = MANA_MIN_QSIZE;
> >
> >  hwc_cq = kzalloc(sizeof(*hwc_cq), GFP_KERNEL);
> >  if (!hwc_cq)
> > @@ -429,7 +429,7 @@ static int mana_hwc_alloc_dma_buf(struct
> > hw_channel_context *hwc, u16 q_depth,
> >
> >  dma_buf->num_reqs = q_depth;
> >
> > - buf_size = PAGE_ALIGN(q_depth * max_msg_size);
> > + buf_size = MANA_MIN_QALIGN(q_depth * max_msg_size);
> >
> >  gmi = &dma_buf->mem_info;
> >  err = mana_gd_alloc_memory(gc, buf_size, gmi);
> > @@ -497,8 +497,8 @@ static int mana_hwc_create_wq(struct
> hw_channel_context
> > *hwc,
> >  else
> >         queue_size = roundup_pow_of_two(GDMA_MAX_SQE_SIZE *
> > q_depth);
> >
> > - if (queue_size < MINIMUM_SUPPORTED_PAGE_SIZE)
> > -       queue_size = MINIMUM_SUPPORTED_PAGE_SIZE;
> > + if (queue_size < MANA_MIN_QSIZE)
> > +       queue_size = MANA_MIN_QSIZE;
> >
> >  hwc_wq = kzalloc(sizeof(*hwc_wq), GFP_KERNEL);
> >  if (!hwc_wq)
> > @@ -628,10 +628,10 @@ static int mana_hwc_establish_channel(struct
> > gdma_context *gc, u16 *q_depth,
> >  init_completion(&hwc->hwc_init_eqe_comp);
> >
> >  err = mana_smc_setup_hwc(&gc->shm_channel, false,
> > -                   eq->mem_info.dma_handle,
> > -                   cq->mem_info.dma_handle,
> > -                   rq->mem_info.dma_handle,
> > -                   sq->mem_info.dma_handle,
> > +                   virt_to_phys(eq->mem_info.virt_addr),
> > +                   virt_to_phys(cq->mem_info.virt_addr),
> > +                   virt_to_phys(rq->mem_info.virt_addr),
> > +                   virt_to_phys(sq->mem_info.virt_addr),
> 
> This change seems unrelated to handling guest PAGE_SIZE values
> other than 4K.  Does it belong in a separate patch?  Or maybe it just
> needs an explanation in the commit message of this patch?

I know dma_handle is usually just the phys adr. But this is not always 
True if IOMMU is used... 
I have no problem to put it to a separate patch if desired.

> 
> >                      eq->eq.msix_index);
> >  if (err)
> >         return err;
> > diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c
> > b/drivers/net/ethernet/microsoft/mana/mana_en.c
> > index d087cf954f75..6a891dbce686 100644
> > --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> > +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> > @@ -1889,10 +1889,10 @@ static int mana_create_txq(struct
> mana_port_context
> > *apc,
> >   *  to prevent overflow.
> >   */
> >  txq_size = MAX_SEND_BUFFERS_PER_QUEUE * 32;
> > - BUILD_BUG_ON(!PAGE_ALIGNED(txq_size));
> > + BUILD_BUG_ON(!MANA_MIN_QALIGNED(txq_size));
> >
> >  cq_size = MAX_SEND_BUFFERS_PER_QUEUE * COMP_ENTRY_SIZE;
> > - cq_size = PAGE_ALIGN(cq_size);
> > + cq_size = MANA_MIN_QALIGN(cq_size);
> >
> >  gc = gd->gdma_context;
> >
> > @@ -2189,8 +2189,8 @@ static struct mana_rxq *mana_create_rxq(struct
> > mana_port_context *apc,
> >  if (err)
> >         goto out;
> >
> > - rq_size = PAGE_ALIGN(rq_size);
> > - cq_size = PAGE_ALIGN(cq_size);
> > + rq_size = MANA_MIN_QALIGN(rq_size);
> > + cq_size = MANA_MIN_QALIGN(cq_size);
> >
> >  /* Create RQ */
> >  memset(&spec, 0, sizeof(spec));
> > diff --git a/drivers/net/ethernet/microsoft/mana/shm_channel.c
> > b/drivers/net/ethernet/microsoft/mana/shm_channel.c
> > index 5553af9c8085..9a54a163d8d1 100644
> > --- a/drivers/net/ethernet/microsoft/mana/shm_channel.c
> > +++ b/drivers/net/ethernet/microsoft/mana/shm_channel.c
> > @@ -6,6 +6,7 @@
> >  #include <linux/io.h>
> >  #include <linux/mm.h>
> >
> > +#include <net/mana/gdma.h>
> >  #include <net/mana/shm_channel.h>
> >
> >  #define PAGE_FRAME_L48_WIDTH_BYTES 6
> > @@ -183,7 +184,7 @@ int mana_smc_setup_hwc(struct shm_channel *sc, bool
> > reset_vf, u64 eq_addr,
> >
> >  /* EQ addr: low 48 bits of frame address */
> >  shmem = (u64 *)ptr;
> > - frame_addr = PHYS_PFN(eq_addr);
> > + frame_addr = MANA_PFN(eq_addr);
> >  *shmem = frame_addr & PAGE_FRAME_L48_MASK;
> >  all_addr_h4bits |= (frame_addr >> PAGE_FRAME_L48_WIDTH_BITS) <<
> >         (frame_addr_seq++ * PAGE_FRAME_H4_WIDTH_BITS);
> 
> In mana_smc_setup_hwc() a few lines above this change, code using
> PAGE_ALIGNED() is unchanged.  Is it correct that the eq/cq/rq/sq
> addresses
> must be aligned to 64K if PAGE_SIZE is 64K?

Since we still using PHYS_PFN on them, if not aligned to PAGE_SIZE, 
the lower bits may be lost. (You said the same below.)

> 
> Related, I wonder about how MANA_PFN() is defined. If PAGE_SIZE is 64K,
> MANA_PFN() will first right-shift 16, then left shift 4. The net is
> right-shift 12,
> corresponding to the 4K chunks that MANA expects. But that approach
> guarantees
> that the rightmost 4 bits of the MANA PFN will always be zero. That's
> consistent
> with requiring the addresses to be PAGE_ALIGNED() to 64K, but I'm unclear
> whether
> that is really the requirement. You might compare with the definition of
> HVPFN_DOWN(), which has a similar goal for Linux guests communicating
> with
> Hyper-V.

@Paul Rosswurm You said MANA HW has "no page concept". So the "frame_addr"
In the mana_smc_setup_hwc() is NOT related to physical page number, correct?
Can we just use phys_adr >> 12 like below?

#define MANA_MIN_QSHIFT 12
#define MANA_PFN(a) ((a) >> MANA_MIN_QSHIFT)

      /* EQ addr: low 48 bits of frame address */
     shmem = (u64 *)ptr;
-     frame_addr = PHYS_PFN(eq_addr);
+     frame_addr = MANA_PFN(eq_addr);

> 
> > @@ -191,7 +192,7 @@ int mana_smc_setup_hwc(struct shm_channel *sc, bool
> > reset_vf, u64 eq_addr,
> >
> >  /* CQ addr: low 48 bits of frame address */
> >  shmem = (u64 *)ptr;
> > - frame_addr = PHYS_PFN(cq_addr);
> > + frame_addr = MANA_PFN(cq_addr);
> >  *shmem = frame_addr & PAGE_FRAME_L48_MASK;
> >  all_addr_h4bits |= (frame_addr >> PAGE_FRAME_L48_WIDTH_BITS) <<
> >         (frame_addr_seq++ * PAGE_FRAME_H4_WIDTH_BITS);
> > @@ -199,7 +200,7 @@ int mana_smc_setup_hwc(struct shm_channel *sc, bool
> > reset_vf, u64 eq_addr,
> >
> >  /* RQ addr: low 48 bits of frame address */
> >  shmem = (u64 *)ptr;
> > - frame_addr = PHYS_PFN(rq_addr);
> > + frame_addr = MANA_PFN(rq_addr);
> >  *shmem = frame_addr & PAGE_FRAME_L48_MASK;
> >  all_addr_h4bits |= (frame_addr >> PAGE_FRAME_L48_WIDTH_BITS) <<
> >         (frame_addr_seq++ * PAGE_FRAME_H4_WIDTH_BITS);
> > @@ -207,7 +208,7 @@ int mana_smc_setup_hwc(struct shm_channel *sc, bool
> > reset_vf, u64 eq_addr,
> >
> >  /* SQ addr: low 48 bits of frame address */
> >  shmem = (u64 *)ptr;
> > - frame_addr = PHYS_PFN(sq_addr);
> > + frame_addr = MANA_PFN(sq_addr);
> >  *shmem = frame_addr & PAGE_FRAME_L48_MASK;
> >  all_addr_h4bits |= (frame_addr >> PAGE_FRAME_L48_WIDTH_BITS) <<
> >         (frame_addr_seq++ * PAGE_FRAME_H4_WIDTH_BITS);
> > diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h
> > index 27684135bb4d..b392559c33e9 100644
> > --- a/include/net/mana/gdma.h
> > +++ b/include/net/mana/gdma.h
> > @@ -224,7 +224,12 @@ struct gdma_dev {
> >  struct auxiliary_device *adev;
> >  };
> >
> > -#define MINIMUM_SUPPORTED_PAGE_SIZE PAGE_SIZE
> > +/* These are defined by HW */
> > +#define MANA_MIN_QSHIFT 12
> > +#define MANA_MIN_QSIZE (1 << MANA_MIN_QSHIFT)
> > +#define MANA_MIN_QALIGN(x) ALIGN((x), MANA_MIN_QSIZE)
> > +#define MANA_MIN_QALIGNED(addr) IS_ALIGNED((unsigned long)(addr),
> MANA_MIN_QSIZE)
> > +#define MANA_PFN(a) (PHYS_PFN(a) << (PAGE_SHIFT - MANA_MIN_QSHIFT))
> 
> See comments above about how this is defined.

Replied above.
Thank you for all the detailed comments!

- Haiyang
Haiyang Zhang June 11, 2024, 6:03 p.m. UTC | #3
> -----Original Message-----
> From: Haiyang Zhang <haiyangz@microsoft.com>
> Sent: Tuesday, June 11, 2024 1:44 PM
> To: Michael Kelley <mhklinux@outlook.com>; linux-hyperv@vger.kernel.org;
> netdev@vger.kernel.org; Paul Rosswurm <paulros@microsoft.com>
> Cc: Dexuan Cui <decui@microsoft.com>; stephen@networkplumber.org; KY
> Srinivasan <kys@microsoft.com>; olaf@aepfle.de; vkuznets
> <vkuznets@redhat.com>; davem@davemloft.net; wei.liu@kernel.org;
> edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; leon@kernel.org;
> Long Li <longli@microsoft.com>; ssengar@linux.microsoft.com; linux-
> rdma@vger.kernel.org; daniel@iogearbox.net; john.fastabend@gmail.com;
> bpf@vger.kernel.org; ast@kernel.org; hawk@kernel.org; tglx@linutronix.de;
> shradhagupta@linux.microsoft.com; linux-kernel@vger.kernel.org
> Subject: RE: [PATCH net-next] net: mana: Add support for variable page
> sizes of ARM64


> > > @@ -183,7 +184,7 @@ int mana_smc_setup_hwc(struct shm_channel *sc,
> bool
> > > reset_vf, u64 eq_addr,
> > >
> > >  /* EQ addr: low 48 bits of frame address */
> > >  shmem = (u64 *)ptr;
> > > - frame_addr = PHYS_PFN(eq_addr);
> > > + frame_addr = MANA_PFN(eq_addr);
> > >  *shmem = frame_addr & PAGE_FRAME_L48_MASK;
> > >  all_addr_h4bits |= (frame_addr >> PAGE_FRAME_L48_WIDTH_BITS) <<
> > >         (frame_addr_seq++ * PAGE_FRAME_H4_WIDTH_BITS);
> >
> > In mana_smc_setup_hwc() a few lines above this change, code using
> > PAGE_ALIGNED() is unchanged.  Is it correct that the eq/cq/rq/sq
> > addresses
> > must be aligned to 64K if PAGE_SIZE is 64K?
> 
> Since we still using PHYS_PFN on them, if not aligned to PAGE_SIZE,
> the lower bits may be lost. (You said the same below.)
> 
> >
> > Related, I wonder about how MANA_PFN() is defined. If PAGE_SIZE is 64K,
> > MANA_PFN() will first right-shift 16, then left shift 4. The net is
> > right-shift 12,
> > corresponding to the 4K chunks that MANA expects. But that approach
> > guarantees
> > that the rightmost 4 bits of the MANA PFN will always be zero. That's
> > consistent
> > with requiring the addresses to be PAGE_ALIGNED() to 64K, but I'm
> unclear
> > whether
> > that is really the requirement. You might compare with the definition
> of
> > HVPFN_DOWN(), which has a similar goal for Linux guests communicating
> > with
> > Hyper-V.
> 
> @Paul Rosswurm You said MANA HW has "no page concept". So the
> "frame_addr"
> In the mana_smc_setup_hwc() is NOT related to physical page number,
> correct?
> Can we just use phys_adr >> 12 like below?
> 
> #define MANA_MIN_QSHIFT 12
> #define MANA_PFN(a) ((a) >> MANA_MIN_QSHIFT)
> 
>       /* EQ addr: low 48 bits of frame address */
>      shmem = (u64 *)ptr;
> -     frame_addr = PHYS_PFN(eq_addr);
> +     frame_addr = MANA_PFN(eq_addr);
> 

I just confirmed with Paul, we can use phys_adr >> 12.
And I will change the alignment requirements to be 4k.

Thanks,
- Haiyang
Michael Kelley June 12, 2024, 2:42 a.m. UTC | #4
From: Haiyang Zhang <haiyangz@microsoft.com> Sent: Tuesday, June 11, 2024 10:44 AM
> 
> > -----Original Message-----
> > From: Michael Kelley <mailto:mhklinux@outlook.com>
> > Sent: Tuesday, June 11, 2024 12:35 PM
> > To: Haiyang Zhang <mailto:haiyangz@microsoft.com>; mailto:linux-hyperv@vger.kernel.org;
> > mailto:netdev@vger.kernel.org
> > Cc: Dexuan Cui <mailto:decui@microsoft.com>; mailto:stephen@networkplumber.org; KY
> > Srinivasan <mailto:kys@microsoft.com>; Paul Rosswurm <mailto:paulros@microsoft.com>;
> > mailto:olaf@aepfle.de; vkuznets <mailto:vkuznets@redhat.com>; mailto:davem@davemloft.net;
> > mailto:wei.liu@kernel.org; mailto:edumazet@google.com; mailto:kuba@kernel.org;
> > mailto:pabeni@redhat.com; mailto:leon@kernel.org; Long Li <mailto:longli@microsoft.com>;
> > mailto:ssengar@linux.microsoft.com; mailto:linux-rdma@vger.kernel.org;
> > mailto:daniel@iogearbox.net; mailto:john.fastabend@gmail.com; mailto:bpf@vger.kernel.org;
> > mailto:ast@kernel.org; mailto:hawk@kernel.org; mailto:tglx@linutronix.de;
> > mailto:shradhagupta@linux.microsoft.com; mailto:linux-kernel@vger.kernel.org
> > Subject: RE: [PATCH net-next] net: mana: Add support for variable page
> > sizes of ARM64
> >
> > From: LKML haiyangz <mailto:lkmlhyz@microsoft.com> On Behalf Of Haiyang Zhang
> > Sent: Monday, June 10, 2024 2:23 PM
> > >
> > > As defined by the MANA Hardware spec, the queue size for DMA is 4KB
> > > minimal, and power of 2.
> >
> > You say the hardware requires 4K "minimal". But the definitions in this
> > patch hardcode to 4K, as if that's the only choice. Is the hardcoding to
> > 4K a design decision made to simplify the MANA driver?
> 
> The HWC q size has to be exactly 4k, which is by HW design.
> Other "regular" queues can be 2^n >= 4k.
> 
> >
> > > To support variable page sizes (4KB, 16KB, 64KB) of ARM64, define
> >
> > A minor nit, but "variable" page size doesn't seem like quite the right
> > description -- both here and in the Subject line.  On ARM64, the page size
> > is a choice among a few fixed options.  Perhaps call it support for "page sizes
> > other than 4K"?
> 
> "page sizes other than 4K" sounds good.
> 
> >
> > > the minimal queue size as a macro separate from the PAGE_SIZE, which
> > > we always assumed it to be 4KB before supporting ARM64.
> > > Also, update the relevant code related to size alignment, DMA region
> > > calculations, etc.
> > >
> > > Signed-off-by: Haiyang Zhang <mailto:haiyangz@microsoft.com>
> > > ---
> > >  drivers/net/ethernet/microsoft/Kconfig        |  2 +-
> > >  .../net/ethernet/microsoft/mana/gdma_main.c   |  8 +++----
> > >  .../net/ethernet/microsoft/mana/hw_channel.c  | 22 +++++++++----------
> > >  drivers/net/ethernet/microsoft/mana/mana_en.c |  8 +++----
> > >  .../net/ethernet/microsoft/mana/shm_channel.c |  9 ++++----
> > >  include/net/mana/gdma.h                       |  7 +++++-
> > >  include/net/mana/mana.h                       |  3 ++-
> > >  7 files changed, 33 insertions(+), 26 deletions(-)
> > >
> > > diff --git a/drivers/net/ethernet/microsoft/Kconfig
> > > b/drivers/net/ethernet/microsoft/Kconfig
> > > index 286f0d5697a1..901fbffbf718 100644
> > > --- a/drivers/net/ethernet/microsoft/Kconfig
> > > +++ b/drivers/net/ethernet/microsoft/Kconfig
> > > @@ -18,7 +18,7 @@ if NET_VENDOR_MICROSOFT
> > >  config MICROSOFT_MANA
> > >  tristate "Microsoft Azure Network Adapter (MANA) support"
> > >  depends on PCI_MSI
> > > - depends on X86_64 || (ARM64 && !CPU_BIG_ENDIAN && ARM64_4K_PAGES)
> > > + depends on X86_64 || (ARM64 && !CPU_BIG_ENDIAN)
> > >  depends on PCI_HYPERV
> > >  select AUXILIARY_BUS
> > >  select PAGE_POOL
> > > diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > > b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > > index 1332db9a08eb..c9df942d0d02 100644
> > > --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > > +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > > @@ -182,7 +182,7 @@ int mana_gd_alloc_memory(struct gdma_context *gc,
> > > unsigned int length,
> > >  dma_addr_t dma_handle;
> > >  void *buf;
> > >
> > > - if (length < PAGE_SIZE || !is_power_of_2(length))
> > > + if (length < MANA_MIN_QSIZE || !is_power_of_2(length))
> > >         return -EINVAL;
> > >
> > >  gmi->dev = gc->dev;
> > > @@ -717,7 +717,7 @@ EXPORT_SYMBOL_NS(mana_gd_destroy_dma_region, NET_MANA);
> > >  static int mana_gd_create_dma_region(struct gdma_dev *gd,
> > >                          struct gdma_mem_info *gmi)
> > >  {
> > > - unsigned int num_page = gmi->length / PAGE_SIZE;
> > > + unsigned int num_page = gmi->length / MANA_MIN_QSIZE;
> >
> > This calculation seems a bit weird when using MANA_MIN_QSIZE. The
> > number of pages, and the construction of the page_addr_list array
> > a few lines later, seem unrelated to the concept of a minimum queue
> > size. Is the right concept really a "mapping chunk", and num_page
> > would conceptually be "num_chunks", or something like that?  Then
> > a queue must be at least one chunk in size, but that's derived from the
> > chunk size, and is not the core concept.
> 
> I think calling it "num_chunks" is fine.
> May I use "num_chunks" in next version?
> 

I think first is the decision on what to use for MANA_MIN_QSIZE. I'm
admittedly not familiar with mana and gdma, but the function
mana_gd_create_dma_region() seems fairly typical in defining a
logical region that spans multiple 4K chunks that may or may not
be physically contiguous.  So you set up an array of physical
addresses that identify the physical memory location of each chunk.
It seems very similar to a Hyper-V GPADL. I typically *do* see such
chunks called "pages", but a "mapping chunk" or "mapping unit"
is probably OK too.  So mana_gd_create_dma_region() would use
MANA_CHUNK_SIZE instead of MANA_MIN_QSIZE.  Then you could
also define MANA_MIN_QSIZE to be MANA_CHUNK_SIZE, for use
specifically when checking the size of a queue.

Then if you are using MANA_CHUNK_SIZE, the local variable
would be "num_chunks".  The use of "page_count", "page_addr_list",
and "offset_in_page" field names in struct
gdma_create_dma_region_req should then be changed as well.

Looking further at the function mana_gd_create_dma_region(),
there's also the use of offset_in_page(), which is based on the
guest PAGE_SIZE.  Wouldn't this be problematic if PAGE_SIZE
is 64K?

But perhaps Paul would weigh in further on his thoughts.

> >
> > Another approach might be to just call it "MANA_PAGE_SIZE", like
> > has been done with HV_HYP_PAGE_SIZE.  HV_HYP_PAGE_SIZE exists to
> > handle exactly the same issue of the guest PAGE_SIZE potentially
> > being different from the fixed 4K size that must be used in host-guest
> > communication on Hyper-V.  Same thing here with MANA.
> 
> I actually called it "MANA_PAGE_SIZE" in my previous internal patch.
> But Paul from Hostnet team opposed using that name, because
> 4kB is the min q size. MANA doesn't have "page" at HW level.
> 
> 
> > >  struct gdma_create_dma_region_req *req = NULL;
> > >  struct gdma_create_dma_region_resp resp = {};
> > >  struct gdma_context *gc = gd->gdma_context;
> > > @@ -727,7 +727,7 @@ static int mana_gd_create_dma_region(struct gdma_dev *gd,
> > >  int err;
> > >  int i;
> > >
> > > - if (length < PAGE_SIZE || !is_power_of_2(length))
> > > + if (length < MANA_MIN_QSIZE || !is_power_of_2(length))
> > >         return -EINVAL;
> > >
> > >  if (offset_in_page(gmi->virt_addr) != 0)
> > > @@ -751,7 +751,7 @@ static int mana_gd_create_dma_region(struct gdma_dev *gd,
> > >  req->page_addr_list_len = num_page;
> > >
> > >  for (i = 0; i < num_page; i++)
> > > -       req->page_addr_list[i] = gmi->dma_handle +  i * PAGE_SIZE;
> > > +       req->page_addr_list[i] = gmi->dma_handle +  i * MANA_MIN_QSIZE;
> > >
> > >  err = mana_gd_send_request(gc, req_msg_size, req, sizeof(resp), &resp);
> > >  if (err)
> > > diff --git a/drivers/net/ethernet/microsoft/mana/hw_channel.c
> > > b/drivers/net/ethernet/microsoft/mana/hw_channel.c
> > > index bbc4f9e16c98..038dc31e09cd 100644
> > > --- a/drivers/net/ethernet/microsoft/mana/hw_channel.c
> > > +++ b/drivers/net/ethernet/microsoft/mana/hw_channel.c
> > > @@ -362,12 +362,12 @@ static int mana_hwc_create_cq(struct hw_channel_context
> > > *hwc, u16 q_depth,
> > >  int err;
> > >
> > >  eq_size = roundup_pow_of_two(GDMA_EQE_SIZE * q_depth);
> > > - if (eq_size < MINIMUM_SUPPORTED_PAGE_SIZE)
> > > -       eq_size = MINIMUM_SUPPORTED_PAGE_SIZE;
> > > + if (eq_size < MANA_MIN_QSIZE)
> > > +       eq_size = MANA_MIN_QSIZE;
> > >
> > >  cq_size = roundup_pow_of_two(GDMA_CQE_SIZE * q_depth);
> > > - if (cq_size < MINIMUM_SUPPORTED_PAGE_SIZE)
> > > -       cq_size = MINIMUM_SUPPORTED_PAGE_SIZE;
> > > + if (cq_size < MANA_MIN_QSIZE)
> > > +       cq_size = MANA_MIN_QSIZE;
> > >
> > >  hwc_cq = kzalloc(sizeof(*hwc_cq), GFP_KERNEL);
> > >  if (!hwc_cq)
> > > @@ -429,7 +429,7 @@ static int mana_hwc_alloc_dma_buf(struct hw_channel_context *hwc, u16 q_depth,
> > >
> > >  dma_buf->num_reqs = q_depth;
> > >
> > > - buf_size = PAGE_ALIGN(q_depth * max_msg_size);
> > > + buf_size = MANA_MIN_QALIGN(q_depth * max_msg_size);
> > >
> > >  gmi = &dma_buf->mem_info;
> > >  err = mana_gd_alloc_memory(gc, buf_size, gmi);
> > > @@ -497,8 +497,8 @@ static int mana_hwc_create_wq(struct hw_channel_context
> > > *hwc,
> > >  else
> > >         queue_size = roundup_pow_of_two(GDMA_MAX_SQE_SIZE * q_depth);
> > >
> > > - if (queue_size < MINIMUM_SUPPORTED_PAGE_SIZE)
> > > -       queue_size = MINIMUM_SUPPORTED_PAGE_SIZE;
> > > + if (queue_size < MANA_MIN_QSIZE)
> > > +       queue_size = MANA_MIN_QSIZE;
> > >
> > >  hwc_wq = kzalloc(sizeof(*hwc_wq), GFP_KERNEL);
> > >  if (!hwc_wq)
> > > @@ -628,10 +628,10 @@ static int mana_hwc_establish_channel(struct
> > > gdma_context *gc, u16 *q_depth,
> > >  init_completion(&hwc->hwc_init_eqe_comp);
> > >
> > >  err = mana_smc_setup_hwc(&gc->shm_channel, false,
> > > -                   eq->mem_info.dma_handle,
> > > -                   cq->mem_info.dma_handle,
> > > -                   rq->mem_info.dma_handle,
> > > -                   sq->mem_info.dma_handle,
> > > +                   virt_to_phys(eq->mem_info.virt_addr),
> > > +                   virt_to_phys(cq->mem_info.virt_addr),
> > > +                   virt_to_phys(rq->mem_info.virt_addr),
> > > +                   virt_to_phys(sq->mem_info.virt_addr),
> >
> > This change seems unrelated to handling guest PAGE_SIZE values
> > other than 4K.  Does it belong in a separate patch?  Or maybe it just
> > needs an explanation in the commit message of this patch?
> 
> I know dma_handle is usually just the phys adr. But this is not always
> True if IOMMU is used...
> I have no problem to put it to a separate patch if desired.

Yes, that would seem like a separate patch.  It's not related to handling
page size other than 4K.

> 
> >
> > >                      eq->eq.msix_index);
> > >  if (err)
> > >         return err;
> > > diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c
> > > b/drivers/net/ethernet/microsoft/mana/mana_en.c
> > > index d087cf954f75..6a891dbce686 100644
> > > --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> > > +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> > > @@ -1889,10 +1889,10 @@ static int mana_create_txq(struct mana_port_context
> > > *apc,
> > >   *  to prevent overflow.
> > >   */
> > >  txq_size = MAX_SEND_BUFFERS_PER_QUEUE * 32;
> > > - BUILD_BUG_ON(!PAGE_ALIGNED(txq_size));
> > > + BUILD_BUG_ON(!MANA_MIN_QALIGNED(txq_size));
> > >
> > >  cq_size = MAX_SEND_BUFFERS_PER_QUEUE * COMP_ENTRY_SIZE;
> > > - cq_size = PAGE_ALIGN(cq_size);
> > > + cq_size = MANA_MIN_QALIGN(cq_size);
> > >
> > >  gc = gd->gdma_context;
> > >
> > > @@ -2189,8 +2189,8 @@ static struct mana_rxq *mana_create_rxq(struct mana_port_context *apc,
> > >  if (err)
> > >         goto out;
> > >
> > > - rq_size = PAGE_ALIGN(rq_size);
> > > - cq_size = PAGE_ALIGN(cq_size);
> > > + rq_size = MANA_MIN_QALIGN(rq_size);
> > > + cq_size = MANA_MIN_QALIGN(cq_size);
> > >
> > >  /* Create RQ */
> > >  memset(&spec, 0, sizeof(spec));
> > > diff --git a/drivers/net/ethernet/microsoft/mana/shm_channel.c
> > > b/drivers/net/ethernet/microsoft/mana/shm_channel.c
> > > index 5553af9c8085..9a54a163d8d1 100644
> > > --- a/drivers/net/ethernet/microsoft/mana/shm_channel.c
> > > +++ b/drivers/net/ethernet/microsoft/mana/shm_channel.c
> > > @@ -6,6 +6,7 @@
> > >  #include <linux/io.h>
> > >  #include <linux/mm.h>
> > >
> > > +#include <net/mana/gdma.h>
> > >  #include <net/mana/shm_channel.h>
> > >
> > >  #define PAGE_FRAME_L48_WIDTH_BYTES 6
> > > @@ -183,7 +184,7 @@ int mana_smc_setup_hwc(struct shm_channel *sc, bool reset_vf, u64 eq_addr,
> > >
> > >  /* EQ addr: low 48 bits of frame address */
> > >  shmem = (u64 *)ptr;
> > > - frame_addr = PHYS_PFN(eq_addr);
> > > + frame_addr = MANA_PFN(eq_addr);
> > >  *shmem = frame_addr & PAGE_FRAME_L48_MASK;
> > >  all_addr_h4bits |= (frame_addr >> PAGE_FRAME_L48_WIDTH_BITS) <<
> > >         (frame_addr_seq++ * PAGE_FRAME_H4_WIDTH_BITS);
> >
> > In mana_smc_setup_hwc() a few lines above this change, code using
> > PAGE_ALIGNED() is unchanged.  Is it correct that the eq/cq/rq/sq
> > addresses
> > must be aligned to 64K if PAGE_SIZE is 64K?
> 
> Since we still using PHYS_PFN on them, if not aligned to PAGE_SIZE,
> the lower bits may be lost. (You said the same below.)
> 
> >
> > Related, I wonder about how MANA_PFN() is defined. If PAGE_SIZE is 64K,
> > MANA_PFN() will first right-shift 16, then left shift 4. The net is
> > right-shift 12,
> > corresponding to the 4K chunks that MANA expects. But that approach
> > guarantees
> > that the rightmost 4 bits of the MANA PFN will always be zero. That's
> > consistent
> > with requiring the addresses to be PAGE_ALIGNED() to 64K, but I'm unclear
> > whether
> > that is really the requirement. You might compare with the definition of
> > HVPFN_DOWN(), which has a similar goal for Linux guests communicating
> > with
> > Hyper-V.
> 
> @Paul Rosswurm You said MANA HW has "no page concept". So the "frame_addr"
> In the mana_smc_setup_hwc() is NOT related to physical page number, correct?
> Can we just use phys_adr >> 12 like below?
> 
> #define MANA_MIN_QSHIFT 12
> #define MANA_PFN(a) ((a) >> MANA_MIN_QSHIFT)
> 
>       /* EQ addr: low 48 bits of frame address */
>      shmem = (u64 *)ptr;
> -     frame_addr = PHYS_PFN(eq_addr);
> +     frame_addr = MANA_PFN(eq_addr);
> 
> >
> > > @@ -191,7 +192,7 @@ int mana_smc_setup_hwc(struct shm_channel *sc, bool
> > > reset_vf, u64 eq_addr,
> > >
> > >  /* CQ addr: low 48 bits of frame address */
> > >  shmem = (u64 *)ptr;
> > > - frame_addr = PHYS_PFN(cq_addr);
> > > + frame_addr = MANA_PFN(cq_addr);
> > >  *shmem = frame_addr & PAGE_FRAME_L48_MASK;
> > >  all_addr_h4bits |= (frame_addr >> PAGE_FRAME_L48_WIDTH_BITS) <<
> > >         (frame_addr_seq++ * PAGE_FRAME_H4_WIDTH_BITS);
> > > @@ -199,7 +200,7 @@ int mana_smc_setup_hwc(struct shm_channel *sc, bool
> > > reset_vf, u64 eq_addr,
> > >
> > >  /* RQ addr: low 48 bits of frame address */
> > >  shmem = (u64 *)ptr;
> > > - frame_addr = PHYS_PFN(rq_addr);
> > > + frame_addr = MANA_PFN(rq_addr);
> > >  *shmem = frame_addr & PAGE_FRAME_L48_MASK;
> > >  all_addr_h4bits |= (frame_addr >> PAGE_FRAME_L48_WIDTH_BITS) <<
> > >         (frame_addr_seq++ * PAGE_FRAME_H4_WIDTH_BITS);
> > > @@ -207,7 +208,7 @@ int mana_smc_setup_hwc(struct shm_channel *sc, bool
> > > reset_vf, u64 eq_addr,
> > >
> > >  /* SQ addr: low 48 bits of frame address */
> > >  shmem = (u64 *)ptr;
> > > - frame_addr = PHYS_PFN(sq_addr);
> > > + frame_addr = MANA_PFN(sq_addr);
> > >  *shmem = frame_addr & PAGE_FRAME_L48_MASK;
> > >  all_addr_h4bits |= (frame_addr >> PAGE_FRAME_L48_WIDTH_BITS) <<
> > >         (frame_addr_seq++ * PAGE_FRAME_H4_WIDTH_BITS);
> > > diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h
> > > index 27684135bb4d..b392559c33e9 100644
> > > --- a/include/net/mana/gdma.h
> > > +++ b/include/net/mana/gdma.h
> > > @@ -224,7 +224,12 @@ struct gdma_dev {
> > >  struct auxiliary_device *adev;
> > >  };
> > >
> > > -#define MINIMUM_SUPPORTED_PAGE_SIZE PAGE_SIZE
> > > +/* These are defined by HW */
> > > +#define MANA_MIN_QSHIFT 12
> > > +#define MANA_MIN_QSIZE (1 << MANA_MIN_QSHIFT)
> > > +#define MANA_MIN_QALIGN(x) ALIGN((x), MANA_MIN_QSIZE)
> > > +#define MANA_MIN_QALIGNED(addr) IS_ALIGNED((unsigned long)(addr),
> > MANA_MIN_QSIZE)
> > > +#define MANA_PFN(a) (PHYS_PFN(a) << (PAGE_SHIFT - MANA_MIN_QSHIFT))
> >
> > See comments above about how this is defined.
> 
> Replied above.
> Thank you for all the detailed comments!
> 
> - Haiyang
Haiyang Zhang June 12, 2024, 2:21 p.m. UTC | #5
> -----Original Message-----
> From: Michael Kelley <mhklinux@outlook.com>
> Sent: Tuesday, June 11, 2024 10:42 PM
> To: Haiyang Zhang <haiyangz@microsoft.com>; linux-hyperv@vger.kernel.org;
> netdev@vger.kernel.org; Paul Rosswurm <paulros@microsoft.com>
> Cc: Dexuan Cui <decui@microsoft.com>; stephen@networkplumber.org; KY
> Srinivasan <kys@microsoft.com>; olaf@aepfle.de; vkuznets
> <vkuznets@redhat.com>; davem@davemloft.net; wei.liu@kernel.org;
> edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; leon@kernel.org;
> Long Li <longli@microsoft.com>; ssengar@linux.microsoft.com; linux-
> rdma@vger.kernel.org; daniel@iogearbox.net; john.fastabend@gmail.com;
> bpf@vger.kernel.org; ast@kernel.org; hawk@kernel.org; tglx@linutronix.de;
> shradhagupta@linux.microsoft.com; linux-kernel@vger.kernel.org
> Subject: RE: [PATCH net-next] net: mana: Add support for variable page
> sizes of ARM64

> > > > diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > > > b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > > > index 1332db9a08eb..c9df942d0d02 100644
> > > > --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > > > +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > > > @@ -182,7 +182,7 @@ int mana_gd_alloc_memory(struct gdma_context
> *gc,
> > > > unsigned int length,
> > > >  dma_addr_t dma_handle;
> > > >  void *buf;
> > > >
> > > > - if (length < PAGE_SIZE || !is_power_of_2(length))
> > > > + if (length < MANA_MIN_QSIZE || !is_power_of_2(length))
> > > >         return -EINVAL;
> > > >
> > > >  gmi->dev = gc->dev;
> > > > @@ -717,7 +717,7 @@ EXPORT_SYMBOL_NS(mana_gd_destroy_dma_region,
> NET_MANA);
> > > >  static int mana_gd_create_dma_region(struct gdma_dev *gd,
> > > >                          struct gdma_mem_info *gmi)
> > > >  {
> > > > - unsigned int num_page = gmi->length / PAGE_SIZE;
> > > > + unsigned int num_page = gmi->length / MANA_MIN_QSIZE;
> > >
> > > This calculation seems a bit weird when using MANA_MIN_QSIZE. The
> > > number of pages, and the construction of the page_addr_list array
> > > a few lines later, seem unrelated to the concept of a minimum queue
> > > size. Is the right concept really a "mapping chunk", and num_page
> > > would conceptually be "num_chunks", or something like that?  Then
> > > a queue must be at least one chunk in size, but that's derived from
> the
> > > chunk size, and is not the core concept.
> >
> > I think calling it "num_chunks" is fine.
> > May I use "num_chunks" in next version?
> >
> 
> I think first is the decision on what to use for MANA_MIN_QSIZE. I'm
> admittedly not familiar with mana and gdma, but the function
> mana_gd_create_dma_region() seems fairly typical in defining a
> logical region that spans multiple 4K chunks that may or may not
> be physically contiguous.  So you set up an array of physical
> addresses that identify the physical memory location of each chunk.
> It seems very similar to a Hyper-V GPADL. I typically *do* see such
> chunks called "pages", but a "mapping chunk" or "mapping unit"
> is probably OK too.  So mana_gd_create_dma_region() would use
> MANA_CHUNK_SIZE instead of MANA_MIN_QSIZE.  Then you could
> also define MANA_MIN_QSIZE to be MANA_CHUNK_SIZE, for use
> specifically when checking the size of a queue.
> 
> Then if you are using MANA_CHUNK_SIZE, the local variable
> would be "num_chunks".  The use of "page_count", "page_addr_list",
> and "offset_in_page" field names in struct
> gdma_create_dma_region_req should then be changed as well.

I'm fine with these names. I will check with Paul too.

I'd define just one macro, with a code comment like this. It 
will be used for chunk calculation and q len checking:
/* Chunk size of MANA DMA, which is also the min queue size */
#define MANA_CHUNK_SIZE 4096

 
> Looking further at the function mana_gd_create_dma_region(),
> there's also the use of offset_in_page(), which is based on the
> guest PAGE_SIZE.  Wouldn't this be problematic if PAGE_SIZE
> is 64K?

As in my other email - I confirmed with Hostnet team that the 
alignment requirement is just 4k.

So I will relax the following check to be 4k alignment too:
if (offset_in_page(gmi->virt_addr) != 0)
                return -EINVAL;

Thanks,
- Haiyang
Haiyang Zhang June 14, 2024, 6:46 p.m. UTC | #6
> -----Original Message-----
> From: Haiyang Zhang <haiyangz@microsoft.com>
> Sent: Wednesday, June 12, 2024 10:22 AM
> To: Michael Kelley <mhklinux@outlook.com>; linux-hyperv@vger.kernel.org;
> netdev@vger.kernel.org; Paul Rosswurm <paulros@microsoft.com>
> Cc: Dexuan Cui <decui@microsoft.com>; stephen@networkplumber.org; KY
> Srinivasan <kys@microsoft.com>; olaf@aepfle.de; vkuznets
> <vkuznets@redhat.com>; davem@davemloft.net; wei.liu@kernel.org;
> edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; leon@kernel.org;
> Long Li <longli@microsoft.com>; ssengar@linux.microsoft.com; linux-
> rdma@vger.kernel.org; daniel@iogearbox.net; john.fastabend@gmail.com;
> bpf@vger.kernel.org; ast@kernel.org; hawk@kernel.org; tglx@linutronix.de;
> shradhagupta@linux.microsoft.com; linux-kernel@vger.kernel.org
> Subject: RE: [PATCH net-next] net: mana: Add support for variable page
> sizes of ARM64
> 
> 
> 
> > -----Original Message-----
> > From: Michael Kelley <mhklinux@outlook.com>
> > Sent: Tuesday, June 11, 2024 10:42 PM
> > To: Haiyang Zhang <haiyangz@microsoft.com>; linux-
> hyperv@vger.kernel.org;
> > netdev@vger.kernel.org; Paul Rosswurm <paulros@microsoft.com>
> > Cc: Dexuan Cui <decui@microsoft.com>; stephen@networkplumber.org; KY
> > Srinivasan <kys@microsoft.com>; olaf@aepfle.de; vkuznets
> > <vkuznets@redhat.com>; davem@davemloft.net; wei.liu@kernel.org;
> > edumazet@google.com; kuba@kernel.org; pabeni@redhat.com;
> leon@kernel.org;
> > Long Li <longli@microsoft.com>; ssengar@linux.microsoft.com; linux-
> > rdma@vger.kernel.org; daniel@iogearbox.net; john.fastabend@gmail.com;
> > bpf@vger.kernel.org; ast@kernel.org; hawk@kernel.org;
> tglx@linutronix.de;
> > shradhagupta@linux.microsoft.com; linux-kernel@vger.kernel.org
> > Subject: RE: [PATCH net-next] net: mana: Add support for variable page
> > sizes of ARM64
> 
> > > > > diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > > > > b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > > > > index 1332db9a08eb..c9df942d0d02 100644
> > > > > --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > > > > +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > > > > @@ -182,7 +182,7 @@ int mana_gd_alloc_memory(struct gdma_context
> > *gc,
> > > > > unsigned int length,
> > > > >  dma_addr_t dma_handle;
> > > > >  void *buf;
> > > > >
> > > > > - if (length < PAGE_SIZE || !is_power_of_2(length))
> > > > > + if (length < MANA_MIN_QSIZE || !is_power_of_2(length))
> > > > >         return -EINVAL;
> > > > >
> > > > >  gmi->dev = gc->dev;
> > > > > @@ -717,7 +717,7 @@ EXPORT_SYMBOL_NS(mana_gd_destroy_dma_region,
> > NET_MANA);
> > > > >  static int mana_gd_create_dma_region(struct gdma_dev *gd,
> > > > >                          struct gdma_mem_info *gmi)
> > > > >  {
> > > > > - unsigned int num_page = gmi->length / PAGE_SIZE;
> > > > > + unsigned int num_page = gmi->length / MANA_MIN_QSIZE;
> > > >
> > > > This calculation seems a bit weird when using MANA_MIN_QSIZE. The
> > > > number of pages, and the construction of the page_addr_list array
> > > > a few lines later, seem unrelated to the concept of a minimum queue
> > > > size. Is the right concept really a "mapping chunk", and num_page
> > > > would conceptually be "num_chunks", or something like that?  Then
> > > > a queue must be at least one chunk in size, but that's derived from
> > the
> > > > chunk size, and is not the core concept.
> > >
> > > I think calling it "num_chunks" is fine.
> > > May I use "num_chunks" in next version?
> > >
> >
> > I think first is the decision on what to use for MANA_MIN_QSIZE. I'm
> > admittedly not familiar with mana and gdma, but the function
> > mana_gd_create_dma_region() seems fairly typical in defining a
> > logical region that spans multiple 4K chunks that may or may not
> > be physically contiguous.  So you set up an array of physical
> > addresses that identify the physical memory location of each chunk.
> > It seems very similar to a Hyper-V GPADL. I typically *do* see such
> > chunks called "pages", but a "mapping chunk" or "mapping unit"
> > is probably OK too.  So mana_gd_create_dma_region() would use
> > MANA_CHUNK_SIZE instead of MANA_MIN_QSIZE.  Then you could
> > also define MANA_MIN_QSIZE to be MANA_CHUNK_SIZE, for use
> > specifically when checking the size of a queue.
> >
> > Then if you are using MANA_CHUNK_SIZE, the local variable
> > would be "num_chunks".  The use of "page_count", "page_addr_list",
> > and "offset_in_page" field names in struct
> > gdma_create_dma_region_req should then be changed as well.
> 
> I'm fine with these names. I will check with Paul too.
> 
> I'd define just one macro, with a code comment like this. It
> will be used for chunk calculation and q len checking:
> /* Chunk size of MANA DMA, which is also the min queue size */
> #define MANA_CHUNK_SIZE 4096
> 
> 

After further discussion with Paul, and reading documents, we 
decided to use MANA_PAGE_SIZE for DMA unit calculations etc.
And use another macro MANA_MIN_QSIZE for queue length checking. 
This is also what Michael initially suggested. 

Thanks,
- Haiyang
diff mbox series

Patch

diff --git a/drivers/net/ethernet/microsoft/Kconfig b/drivers/net/ethernet/microsoft/Kconfig
index 286f0d5697a1..901fbffbf718 100644
--- a/drivers/net/ethernet/microsoft/Kconfig
+++ b/drivers/net/ethernet/microsoft/Kconfig
@@ -18,7 +18,7 @@  if NET_VENDOR_MICROSOFT
 config MICROSOFT_MANA
 	tristate "Microsoft Azure Network Adapter (MANA) support"
 	depends on PCI_MSI
-	depends on X86_64 || (ARM64 && !CPU_BIG_ENDIAN && ARM64_4K_PAGES)
+	depends on X86_64 || (ARM64 && !CPU_BIG_ENDIAN)
 	depends on PCI_HYPERV
 	select AUXILIARY_BUS
 	select PAGE_POOL
diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index 1332db9a08eb..c9df942d0d02 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -182,7 +182,7 @@  int mana_gd_alloc_memory(struct gdma_context *gc, unsigned int length,
 	dma_addr_t dma_handle;
 	void *buf;
 
-	if (length < PAGE_SIZE || !is_power_of_2(length))
+	if (length < MANA_MIN_QSIZE || !is_power_of_2(length))
 		return -EINVAL;
 
 	gmi->dev = gc->dev;
@@ -717,7 +717,7 @@  EXPORT_SYMBOL_NS(mana_gd_destroy_dma_region, NET_MANA);
 static int mana_gd_create_dma_region(struct gdma_dev *gd,
 				     struct gdma_mem_info *gmi)
 {
-	unsigned int num_page = gmi->length / PAGE_SIZE;
+	unsigned int num_page = gmi->length / MANA_MIN_QSIZE;
 	struct gdma_create_dma_region_req *req = NULL;
 	struct gdma_create_dma_region_resp resp = {};
 	struct gdma_context *gc = gd->gdma_context;
@@ -727,7 +727,7 @@  static int mana_gd_create_dma_region(struct gdma_dev *gd,
 	int err;
 	int i;
 
-	if (length < PAGE_SIZE || !is_power_of_2(length))
+	if (length < MANA_MIN_QSIZE || !is_power_of_2(length))
 		return -EINVAL;
 
 	if (offset_in_page(gmi->virt_addr) != 0)
@@ -751,7 +751,7 @@  static int mana_gd_create_dma_region(struct gdma_dev *gd,
 	req->page_addr_list_len = num_page;
 
 	for (i = 0; i < num_page; i++)
-		req->page_addr_list[i] = gmi->dma_handle +  i * PAGE_SIZE;
+		req->page_addr_list[i] = gmi->dma_handle +  i * MANA_MIN_QSIZE;
 
 	err = mana_gd_send_request(gc, req_msg_size, req, sizeof(resp), &resp);
 	if (err)
diff --git a/drivers/net/ethernet/microsoft/mana/hw_channel.c b/drivers/net/ethernet/microsoft/mana/hw_channel.c
index bbc4f9e16c98..038dc31e09cd 100644
--- a/drivers/net/ethernet/microsoft/mana/hw_channel.c
+++ b/drivers/net/ethernet/microsoft/mana/hw_channel.c
@@ -362,12 +362,12 @@  static int mana_hwc_create_cq(struct hw_channel_context *hwc, u16 q_depth,
 	int err;
 
 	eq_size = roundup_pow_of_two(GDMA_EQE_SIZE * q_depth);
-	if (eq_size < MINIMUM_SUPPORTED_PAGE_SIZE)
-		eq_size = MINIMUM_SUPPORTED_PAGE_SIZE;
+	if (eq_size < MANA_MIN_QSIZE)
+		eq_size = MANA_MIN_QSIZE;
 
 	cq_size = roundup_pow_of_two(GDMA_CQE_SIZE * q_depth);
-	if (cq_size < MINIMUM_SUPPORTED_PAGE_SIZE)
-		cq_size = MINIMUM_SUPPORTED_PAGE_SIZE;
+	if (cq_size < MANA_MIN_QSIZE)
+		cq_size = MANA_MIN_QSIZE;
 
 	hwc_cq = kzalloc(sizeof(*hwc_cq), GFP_KERNEL);
 	if (!hwc_cq)
@@ -429,7 +429,7 @@  static int mana_hwc_alloc_dma_buf(struct hw_channel_context *hwc, u16 q_depth,
 
 	dma_buf->num_reqs = q_depth;
 
-	buf_size = PAGE_ALIGN(q_depth * max_msg_size);
+	buf_size = MANA_MIN_QALIGN(q_depth * max_msg_size);
 
 	gmi = &dma_buf->mem_info;
 	err = mana_gd_alloc_memory(gc, buf_size, gmi);
@@ -497,8 +497,8 @@  static int mana_hwc_create_wq(struct hw_channel_context *hwc,
 	else
 		queue_size = roundup_pow_of_two(GDMA_MAX_SQE_SIZE * q_depth);
 
-	if (queue_size < MINIMUM_SUPPORTED_PAGE_SIZE)
-		queue_size = MINIMUM_SUPPORTED_PAGE_SIZE;
+	if (queue_size < MANA_MIN_QSIZE)
+		queue_size = MANA_MIN_QSIZE;
 
 	hwc_wq = kzalloc(sizeof(*hwc_wq), GFP_KERNEL);
 	if (!hwc_wq)
@@ -628,10 +628,10 @@  static int mana_hwc_establish_channel(struct gdma_context *gc, u16 *q_depth,
 	init_completion(&hwc->hwc_init_eqe_comp);
 
 	err = mana_smc_setup_hwc(&gc->shm_channel, false,
-				 eq->mem_info.dma_handle,
-				 cq->mem_info.dma_handle,
-				 rq->mem_info.dma_handle,
-				 sq->mem_info.dma_handle,
+				 virt_to_phys(eq->mem_info.virt_addr),
+				 virt_to_phys(cq->mem_info.virt_addr),
+				 virt_to_phys(rq->mem_info.virt_addr),
+				 virt_to_phys(sq->mem_info.virt_addr),
 				 eq->eq.msix_index);
 	if (err)
 		return err;
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index d087cf954f75..6a891dbce686 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -1889,10 +1889,10 @@  static int mana_create_txq(struct mana_port_context *apc,
 	 *  to prevent overflow.
 	 */
 	txq_size = MAX_SEND_BUFFERS_PER_QUEUE * 32;
-	BUILD_BUG_ON(!PAGE_ALIGNED(txq_size));
+	BUILD_BUG_ON(!MANA_MIN_QALIGNED(txq_size));
 
 	cq_size = MAX_SEND_BUFFERS_PER_QUEUE * COMP_ENTRY_SIZE;
-	cq_size = PAGE_ALIGN(cq_size);
+	cq_size = MANA_MIN_QALIGN(cq_size);
 
 	gc = gd->gdma_context;
 
@@ -2189,8 +2189,8 @@  static struct mana_rxq *mana_create_rxq(struct mana_port_context *apc,
 	if (err)
 		goto out;
 
-	rq_size = PAGE_ALIGN(rq_size);
-	cq_size = PAGE_ALIGN(cq_size);
+	rq_size = MANA_MIN_QALIGN(rq_size);
+	cq_size = MANA_MIN_QALIGN(cq_size);
 
 	/* Create RQ */
 	memset(&spec, 0, sizeof(spec));
diff --git a/drivers/net/ethernet/microsoft/mana/shm_channel.c b/drivers/net/ethernet/microsoft/mana/shm_channel.c
index 5553af9c8085..9a54a163d8d1 100644
--- a/drivers/net/ethernet/microsoft/mana/shm_channel.c
+++ b/drivers/net/ethernet/microsoft/mana/shm_channel.c
@@ -6,6 +6,7 @@ 
 #include <linux/io.h>
 #include <linux/mm.h>
 
+#include <net/mana/gdma.h>
 #include <net/mana/shm_channel.h>
 
 #define PAGE_FRAME_L48_WIDTH_BYTES 6
@@ -183,7 +184,7 @@  int mana_smc_setup_hwc(struct shm_channel *sc, bool reset_vf, u64 eq_addr,
 
 	/* EQ addr: low 48 bits of frame address */
 	shmem = (u64 *)ptr;
-	frame_addr = PHYS_PFN(eq_addr);
+	frame_addr = MANA_PFN(eq_addr);
 	*shmem = frame_addr & PAGE_FRAME_L48_MASK;
 	all_addr_h4bits |= (frame_addr >> PAGE_FRAME_L48_WIDTH_BITS) <<
 		(frame_addr_seq++ * PAGE_FRAME_H4_WIDTH_BITS);
@@ -191,7 +192,7 @@  int mana_smc_setup_hwc(struct shm_channel *sc, bool reset_vf, u64 eq_addr,
 
 	/* CQ addr: low 48 bits of frame address */
 	shmem = (u64 *)ptr;
-	frame_addr = PHYS_PFN(cq_addr);
+	frame_addr = MANA_PFN(cq_addr);
 	*shmem = frame_addr & PAGE_FRAME_L48_MASK;
 	all_addr_h4bits |= (frame_addr >> PAGE_FRAME_L48_WIDTH_BITS) <<
 		(frame_addr_seq++ * PAGE_FRAME_H4_WIDTH_BITS);
@@ -199,7 +200,7 @@  int mana_smc_setup_hwc(struct shm_channel *sc, bool reset_vf, u64 eq_addr,
 
 	/* RQ addr: low 48 bits of frame address */
 	shmem = (u64 *)ptr;
-	frame_addr = PHYS_PFN(rq_addr);
+	frame_addr = MANA_PFN(rq_addr);
 	*shmem = frame_addr & PAGE_FRAME_L48_MASK;
 	all_addr_h4bits |= (frame_addr >> PAGE_FRAME_L48_WIDTH_BITS) <<
 		(frame_addr_seq++ * PAGE_FRAME_H4_WIDTH_BITS);
@@ -207,7 +208,7 @@  int mana_smc_setup_hwc(struct shm_channel *sc, bool reset_vf, u64 eq_addr,
 
 	/* SQ addr: low 48 bits of frame address */
 	shmem = (u64 *)ptr;
-	frame_addr = PHYS_PFN(sq_addr);
+	frame_addr = MANA_PFN(sq_addr);
 	*shmem = frame_addr & PAGE_FRAME_L48_MASK;
 	all_addr_h4bits |= (frame_addr >> PAGE_FRAME_L48_WIDTH_BITS) <<
 		(frame_addr_seq++ * PAGE_FRAME_H4_WIDTH_BITS);
diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h
index 27684135bb4d..b392559c33e9 100644
--- a/include/net/mana/gdma.h
+++ b/include/net/mana/gdma.h
@@ -224,7 +224,12 @@  struct gdma_dev {
 	struct auxiliary_device *adev;
 };
 
-#define MINIMUM_SUPPORTED_PAGE_SIZE PAGE_SIZE
+/* These are defined by HW */
+#define MANA_MIN_QSHIFT 12
+#define MANA_MIN_QSIZE (1 << MANA_MIN_QSHIFT)
+#define MANA_MIN_QALIGN(x) ALIGN((x), MANA_MIN_QSIZE)
+#define MANA_MIN_QALIGNED(addr) IS_ALIGNED((unsigned long)(addr), MANA_MIN_QSIZE)
+#define MANA_PFN(a) (PHYS_PFN(a) << (PAGE_SHIFT - MANA_MIN_QSHIFT))
 
 #define GDMA_CQE_SIZE 64
 #define GDMA_EQE_SIZE 16
diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
index 561f6719fb4e..43e8fc574354 100644
--- a/include/net/mana/mana.h
+++ b/include/net/mana/mana.h
@@ -42,7 +42,8 @@  enum TRI_STATE {
 
 #define MAX_SEND_BUFFERS_PER_QUEUE 256
 
-#define EQ_SIZE (8 * PAGE_SIZE)
+#define EQ_SIZE (8 * MANA_MIN_QSIZE)
+
 #define LOG2_EQ_THROTTLE 3
 
 #define MAX_PORTS_IN_MANA_DEV 256