diff mbox series

[RFC,2/2] xen/grant-table: Use unpopulated DMAable pages instead of real RAM ones

Message ID 1652810658-27810-3-git-send-email-olekstysh@gmail.com (mailing list archive)
State New
Headers show
Series Ability to allocate DMAable pages using unpopulated-alloc | expand

Commit Message

Oleksandr May 17, 2022, 6:04 p.m. UTC
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

Depends on CONFIG_XEN_UNPOPULATED_ALLOC. If enabled then unpopulated
DMAable (contiguous) pages will be allocated for grant mapping into
instead of ballooning out real RAM pages.

TODO: Fallback to real RAM pages if xen_alloc_unpopulated_dma_pages()
fails.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
---
 drivers/xen/grant-table.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

Comments

Stefano Stabellini June 3, 2022, 9:19 p.m. UTC | #1
On Tue, 17 May 2022, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> Depends on CONFIG_XEN_UNPOPULATED_ALLOC. If enabled then unpopulated
> DMAable (contiguous) pages will be allocated for grant mapping into
> instead of ballooning out real RAM pages.
> 
> TODO: Fallback to real RAM pages if xen_alloc_unpopulated_dma_pages()
> fails.
> 
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> ---
>  drivers/xen/grant-table.c | 27 +++++++++++++++++++++++++++
>  1 file changed, 27 insertions(+)
> 
> diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
> index 8ccccac..2bb4392 100644
> --- a/drivers/xen/grant-table.c
> +++ b/drivers/xen/grant-table.c
> @@ -864,6 +864,25 @@ EXPORT_SYMBOL_GPL(gnttab_free_pages);
>   */
>  int gnttab_dma_alloc_pages(struct gnttab_dma_alloc_args *args)
>  {
> +#ifdef CONFIG_XEN_UNPOPULATED_ALLOC
> +	int ret;

This is an alternative implementation of the same function. If we are
going to use #ifdef, then I would #ifdef the entire function, rather
than just the body. Otherwise within the function body we can use
IS_ENABLED.


> +	ret = xen_alloc_unpopulated_dma_pages(args->dev, args->nr_pages,
> +			args->pages);
> +	if (ret < 0)
> +		return ret;
> +
> +	ret = gnttab_pages_set_private(args->nr_pages, args->pages);
> +	if (ret < 0) {
> +		gnttab_dma_free_pages(args);

it should xen_free_unpopulated_dma_pages ?


> +		return ret;
> +	}
> +
> +	args->vaddr = page_to_virt(args->pages[0]);
> +	args->dev_bus_addr = page_to_phys(args->pages[0]);

There are two things to note here. 

The first thing to note is that normally we would call pfn_to_bfn to
retrieve the dev_bus_addr of a page because pfn_to_bfn takes into
account foreign mappings. However, these are freshly allocated pages
without foreign mappings, so page_to_phys/dma should be sufficient.


The second has to do with physical addresses and DMA addresses. The
functions are called gnttab_dma_alloc_pages and
xen_alloc_unpopulated_dma_pages which make you think we are retrieving a
DMA address here. However, to get a DMA address we need to call
page_to_dma rather than page_to_phys.

page_to_dma takes into account special offsets that some devices have
when accessing memory. There are real cases on ARM where the physical
address != DMA address, e.g. RPi4.

However, to call page_to_dma you need to specify as first argument the
DMA-capable device that is expected to use those pages for DMA (e.g. an
ethernet device or a MMC controller.) While the args->dev we have in
gnttab_dma_alloc_pages is the gntdev_miscdev.

So this interface cannot actually be used to allocate memory that is
supposed to be DMA-able by a DMA-capable device, such as an ethernet
device.

But I think that should be fine because the memory is meant to be used
by a userspace PV backend for grant mappings. If any of those mappings
end up being used for actual DMA in the kernel they should go through the
drivers/xen/swiotlb-xen.c and xen_phys_to_dma should be called, which
ends up calling page_to_dma as appropriate.

It would be good to double-check that the above is correct and, if so,
maybe add a short in-code comment about it:

/*
 * These are not actually DMA addresses but regular physical addresses.
 * If these pages end up being used in a DMA operation then the
 * swiotlb-xen functions are called and xen_phys_to_dma takes care of
 * the address translations:
 *
 * - from gfn to bfn in case of foreign mappings
 * - from physical to DMA addresses in case the two are different for a
 *   given DMA-mastering device
 */



> +	return ret;
> +#else
>  	unsigned long pfn, start_pfn;
>  	size_t size;
>  	int i, ret;
> @@ -910,6 +929,7 @@ int gnttab_dma_alloc_pages(struct gnttab_dma_alloc_args *args)
>  fail:
>  	gnttab_dma_free_pages(args);
>  	return ret;
> +#endif
>  }
>  EXPORT_SYMBOL_GPL(gnttab_dma_alloc_pages);
>  
> @@ -919,6 +939,12 @@ EXPORT_SYMBOL_GPL(gnttab_dma_alloc_pages);
>   */
>  int gnttab_dma_free_pages(struct gnttab_dma_alloc_args *args)
>  {
> +#ifdef CONFIG_XEN_UNPOPULATED_ALLOC
> +	gnttab_pages_clear_private(args->nr_pages, args->pages);
> +	xen_free_unpopulated_dma_pages(args->dev, args->nr_pages, args->pages);
> +
> +	return 0;
> +#else
>  	size_t size;
>  	int i, ret;
>  
> @@ -946,6 +972,7 @@ int gnttab_dma_free_pages(struct gnttab_dma_alloc_args *args)
>  		dma_free_wc(args->dev, size,
>  			    args->vaddr, args->dev_bus_addr);
>  	return ret;
> +#endif
>  }
>  EXPORT_SYMBOL_GPL(gnttab_dma_free_pages);
>  #endif
> -- 
> 2.7.4
>
Oleksandr June 9, 2022, 6:39 p.m. UTC | #2
On 04.06.22 00:19, Stefano Stabellini wrote:


Hello Stefano

Thank you for having a look and sorry for the late response.

> On Tue, 17 May 2022, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> Depends on CONFIG_XEN_UNPOPULATED_ALLOC. If enabled then unpopulated
>> DMAable (contiguous) pages will be allocated for grant mapping into
>> instead of ballooning out real RAM pages.
>>
>> TODO: Fallback to real RAM pages if xen_alloc_unpopulated_dma_pages()
>> fails.
>>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> ---
>>   drivers/xen/grant-table.c | 27 +++++++++++++++++++++++++++
>>   1 file changed, 27 insertions(+)
>>
>> diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
>> index 8ccccac..2bb4392 100644
>> --- a/drivers/xen/grant-table.c
>> +++ b/drivers/xen/grant-table.c
>> @@ -864,6 +864,25 @@ EXPORT_SYMBOL_GPL(gnttab_free_pages);
>>    */
>>   int gnttab_dma_alloc_pages(struct gnttab_dma_alloc_args *args)
>>   {
>> +#ifdef CONFIG_XEN_UNPOPULATED_ALLOC
>> +	int ret;
> This is an alternative implementation of the same function.

Currently, yes.


>   If we are
> going to use #ifdef, then I would #ifdef the entire function, rather
> than just the body. Otherwise within the function body we can use
> IS_ENABLED.


Good point. Note, there is one missing thing in current patch which is 
described in TODO.

"Fallback to real RAM pages if xen_alloc_unpopulated_dma_pages() 
fails."  So I will likely use IS_ENABLED within the function body.

If CONFIG_XEN_UNPOPULATED_ALLOC is enabled then gnttab_dma_alloc_pages() 
will try to call xen_alloc_unpopulated_dma_pages() the first and if 
fails then fallback to allocate RAM pages and balloon them out.

One moment is not entirely clear to me. If we use fallback in 
gnttab_dma_alloc_pages() then we must use fallback in 
gnttab_dma_free_pages() as well, we cannot use 
xen_free_unpopulated_dma_pages() for real RAM pages. The question is how 
to pass this information to the gnttab_dma_free_pages()? The first idea 
which comes to mind is to add a flag to struct gnttab_dma_alloc_args...


>
>> +	ret = xen_alloc_unpopulated_dma_pages(args->dev, args->nr_pages,
>> +			args->pages);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	ret = gnttab_pages_set_private(args->nr_pages, args->pages);
>> +	if (ret < 0) {
>> +		gnttab_dma_free_pages(args);
> it should xen_free_unpopulated_dma_pages ?

Besides calling the xen_free_unpopulated_dma_pages(), we also need to 
call gnttab_pages_clear_private() here, this is what 
gnttab_dma_free_pages() is doing.

I can change to call both function instead:

     gnttab_pages_clear_private(args->nr_pages, args->pages);
     xen_free_unpopulated_dma_pages(args->dev, args->nr_pages, args->pages);

Shall I?


>
>
>> +		return ret;
>> +	}
>> +
>> +	args->vaddr = page_to_virt(args->pages[0]);
>> +	args->dev_bus_addr = page_to_phys(args->pages[0]);
> There are two things to note here.
>
> The first thing to note is that normally we would call pfn_to_bfn to
> retrieve the dev_bus_addr of a page because pfn_to_bfn takes into
> account foreign mappings. However, these are freshly allocated pages
> without foreign mappings, so page_to_phys/dma should be sufficient.

agree


>
>
> The second has to do with physical addresses and DMA addresses. The
> functions are called gnttab_dma_alloc_pages and
> xen_alloc_unpopulated_dma_pages which make you think we are retrieving a
> DMA address here. However, to get a DMA address we need to call
> page_to_dma rather than page_to_phys.
>
> page_to_dma takes into account special offsets that some devices have
> when accessing memory. There are real cases on ARM where the physical
> address != DMA address, e.g. RPi4.

I got it. Now I am in doubt whether it would be better to name the API:

xen_alloc_unpopulated_cma_pages()

or

xen_alloc_unpopulated_contiguous_pages()

What do you think?


>
> However, to call page_to_dma you need to specify as first argument the
> DMA-capable device that is expected to use those pages for DMA (e.g. an
> ethernet device or a MMC controller.) While the args->dev we have in
> gnttab_dma_alloc_pages is the gntdev_miscdev.

agree

As I understand, at this time it is unknown for what exactly device 
these pages are supposed to be used at the end.

For now, it is only known that these pages to be used by userspace PV 
backend for grant mappings.


>
> So this interface cannot actually be used to allocate memory that is
> supposed to be DMA-able by a DMA-capable device, such as an ethernet
> device.

agree


>
> But I think that should be fine because the memory is meant to be used
> by a userspace PV backend for grant mappings. If any of those mappings
> end up being used for actual DMA in the kernel they should go through the
> drivers/xen/swiotlb-xen.c and xen_phys_to_dma should be called, which
> ends up calling page_to_dma as appropriate.
>
> It would be good to double-check that the above is correct and, if so,
> maybe add a short in-code comment about it:
>
> /*
>   * These are not actually DMA addresses but regular physical addresses.
>   * If these pages end up being used in a DMA operation then the
>   * swiotlb-xen functions are called and xen_phys_to_dma takes care of
>   * the address translations:
>   *
>   * - from gfn to bfn in case of foreign mappings
>   * - from physical to DMA addresses in case the two are different for a
>   *   given DMA-mastering device
>   */

I agree this needs to be re-checked. But, there is one moment here, if 
userspace PV backend runs in other than Dom0 domain (non 1:1 mapped 
domain), the xen-swiotlb seems not to be in use then? How to be in this 
case?


>
>
>
>> +	return ret;
>> +#else
>>   	unsigned long pfn, start_pfn;
>>   	size_t size;
>>   	int i, ret;
>> @@ -910,6 +929,7 @@ int gnttab_dma_alloc_pages(struct gnttab_dma_alloc_args *args)
>>   fail:
>>   	gnttab_dma_free_pages(args);
>>   	return ret;
>> +#endif
>>   }
>>   EXPORT_SYMBOL_GPL(gnttab_dma_alloc_pages);
>>   
>> @@ -919,6 +939,12 @@ EXPORT_SYMBOL_GPL(gnttab_dma_alloc_pages);
>>    */
>>   int gnttab_dma_free_pages(struct gnttab_dma_alloc_args *args)
>>   {
>> +#ifdef CONFIG_XEN_UNPOPULATED_ALLOC
>> +	gnttab_pages_clear_private(args->nr_pages, args->pages);
>> +	xen_free_unpopulated_dma_pages(args->dev, args->nr_pages, args->pages);
>> +
>> +	return 0;
>> +#else
>>   	size_t size;
>>   	int i, ret;
>>   
>> @@ -946,6 +972,7 @@ int gnttab_dma_free_pages(struct gnttab_dma_alloc_args *args)
>>   		dma_free_wc(args->dev, size,
>>   			    args->vaddr, args->dev_bus_addr);
>>   	return ret;
>> +#endif
>>   }
>>   EXPORT_SYMBOL_GPL(gnttab_dma_free_pages);
>>   #endif
>> -- 
>> 2.7.4
>>
Stefano Stabellini June 10, 2022, 11:55 p.m. UTC | #3
On Thu, 9 Jun 2022, Oleksandr wrote:
> On 04.06.22 00:19, Stefano Stabellini wrote:
> Hello Stefano
> 
> Thank you for having a look and sorry for the late response.
> 
> > On Tue, 17 May 2022, Oleksandr Tyshchenko wrote:
> > > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > 
> > > Depends on CONFIG_XEN_UNPOPULATED_ALLOC. If enabled then unpopulated
> > > DMAable (contiguous) pages will be allocated for grant mapping into
> > > instead of ballooning out real RAM pages.
> > > 
> > > TODO: Fallback to real RAM pages if xen_alloc_unpopulated_dma_pages()
> > > fails.
> > > 
> > > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > ---
> > >   drivers/xen/grant-table.c | 27 +++++++++++++++++++++++++++
> > >   1 file changed, 27 insertions(+)
> > > 
> > > diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
> > > index 8ccccac..2bb4392 100644
> > > --- a/drivers/xen/grant-table.c
> > > +++ b/drivers/xen/grant-table.c
> > > @@ -864,6 +864,25 @@ EXPORT_SYMBOL_GPL(gnttab_free_pages);
> > >    */
> > >   int gnttab_dma_alloc_pages(struct gnttab_dma_alloc_args *args)
> > >   {
> > > +#ifdef CONFIG_XEN_UNPOPULATED_ALLOC
> > > +	int ret;
> > This is an alternative implementation of the same function.
> 
> Currently, yes.
> 
> 
> >   If we are
> > going to use #ifdef, then I would #ifdef the entire function, rather
> > than just the body. Otherwise within the function body we can use
> > IS_ENABLED.
> 
> 
> Good point. Note, there is one missing thing in current patch which is
> described in TODO.
> 
> "Fallback to real RAM pages if xen_alloc_unpopulated_dma_pages() fails."  So I
> will likely use IS_ENABLED within the function body.
> 
> If CONFIG_XEN_UNPOPULATED_ALLOC is enabled then gnttab_dma_alloc_pages() will
> try to call xen_alloc_unpopulated_dma_pages() the first and if fails then
> fallback to allocate RAM pages and balloon them out.
> 
> One moment is not entirely clear to me. If we use fallback in
> gnttab_dma_alloc_pages() then we must use fallback in gnttab_dma_free_pages()
> as well, we cannot use xen_free_unpopulated_dma_pages() for real RAM pages.
> The question is how to pass this information to the gnttab_dma_free_pages()?
> The first idea which comes to mind is to add a flag to struct
> gnttab_dma_alloc_args...
 
You can check if the page is within the mhp_range range or part of
iomem_resource? If not, you can free it as a normal page.

If we do this, then the fallback is better implemented in
unpopulated-alloc.c because that is the one that is aware about
page addresses.

 
 
> > > +	ret = xen_alloc_unpopulated_dma_pages(args->dev, args->nr_pages,
> > > +			args->pages);
> > > +	if (ret < 0)
> > > +		return ret;
> > > +
> > > +	ret = gnttab_pages_set_private(args->nr_pages, args->pages);
> > > +	if (ret < 0) {
> > > +		gnttab_dma_free_pages(args);
> > it should xen_free_unpopulated_dma_pages ?
> 
> Besides calling the xen_free_unpopulated_dma_pages(), we also need to call
> gnttab_pages_clear_private() here, this is what gnttab_dma_free_pages() is
> doing.
> 
> I can change to call both function instead:
> 
>     gnttab_pages_clear_private(args->nr_pages, args->pages);
>     xen_free_unpopulated_dma_pages(args->dev, args->nr_pages, args->pages);
> 
> Shall I?

No, leave it as is. I didn't realize that gnttab_pages_set_private can
fail half-way through.

 
> > 
> > 
> > > +		return ret;
> > > +	}
> > > +
> > > +	args->vaddr = page_to_virt(args->pages[0]);
> > > +	args->dev_bus_addr = page_to_phys(args->pages[0]);
> > There are two things to note here.
> > 
> > The first thing to note is that normally we would call pfn_to_bfn to
> > retrieve the dev_bus_addr of a page because pfn_to_bfn takes into
> > account foreign mappings. However, these are freshly allocated pages
> > without foreign mappings, so page_to_phys/dma should be sufficient.
> 
> agree
> 
> 
> > 
> > 
> > The second has to do with physical addresses and DMA addresses. The
> > functions are called gnttab_dma_alloc_pages and
> > xen_alloc_unpopulated_dma_pages which make you think we are retrieving a
> > DMA address here. However, to get a DMA address we need to call
> > page_to_dma rather than page_to_phys.
> > 
> > page_to_dma takes into account special offsets that some devices have
> > when accessing memory. There are real cases on ARM where the physical
> > address != DMA address, e.g. RPi4.
> 
> I got it. Now I am in doubt whether it would be better to name the API:
> 
> xen_alloc_unpopulated_cma_pages()
> 
> or
> 
> xen_alloc_unpopulated_contiguous_pages()
> 
> What do you think?

Yeah actually I think it is better to stay away from "dma" in the name.
I like xen_alloc_unpopulated_contiguous_pages().
 
 
> > However, to call page_to_dma you need to specify as first argument the
> > DMA-capable device that is expected to use those pages for DMA (e.g. an
> > ethernet device or a MMC controller.) While the args->dev we have in
> > gnttab_dma_alloc_pages is the gntdev_miscdev.
> 
> agree
> 
> As I understand, at this time it is unknown for what exactly device these
> pages are supposed to be used at the end.
> 
> For now, it is only known that these pages to be used by userspace PV backend
> for grant mappings.

Yeah
 

> > So this interface cannot actually be used to allocate memory that is
> > supposed to be DMA-able by a DMA-capable device, such as an ethernet
> > device.
> 
> agree
> 
> 
> > 
> > But I think that should be fine because the memory is meant to be used
> > by a userspace PV backend for grant mappings. If any of those mappings
> > end up being used for actual DMA in the kernel they should go through the
> > drivers/xen/swiotlb-xen.c and xen_phys_to_dma should be called, which
> > ends up calling page_to_dma as appropriate.
> > 
> > It would be good to double-check that the above is correct and, if so,
> > maybe add a short in-code comment about it:
> > 
> > /*
> >   * These are not actually DMA addresses but regular physical addresses.
> >   * If these pages end up being used in a DMA operation then the
> >   * swiotlb-xen functions are called and xen_phys_to_dma takes care of
> >   * the address translations:
> >   *
> >   * - from gfn to bfn in case of foreign mappings
> >   * - from physical to DMA addresses in case the two are different for a
> >   *   given DMA-mastering device
> >   */
> 
> I agree this needs to be re-checked. But, there is one moment here, if
> userspace PV backend runs in other than Dom0 domain (non 1:1 mapped domain),
> the xen-swiotlb seems not to be in use then? How to be in this case?
 
In that case, an IOMMU is required. If an IOMMU is setup correct, then
the gfn->bfn translation is not necessary because it is done
automatically by the IOMMU. That is because when the foreign page is
mapped in the domain, the mapping also applies to the IOMMU pagetable.

So the device is going to do DMA to "gfn" and the IOMMU will translate
it to the right "mfn", the one corresponding to "bfn".

The physical to DMA address should be done automatically by the default
(non-swiotlb_xen) dma_ops in Linux. E.g.
kernel/dma/direct.c:dma_direct_map_sg correctly calls
dma_direct_map_page, which calls phys_to_dma.
 
 
 
> > > +	return ret;
> > > +#else
> > >   	unsigned long pfn, start_pfn;
> > >   	size_t size;
> > >   	int i, ret;
> > > @@ -910,6 +929,7 @@ int gnttab_dma_alloc_pages(struct
> > > gnttab_dma_alloc_args *args)
> > >   fail:
> > >   	gnttab_dma_free_pages(args);
> > >   	return ret;
> > > +#endif
> > >   }
> > >   EXPORT_SYMBOL_GPL(gnttab_dma_alloc_pages);
> > >   @@ -919,6 +939,12 @@ EXPORT_SYMBOL_GPL(gnttab_dma_alloc_pages);
> > >    */
> > >   int gnttab_dma_free_pages(struct gnttab_dma_alloc_args *args)
> > >   {
> > > +#ifdef CONFIG_XEN_UNPOPULATED_ALLOC
> > > +	gnttab_pages_clear_private(args->nr_pages, args->pages);
> > > +	xen_free_unpopulated_dma_pages(args->dev, args->nr_pages,
> > > args->pages);
> > > +
> > > +	return 0;
> > > +#else
> > >   	size_t size;
> > >   	int i, ret;
> > >   @@ -946,6 +972,7 @@ int gnttab_dma_free_pages(struct
> > > gnttab_dma_alloc_args *args)
> > >   		dma_free_wc(args->dev, size,
> > >   			    args->vaddr, args->dev_bus_addr);
> > >   	return ret;
> > > +#endif
> > >   }
> > >   EXPORT_SYMBOL_GPL(gnttab_dma_free_pages);
> > >   #endif
> > > -- 
> > > 2.7.4
> > >
Oleksandr June 14, 2022, 5:53 p.m. UTC | #4
On 11.06.22 02:55, Stefano Stabellini wrote:

Hello Stefano

> On Thu, 9 Jun 2022, Oleksandr wrote:
>> On 04.06.22 00:19, Stefano Stabellini wrote:
>> Hello Stefano
>>
>> Thank you for having a look and sorry for the late response.
>>
>>> On Tue, 17 May 2022, Oleksandr Tyshchenko wrote:
>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>
>>>> Depends on CONFIG_XEN_UNPOPULATED_ALLOC. If enabled then unpopulated
>>>> DMAable (contiguous) pages will be allocated for grant mapping into
>>>> instead of ballooning out real RAM pages.
>>>>
>>>> TODO: Fallback to real RAM pages if xen_alloc_unpopulated_dma_pages()
>>>> fails.
>>>>
>>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>> ---
>>>>    drivers/xen/grant-table.c | 27 +++++++++++++++++++++++++++
>>>>    1 file changed, 27 insertions(+)
>>>>
>>>> diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
>>>> index 8ccccac..2bb4392 100644
>>>> --- a/drivers/xen/grant-table.c
>>>> +++ b/drivers/xen/grant-table.c
>>>> @@ -864,6 +864,25 @@ EXPORT_SYMBOL_GPL(gnttab_free_pages);
>>>>     */
>>>>    int gnttab_dma_alloc_pages(struct gnttab_dma_alloc_args *args)
>>>>    {
>>>> +#ifdef CONFIG_XEN_UNPOPULATED_ALLOC
>>>> +	int ret;
>>> This is an alternative implementation of the same function.
>> Currently, yes.
>>
>>
>>>    If we are
>>> going to use #ifdef, then I would #ifdef the entire function, rather
>>> than just the body. Otherwise within the function body we can use
>>> IS_ENABLED.
>>
>> Good point. Note, there is one missing thing in current patch which is
>> described in TODO.
>>
>> "Fallback to real RAM pages if xen_alloc_unpopulated_dma_pages() fails."  So I
>> will likely use IS_ENABLED within the function body.
>>
>> If CONFIG_XEN_UNPOPULATED_ALLOC is enabled then gnttab_dma_alloc_pages() will
>> try to call xen_alloc_unpopulated_dma_pages() the first and if fails then
>> fallback to allocate RAM pages and balloon them out.
>>
>> One moment is not entirely clear to me. If we use fallback in
>> gnttab_dma_alloc_pages() then we must use fallback in gnttab_dma_free_pages()
>> as well, we cannot use xen_free_unpopulated_dma_pages() for real RAM pages.
>> The question is how to pass this information to the gnttab_dma_free_pages()?
>> The first idea which comes to mind is to add a flag to struct
>> gnttab_dma_alloc_args...
>   
> You can check if the page is within the mhp_range range or part of
> iomem_resource? If not, you can free it as a normal page.
>
> If we do this, then the fallback is better implemented in
> unpopulated-alloc.c because that is the one that is aware about
> page addresses.


I got your idea and agree this can work technically. Or if we finally 
decide to use the second option (use "dma_pool" for all) in the first patch
"[RFC PATCH 1/2] xen/unpopulated-alloc: Introduce helpers for DMA 
allocations" then we will likely be able to check whether a page in question
is within a "dma_pool" using gen_pool_has_addr().

I am still wondering, can we avoid the fallback implementation in 
unpopulated-alloc.c.
Because for that purpose we will need to pull more code into 
unpopulated-alloc.c (to be more precise, almost everything which 
gnttab_dma_free_pages() already has except gnttab_pages_clear_private()) 
and pass more arguments to xen_free_unpopulated_dma_pages(). Also I 
might mistake, but having a fallback split between grant-table.c (to 
allocate RAM pages in gnttab_dma_alloc_pages()) and unpopulated-alloc.c 
(to free RAM pages in xen_free_unpopulated_dma_pages()) would look a bit 
weird.

I see two possible options for the fallback implementation in grant-table.c:
1. (less preferable) by introducing new flag in struct gnttab_dma_alloc_args
2. (more preferable) by guessing unpopulated (non real RAM) page using 
is_zone_device_page(), etc


For example, with the second option the resulting code will look quite 
simple (only build tested):

diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 738029d..3bda71f 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -1047,6 +1047,23 @@ int gnttab_dma_alloc_pages(struct 
gnttab_dma_alloc_args *args)
         size_t size;
         int i, ret;

+       if (IS_ENABLED(CONFIG_XEN_UNPOPULATED_ALLOC)) {
+               ret = xen_alloc_unpopulated_dma_pages(args->dev, 
args->nr_pages,
+                               args->pages);
+               if (ret < 0)
+                       goto fallback;
+
+               ret = gnttab_pages_set_private(args->nr_pages, args->pages);
+               if (ret < 0)
+                       goto fail;
+
+               args->vaddr = page_to_virt(args->pages[0]);
+               args->dev_bus_addr = page_to_phys(args->pages[0]);
+
+               return ret;
+       }
+
+fallback:
         size = args->nr_pages << PAGE_SHIFT;
         if (args->coherent)
                 args->vaddr = dma_alloc_coherent(args->dev, size,
@@ -1103,6 +1120,12 @@ int gnttab_dma_free_pages(struct 
gnttab_dma_alloc_args *args)

         gnttab_pages_clear_private(args->nr_pages, args->pages);

+       if (IS_ENABLED(CONFIG_XEN_UNPOPULATED_ALLOC) &&
+                       is_zone_device_page(args->pages[0])) {
+               xen_free_unpopulated_dma_pages(args->dev, 
args->nr_pages, args->pages);
+               return 0;
+       }
+
         for (i = 0; i < args->nr_pages; i++)
                 args->frames[i] = page_to_xen_pfn(args->pages[i]);


What do you think?


>
>   
>   
>>>> +	ret = xen_alloc_unpopulated_dma_pages(args->dev, args->nr_pages,
>>>> +			args->pages);
>>>> +	if (ret < 0)
>>>> +		return ret;
>>>> +
>>>> +	ret = gnttab_pages_set_private(args->nr_pages, args->pages);
>>>> +	if (ret < 0) {
>>>> +		gnttab_dma_free_pages(args);
>>> it should xen_free_unpopulated_dma_pages ?
>> Besides calling the xen_free_unpopulated_dma_pages(), we also need to call
>> gnttab_pages_clear_private() here, this is what gnttab_dma_free_pages() is
>> doing.
>>
>> I can change to call both function instead:
>>
>>      gnttab_pages_clear_private(args->nr_pages, args->pages);
>>      xen_free_unpopulated_dma_pages(args->dev, args->nr_pages, args->pages);
>>
>> Shall I?
> No, leave it as is. I didn't realize that gnttab_pages_set_private can
> fail half-way through.


ok, thank you for the confirmation.


>
>   
>>>
>>>> +		return ret;
>>>> +	}
>>>> +
>>>> +	args->vaddr = page_to_virt(args->pages[0]);
>>>> +	args->dev_bus_addr = page_to_phys(args->pages[0]);
>>> There are two things to note here.
>>>
>>> The first thing to note is that normally we would call pfn_to_bfn to
>>> retrieve the dev_bus_addr of a page because pfn_to_bfn takes into
>>> account foreign mappings. However, these are freshly allocated pages
>>> without foreign mappings, so page_to_phys/dma should be sufficient.
>> agree
>>
>>
>>>
>>> The second has to do with physical addresses and DMA addresses. The
>>> functions are called gnttab_dma_alloc_pages and
>>> xen_alloc_unpopulated_dma_pages which make you think we are retrieving a
>>> DMA address here. However, to get a DMA address we need to call
>>> page_to_dma rather than page_to_phys.
>>>
>>> page_to_dma takes into account special offsets that some devices have
>>> when accessing memory. There are real cases on ARM where the physical
>>> address != DMA address, e.g. RPi4.
>> I got it. Now I am in doubt whether it would be better to name the API:
>>
>> xen_alloc_unpopulated_cma_pages()
>>
>> or
>>
>> xen_alloc_unpopulated_contiguous_pages()
>>
>> What do you think?
> Yeah actually I think it is better to stay away from "dma" in the name.
> I like xen_alloc_unpopulated_contiguous_pages().


perfect, I will rename then, thank you for the confirmation.


>   
>   
>>> However, to call page_to_dma you need to specify as first argument the
>>> DMA-capable device that is expected to use those pages for DMA (e.g. an
>>> ethernet device or a MMC controller.) While the args->dev we have in
>>> gnttab_dma_alloc_pages is the gntdev_miscdev.
>> agree
>>
>> As I understand, at this time it is unknown for what exactly device these
>> pages are supposed to be used at the end.
>>
>> For now, it is only known that these pages to be used by userspace PV backend
>> for grant mappings.
> Yeah
>   
>
>>> So this interface cannot actually be used to allocate memory that is
>>> supposed to be DMA-able by a DMA-capable device, such as an ethernet
>>> device.
>> agree
>>
>>
>>> But I think that should be fine because the memory is meant to be used
>>> by a userspace PV backend for grant mappings. If any of those mappings
>>> end up being used for actual DMA in the kernel they should go through the
>>> drivers/xen/swiotlb-xen.c and xen_phys_to_dma should be called, which
>>> ends up calling page_to_dma as appropriate.
>>>
>>> It would be good to double-check that the above is correct and, if so,
>>> maybe add a short in-code comment about it:
>>>
>>> /*
>>>    * These are not actually DMA addresses but regular physical addresses.
>>>    * If these pages end up being used in a DMA operation then the
>>>    * swiotlb-xen functions are called and xen_phys_to_dma takes care of
>>>    * the address translations:
>>>    *
>>>    * - from gfn to bfn in case of foreign mappings
>>>    * - from physical to DMA addresses in case the two are different for a
>>>    *   given DMA-mastering device
>>>    */
>> I agree this needs to be re-checked. But, there is one moment here, if
>> userspace PV backend runs in other than Dom0 domain (non 1:1 mapped domain),
>> the xen-swiotlb seems not to be in use then? How to be in this case?
>   
> In that case, an IOMMU is required. If an IOMMU is setup correct, then
> the gfn->bfn translation is not necessary because it is done
> automatically by the IOMMU. That is because when the foreign page is
> mapped in the domain, the mapping also applies to the IOMMU pagetable.
>
> So the device is going to do DMA to "gfn" and the IOMMU will translate
> it to the right "mfn", the one corresponding to "bfn".
>
> The physical to DMA address should be done automatically by the default
> (non-swiotlb_xen) dma_ops in Linux. E.g.
> kernel/dma/direct.c:dma_direct_map_sg correctly calls
> dma_direct_map_page, which calls phys_to_dma.


Thank you for the explanation.


>   
>   
>   
>>>> +	return ret;
>>>> +#else
>>>>    	unsigned long pfn, start_pfn;
>>>>    	size_t size;
>>>>    	int i, ret;
>>>> @@ -910,6 +929,7 @@ int gnttab_dma_alloc_pages(struct
>>>> gnttab_dma_alloc_args *args)
>>>>    fail:
>>>>    	gnttab_dma_free_pages(args);
>>>>    	return ret;
>>>> +#endif
>>>>    }
>>>>    EXPORT_SYMBOL_GPL(gnttab_dma_alloc_pages);
>>>>    @@ -919,6 +939,12 @@ EXPORT_SYMBOL_GPL(gnttab_dma_alloc_pages);
>>>>     */
>>>>    int gnttab_dma_free_pages(struct gnttab_dma_alloc_args *args)
>>>>    {
>>>> +#ifdef CONFIG_XEN_UNPOPULATED_ALLOC
>>>> +	gnttab_pages_clear_private(args->nr_pages, args->pages);
>>>> +	xen_free_unpopulated_dma_pages(args->dev, args->nr_pages,
>>>> args->pages);
>>>> +
>>>> +	return 0;
>>>> +#else
>>>>    	size_t size;
>>>>    	int i, ret;
>>>>    @@ -946,6 +972,7 @@ int gnttab_dma_free_pages(struct
>>>> gnttab_dma_alloc_args *args)
>>>>    		dma_free_wc(args->dev, size,
>>>>    			    args->vaddr, args->dev_bus_addr);
>>>>    	return ret;
>>>> +#endif
>>>>    }
>>>>    EXPORT_SYMBOL_GPL(gnttab_dma_free_pages);
>>>>    #endif
>>>> -- 
>>>> 2.7.4
Stefano Stabellini June 15, 2022, 12:51 a.m. UTC | #5
On Tue, 14 Jun 2022, Oleksandr wrote:
> On 11.06.22 02:55, Stefano Stabellini wrote:
> 
> Hello Stefano
> 
> > On Thu, 9 Jun 2022, Oleksandr wrote:
> > > On 04.06.22 00:19, Stefano Stabellini wrote:
> > > Hello Stefano
> > > 
> > > Thank you for having a look and sorry for the late response.
> > > 
> > > > On Tue, 17 May 2022, Oleksandr Tyshchenko wrote:
> > > > > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > > > 
> > > > > Depends on CONFIG_XEN_UNPOPULATED_ALLOC. If enabled then unpopulated
> > > > > DMAable (contiguous) pages will be allocated for grant mapping into
> > > > > instead of ballooning out real RAM pages.
> > > > > 
> > > > > TODO: Fallback to real RAM pages if xen_alloc_unpopulated_dma_pages()
> > > > > fails.
> > > > > 
> > > > > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > > > ---
> > > > >    drivers/xen/grant-table.c | 27 +++++++++++++++++++++++++++
> > > > >    1 file changed, 27 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
> > > > > index 8ccccac..2bb4392 100644
> > > > > --- a/drivers/xen/grant-table.c
> > > > > +++ b/drivers/xen/grant-table.c
> > > > > @@ -864,6 +864,25 @@ EXPORT_SYMBOL_GPL(gnttab_free_pages);
> > > > >     */
> > > > >    int gnttab_dma_alloc_pages(struct gnttab_dma_alloc_args *args)
> > > > >    {
> > > > > +#ifdef CONFIG_XEN_UNPOPULATED_ALLOC
> > > > > +	int ret;
> > > > This is an alternative implementation of the same function.
> > > Currently, yes.
> > > 
> > > 
> > > >    If we are
> > > > going to use #ifdef, then I would #ifdef the entire function, rather
> > > > than just the body. Otherwise within the function body we can use
> > > > IS_ENABLED.
> > > 
> > > Good point. Note, there is one missing thing in current patch which is
> > > described in TODO.
> > > 
> > > "Fallback to real RAM pages if xen_alloc_unpopulated_dma_pages() fails." 
> > > So I
> > > will likely use IS_ENABLED within the function body.
> > > 
> > > If CONFIG_XEN_UNPOPULATED_ALLOC is enabled then gnttab_dma_alloc_pages()
> > > will
> > > try to call xen_alloc_unpopulated_dma_pages() the first and if fails then
> > > fallback to allocate RAM pages and balloon them out.
> > > 
> > > One moment is not entirely clear to me. If we use fallback in
> > > gnttab_dma_alloc_pages() then we must use fallback in
> > > gnttab_dma_free_pages()
> > > as well, we cannot use xen_free_unpopulated_dma_pages() for real RAM
> > > pages.
> > > The question is how to pass this information to the
> > > gnttab_dma_free_pages()?
> > > The first idea which comes to mind is to add a flag to struct
> > > gnttab_dma_alloc_args...
> >   You can check if the page is within the mhp_range range or part of
> > iomem_resource? If not, you can free it as a normal page.
> > 
> > If we do this, then the fallback is better implemented in
> > unpopulated-alloc.c because that is the one that is aware about
> > page addresses.
> 
> 
> I got your idea and agree this can work technically. Or if we finally decide
> to use the second option (use "dma_pool" for all) in the first patch
> "[RFC PATCH 1/2] xen/unpopulated-alloc: Introduce helpers for DMA allocations"
> then we will likely be able to check whether a page in question
> is within a "dma_pool" using gen_pool_has_addr().
> 
> I am still wondering, can we avoid the fallback implementation in
> unpopulated-alloc.c.
> Because for that purpose we will need to pull more code into
> unpopulated-alloc.c (to be more precise, almost everything which
> gnttab_dma_free_pages() already has except gnttab_pages_clear_private()) and
> pass more arguments to xen_free_unpopulated_dma_pages(). Also I might mistake,
> but having a fallback split between grant-table.c (to allocate RAM pages in
> gnttab_dma_alloc_pages()) and unpopulated-alloc.c (to free RAM pages in
> xen_free_unpopulated_dma_pages()) would look a bit weird.
> 
> I see two possible options for the fallback implementation in grant-table.c:
> 1. (less preferable) by introducing new flag in struct gnttab_dma_alloc_args
> 2. (more preferable) by guessing unpopulated (non real RAM) page using
> is_zone_device_page(), etc
> 
> 
> For example, with the second option the resulting code will look quite simple
> (only build tested):
> 
> diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
> index 738029d..3bda71f 100644
> --- a/drivers/xen/grant-table.c
> +++ b/drivers/xen/grant-table.c
> @@ -1047,6 +1047,23 @@ int gnttab_dma_alloc_pages(struct gnttab_dma_alloc_args
> *args)
>         size_t size;
>         int i, ret;
> 
> +       if (IS_ENABLED(CONFIG_XEN_UNPOPULATED_ALLOC)) {
> +               ret = xen_alloc_unpopulated_dma_pages(args->dev,
> args->nr_pages,
> +                               args->pages);
> +               if (ret < 0)
> +                       goto fallback;
> +
> +               ret = gnttab_pages_set_private(args->nr_pages, args->pages);
> +               if (ret < 0)
> +                       goto fail;
> +
> +               args->vaddr = page_to_virt(args->pages[0]);
> +               args->dev_bus_addr = page_to_phys(args->pages[0]);
> +
> +               return ret;
> +       }
> +
> +fallback:
>         size = args->nr_pages << PAGE_SHIFT;
>         if (args->coherent)
>                 args->vaddr = dma_alloc_coherent(args->dev, size,
> @@ -1103,6 +1120,12 @@ int gnttab_dma_free_pages(struct gnttab_dma_alloc_args
> *args)
> 
>         gnttab_pages_clear_private(args->nr_pages, args->pages);
> 
> +       if (IS_ENABLED(CONFIG_XEN_UNPOPULATED_ALLOC) &&
> +                       is_zone_device_page(args->pages[0])) {
> +               xen_free_unpopulated_dma_pages(args->dev, args->nr_pages,
> args->pages);
> +               return 0;
> +       }
> +
>         for (i = 0; i < args->nr_pages; i++)
>                 args->frames[i] = page_to_xen_pfn(args->pages[i]);
> 
> 
> What do you think?
 
I have another idea. Why don't we introduce a function implemented in
drivers/xen/unpopulated-alloc.c called is_xen_unpopulated_page() and
call it from here? is_xen_unpopulated_page can be implemented by using
gen_pool_has_addr.
Oleksandr June 15, 2022, 4:40 p.m. UTC | #6
On 15.06.22 03:51, Stefano Stabellini wrote:

Hello Stefano

> On Tue, 14 Jun 2022, Oleksandr wrote:
>> On 11.06.22 02:55, Stefano Stabellini wrote:
>>
>> Hello Stefano
>>
>>> On Thu, 9 Jun 2022, Oleksandr wrote:
>>>> On 04.06.22 00:19, Stefano Stabellini wrote:
>>>> Hello Stefano
>>>>
>>>> Thank you for having a look and sorry for the late response.
>>>>
>>>>> On Tue, 17 May 2022, Oleksandr Tyshchenko wrote:
>>>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>>>
>>>>>> Depends on CONFIG_XEN_UNPOPULATED_ALLOC. If enabled then unpopulated
>>>>>> DMAable (contiguous) pages will be allocated for grant mapping into
>>>>>> instead of ballooning out real RAM pages.
>>>>>>
>>>>>> TODO: Fallback to real RAM pages if xen_alloc_unpopulated_dma_pages()
>>>>>> fails.
>>>>>>
>>>>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>>> ---
>>>>>>     drivers/xen/grant-table.c | 27 +++++++++++++++++++++++++++
>>>>>>     1 file changed, 27 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
>>>>>> index 8ccccac..2bb4392 100644
>>>>>> --- a/drivers/xen/grant-table.c
>>>>>> +++ b/drivers/xen/grant-table.c
>>>>>> @@ -864,6 +864,25 @@ EXPORT_SYMBOL_GPL(gnttab_free_pages);
>>>>>>      */
>>>>>>     int gnttab_dma_alloc_pages(struct gnttab_dma_alloc_args *args)
>>>>>>     {
>>>>>> +#ifdef CONFIG_XEN_UNPOPULATED_ALLOC
>>>>>> +	int ret;
>>>>> This is an alternative implementation of the same function.
>>>> Currently, yes.
>>>>
>>>>
>>>>>     If we are
>>>>> going to use #ifdef, then I would #ifdef the entire function, rather
>>>>> than just the body. Otherwise within the function body we can use
>>>>> IS_ENABLED.
>>>> Good point. Note, there is one missing thing in current patch which is
>>>> described in TODO.
>>>>
>>>> "Fallback to real RAM pages if xen_alloc_unpopulated_dma_pages() fails."
>>>> So I
>>>> will likely use IS_ENABLED within the function body.
>>>>
>>>> If CONFIG_XEN_UNPOPULATED_ALLOC is enabled then gnttab_dma_alloc_pages()
>>>> will
>>>> try to call xen_alloc_unpopulated_dma_pages() the first and if fails then
>>>> fallback to allocate RAM pages and balloon them out.
>>>>
>>>> One moment is not entirely clear to me. If we use fallback in
>>>> gnttab_dma_alloc_pages() then we must use fallback in
>>>> gnttab_dma_free_pages()
>>>> as well, we cannot use xen_free_unpopulated_dma_pages() for real RAM
>>>> pages.
>>>> The question is how to pass this information to the
>>>> gnttab_dma_free_pages()?
>>>> The first idea which comes to mind is to add a flag to struct
>>>> gnttab_dma_alloc_args...
>>>    You can check if the page is within the mhp_range range or part of
>>> iomem_resource? If not, you can free it as a normal page.
>>>
>>> If we do this, then the fallback is better implemented in
>>> unpopulated-alloc.c because that is the one that is aware about
>>> page addresses.
>>
>> I got your idea and agree this can work technically. Or if we finally decide
>> to use the second option (use "dma_pool" for all) in the first patch
>> "[RFC PATCH 1/2] xen/unpopulated-alloc: Introduce helpers for DMA allocations"
>> then we will likely be able to check whether a page in question
>> is within a "dma_pool" using gen_pool_has_addr().
>>
>> I am still wondering, can we avoid the fallback implementation in
>> unpopulated-alloc.c.
>> Because for that purpose we will need to pull more code into
>> unpopulated-alloc.c (to be more precise, almost everything which
>> gnttab_dma_free_pages() already has except gnttab_pages_clear_private()) and
>> pass more arguments to xen_free_unpopulated_dma_pages(). Also I might mistake,
>> but having a fallback split between grant-table.c (to allocate RAM pages in
>> gnttab_dma_alloc_pages()) and unpopulated-alloc.c (to free RAM pages in
>> xen_free_unpopulated_dma_pages()) would look a bit weird.
>>
>> I see two possible options for the fallback implementation in grant-table.c:
>> 1. (less preferable) by introducing new flag in struct gnttab_dma_alloc_args
>> 2. (more preferable) by guessing unpopulated (non real RAM) page using
>> is_zone_device_page(), etc
>>
>>
>> For example, with the second option the resulting code will look quite simple
>> (only build tested):
>>
>> diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
>> index 738029d..3bda71f 100644
>> --- a/drivers/xen/grant-table.c
>> +++ b/drivers/xen/grant-table.c
>> @@ -1047,6 +1047,23 @@ int gnttab_dma_alloc_pages(struct gnttab_dma_alloc_args
>> *args)
>>          size_t size;
>>          int i, ret;
>>
>> +       if (IS_ENABLED(CONFIG_XEN_UNPOPULATED_ALLOC)) {
>> +               ret = xen_alloc_unpopulated_dma_pages(args->dev,
>> args->nr_pages,
>> +                               args->pages);
>> +               if (ret < 0)
>> +                       goto fallback;
>> +
>> +               ret = gnttab_pages_set_private(args->nr_pages, args->pages);
>> +               if (ret < 0)
>> +                       goto fail;
>> +
>> +               args->vaddr = page_to_virt(args->pages[0]);
>> +               args->dev_bus_addr = page_to_phys(args->pages[0]);
>> +
>> +               return ret;
>> +       }
>> +
>> +fallback:
>>          size = args->nr_pages << PAGE_SHIFT;
>>          if (args->coherent)
>>                  args->vaddr = dma_alloc_coherent(args->dev, size,
>> @@ -1103,6 +1120,12 @@ int gnttab_dma_free_pages(struct gnttab_dma_alloc_args
>> *args)
>>
>>          gnttab_pages_clear_private(args->nr_pages, args->pages);
>>
>> +       if (IS_ENABLED(CONFIG_XEN_UNPOPULATED_ALLOC) &&
>> +                       is_zone_device_page(args->pages[0])) {
>> +               xen_free_unpopulated_dma_pages(args->dev, args->nr_pages,
>> args->pages);
>> +               return 0;
>> +       }
>> +
>>          for (i = 0; i < args->nr_pages; i++)
>>                  args->frames[i] = page_to_xen_pfn(args->pages[i]);
>>
>>
>> What do you think?
>   
> I have another idea. Why don't we introduce a function implemented in
> drivers/xen/unpopulated-alloc.c called is_xen_unpopulated_page() and
> call it from here? is_xen_unpopulated_page can be implemented by using
> gen_pool_has_addr.

I like the idea, will do
diff mbox series

Patch

diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 8ccccac..2bb4392 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -864,6 +864,25 @@  EXPORT_SYMBOL_GPL(gnttab_free_pages);
  */
 int gnttab_dma_alloc_pages(struct gnttab_dma_alloc_args *args)
 {
+#ifdef CONFIG_XEN_UNPOPULATED_ALLOC
+	int ret;
+
+	ret = xen_alloc_unpopulated_dma_pages(args->dev, args->nr_pages,
+			args->pages);
+	if (ret < 0)
+		return ret;
+
+	ret = gnttab_pages_set_private(args->nr_pages, args->pages);
+	if (ret < 0) {
+		gnttab_dma_free_pages(args);
+		return ret;
+	}
+
+	args->vaddr = page_to_virt(args->pages[0]);
+	args->dev_bus_addr = page_to_phys(args->pages[0]);
+
+	return ret;
+#else
 	unsigned long pfn, start_pfn;
 	size_t size;
 	int i, ret;
@@ -910,6 +929,7 @@  int gnttab_dma_alloc_pages(struct gnttab_dma_alloc_args *args)
 fail:
 	gnttab_dma_free_pages(args);
 	return ret;
+#endif
 }
 EXPORT_SYMBOL_GPL(gnttab_dma_alloc_pages);
 
@@ -919,6 +939,12 @@  EXPORT_SYMBOL_GPL(gnttab_dma_alloc_pages);
  */
 int gnttab_dma_free_pages(struct gnttab_dma_alloc_args *args)
 {
+#ifdef CONFIG_XEN_UNPOPULATED_ALLOC
+	gnttab_pages_clear_private(args->nr_pages, args->pages);
+	xen_free_unpopulated_dma_pages(args->dev, args->nr_pages, args->pages);
+
+	return 0;
+#else
 	size_t size;
 	int i, ret;
 
@@ -946,6 +972,7 @@  int gnttab_dma_free_pages(struct gnttab_dma_alloc_args *args)
 		dma_free_wc(args->dev, size,
 			    args->vaddr, args->dev_bus_addr);
 	return ret;
+#endif
 }
 EXPORT_SYMBOL_GPL(gnttab_dma_free_pages);
 #endif