mbox series

[v3,0/8] Support DEVICE_GENERIC memory in migrate_vma_*

Message ID 20210617151705.15367-1-alex.sierra@amd.com (mailing list archive)
Headers show
Series Support DEVICE_GENERIC memory in migrate_vma_* | expand

Message

Sierra Guiza, Alejandro (Alex) June 17, 2021, 3:16 p.m. UTC
v1:
AMD is building a system architecture for the Frontier supercomputer with a
coherent interconnect between CPUs and GPUs. This hardware architecture allows
the CPUs to coherently access GPU device memory. We have hardware in our labs
and we are working with our partner HPE on the BIOS, firmware and software
for delivery to the DOE.

The system BIOS advertises the GPU device memory (aka VRAM) as SPM
(special purpose memory) in the UEFI system address map. The amdgpu driver looks
it up with lookup_resource and registers it with devmap as MEMORY_DEVICE_GENERIC
using devm_memremap_pages.

Now we're trying to migrate data to and from that memory using the migrate_vma_*
helpers so we can support page-based migration in our unified memory allocations,
while also supporting CPU access to those pages.

This patch series makes a few changes to make MEMORY_DEVICE_GENERIC pages behave
correctly in the migrate_vma_* helpers. We are looking for feedback about this
approach. If we're close, what's needed to make our patches acceptable upstream?
If we're not close, any suggestions how else to achieve what we are trying to do
(i.e. page migration and coherent CPU access to VRAM)?

This work is based on HMM and our SVM memory manager that was recently upstreamed
to Dave Airlie's drm-next branch
https://lore.kernel.org/dri-devel/20210527205606.2660-6-Felix.Kuehling@amd.com/T/#r996356015e295780eb50453e7dbd5d0d68b47cbc
On top of that we did some rework of our VRAM management for migrations to remove
some incorrect assumptions, allow partially successful migrations and GPU memory
mappings that mix pages in VRAM and system memory.
https://patchwork.kernel.org/project/dri-devel/list/?series=489811

v2:
This patch series version has merged "[RFC PATCH v3 0/2]
mm: remove extra ZONE_DEVICE struct page refcount" patch series made by
Ralph Campbell. It also applies at the top of these series, our changes
to support device generic type in migration_vma helpers.
This has been tested in systems with device memory that has coherent
access by CPU.

Also addresses the following feedback made in v1:
- Isolate in one patch kernel/resource.c modification, based
on Christoph's feedback.
- Add helpers check for generic and private type to avoid
duplicated long lines.

v3:
- Include cover letter from v1
- Rename dax_layout_is_idle_page func to dax_page_unused in patch
ext4/xfs: add page refcount helper

Patches 1-2 Rebased Ralph Campbell's ZONE_DEVICE page refcounting patches
Patches 4-5 are for context to show how we are looking up the SPM 
memory and registering it with devmap.
Patches 3,6-8 are the changes we are trying to upstream or rework to 
make them acceptable upstream.

Alex Sierra (6):
  kernel: resource: lookup_resource as exported symbol
  drm/amdkfd: add SPM support for SVM
  drm/amdkfd: generic type as sys mem on migration to ram
  include/linux/mm.h: helpers to check zone device generic type
  mm: add generic type support to migrate_vma helpers
  mm: call pgmap->ops->page_free for DEVICE_GENERIC pages

Ralph Campbell (2):
  ext4/xfs: add page refcount helper
  mm: remove extra ZONE_DEVICE struct page refcount

 arch/powerpc/kvm/book3s_hv_uvmem.c       |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 15 ++++--
 drivers/gpu/drm/nouveau/nouveau_dmem.c   |  2 +-
 fs/dax.c                                 |  8 +--
 fs/ext4/inode.c                          |  5 +-
 fs/xfs/xfs_file.c                        |  4 +-
 include/linux/dax.h                      | 10 ++++
 include/linux/memremap.h                 |  7 +--
 include/linux/mm.h                       | 52 +++---------------
 kernel/resource.c                        |  2 +-
 lib/test_hmm.c                           |  2 +-
 mm/internal.h                            |  8 +++
 mm/memremap.c                            | 69 +++++++-----------------
 mm/migrate.c                             | 13 ++---
 mm/page_alloc.c                          |  3 ++
 mm/swap.c                                | 45 ++--------------
 16 files changed, 83 insertions(+), 164 deletions(-)

Comments

Sierra Guiza, Alejandro (Alex) June 17, 2021, 3:56 p.m. UTC | #1
On 6/17/2021 10:16 AM, Alex Sierra wrote:
> v1:
> AMD is building a system architecture for the Frontier supercomputer with a
> coherent interconnect between CPUs and GPUs. This hardware architecture allows
> the CPUs to coherently access GPU device memory. We have hardware in our labs
> and we are working with our partner HPE on the BIOS, firmware and software
> for delivery to the DOE.
>
> The system BIOS advertises the GPU device memory (aka VRAM) as SPM
> (special purpose memory) in the UEFI system address map. The amdgpu driver looks
> it up with lookup_resource and registers it with devmap as MEMORY_DEVICE_GENERIC
> using devm_memremap_pages.
>
> Now we're trying to migrate data to and from that memory using the migrate_vma_*
> helpers so we can support page-based migration in our unified memory allocations,
> while also supporting CPU access to those pages.
>
> This patch series makes a few changes to make MEMORY_DEVICE_GENERIC pages behave
> correctly in the migrate_vma_* helpers. We are looking for feedback about this
> approach. If we're close, what's needed to make our patches acceptable upstream?
> If we're not close, any suggestions how else to achieve what we are trying to do
> (i.e. page migration and coherent CPU access to VRAM)?
>
> This work is based on HMM and our SVM memory manager that was recently upstreamed
> to Dave Airlie's drm-next branch
> https://lore.kernel.org/dri-devel/20210527205606.2660-6-Felix.Kuehling@amd.com/T/#r996356015e295780eb50453e7dbd5d0d68b47cbc
Corrected link:

https://cgit.freedesktop.org/drm/drm/log/?h=drm-next

Regards,
Alex Sierra

> On top of that we did some rework of our VRAM management for migrations to remove
> some incorrect assumptions, allow partially successful migrations and GPU memory
> mappings that mix pages in VRAM and system memory.
> https://patchwork.kernel.org/project/dri-devel/list/?series=489811

Corrected link:

https://lore.kernel.org/dri-devel/20210527205606.2660-6-Felix.Kuehling@amd.com/T/#r996356015e295780eb50453e7dbd5d0d68b47cbc

Regards,
Alex Sierra

>
> v2:
> This patch series version has merged "[RFC PATCH v3 0/2]
> mm: remove extra ZONE_DEVICE struct page refcount" patch series made by
> Ralph Campbell. It also applies at the top of these series, our changes
> to support device generic type in migration_vma helpers.
> This has been tested in systems with device memory that has coherent
> access by CPU.
>
> Also addresses the following feedback made in v1:
> - Isolate in one patch kernel/resource.c modification, based
> on Christoph's feedback.
> - Add helpers check for generic and private type to avoid
> duplicated long lines.
>
> v3:
> - Include cover letter from v1
> - Rename dax_layout_is_idle_page func to dax_page_unused in patch
> ext4/xfs: add page refcount helper
>
> Patches 1-2 Rebased Ralph Campbell's ZONE_DEVICE page refcounting patches
> Patches 4-5 are for context to show how we are looking up the SPM
> memory and registering it with devmap.
> Patches 3,6-8 are the changes we are trying to upstream or rework to
> make them acceptable upstream.
>
> Alex Sierra (6):
>    kernel: resource: lookup_resource as exported symbol
>    drm/amdkfd: add SPM support for SVM
>    drm/amdkfd: generic type as sys mem on migration to ram
>    include/linux/mm.h: helpers to check zone device generic type
>    mm: add generic type support to migrate_vma helpers
>    mm: call pgmap->ops->page_free for DEVICE_GENERIC pages
>
> Ralph Campbell (2):
>    ext4/xfs: add page refcount helper
>    mm: remove extra ZONE_DEVICE struct page refcount
>
>   arch/powerpc/kvm/book3s_hv_uvmem.c       |  2 +-
>   drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 15 ++++--
>   drivers/gpu/drm/nouveau/nouveau_dmem.c   |  2 +-
>   fs/dax.c                                 |  8 +--
>   fs/ext4/inode.c                          |  5 +-
>   fs/xfs/xfs_file.c                        |  4 +-
>   include/linux/dax.h                      | 10 ++++
>   include/linux/memremap.h                 |  7 +--
>   include/linux/mm.h                       | 52 +++---------------
>   kernel/resource.c                        |  2 +-
>   lib/test_hmm.c                           |  2 +-
>   mm/internal.h                            |  8 +++
>   mm/memremap.c                            | 69 +++++++-----------------
>   mm/migrate.c                             | 13 ++---
>   mm/page_alloc.c                          |  3 ++
>   mm/swap.c                                | 45 ++--------------
>   16 files changed, 83 insertions(+), 164 deletions(-)
>
Theodore Ts'o June 20, 2021, 2:14 p.m. UTC | #2
On Thu, Jun 17, 2021 at 10:16:57AM -0500, Alex Sierra wrote:
> v1:
> AMD is building a system architecture for the Frontier supercomputer with a
> coherent interconnect between CPUs and GPUs. This hardware architecture allows
> the CPUs to coherently access GPU device memory. We have hardware in our labs
> and we are working with our partner HPE on the BIOS, firmware and software
> for delivery to the DOE.
> 
> The system BIOS advertises the GPU device memory (aka VRAM) as SPM
> (special purpose memory) in the UEFI system address map. The amdgpu driver looks
> it up with lookup_resource and registers it with devmap as MEMORY_DEVICE_GENERIC
> using devm_memremap_pages.
> 
> Now we're trying to migrate data to and from that memory using the migrate_vma_*
> helpers so we can support page-based migration in our unified memory allocations,
> while also supporting CPU access to those pages.
> 
> This patch series makes a few changes to make MEMORY_DEVICE_GENERIC pages behave
> correctly in the migrate_vma_* helpers. We are looking for feedback about this
> approach. If we're close, what's needed to make our patches acceptable upstream?
> If we're not close, any suggestions how else to achieve what we are trying to do
> (i.e. page migration and coherent CPU access to VRAM)?

Is there a way we can test the codepaths touched by this patchset?  It
doesn't have to be via a complete qemu simulation of the GPU device
memory, but some way of creating MEMORY_DEVICE_GENERIC subject to
migrate_vma_* helpers so we can test for regressions moving forward.

Thanks,

					- Ted
Felix Kuehling June 23, 2021, 9:49 p.m. UTC | #3
On 2021-06-20 10:14 a.m., Theodore Ts'o wrote:
> On Thu, Jun 17, 2021 at 10:16:57AM -0500, Alex Sierra wrote:
>> v1:
>> AMD is building a system architecture for the Frontier supercomputer with a
>> coherent interconnect between CPUs and GPUs. This hardware architecture allows
>> the CPUs to coherently access GPU device memory. We have hardware in our labs
>> and we are working with our partner HPE on the BIOS, firmware and software
>> for delivery to the DOE.
>>
>> The system BIOS advertises the GPU device memory (aka VRAM) as SPM
>> (special purpose memory) in the UEFI system address map. The amdgpu driver looks
>> it up with lookup_resource and registers it with devmap as MEMORY_DEVICE_GENERIC
>> using devm_memremap_pages.
>>
>> Now we're trying to migrate data to and from that memory using the migrate_vma_*
>> helpers so we can support page-based migration in our unified memory allocations,
>> while also supporting CPU access to those pages.
>>
>> This patch series makes a few changes to make MEMORY_DEVICE_GENERIC pages behave
>> correctly in the migrate_vma_* helpers. We are looking for feedback about this
>> approach. If we're close, what's needed to make our patches acceptable upstream?
>> If we're not close, any suggestions how else to achieve what we are trying to do
>> (i.e. page migration and coherent CPU access to VRAM)?
> Is there a way we can test the codepaths touched by this patchset?  It
> doesn't have to be via a complete qemu simulation of the GPU device
> memory, but some way of creating MEMORY_DEVICE_GENERIC subject to
> migrate_vma_* helpers so we can test for regressions moving forward.

Hi Theodore,

I can think of two ways to test the changes for MEMORY_DEVICE_GENERIC in 
this patch series in a way that is reproducible without special hardware 
and firmware:

For the reference counting changes we could use the dax driver with hmem 
and use efi_fake_mem on the kernel command line to create some 
DEVICE_GENERIC pages. I'm open to suggestions for good user mode tests 
to exercise dax functionality on this type of memory.

For the migration helper changes we could modify or parametrize 
lib/hmm_test.c to create DEVICE_GENERIC pages instead of DEVICE_PRIVATE. 
Then run tools/testing/selftests/vm/hmm-tests.c.

Regards,
   Felix


>
> Thanks,
>
> 					- Ted
Christoph Hellwig June 24, 2021, 5:30 a.m. UTC | #4
On Wed, Jun 23, 2021 at 05:49:55PM -0400, Felix Kuehling wrote:
> For the reference counting changes we could use the dax driver with hmem 
> and use efi_fake_mem on the kernel command line to create some 
> DEVICE_GENERIC pages. I'm open to suggestions for good user mode tests to 
> exercise dax functionality on this type of memory.
>
> For the migration helper changes we could modify or parametrize 
> lib/hmm_test.c to create DEVICE_GENERIC pages instead of DEVICE_PRIVATE. 
> Then run tools/testing/selftests/vm/hmm-tests.c.

We'll also need a real in-tree user of the enhanced DEVICE_GENERIC memory.
So while the refcounting cleanups early in the series are something I'd
really like to see upstream as soon as everything is sorted out, the
actual bits that can't only be used by your updated driver should wait
for that.
Felix Kuehling June 24, 2021, 3:08 p.m. UTC | #5
Am 2021-06-24 um 1:30 a.m. schrieb Christoph Hellwig:
> On Wed, Jun 23, 2021 at 05:49:55PM -0400, Felix Kuehling wrote:
>> For the reference counting changes we could use the dax driver with hmem 
>> and use efi_fake_mem on the kernel command line to create some 
>> DEVICE_GENERIC pages. I'm open to suggestions for good user mode tests to 
>> exercise dax functionality on this type of memory.
>>
>> For the migration helper changes we could modify or parametrize 
>> lib/hmm_test.c to create DEVICE_GENERIC pages instead of DEVICE_PRIVATE. 
>> Then run tools/testing/selftests/vm/hmm-tests.c.
> We'll also need a real in-tree user of the enhanced DEVICE_GENERIC memory.
> So while the refcounting cleanups early in the series are something I'd
> really like to see upstream as soon as everything is sorted out, the
> actual bits that can't only be used by your updated driver should wait
> for that.

The driver changes are pretty much ready to go.

But we have a bit of a chicken-egg problem because those changes likely
go through different trees. The GPU driver changes will go through
drm-next, but we can't merge them there until our dependencies have been
merged there from upstream. Unless we protect everything with some #ifdef.

Regards,
  Felix
Theodore Ts'o July 16, 2021, 3:07 p.m. UTC | #6
On Wed, Jun 23, 2021 at 05:49:55PM -0400, Felix Kuehling wrote:
> 
> I can think of two ways to test the changes for MEMORY_DEVICE_GENERIC in
> this patch series in a way that is reproducible without special hardware and
> firmware:
> 
> For the reference counting changes we could use the dax driver with hmem and
> use efi_fake_mem on the kernel command line to create some DEVICE_GENERIC
> pages. I'm open to suggestions for good user mode tests to exercise dax
> functionality on this type of memory.

Sorry for the thread necromancy, but now that the merge window is
past....

Today I test ext4's dax support, without having any $$$ DAX hardware,
by using the kernel command line "memmap=4G!9G:memmap=9G!14G" which
reserves memory so that creates two pmem device and then I run
xfstests with DAX enabled using qemu or using a Google Compute Engine
VM, using TEST_DEV=/dev/pmem0 and SCRATCH_DEV=/dev/pmem1.

If you can give me a recipe for what kernel configs I should enable,
and what magic kernel command line arguments to use, then I'd be able
to test your patch set with ext4.

Cheers,

						- Ted
Felix Kuehling July 16, 2021, 10:14 p.m. UTC | #7
Am 2021-07-16 um 11:07 a.m. schrieb Theodore Y. Ts'o:
> On Wed, Jun 23, 2021 at 05:49:55PM -0400, Felix Kuehling wrote:
>> I can think of two ways to test the changes for MEMORY_DEVICE_GENERIC in
>> this patch series in a way that is reproducible without special hardware and
>> firmware:
>>
>> For the reference counting changes we could use the dax driver with hmem and
>> use efi_fake_mem on the kernel command line to create some DEVICE_GENERIC
>> pages. I'm open to suggestions for good user mode tests to exercise dax
>> functionality on this type of memory.
> Sorry for the thread necromancy, but now that the merge window is
> past....

No worries. Alejandro should have a new version of this series in a few
days, with updates to hmm_test and some fixes.


>
> Today I test ext4's dax support, without having any $$$ DAX hardware,
> by using the kernel command line "memmap=4G!9G:memmap=9G!14G" which
> reserves memory so that creates two pmem device and then I run
> xfstests with DAX enabled using qemu or using a Google Compute Engine
> VM, using TEST_DEV=/dev/pmem0 and SCRATCH_DEV=/dev/pmem1.
>
> If you can give me a recipe for what kernel configs I should enable,
> and what magic kernel command line arguments to use, then I'd be able
> to test your patch set with ext4.
That would be great!

Regarding kernel config options, it should be the same as what you're
using for DAX testing today. We're not changing or adding any Kconfig
options. But let me take a stab:

ZONE_DEVICE
HMM_MIRROR
MMU_NOTIFIER
DEVICE_PRIVATE (maybe not needed for your test)
FS_DAX

I'm not sure what you're looking for in terms of kernel command line,
other than the memmap options you already found. There are some more
options to run hmm_test with fake SPM (DEVICE_GENERIC) memory, but we're
already running that ourselves. That will also be in the next revision
of this patch series.

If you can run your xfstests with DAX on top of this patch series, that
would be very helpful. That's to make sure the ZONE_DEVICE page refcount
changes don't break DAX.

Regards,
  Felix


>
> Cheers,
>
> 						- Ted
Sierra Guiza, Alejandro (Alex) July 17, 2021, 7:54 p.m. UTC | #8
On 7/16/2021 5:14 PM, Felix Kuehling wrote:
> Am 2021-07-16 um 11:07 a.m. schrieb Theodore Y. Ts'o:
>> On Wed, Jun 23, 2021 at 05:49:55PM -0400, Felix Kuehling wrote:
>>> I can think of two ways to test the changes for MEMORY_DEVICE_GENERIC in
>>> this patch series in a way that is reproducible without special hardware and
>>> firmware:
>>>
>>> For the reference counting changes we could use the dax driver with hmem and
>>> use efi_fake_mem on the kernel command line to create some DEVICE_GENERIC
>>> pages. I'm open to suggestions for good user mode tests to exercise dax
>>> functionality on this type of memory.
>> Sorry for the thread necromancy, but now that the merge window is
>> past....
> No worries. Alejandro should have a new version of this series in a few
> days, with updates to hmm_test and some fixes.

V4 patch series have been sent for review.
https://marc.info/?l=dri-devel&m=162654971618911&w=2

Regards,
Alex Sierra

>
>
>> Today I test ext4's dax support, without having any $$$ DAX hardware,
>> by using the kernel command line "memmap=4G!9G:memmap=9G!14G" which
>> reserves memory so that creates two pmem device and then I run
>> xfstests with DAX enabled using qemu or using a Google Compute Engine
>> VM, using TEST_DEV=/dev/pmem0 and SCRATCH_DEV=/dev/pmem1.
>>
>> If you can give me a recipe for what kernel configs I should enable,
>> and what magic kernel command line arguments to use, then I'd be able
>> to test your patch set with ext4.
> That would be great!
>
> Regarding kernel config options, it should be the same as what you're
> using for DAX testing today. We're not changing or adding any Kconfig
> options. But let me take a stab:
>
> ZONE_DEVICE
> HMM_MIRROR
> MMU_NOTIFIER
> DEVICE_PRIVATE (maybe not needed for your test)
> FS_DAX
>
> I'm not sure what you're looking for in terms of kernel command line,
> other than the memmap options you already found. There are some more
> options to run hmm_test with fake SPM (DEVICE_GENERIC) memory, but we're
> already running that ourselves. That will also be in the next revision
> of this patch series.

In order to run hmm test with generic device type enabled, set the 
following:

kernel config:
EFI_FAKE_MEMMAP
RUNTIME_TESTING_MENU
TEST_HMM=m

Kernel parameters to fake SP memory. The addresses set here are based on 
my system's usable memory enumerated by BIOS-e820 at boot. The test 
requires two SP devices of at least 256MB.
efi_fake_mem=1G@0x100000000:0x40000,1G@0x140000000:0x40000

To run the hmm_test in generic device type pass the SP addresses to the 
sh script.
sudo ./test_hmm.sh smoke 0x100000000 0x140000000

>
> If you can run your xfstests with DAX on top of this patch series, that
> would be very helpful. That's to make sure the ZONE_DEVICE page refcount
> changes don't break DAX.
>
> Regards,
>    Felix
>
>
>> Cheers,
>>
>> 						- Ted
Sierra Guiza, Alejandro (Alex) July 23, 2021, 10:46 p.m. UTC | #9
On 7/17/2021 2:54 PM, Sierra Guiza, Alejandro (Alex) wrote:
>
> On 7/16/2021 5:14 PM, Felix Kuehling wrote:
>> Am 2021-07-16 um 11:07 a.m. schrieb Theodore Y. Ts'o:
>>> On Wed, Jun 23, 2021 at 05:49:55PM -0400, Felix Kuehling wrote:
>>>> I can think of two ways to test the changes for 
>>>> MEMORY_DEVICE_GENERIC in
>>>> this patch series in a way that is reproducible without special 
>>>> hardware and
>>>> firmware:
>>>>
>>>> For the reference counting changes we could use the dax driver with 
>>>> hmem and
>>>> use efi_fake_mem on the kernel command line to create some 
>>>> DEVICE_GENERIC
>>>> pages. I'm open to suggestions for good user mode tests to exercise 
>>>> dax
>>>> functionality on this type of memory.
>>> Sorry for the thread necromancy, but now that the merge window is
>>> past....
>> No worries. Alejandro should have a new version of this series in a few
>> days, with updates to hmm_test and some fixes.
>
> V4 patch series have been sent for review.
> https://marc.info/?l=dri-devel&m=162654971618911&w=2
>
> Regards,
> Alex Sierra
>
>>
>>
>>> Today I test ext4's dax support, without having any $$$ DAX hardware,
>>> by using the kernel command line "memmap=4G!9G:memmap=9G!14G" which
>>> reserves memory so that creates two pmem device and then I run
>>> xfstests with DAX enabled using qemu or using a Google Compute Engine
>>> VM, using TEST_DEV=/dev/pmem0 and SCRATCH_DEV=/dev/pmem1.
>>>
>>> If you can give me a recipe for what kernel configs I should enable,
>>> and what magic kernel command line arguments to use, then I'd be able
>>> to test your patch set with ext4.
>> That would be great!
>>
>> Regarding kernel config options, it should be the same as what you're
>> using for DAX testing today. We're not changing or adding any Kconfig
>> options. But let me take a stab:
>>
>> ZONE_DEVICE
>> HMM_MIRROR
>> MMU_NOTIFIER
>> DEVICE_PRIVATE (maybe not needed for your test)
>> FS_DAX
Hi Theodore,
I wonder if you had chance to set the kernel configs from Felix to 
enable DAX in xfstests.

I've been trying to test FS DAX on my side using virtio-fs + QEMU. 
Unfortunately, Im having some problems setting up the environment with 
the guest kernel. Could you share your VM (QEMU or GCE) setup to run it 
with xfstests?

Regards,
Alex S.

>>
>> I'm not sure what you're looking for in terms of kernel command line,
>> other than the memmap options you already found. There are some more
>> options to run hmm_test with fake SPM (DEVICE_GENERIC) memory, but we're
>> already running that ourselves. That will also be in the next revision
>> of this patch series.
>
> In order to run hmm test with generic device type enabled, set the 
> following:
>
> kernel config:
> EFI_FAKE_MEMMAP
> RUNTIME_TESTING_MENU
> TEST_HMM=m
>
> Kernel parameters to fake SP memory. The addresses set here are based 
> on my system's usable memory enumerated by BIOS-e820 at boot. The test 
> requires two SP devices of at least 256MB.
> efi_fake_mem=1G@0x100000000:0x40000,1G@0x140000000:0x40000
>
> To run the hmm_test in generic device type pass the SP addresses to 
> the sh script.
> sudo ./test_hmm.sh smoke 0x100000000 0x140000000
>
>>
>> If you can run your xfstests with DAX on top of this patch series, that
>> would be very helpful. That's to make sure the ZONE_DEVICE page refcount
>> changes don't break DAX.
>>
>> Regards,
>>    Felix
>>
>>
>>> Cheers,
>>>
>>>                         - Ted
Felix Kuehling July 30, 2021, 7:02 p.m. UTC | #10
Am 2021-07-23 um 6:46 p.m. schrieb Sierra Guiza, Alejandro (Alex):
>
> On 7/17/2021 2:54 PM, Sierra Guiza, Alejandro (Alex) wrote:
>>
>> On 7/16/2021 5:14 PM, Felix Kuehling wrote:
>>> Am 2021-07-16 um 11:07 a.m. schrieb Theodore Y. Ts'o:
>>>> On Wed, Jun 23, 2021 at 05:49:55PM -0400, Felix Kuehling wrote:
>>>>> I can think of two ways to test the changes for
>>>>> MEMORY_DEVICE_GENERIC in
>>>>> this patch series in a way that is reproducible without special
>>>>> hardware and
>>>>> firmware:
>>>>>
>>>>> For the reference counting changes we could use the dax driver
>>>>> with hmem and
>>>>> use efi_fake_mem on the kernel command line to create some
>>>>> DEVICE_GENERIC
>>>>> pages. I'm open to suggestions for good user mode tests to
>>>>> exercise dax
>>>>> functionality on this type of memory.
>>>> Sorry for the thread necromancy, but now that the merge window is
>>>> past....
>>> No worries. Alejandro should have a new version of this series in a few
>>> days, with updates to hmm_test and some fixes.
>>
>> V4 patch series have been sent for review.
>> https://marc.info/?l=dri-devel&m=162654971618911&w=2
>>
>> Regards,
>> Alex Sierra
>>
>>>
>>>
>>>> Today I test ext4's dax support, without having any $$$ DAX hardware,
>>>> by using the kernel command line "memmap=4G!9G:memmap=9G!14G" which
>>>> reserves memory so that creates two pmem device and then I run
>>>> xfstests with DAX enabled using qemu or using a Google Compute Engine
>>>> VM, using TEST_DEV=/dev/pmem0 and SCRATCH_DEV=/dev/pmem1.
>>>>
>>>> If you can give me a recipe for what kernel configs I should enable,
>>>> and what magic kernel command line arguments to use, then I'd be able
>>>> to test your patch set with ext4.
>>> That would be great!
>>>
>>> Regarding kernel config options, it should be the same as what you're
>>> using for DAX testing today. We're not changing or adding any Kconfig
>>> options. But let me take a stab:
>>>
>>> ZONE_DEVICE
>>> HMM_MIRROR
>>> MMU_NOTIFIER
>>> DEVICE_PRIVATE (maybe not needed for your test)
>>> FS_DAX
> Hi Theodore,
> I wonder if you had chance to set the kernel configs from Felix to
> enable DAX in xfstests.
>
> I've been trying to test FS DAX on my side using virtio-fs + QEMU.
> Unfortunately, Im having some problems setting up the environment with
> the guest kernel. Could you share your VM (QEMU or GCE) setup to run
> it with xfstests?
>
> Regards,
> Alex S.

Hi Theodore,

Sorry to keep bugging you. I'm wondering if you've had a chance to test
FS DAX with Alex's last patch series? ([PATCH v4 00/13] Support
DEVICE_GENERIC memory in migrate_vma_*) I think other than minor
cosmetic fixes, this should be ready to merge, if it passes your tests.

Thanks,
  Felix


>
>>>
>>> I'm not sure what you're looking for in terms of kernel command line,
>>> other than the memmap options you already found. There are some more
>>> options to run hmm_test with fake SPM (DEVICE_GENERIC) memory, but
>>> we're
>>> already running that ourselves. That will also be in the next revision
>>> of this patch series.
>>
>> In order to run hmm test with generic device type enabled, set the
>> following:
>>
>> kernel config:
>> EFI_FAKE_MEMMAP
>> RUNTIME_TESTING_MENU
>> TEST_HMM=m
>>
>> Kernel parameters to fake SP memory. The addresses set here are based
>> on my system's usable memory enumerated by BIOS-e820 at boot. The
>> test requires two SP devices of at least 256MB.
>> efi_fake_mem=1G@0x100000000:0x40000,1G@0x140000000:0x40000
>>
>> To run the hmm_test in generic device type pass the SP addresses to
>> the sh script.
>> sudo ./test_hmm.sh smoke 0x100000000 0x140000000
>>
>>>
>>> If you can run your xfstests with DAX on top of this patch series, that
>>> would be very helpful. That's to make sure the ZONE_DEVICE page
>>> refcount
>>> changes don't break DAX.
>>>
>>> Regards,
>>>    Felix
>>>
>>>
>>>> Cheers,
>>>>
>>>>                         - Ted