Message ID | 20180619171638.GE1438@bombadil.infradead.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Jun 19, 2018 at 10:16:38AM -0700, Matthew Wilcox wrote: > I think I see a bug. No idea if it's the one you're hitting ;-) > > I had been intending to not use the 'entry' to decide whether we were > waiting on a 2MB or 4kB page, but rather the xas. I shelved that idea, > but not before dropping the DAX_PMD flag being passed from the PMD > pagefault caller. So if I put that back ... Did you get a chance to test this? > diff --git a/fs/dax.c b/fs/dax.c > index 9919b6b545fb..75cc160d2f0b 100644 > --- a/fs/dax.c > +++ b/fs/dax.c > @@ -367,13 +367,13 @@ static struct page *dax_busy_page(void *entry) > * a VM_FAULT code, encoded as an xarray internal entry. The ERR_PTR values > * overlap with xarray value entries. > */ > -static > -void *grab_mapping_entry(struct xa_state *xas, struct address_space *mapping) > +static void *grab_mapping_entry(struct xa_state *xas, > + struct address_space *mapping, unsigned long size) > { > bool pmd_downgrade = false; /* splitting 2MiB entry into 4k entries? */ > void *locked = dax_make_entry(pfn_to_pfn_t(0), > - DAX_EMPTY | DAX_LOCKED); > - void *unlocked = dax_make_entry(pfn_to_pfn_t(0), DAX_EMPTY); > + size | DAX_EMPTY | DAX_LOCKED); > + void *unlocked = dax_make_entry(pfn_to_pfn_t(0), size | DAX_EMPTY); > void *entry; > > retry: > @@ -1163,7 +1163,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp, > if (write && !vmf->cow_page) > flags |= IOMAP_WRITE; > > - entry = grab_mapping_entry(&xas, mapping); > + entry = grab_mapping_entry(&xas, mapping, 0); > if (xa_is_internal(entry)) { > ret = xa_to_internal(entry); > goto out; > @@ -1396,7 +1396,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp, > * page is already in the tree, for instance), it will return > * VM_FAULT_FALLBACK. > */ > - entry = grab_mapping_entry(&xas, mapping); > + entry = grab_mapping_entry(&xas, mapping, DAX_PMD); > if (xa_is_internal(entry)) { > result = xa_to_internal(entry); > goto fallback; >
On Wed, Jun 27, 2018 at 04:05:29AM -0700, Matthew Wilcox wrote: > On Tue, Jun 19, 2018 at 10:16:38AM -0700, Matthew Wilcox wrote: > > I think I see a bug. No idea if it's the one you're hitting ;-) > > > > I had been intending to not use the 'entry' to decide whether we were > > waiting on a 2MB or 4kB page, but rather the xas. I shelved that idea, > > but not before dropping the DAX_PMD flag being passed from the PMD > > pagefault caller. So if I put that back ... > > Did you get a chance to test this? With this patch it doesn't deadlock, but the test dies with a SIGBUS and we hit a WARN_ON in the DAX code: WARNING: CPU: 5 PID: 1678 at fs/dax.c:226 get_unlocked_entry+0xf7/0x120 I don't have a lot of time this week to debug further. The quickest path to victory is probably for you to get this reproducing in your test setup. Does XFS + DAX + generic/340 pass for you?
On Wed, Jun 27, 2018 at 01:44:38PM -0600, Ross Zwisler wrote: > On Wed, Jun 27, 2018 at 04:05:29AM -0700, Matthew Wilcox wrote: > > On Tue, Jun 19, 2018 at 10:16:38AM -0700, Matthew Wilcox wrote: > > > I think I see a bug. No idea if it's the one you're hitting ;-) > > > > > > I had been intending to not use the 'entry' to decide whether we were > > > waiting on a 2MB or 4kB page, but rather the xas. I shelved that idea, > > > but not before dropping the DAX_PMD flag being passed from the PMD > > > pagefault caller. So if I put that back ... > > > > Did you get a chance to test this? > > With this patch it doesn't deadlock, but the test dies with a SIGBUS and we > hit a WARN_ON in the DAX code: > > WARNING: CPU: 5 PID: 1678 at fs/dax.c:226 get_unlocked_entry+0xf7/0x120 > > I don't have a lot of time this week to debug further. The quickest path to > victory is probably for you to get this reproducing in your test setup. Does > XFS + DAX + generic/340 pass for you? I won't be back in front of my test box until Tuesday, but that test does work for me because I couldn't get your instructions to give me a 2MB aligned DAX setup. I had to settle for 4k, so none of the 2MB stuff has been tested properly.
On Thu, Jun 28, 2018 at 01:39:09AM -0700, Matthew Wilcox wrote: > On Wed, Jun 27, 2018 at 01:44:38PM -0600, Ross Zwisler wrote: > > On Wed, Jun 27, 2018 at 04:05:29AM -0700, Matthew Wilcox wrote: > > > On Tue, Jun 19, 2018 at 10:16:38AM -0700, Matthew Wilcox wrote: > > > > I think I see a bug. No idea if it's the one you're hitting ;-) > > > > > > > > I had been intending to not use the 'entry' to decide whether we were > > > > waiting on a 2MB or 4kB page, but rather the xas. I shelved that idea, > > > > but not before dropping the DAX_PMD flag being passed from the PMD > > > > pagefault caller. So if I put that back ... > > > > > > Did you get a chance to test this? > > > > With this patch it doesn't deadlock, but the test dies with a SIGBUS and we > > hit a WARN_ON in the DAX code: > > > > WARNING: CPU: 5 PID: 1678 at fs/dax.c:226 get_unlocked_entry+0xf7/0x120 > > > > I don't have a lot of time this week to debug further. The quickest path to > > victory is probably for you to get this reproducing in your test setup. Does > > XFS + DAX + generic/340 pass for you? > > I won't be back in front of my test box until Tuesday, but that test > does work for me because I couldn't get your instructions to give me a > 2MB aligned DAX setup. I had to settle for 4k, so none of the 2MB stuff > has been tested properly. Ah. I've documented both my qemu setup and my filesystem setup here: https://nvdimm.wiki.kernel.org/pmem_in_qemu https://nvdimm.wiki.kernel.org/2mib_fs_dax This should be enough to get you up and running in QEMU and able to reliably get filesystem DAX PMD faults.
On Wed, Jun 27, 2018 at 01:44:38PM -0600, Ross Zwisler wrote: > On Wed, Jun 27, 2018 at 04:05:29AM -0700, Matthew Wilcox wrote: > > On Tue, Jun 19, 2018 at 10:16:38AM -0700, Matthew Wilcox wrote: > > > I think I see a bug. No idea if it's the one you're hitting ;-) > > > > > > I had been intending to not use the 'entry' to decide whether we were > > > waiting on a 2MB or 4kB page, but rather the xas. I shelved that idea, > > > but not before dropping the DAX_PMD flag being passed from the PMD > > > pagefault caller. So if I put that back ... > > > > Did you get a chance to test this? > > With this patch it doesn't deadlock, but the test dies with a SIGBUS and we > hit a WARN_ON in the DAX code: > > WARNING: CPU: 5 PID: 1678 at fs/dax.c:226 get_unlocked_entry+0xf7/0x120 > > I don't have a lot of time this week to debug further. The quickest path to > victory is probably for you to get this reproducing in your test setup. Does > XFS + DAX + generic/340 pass for you? I now have generic/340 passing. I've pushed a new version to git://git.infradead.org/users/willy/linux-dax.git xarray
On Wed, Jul 25, 2018 at 02:03:23PM -0700, Matthew Wilcox wrote: > On Wed, Jun 27, 2018 at 01:44:38PM -0600, Ross Zwisler wrote: > > On Wed, Jun 27, 2018 at 04:05:29AM -0700, Matthew Wilcox wrote: > > > On Tue, Jun 19, 2018 at 10:16:38AM -0700, Matthew Wilcox wrote: > > > > I think I see a bug. No idea if it's the one you're hitting ;-) > > > > > > > > I had been intending to not use the 'entry' to decide whether we were > > > > waiting on a 2MB or 4kB page, but rather the xas. I shelved that idea, > > > > but not before dropping the DAX_PMD flag being passed from the PMD > > > > pagefault caller. So if I put that back ... > > > > > > Did you get a chance to test this? > > > > With this patch it doesn't deadlock, but the test dies with a SIGBUS and we > > hit a WARN_ON in the DAX code: > > > > WARNING: CPU: 5 PID: 1678 at fs/dax.c:226 get_unlocked_entry+0xf7/0x120 > > > > I don't have a lot of time this week to debug further. The quickest path to > > victory is probably for you to get this reproducing in your test setup. Does > > XFS + DAX + generic/340 pass for you? > > I now have generic/340 passing. I've pushed a new version to > git://git.infradead.org/users/willy/linux-dax.git xarray Thanks, I'll throw it in my test setup.
On Wed, Jul 25, 2018 at 02:03:23PM -0700, Matthew Wilcox wrote: > On Wed, Jun 27, 2018 at 01:44:38PM -0600, Ross Zwisler wrote: > > On Wed, Jun 27, 2018 at 04:05:29AM -0700, Matthew Wilcox wrote: > > > On Tue, Jun 19, 2018 at 10:16:38AM -0700, Matthew Wilcox wrote: > > > > I think I see a bug. No idea if it's the one you're hitting ;-) > > > > > > > > I had been intending to not use the 'entry' to decide whether we were > > > > waiting on a 2MB or 4kB page, but rather the xas. I shelved that idea, > > > > but not before dropping the DAX_PMD flag being passed from the PMD > > > > pagefault caller. So if I put that back ... > > > > > > Did you get a chance to test this? > > > > With this patch it doesn't deadlock, but the test dies with a SIGBUS and we > > hit a WARN_ON in the DAX code: > > > > WARNING: CPU: 5 PID: 1678 at fs/dax.c:226 get_unlocked_entry+0xf7/0x120 > > > > I don't have a lot of time this week to debug further. The quickest path to > > victory is probably for you to get this reproducing in your test setup. Does > > XFS + DAX + generic/340 pass for you? > > I now have generic/340 passing. I've pushed a new version to > git://git.infradead.org/users/willy/linux-dax.git xarray Okay, the next failure I'm hitting is with DAX + XFS + generic/344. It doesn't happen every time, but I can usually recreate it within 10 iterations of the test. Here's the failure: generic/344 21s ...[ 1852.564559] run fstests generic/344 at 2018-07-27 11:19:05 [ 1853.033177] XFS (pmem0p2): Unmounting Filesystem [ 1853.134497] XFS (pmem0p2): DAX enabled. Warning: EXPERIMENTAL, use at your own risk [ 1853.135335] XFS (pmem0p2): Mounting V5 Filesystem [ 1853.138119] XFS (pmem0p2): Ending clean mount [ 1862.251185] WARNING: CPU: 10 PID: 15695 at mm/memory.c:1801 insert_pfn+0x229/0x240 [ 1862.252023] Modules linked in: dax_pmem device_dax nd_pmem nd_btt nfit libnvdimm [ 1862.252853] CPU: 10 PID: 15695 Comm: holetest Tainted: G W 4.18.0-rc6-00077-gc79b37ebab6d-dirty #1 [ 1862.253979] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/2014 [ 1862.255232] RIP: 0010:insert_pfn+0x229/0x240 [ 1862.255734] Code: 21 fa 4c 89 4d b0 48 89 45 c0 c6 05 3c 47 74 01 01 e8 db c2 e2 ff 0f 0b 44 8b 45 ac 4c 8b 4d b0 4c 8b 55 b8 48 8b 45 c0 eb 92 <0f> 0b e9 45 fe ff ff 41 bf f4 ff ff ff e9 43 fe ff ff e8 50 c5 e2 [ 1862.257526] RSP: 0000:ffffc9000e197af8 EFLAGS: 00010216 [ 1862.257994] RAX: 0000000000002df7 RBX: ffff8800368ffe00 RCX: 0000000000000002 [ 1862.258673] RDX: 000fffffffffffff RSI: 000000000000000a RDI: 8000000002df7225 [ 1862.259319] RBP: ffffc9000e197b50 R08: 0000000000000001 R09: ffff8800b8fecd48 [ 1862.260097] R10: ffffc9000e197a30 R11: ffff88010d649a80 R12: ffff8800bb616e80 [ 1862.260843] R13: 00007fc2137c0000 R14: 00000000004521d4 R15: 00000000fffffff0 [ 1862.261563] FS: 00007fc2157ff700(0000) GS:ffff880115800000(0000) knlGS:0000000000000000 [ 1862.262420] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1862.263003] CR2: 00007fc2137c0c00 CR3: 00000000b866e000 CR4: 00000000000006e0 [ 1862.263631] Call Trace: [ 1862.263862] ? trace_hardirqs_on_caller+0xf4/0x190 [ 1862.264300] __vm_insert_mixed+0x83/0xd0 [ 1862.264657] vmf_insert_mixed_mkwrite+0x13/0x40 [ 1862.265066] dax_iomap_pte_fault+0x760/0x1140 [ 1862.265478] dax_iomap_fault+0x37/0x40 [ 1862.265816] __xfs_filemap_fault+0x2de/0x310 [ 1862.266207] xfs_filemap_page_mkwrite+0x15/0x20 [ 1862.266610] xfs_filemap_pfn_mkwrite+0xe/0x10 [ 1862.267044] do_wp_page+0x1bb/0x660 [ 1862.267435] __handle_mm_fault+0xc78/0x1320 [ 1862.267912] handle_mm_fault+0x1ba/0x3c0 [ 1862.268359] __do_page_fault+0x2b4/0x590 [ 1862.268799] do_page_fault+0x38/0x2c0 [ 1862.269218] do_async_page_fault+0x2c/0xb0 [ 1862.269674] ? async_page_fault+0x8/0x30 [ 1862.270081] async_page_fault+0x1e/0x30 [ 1862.270431] RIP: 0033:0x401442 [ 1862.270709] Code: 1d 20 00 85 f6 0f 85 7d 00 00 00 48 85 db 7e 20 4b 8d 04 34 31 d2 66 90 48 8b 0d 21 1d 20 00 48 0f af ca 48 83 c2 01 48 39 d3 <48> 89 2c 08 75 e8 8b 0d de 1c 20 00 31 c0 85 c9 74 0a 8b 15 d6 1c [ 1862.272811] RSP: 002b:00007fc2157feec0 EFLAGS: 00010216 [ 1862.273392] RAX: 00007fc213600c00 RBX: 0000000000001000 RCX: 00000000001c0000 [ 1862.274182] RDX: 00000000000001c1 RSI: 0000000000000000 RDI: 0000000000000001 [ 1862.274961] RBP: 00007fc2157ff700 R08: 00007fc2157ff700 R09: 00007fc2157ff700 [ 1862.275740] R10: 00007fc2157ff9d0 R11: 0000000000000202 R12: 00007fc213600000 [ 1862.276519] R13: 00007ffe07faa240 R14: 0000000000000c00 R15: 00007ffe07faa170 [ 1862.277296] irq event stamp: 13256 [ 1862.277666] hardirqs last enabled at (13255): [<ffffffff81c8c24c>] _raw_spin_unlock_irq+0x2c/0x60 [ 1862.278630] hardirqs last disabled at (13256): [<ffffffff81e011d3>] error_entry+0x93/0x110 [ 1862.279510] softirqs last enabled at (13250): [<ffffffff820003bf>] __do_softirq+0x3bf/0x520 [ 1862.280410] softirqs last disabled at (13229): [<ffffffff810ac388>] irq_exit+0xe8/0xf0 [ 1862.281251] ---[ end trace 4b8bc73df4e9e7ba ]--- 22s [ 1874.258598] XFS (pmem0p2): Unmounting Filesystem [ 1874.347182] XFS (pmem0p2): DAX enabled. Warning: EXPERIMENTAL, use at your own risk [ 1874.348721] XFS (pmem0p2): Mounting V5 Filesystem [ 1874.353946] XFS (pmem0p2): Ending clean mount _check_dmesg: something found in dmesg (see /root/xfstests/results//generic/344.dmesg) Ran: generic/344 Failures: generic/344 Failed 1 of 1 tests - Ross
On Fri, Jul 27, 2018 at 11:20:35AM -0600, Ross Zwisler wrote: > Okay, the next failure I'm hitting is with DAX + XFS + generic/344. It > doesn't happen every time, but I can usually recreate it within 10 iterations > of the test. Here's the failure: Thanks. I've made some progress with this; the WARNing is coming from a vm_insert_* mkwrite call. Inserting sufficient debugging code has let me determine we still have a zero_pfn in the page table when we're trying to insert a new PFN.
diff --git a/fs/dax.c b/fs/dax.c index 9919b6b545fb..75cc160d2f0b 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -367,13 +367,13 @@ static struct page *dax_busy_page(void *entry) * a VM_FAULT code, encoded as an xarray internal entry. The ERR_PTR values * overlap with xarray value entries. */ -static -void *grab_mapping_entry(struct xa_state *xas, struct address_space *mapping) +static void *grab_mapping_entry(struct xa_state *xas, + struct address_space *mapping, unsigned long size) { bool pmd_downgrade = false; /* splitting 2MiB entry into 4k entries? */ void *locked = dax_make_entry(pfn_to_pfn_t(0), - DAX_EMPTY | DAX_LOCKED); - void *unlocked = dax_make_entry(pfn_to_pfn_t(0), DAX_EMPTY); + size | DAX_EMPTY | DAX_LOCKED); + void *unlocked = dax_make_entry(pfn_to_pfn_t(0), size | DAX_EMPTY); void *entry; retry: @@ -1163,7 +1163,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp, if (write && !vmf->cow_page) flags |= IOMAP_WRITE; - entry = grab_mapping_entry(&xas, mapping); + entry = grab_mapping_entry(&xas, mapping, 0); if (xa_is_internal(entry)) { ret = xa_to_internal(entry); goto out; @@ -1396,7 +1396,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp, * page is already in the tree, for instance), it will return * VM_FAULT_FALLBACK. */ - entry = grab_mapping_entry(&xas, mapping); + entry = grab_mapping_entry(&xas, mapping, DAX_PMD); if (xa_is_internal(entry)) { result = xa_to_internal(entry); goto fallback;