diff mbox

[v14,00/74] Convert page cache to XArray

Message ID 20180619171638.GE1438@bombadil.infradead.org (mailing list archive)
State New, archived
Headers show

Commit Message

Matthew Wilcox June 19, 2018, 5:16 p.m. UTC
On Tue, Jun 19, 2018 at 10:40:37AM -0600, Ross Zwisler wrote:
> On Tue, Jun 19, 2018 at 02:22:30AM -0700, Matthew Wilcox wrote:
> > On Mon, Jun 18, 2018 at 09:12:57PM -0600, Ross Zwisler wrote:
> > > Hit another deadlock.  This one reproduces 100% of the time in my setup with
> > > XFS + DAX + generic/340.  It doesn't reproduce for me at all with
> > > next-20180615.  Here's the output from "echo w > /proc/sysrq-trigger":
> > 
> > *sigh*.  I wonder what the differences are between our setups ...
> > 
> > > [   92.849119] sysrq: SysRq : Show Blocked State
> > > [   92.850506]   task                        PC stack   pid father
> > > [   92.852299] holetest        D    0  1651   1466 0x00000000
> > > [   92.853912] Call Trace:
> > > [   92.854610]  __schedule+0x2c5/0xad0
> > > [   92.855612]  schedule+0x36/0x90
> > > [   92.856602]  get_unlocked_entry+0xce/0x120
> > > [   92.857756]  ? dax_insert_entry+0x2b0/0x2b0
> > > [   92.858931]  grab_mapping_entry+0x19e/0x250
> > > [   92.860119]  dax_iomap_pte_fault+0x115/0x1140
> > > [   92.860836]  dax_iomap_fault+0x37/0x40
> > ...
> > > This looks very similar to the one I reported last week with generic/269.
> > 
> > Yeah, another missing wakeup, no doubt.  Can you bisect this?  That was
> > how I found the last one; bisected it to a single patch and stared very
> > hard at the patch until I saw it.  I'm not going to be in a position to
> > tinker with my DAX setup until the first week of July.
> 
> It bisected to this commit:
> 
> b4b4daa7e8fb0ad0fee35d3e28d00e97c849a6cb is the first bad commit
> commit b4b4daa7e8fb0ad0fee35d3e28d00e97c849a6cb
> Author: Matthew Wilcox <willy@infradead.org>
> Date:   Thu Mar 29 22:58:27 2018 -0400
> 
>     dax: Convert page fault handlers to XArray
> 
>     This is the last part of DAX to be converted to the XArray so
>     remove all the old helper functions.
> 
>     Signed-off-by: Matthew Wilcox <willy@infradead.org>

I think I see a bug.  No idea if it's the one you're hitting ;-)

I had been intending to not use the 'entry' to decide whether we were
waiting on a 2MB or 4kB page, but rather the xas.  I shelved that idea,
but not before dropping the DAX_PMD flag being passed from the PMD
pagefault caller.  So if I put that back ...

Comments

Matthew Wilcox June 27, 2018, 11:05 a.m. UTC | #1
On Tue, Jun 19, 2018 at 10:16:38AM -0700, Matthew Wilcox wrote:
> I think I see a bug.  No idea if it's the one you're hitting ;-)
> 
> I had been intending to not use the 'entry' to decide whether we were
> waiting on a 2MB or 4kB page, but rather the xas.  I shelved that idea,
> but not before dropping the DAX_PMD flag being passed from the PMD
> pagefault caller.  So if I put that back ...

Did you get a chance to test this?

> diff --git a/fs/dax.c b/fs/dax.c
> index 9919b6b545fb..75cc160d2f0b 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -367,13 +367,13 @@ static struct page *dax_busy_page(void *entry)
>   * a VM_FAULT code, encoded as an xarray internal entry.  The ERR_PTR values
>   * overlap with xarray value entries.
>   */
> -static
> -void *grab_mapping_entry(struct xa_state *xas, struct address_space *mapping)
> +static void *grab_mapping_entry(struct xa_state *xas,
> +		struct address_space *mapping, unsigned long size)
>  {
>  	bool pmd_downgrade = false; /* splitting 2MiB entry into 4k entries? */
>  	void *locked = dax_make_entry(pfn_to_pfn_t(0),
> -						DAX_EMPTY | DAX_LOCKED);
> -	void *unlocked = dax_make_entry(pfn_to_pfn_t(0), DAX_EMPTY);
> +						size | DAX_EMPTY | DAX_LOCKED);
> +	void *unlocked = dax_make_entry(pfn_to_pfn_t(0), size | DAX_EMPTY);
>  	void *entry;
>  
>  retry:
> @@ -1163,7 +1163,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
>  	if (write && !vmf->cow_page)
>  		flags |= IOMAP_WRITE;
>  
> -	entry = grab_mapping_entry(&xas, mapping);
> +	entry = grab_mapping_entry(&xas, mapping, 0);
>  	if (xa_is_internal(entry)) {
>  		ret = xa_to_internal(entry);
>  		goto out;
> @@ -1396,7 +1396,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
>  	 * page is already in the tree, for instance), it will return
>  	 * VM_FAULT_FALLBACK.
>  	 */
> -	entry = grab_mapping_entry(&xas, mapping);
> +	entry = grab_mapping_entry(&xas, mapping, DAX_PMD);
>  	if (xa_is_internal(entry)) {
>  		result = xa_to_internal(entry);
>  		goto fallback;
>
Ross Zwisler June 27, 2018, 7:44 p.m. UTC | #2
On Wed, Jun 27, 2018 at 04:05:29AM -0700, Matthew Wilcox wrote:
> On Tue, Jun 19, 2018 at 10:16:38AM -0700, Matthew Wilcox wrote:
> > I think I see a bug.  No idea if it's the one you're hitting ;-)
> > 
> > I had been intending to not use the 'entry' to decide whether we were
> > waiting on a 2MB or 4kB page, but rather the xas.  I shelved that idea,
> > but not before dropping the DAX_PMD flag being passed from the PMD
> > pagefault caller.  So if I put that back ...
> 
> Did you get a chance to test this?

With this patch it doesn't deadlock, but the test dies with a SIGBUS and we
hit a WARN_ON in the DAX code:

WARNING: CPU: 5 PID: 1678 at fs/dax.c:226 get_unlocked_entry+0xf7/0x120

I don't have a lot of time this week to debug further.  The quickest path to
victory is probably for you to get this reproducing in your test setup.  Does
XFS + DAX + generic/340 pass for you?
Matthew Wilcox June 28, 2018, 8:39 a.m. UTC | #3
On Wed, Jun 27, 2018 at 01:44:38PM -0600, Ross Zwisler wrote:
> On Wed, Jun 27, 2018 at 04:05:29AM -0700, Matthew Wilcox wrote:
> > On Tue, Jun 19, 2018 at 10:16:38AM -0700, Matthew Wilcox wrote:
> > > I think I see a bug.  No idea if it's the one you're hitting ;-)
> > > 
> > > I had been intending to not use the 'entry' to decide whether we were
> > > waiting on a 2MB or 4kB page, but rather the xas.  I shelved that idea,
> > > but not before dropping the DAX_PMD flag being passed from the PMD
> > > pagefault caller.  So if I put that back ...
> > 
> > Did you get a chance to test this?
> 
> With this patch it doesn't deadlock, but the test dies with a SIGBUS and we
> hit a WARN_ON in the DAX code:
> 
> WARNING: CPU: 5 PID: 1678 at fs/dax.c:226 get_unlocked_entry+0xf7/0x120
> 
> I don't have a lot of time this week to debug further.  The quickest path to
> victory is probably for you to get this reproducing in your test setup.  Does
> XFS + DAX + generic/340 pass for you?

I won't be back in front of my test box until Tuesday, but that test
does work for me because I couldn't get your instructions to give me a
2MB aligned DAX setup.  I had to settle for 4k, so none of the 2MB stuff
has been tested properly.
Ross Zwisler June 28, 2018, 4:30 p.m. UTC | #4
On Thu, Jun 28, 2018 at 01:39:09AM -0700, Matthew Wilcox wrote:
> On Wed, Jun 27, 2018 at 01:44:38PM -0600, Ross Zwisler wrote:
> > On Wed, Jun 27, 2018 at 04:05:29AM -0700, Matthew Wilcox wrote:
> > > On Tue, Jun 19, 2018 at 10:16:38AM -0700, Matthew Wilcox wrote:
> > > > I think I see a bug.  No idea if it's the one you're hitting ;-)
> > > > 
> > > > I had been intending to not use the 'entry' to decide whether we were
> > > > waiting on a 2MB or 4kB page, but rather the xas.  I shelved that idea,
> > > > but not before dropping the DAX_PMD flag being passed from the PMD
> > > > pagefault caller.  So if I put that back ...
> > > 
> > > Did you get a chance to test this?
> > 
> > With this patch it doesn't deadlock, but the test dies with a SIGBUS and we
> > hit a WARN_ON in the DAX code:
> > 
> > WARNING: CPU: 5 PID: 1678 at fs/dax.c:226 get_unlocked_entry+0xf7/0x120
> > 
> > I don't have a lot of time this week to debug further.  The quickest path to
> > victory is probably for you to get this reproducing in your test setup.  Does
> > XFS + DAX + generic/340 pass for you?
> 
> I won't be back in front of my test box until Tuesday, but that test
> does work for me because I couldn't get your instructions to give me a
> 2MB aligned DAX setup.  I had to settle for 4k, so none of the 2MB stuff
> has been tested properly.

Ah.  I've documented both my qemu setup and my filesystem setup here:

https://nvdimm.wiki.kernel.org/pmem_in_qemu
https://nvdimm.wiki.kernel.org/2mib_fs_dax

This should be enough to get you up and running in QEMU and able to reliably
get filesystem DAX PMD faults.
Matthew Wilcox July 25, 2018, 9:03 p.m. UTC | #5
On Wed, Jun 27, 2018 at 01:44:38PM -0600, Ross Zwisler wrote:
> On Wed, Jun 27, 2018 at 04:05:29AM -0700, Matthew Wilcox wrote:
> > On Tue, Jun 19, 2018 at 10:16:38AM -0700, Matthew Wilcox wrote:
> > > I think I see a bug.  No idea if it's the one you're hitting ;-)
> > > 
> > > I had been intending to not use the 'entry' to decide whether we were
> > > waiting on a 2MB or 4kB page, but rather the xas.  I shelved that idea,
> > > but not before dropping the DAX_PMD flag being passed from the PMD
> > > pagefault caller.  So if I put that back ...
> > 
> > Did you get a chance to test this?
> 
> With this patch it doesn't deadlock, but the test dies with a SIGBUS and we
> hit a WARN_ON in the DAX code:
> 
> WARNING: CPU: 5 PID: 1678 at fs/dax.c:226 get_unlocked_entry+0xf7/0x120
> 
> I don't have a lot of time this week to debug further.  The quickest path to
> victory is probably for you to get this reproducing in your test setup.  Does
> XFS + DAX + generic/340 pass for you?

I now have generic/340 passing.  I've pushed a new version to
git://git.infradead.org/users/willy/linux-dax.git xarray
Ross Zwisler July 25, 2018, 9:12 p.m. UTC | #6
On Wed, Jul 25, 2018 at 02:03:23PM -0700, Matthew Wilcox wrote:
> On Wed, Jun 27, 2018 at 01:44:38PM -0600, Ross Zwisler wrote:
> > On Wed, Jun 27, 2018 at 04:05:29AM -0700, Matthew Wilcox wrote:
> > > On Tue, Jun 19, 2018 at 10:16:38AM -0700, Matthew Wilcox wrote:
> > > > I think I see a bug.  No idea if it's the one you're hitting ;-)
> > > > 
> > > > I had been intending to not use the 'entry' to decide whether we were
> > > > waiting on a 2MB or 4kB page, but rather the xas.  I shelved that idea,
> > > > but not before dropping the DAX_PMD flag being passed from the PMD
> > > > pagefault caller.  So if I put that back ...
> > > 
> > > Did you get a chance to test this?
> > 
> > With this patch it doesn't deadlock, but the test dies with a SIGBUS and we
> > hit a WARN_ON in the DAX code:
> > 
> > WARNING: CPU: 5 PID: 1678 at fs/dax.c:226 get_unlocked_entry+0xf7/0x120
> > 
> > I don't have a lot of time this week to debug further.  The quickest path to
> > victory is probably for you to get this reproducing in your test setup.  Does
> > XFS + DAX + generic/340 pass for you?
> 
> I now have generic/340 passing.  I've pushed a new version to
> git://git.infradead.org/users/willy/linux-dax.git xarray

Thanks, I'll throw it in my test setup.
Ross Zwisler July 27, 2018, 5:20 p.m. UTC | #7
On Wed, Jul 25, 2018 at 02:03:23PM -0700, Matthew Wilcox wrote:
> On Wed, Jun 27, 2018 at 01:44:38PM -0600, Ross Zwisler wrote:
> > On Wed, Jun 27, 2018 at 04:05:29AM -0700, Matthew Wilcox wrote:
> > > On Tue, Jun 19, 2018 at 10:16:38AM -0700, Matthew Wilcox wrote:
> > > > I think I see a bug.  No idea if it's the one you're hitting ;-)
> > > > 
> > > > I had been intending to not use the 'entry' to decide whether we were
> > > > waiting on a 2MB or 4kB page, but rather the xas.  I shelved that idea,
> > > > but not before dropping the DAX_PMD flag being passed from the PMD
> > > > pagefault caller.  So if I put that back ...
> > > 
> > > Did you get a chance to test this?
> > 
> > With this patch it doesn't deadlock, but the test dies with a SIGBUS and we
> > hit a WARN_ON in the DAX code:
> > 
> > WARNING: CPU: 5 PID: 1678 at fs/dax.c:226 get_unlocked_entry+0xf7/0x120
> > 
> > I don't have a lot of time this week to debug further.  The quickest path to
> > victory is probably for you to get this reproducing in your test setup.  Does
> > XFS + DAX + generic/340 pass for you?
> 
> I now have generic/340 passing.  I've pushed a new version to
> git://git.infradead.org/users/willy/linux-dax.git xarray

Okay, the next failure I'm hitting is with DAX + XFS + generic/344.  It
doesn't happen every time, but I can usually recreate it within 10 iterations
of the test.  Here's the failure:

generic/344 21s ...[ 1852.564559] run fstests generic/344 at 2018-07-27 11:19:05
[ 1853.033177] XFS (pmem0p2): Unmounting Filesystem
[ 1853.134497] XFS (pmem0p2): DAX enabled. Warning: EXPERIMENTAL, use at your own risk
[ 1853.135335] XFS (pmem0p2): Mounting V5 Filesystem
[ 1853.138119] XFS (pmem0p2): Ending clean mount
[ 1862.251185] WARNING: CPU: 10 PID: 15695 at mm/memory.c:1801 insert_pfn+0x229/0x240
[ 1862.252023] Modules linked in: dax_pmem device_dax nd_pmem nd_btt nfit libnvdimm
[ 1862.252853] CPU: 10 PID: 15695 Comm: holetest Tainted: G        W         4.18.0-rc6-00077-gc79b37ebab6d-dirty #1
[ 1862.253979] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/2014
[ 1862.255232] RIP: 0010:insert_pfn+0x229/0x240
[ 1862.255734] Code: 21 fa 4c 89 4d b0 48 89 45 c0 c6 05 3c 47 74 01 01 e8 db c2 e2 ff 0f 0b 44 8b 45 ac 4c 8b 4d b0 4c 8b 55 b8 48 8b 45 c0 eb 92 <0f> 0b e9 45 fe ff ff 41 bf f4 ff ff ff e9 43 fe ff ff e8 50 c5 e2
[ 1862.257526] RSP: 0000:ffffc9000e197af8 EFLAGS: 00010216
[ 1862.257994] RAX: 0000000000002df7 RBX: ffff8800368ffe00 RCX: 0000000000000002
[ 1862.258673] RDX: 000fffffffffffff RSI: 000000000000000a RDI: 8000000002df7225
[ 1862.259319] RBP: ffffc9000e197b50 R08: 0000000000000001 R09: ffff8800b8fecd48
[ 1862.260097] R10: ffffc9000e197a30 R11: ffff88010d649a80 R12: ffff8800bb616e80
[ 1862.260843] R13: 00007fc2137c0000 R14: 00000000004521d4 R15: 00000000fffffff0
[ 1862.261563] FS:  00007fc2157ff700(0000) GS:ffff880115800000(0000) knlGS:0000000000000000
[ 1862.262420] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1862.263003] CR2: 00007fc2137c0c00 CR3: 00000000b866e000 CR4: 00000000000006e0
[ 1862.263631] Call Trace:
[ 1862.263862]  ? trace_hardirqs_on_caller+0xf4/0x190
[ 1862.264300]  __vm_insert_mixed+0x83/0xd0
[ 1862.264657]  vmf_insert_mixed_mkwrite+0x13/0x40
[ 1862.265066]  dax_iomap_pte_fault+0x760/0x1140
[ 1862.265478]  dax_iomap_fault+0x37/0x40
[ 1862.265816]  __xfs_filemap_fault+0x2de/0x310
[ 1862.266207]  xfs_filemap_page_mkwrite+0x15/0x20
[ 1862.266610]  xfs_filemap_pfn_mkwrite+0xe/0x10
[ 1862.267044]  do_wp_page+0x1bb/0x660
[ 1862.267435]  __handle_mm_fault+0xc78/0x1320
[ 1862.267912]  handle_mm_fault+0x1ba/0x3c0
[ 1862.268359]  __do_page_fault+0x2b4/0x590
[ 1862.268799]  do_page_fault+0x38/0x2c0
[ 1862.269218]  do_async_page_fault+0x2c/0xb0
[ 1862.269674]  ? async_page_fault+0x8/0x30
[ 1862.270081]  async_page_fault+0x1e/0x30
[ 1862.270431] RIP: 0033:0x401442
[ 1862.270709] Code: 1d 20 00 85 f6 0f 85 7d 00 00 00 48 85 db 7e 20 4b 8d 04 34 31 d2 66 90 48 8b 0d 21 1d 20 00 48 0f af ca 48 83 c2 01 48 39 d3 <48> 89 2c 08 75 e8 8b 0d de 1c 20 00 31 c0 85 c9 74 0a 8b 15 d6 1c
[ 1862.272811] RSP: 002b:00007fc2157feec0 EFLAGS: 00010216
[ 1862.273392] RAX: 00007fc213600c00 RBX: 0000000000001000 RCX: 00000000001c0000
[ 1862.274182] RDX: 00000000000001c1 RSI: 0000000000000000 RDI: 0000000000000001
[ 1862.274961] RBP: 00007fc2157ff700 R08: 00007fc2157ff700 R09: 00007fc2157ff700
[ 1862.275740] R10: 00007fc2157ff9d0 R11: 0000000000000202 R12: 00007fc213600000
[ 1862.276519] R13: 00007ffe07faa240 R14: 0000000000000c00 R15: 00007ffe07faa170
[ 1862.277296] irq event stamp: 13256
[ 1862.277666] hardirqs last  enabled at (13255): [<ffffffff81c8c24c>] _raw_spin_unlock_irq+0x2c/0x60
[ 1862.278630] hardirqs last disabled at (13256): [<ffffffff81e011d3>] error_entry+0x93/0x110
[ 1862.279510] softirqs last  enabled at (13250): [<ffffffff820003bf>] __do_softirq+0x3bf/0x520
[ 1862.280410] softirqs last disabled at (13229): [<ffffffff810ac388>] irq_exit+0xe8/0xf0
[ 1862.281251] ---[ end trace 4b8bc73df4e9e7ba ]---
 22s
[ 1874.258598] XFS (pmem0p2): Unmounting Filesystem
[ 1874.347182] XFS (pmem0p2): DAX enabled. Warning: EXPERIMENTAL, use at your own risk
[ 1874.348721] XFS (pmem0p2): Mounting V5 Filesystem
[ 1874.353946] XFS (pmem0p2): Ending clean mount
_check_dmesg: something found in dmesg (see /root/xfstests/results//generic/344.dmesg)
Ran: generic/344
Failures: generic/344
Failed 1 of 1 tests

- Ross
Matthew Wilcox July 30, 2018, 3:43 p.m. UTC | #8
On Fri, Jul 27, 2018 at 11:20:35AM -0600, Ross Zwisler wrote:
> Okay, the next failure I'm hitting is with DAX + XFS + generic/344.  It
> doesn't happen every time, but I can usually recreate it within 10 iterations
> of the test.  Here's the failure:

Thanks.  I've made some progress with this; the WARNing is coming from
a vm_insert_* mkwrite call.  Inserting sufficient debugging code has
let me determine we still have a zero_pfn in the page table when we're
trying to insert a new PFN.
diff mbox

Patch

diff --git a/fs/dax.c b/fs/dax.c
index 9919b6b545fb..75cc160d2f0b 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -367,13 +367,13 @@  static struct page *dax_busy_page(void *entry)
  * a VM_FAULT code, encoded as an xarray internal entry.  The ERR_PTR values
  * overlap with xarray value entries.
  */
-static
-void *grab_mapping_entry(struct xa_state *xas, struct address_space *mapping)
+static void *grab_mapping_entry(struct xa_state *xas,
+		struct address_space *mapping, unsigned long size)
 {
 	bool pmd_downgrade = false; /* splitting 2MiB entry into 4k entries? */
 	void *locked = dax_make_entry(pfn_to_pfn_t(0),
-						DAX_EMPTY | DAX_LOCKED);
-	void *unlocked = dax_make_entry(pfn_to_pfn_t(0), DAX_EMPTY);
+						size | DAX_EMPTY | DAX_LOCKED);
+	void *unlocked = dax_make_entry(pfn_to_pfn_t(0), size | DAX_EMPTY);
 	void *entry;
 
 retry:
@@ -1163,7 +1163,7 @@  static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
 	if (write && !vmf->cow_page)
 		flags |= IOMAP_WRITE;
 
-	entry = grab_mapping_entry(&xas, mapping);
+	entry = grab_mapping_entry(&xas, mapping, 0);
 	if (xa_is_internal(entry)) {
 		ret = xa_to_internal(entry);
 		goto out;
@@ -1396,7 +1396,7 @@  static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
 	 * page is already in the tree, for instance), it will return
 	 * VM_FAULT_FALLBACK.
 	 */
-	entry = grab_mapping_entry(&xas, mapping);
+	entry = grab_mapping_entry(&xas, mapping, DAX_PMD);
 	if (xa_is_internal(entry)) {
 		result = xa_to_internal(entry);
 		goto fallback;