diff mbox

dax: fix deadlock with DAX 4k holes

Message ID 1483479365-13607-1-git-send-email-ross.zwisler@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Ross Zwisler Jan. 3, 2017, 9:36 p.m. UTC
Currently in DAX if we have three read faults on the same hole address we
can end up with the following:

Thread 0		Thread 1		Thread 2
--------		--------		--------
dax_iomap_fault
 grab_mapping_entry
  lock_slot
   <locks empty DAX entry>

  			dax_iomap_fault
			 grab_mapping_entry
			  get_unlocked_mapping_entry
			   <sleeps on empty DAX entry>

						dax_iomap_fault
						 grab_mapping_entry
						  get_unlocked_mapping_entry
						   <sleeps on empty DAX entry>
  dax_load_hole
   find_or_create_page
   ...
    page_cache_tree_insert
     dax_wake_mapping_entry_waiter
      <wakes one sleeper>
     __radix_tree_replace
      <swaps empty DAX entry with 4k zero page>

			<wakes>
			get_page
			lock_page
			...
			put_locked_mapping_entry
			unlock_page
			put_page

						<sleeps forever on the DAX
						 wait queue>

The crux of the problem is that once we insert a 4k zero page, all locking
from then on is done in terms of that 4k zero page and any additional
threads sleeping on the empty DAX entry will never be woken.  Fix this by
waking all sleepers when we replace the DAX radix tree entry with a 4k zero
page.  This will allow all sleeping threads to successfully transition from
locking based on the DAX empty entry to locking on the 4k zero page.

With the test case reported by Xiong this happens very regularly in my test
setup, with some runs resulting in 9+ threads in this deadlocked state.
With this fix I've been able to run that same test dozens of times in a
loop without issue.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reported-by: Xiong Zhou <xzhou@redhat.com>
Fixes: commit ac401cc78242 ("dax: New fault locking")
Cc: Jan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org # 4.7+
---

This issue exists as far back as v4.7, and I was easly able to reproduce it
with v4.7 using the same test.

Unfortunately this patch won't apply cleanly to the stable trees, but the
change is very simple and should be easy to replicate by hand.  Please ping
me if you'd like patches that apply cleanly to the v4.9 and v4.8.15 trees.

---
 mm/filemap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Jan Kara Jan. 4, 2017, 7:18 a.m. UTC | #1
On Tue 03-01-17 14:36:05, Ross Zwisler wrote:
> Currently in DAX if we have three read faults on the same hole address we
> can end up with the following:
> 
> Thread 0		Thread 1		Thread 2
> --------		--------		--------
> dax_iomap_fault
>  grab_mapping_entry
>   lock_slot
>    <locks empty DAX entry>
> 
>   			dax_iomap_fault
> 			 grab_mapping_entry
> 			  get_unlocked_mapping_entry
> 			   <sleeps on empty DAX entry>
> 
> 						dax_iomap_fault
> 						 grab_mapping_entry
> 						  get_unlocked_mapping_entry
> 						   <sleeps on empty DAX entry>
>   dax_load_hole
>    find_or_create_page
>    ...
>     page_cache_tree_insert
>      dax_wake_mapping_entry_waiter
>       <wakes one sleeper>
>      __radix_tree_replace
>       <swaps empty DAX entry with 4k zero page>
> 
> 			<wakes>
> 			get_page
> 			lock_page
> 			...
> 			put_locked_mapping_entry
> 			unlock_page
> 			put_page
> 
> 						<sleeps forever on the DAX
> 						 wait queue>
> 
> The crux of the problem is that once we insert a 4k zero page, all locking
> from then on is done in terms of that 4k zero page and any additional
> threads sleeping on the empty DAX entry will never be woken.  Fix this by
> waking all sleepers when we replace the DAX radix tree entry with a 4k zero
> page.  This will allow all sleeping threads to successfully transition from
> locking based on the DAX empty entry to locking on the 4k zero page.
> 
> With the test case reported by Xiong this happens very regularly in my test
> setup, with some runs resulting in 9+ threads in this deadlocked state.
> With this fix I've been able to run that same test dozens of times in a
> loop without issue.
> 
> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> Reported-by: Xiong Zhou <xzhou@redhat.com>
> Fixes: commit ac401cc78242 ("dax: New fault locking")
> Cc: Jan Kara <jack@suse.cz>
> Cc: stable@vger.kernel.org # 4.7+

Ah, very good catch. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

I wonder why I was not able to reproduce this... Probably the timing didn't
work out right on my test machine.

								Honza

> ---
> 
> This issue exists as far back as v4.7, and I was easly able to reproduce it
> with v4.7 using the same test.
> 
> Unfortunately this patch won't apply cleanly to the stable trees, but the
> change is very simple and should be easy to replicate by hand.  Please ping
> me if you'd like patches that apply cleanly to the v4.9 and v4.8.15 trees.
> 
> ---
>  mm/filemap.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index d0e4d10..b772a33 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -138,7 +138,7 @@ static int page_cache_tree_insert(struct address_space *mapping,
>  				dax_radix_locked_entry(0, RADIX_DAX_EMPTY));
>  			/* Wakeup waiters for exceptional entry lock */
>  			dax_wake_mapping_entry_waiter(mapping, page->index, p,
> -						      false);
> +						      true);
>  		}
>  	}
>  	__radix_tree_replace(&mapping->page_tree, node, slot, page,
> -- 
> 2.7.4
>
Murphy Zhou Jan. 4, 2017, 2:26 p.m. UTC | #2
On Tue, Jan 03, 2017 at 02:36:05PM -0700, Ross Zwisler wrote:
> Currently in DAX if we have three read faults on the same hole address we
> can end up with the following:
> 
> Thread 0		Thread 1		Thread 2
> --------		--------		--------
> dax_iomap_fault
>  grab_mapping_entry
>   lock_slot
>    <locks empty DAX entry>
> 
>   			dax_iomap_fault
> 			 grab_mapping_entry
> 			  get_unlocked_mapping_entry
> 			   <sleeps on empty DAX entry>
> 
> 						dax_iomap_fault
> 						 grab_mapping_entry
> 						  get_unlocked_mapping_entry
> 						   <sleeps on empty DAX entry>
>   dax_load_hole
>    find_or_create_page
>    ...
>     page_cache_tree_insert
>      dax_wake_mapping_entry_waiter
>       <wakes one sleeper>
>      __radix_tree_replace
>       <swaps empty DAX entry with 4k zero page>
> 
> 			<wakes>
> 			get_page
> 			lock_page
> 			...
> 			put_locked_mapping_entry
> 			unlock_page
> 			put_page
> 
> 						<sleeps forever on the DAX
> 						 wait queue>
> 
> The crux of the problem is that once we insert a 4k zero page, all locking
> from then on is done in terms of that 4k zero page and any additional
> threads sleeping on the empty DAX entry will never be woken.  Fix this by
> waking all sleepers when we replace the DAX radix tree entry with a 4k zero
> page.  This will allow all sleeping threads to successfully transition from
> locking based on the DAX empty entry to locking on the 4k zero page.
> 
> With the test case reported by Xiong this happens very regularly in my test
> setup, with some runs resulting in 9+ threads in this deadlocked state.
> With this fix I've been able to run that same test dozens of times in a
> loop without issue.
> 
> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> Reported-by: Xiong Zhou <xzhou@redhat.com>
> Fixes: commit ac401cc78242 ("dax: New fault locking")
> Cc: Jan Kara <jack@suse.cz>
> Cc: stable@vger.kernel.org # 4.7+
> ---

Positive test result of this patch for this issue and the regression
tests.

Great job!

> 
> This issue exists as far back as v4.7, and I was easly able to reproduce it
> with v4.7 using the same test.
> 
> Unfortunately this patch won't apply cleanly to the stable trees, but the
> change is very simple and should be easy to replicate by hand.  Please ping
> me if you'd like patches that apply cleanly to the v4.9 and v4.8.15 trees.
> 
> ---
>  mm/filemap.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index d0e4d10..b772a33 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -138,7 +138,7 @@ static int page_cache_tree_insert(struct address_space *mapping,
>  				dax_radix_locked_entry(0, RADIX_DAX_EMPTY));
>  			/* Wakeup waiters for exceptional entry lock */
>  			dax_wake_mapping_entry_waiter(mapping, page->index, p,
> -						      false);
> +						      true);
>  		}
>  	}
>  	__radix_tree_replace(&mapping->page_tree, node, slot, page,
> -- 
> 2.7.4
>
diff mbox

Patch

diff --git a/mm/filemap.c b/mm/filemap.c
index d0e4d10..b772a33 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -138,7 +138,7 @@  static int page_cache_tree_insert(struct address_space *mapping,
 				dax_radix_locked_entry(0, RADIX_DAX_EMPTY));
 			/* Wakeup waiters for exceptional entry lock */
 			dax_wake_mapping_entry_waiter(mapping, page->index, p,
-						      false);
+						      true);
 		}
 	}
 	__radix_tree_replace(&mapping->page_tree, node, slot, page,