diff mbox

mm, dax: fix DAX deadlocks (COW fault)

Message ID 1447675755-5692-1-git-send-email-yigal@plexistor.com (mailing list archive)
State Accepted
Commit 0df9d41
Headers show

Commit Message

Yigal Korman Nov. 16, 2015, 12:09 p.m. UTC
DAX handling of COW faults has wrong locking sequence:
	dax_fault does i_mmap_lock_read
	do_cow_fault does i_mmap_unlock_write

Ross's commit[1] missed a fix[2] that Kirill added to Matthew's
commit[3].

Original COW locking logic was introduced by Matthew here[4].

This should be applied to v4.3 as well.

[1] 0f90cc6609c7 mm, dax: fix DAX deadlocks
[2] 52a2b53ffde6 mm, dax: use i_mmap_unlock_write() in do_cow_fault()
[3] 843172978bb9 dax: fix race between simultaneous faults
[4] 2e4cdab0584f mm: allow page fault handlers to perform the COW

Signed-off-by: Yigal Korman <yigal@plexistor.com>

Cc: Stable Tree <stable@vger.kernel.org>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Jan Kara <jack@suse.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
---
 mm/memory.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Comments

Dan Williams Nov. 16, 2015, 6:15 p.m. UTC | #1
On Mon, Nov 16, 2015 at 4:09 AM, Yigal Korman <yigal@plexistor.com> wrote:
> DAX handling of COW faults has wrong locking sequence:
>         dax_fault does i_mmap_lock_read
>         do_cow_fault does i_mmap_unlock_write
>
> Ross's commit[1] missed a fix[2] that Kirill added to Matthew's
> commit[3].
>
> Original COW locking logic was introduced by Matthew here[4].
>
> This should be applied to v4.3 as well.
>
> [1] 0f90cc6609c7 mm, dax: fix DAX deadlocks
> [2] 52a2b53ffde6 mm, dax: use i_mmap_unlock_write() in do_cow_fault()
> [3] 843172978bb9 dax: fix race between simultaneous faults
> [4] 2e4cdab0584f mm: allow page fault handlers to perform the COW
>
> Signed-off-by: Yigal Korman <yigal@plexistor.com>
>
> Cc: Stable Tree <stable@vger.kernel.org>
> Cc: Boaz Harrosh <boaz@plexistor.com>
> Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Dave Chinner <dchinner@redhat.com>
> Cc: Jan Kara <jack@suse.com>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
> ---
>  mm/memory.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index c716913..e5071af 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3015,9 +3015,9 @@ static int do_cow_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>                 } else {
>                         /*
>                          * The fault handler has no page to lock, so it holds
> -                        * i_mmap_lock for write to protect against truncate.
> +                        * i_mmap_lock for read to protect against truncate.
>                          */
> -                       i_mmap_unlock_write(vma->vm_file->f_mapping);
> +                       i_mmap_unlock_read(vma->vm_file->f_mapping);
>                 }
>                 goto uncharge_out;
>         }
> @@ -3031,9 +3031,9 @@ static int do_cow_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>         } else {
>                 /*
>                  * The fault handler has no page to lock, so it holds
> -                * i_mmap_lock for write to protect against truncate.
> +                * i_mmap_lock for read to protect against truncate.
>                  */
> -               i_mmap_unlock_write(vma->vm_file->f_mapping);
> +               i_mmap_unlock_read(vma->vm_file->f_mapping);
>         }
>         return ret;
>  uncharge_out:

Looks good to me.  I'll include this with some other DAX fixes I have pending.
Ross Zwisler Nov. 16, 2015, 6:34 p.m. UTC | #2
On Mon, Nov 16, 2015 at 10:15:56AM -0800, Dan Williams wrote:
> On Mon, Nov 16, 2015 at 4:09 AM, Yigal Korman <yigal@plexistor.com> wrote:
> > DAX handling of COW faults has wrong locking sequence:
> >         dax_fault does i_mmap_lock_read
> >         do_cow_fault does i_mmap_unlock_write
> >
> > Ross's commit[1] missed a fix[2] that Kirill added to Matthew's
> > commit[3].
> >
> > Original COW locking logic was introduced by Matthew here[4].
> >
> > This should be applied to v4.3 as well.
> >
> > [1] 0f90cc6609c7 mm, dax: fix DAX deadlocks
> > [2] 52a2b53ffde6 mm, dax: use i_mmap_unlock_write() in do_cow_fault()
> > [3] 843172978bb9 dax: fix race between simultaneous faults
> > [4] 2e4cdab0584f mm: allow page fault handlers to perform the COW
> >
> > Signed-off-by: Yigal Korman <yigal@plexistor.com>
> >
> > Cc: Stable Tree <stable@vger.kernel.org>
> > Cc: Boaz Harrosh <boaz@plexistor.com>
> > Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
> > Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Dave Chinner <dchinner@redhat.com>
> > Cc: Jan Kara <jack@suse.com>
> > Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> > Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
> > ---
> >  mm/memory.c | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/mm/memory.c b/mm/memory.c
> > index c716913..e5071af 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -3015,9 +3015,9 @@ static int do_cow_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> >                 } else {
> >                         /*
> >                          * The fault handler has no page to lock, so it holds
> > -                        * i_mmap_lock for write to protect against truncate.
> > +                        * i_mmap_lock for read to protect against truncate.
> >                          */
> > -                       i_mmap_unlock_write(vma->vm_file->f_mapping);
> > +                       i_mmap_unlock_read(vma->vm_file->f_mapping);
> >                 }
> >                 goto uncharge_out;
> >         }
> > @@ -3031,9 +3031,9 @@ static int do_cow_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> >         } else {
> >                 /*
> >                  * The fault handler has no page to lock, so it holds
> > -                * i_mmap_lock for write to protect against truncate.
> > +                * i_mmap_lock for read to protect against truncate.
> >                  */
> > -               i_mmap_unlock_write(vma->vm_file->f_mapping);
> > +               i_mmap_unlock_read(vma->vm_file->f_mapping);
> >         }
> >         return ret;
> >  uncharge_out:
> 
> Looks good to me.  I'll include this with some other DAX fixes I have pending.

Looks good to me as well.  Thanks for catching this.
Boaz Harrosh Nov. 17, 2015, 10:40 a.m. UTC | #3
On 11/16/2015 08:34 PM, Ross Zwisler wrote:
> On Mon, Nov 16, 2015 at 10:15:56AM -0800, Dan Williams wrote:
>> On Mon, Nov 16, 2015 at 4:09 AM, Yigal Korman <yigal@plexistor.com> wrote:
>>> DAX handling of COW faults has wrong locking sequence:
>>>         dax_fault does i_mmap_lock_read
>>>         do_cow_fault does i_mmap_unlock_write
>>>
>>> Ross's commit[1] missed a fix[2] that Kirill added to Matthew's
>>> commit[3].
>>>
>>> Original COW locking logic was introduced by Matthew here[4].
>>>
>>> This should be applied to v4.3 as well.
>>>
>>> [1] 0f90cc6609c7 mm, dax: fix DAX deadlocks
>>> [2] 52a2b53ffde6 mm, dax: use i_mmap_unlock_write() in do_cow_fault()
>>> [3] 843172978bb9 dax: fix race between simultaneous faults
>>> [4] 2e4cdab0584f mm: allow page fault handlers to perform the COW
>>>
>>> Signed-off-by: Yigal Korman <yigal@plexistor.com>
>>>
>>> Cc: Stable Tree <stable@vger.kernel.org>
>>> Cc: Boaz Harrosh <boaz@plexistor.com>
>>> Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
>>> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
>>> Cc: Dan Williams <dan.j.williams@intel.com>
>>> Cc: Dave Chinner <dchinner@redhat.com>
>>> Cc: Jan Kara <jack@suse.com>
>>> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>>> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
>>> ---
>>>  mm/memory.c | 8 ++++----
>>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/mm/memory.c b/mm/memory.c
>>> index c716913..e5071af 100644
>>> --- a/mm/memory.c
>>> +++ b/mm/memory.c
>>> @@ -3015,9 +3015,9 @@ static int do_cow_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>>>                 } else {
>>>                         /*
>>>                          * The fault handler has no page to lock, so it holds
>>> -                        * i_mmap_lock for write to protect against truncate.
>>> +                        * i_mmap_lock for read to protect against truncate.
>>>                          */
>>> -                       i_mmap_unlock_write(vma->vm_file->f_mapping);
>>> +                       i_mmap_unlock_read(vma->vm_file->f_mapping);
>>>                 }
>>>                 goto uncharge_out;
>>>         }
>>> @@ -3031,9 +3031,9 @@ static int do_cow_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>>>         } else {
>>>                 /*
>>>                  * The fault handler has no page to lock, so it holds
>>> -                * i_mmap_lock for write to protect against truncate.
>>> +                * i_mmap_lock for read to protect against truncate.
>>>                  */
>>> -               i_mmap_unlock_write(vma->vm_file->f_mapping);
>>> +               i_mmap_unlock_read(vma->vm_file->f_mapping);
>>>         }
>>>         return ret;
>>>  uncharge_out:
>>
>> Looks good to me.  I'll include this with some other DAX fixes I have pending.
> 
> Looks good to me as well.  Thanks for catching this.
> 

Yes. None of the xfstests catch this. It needs a private-mapping mmap in some combination
of other activity on the file at the same time.

Which the linker of gcc does. We have a test of a git clone Kernel-tree and make. which
catches this in the make phase. For some reason on ext4 it is reliable to crash but on xfs 1/2
the runs go through, go figure.

Thanks Yigal for the fast fix
Boaz
diff mbox

Patch

diff --git a/mm/memory.c b/mm/memory.c
index c716913..e5071af 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3015,9 +3015,9 @@  static int do_cow_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 		} else {
 			/*
 			 * The fault handler has no page to lock, so it holds
-			 * i_mmap_lock for write to protect against truncate.
+			 * i_mmap_lock for read to protect against truncate.
 			 */
-			i_mmap_unlock_write(vma->vm_file->f_mapping);
+			i_mmap_unlock_read(vma->vm_file->f_mapping);
 		}
 		goto uncharge_out;
 	}
@@ -3031,9 +3031,9 @@  static int do_cow_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	} else {
 		/*
 		 * The fault handler has no page to lock, so it holds
-		 * i_mmap_lock for write to protect against truncate.
+		 * i_mmap_lock for read to protect against truncate.
 		 */
-		i_mmap_unlock_write(vma->vm_file->f_mapping);
+		i_mmap_unlock_read(vma->vm_file->f_mapping);
 	}
 	return ret;
 uncharge_out: