diff mbox

fs: use-after-free in path_lookupat

Message ID 20170305191802.GK29622@ZenIV.linux.org.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Al Viro March 5, 2017, 7:18 p.m. UTC
On Sun, Mar 05, 2017 at 06:33:18PM +0100, Dmitry Vyukov wrote:

> Added more debug output.
> 
> name_to_handle_at(r4, &(0x7f0000003000-0x6)="2e2f62757300",
> &(0x7f0000003000-0xd)={0xc, 0x0, "cd21"}, &(0x7f0000002000)=0x0,
> 0x1000)
> 
> actually passes name="" because of the overlapping addresses. Flags
> contain AT_EMPTY_PATH.

Bloody hell...  So you end up with name == (char *)&handle->handle_type + 3?
Looks like it would be a lot more useful to dump the actual contents of
those suckers right before the syscall...

Anyway, that explains WTF is going on.  The bug is in path_init() and
it triggers when you pass something with dentry allocated by d_alloc_pseudo()
as dfd, combined with empty pathname.  You need to have the file closed
by another thread, and have that another thread get out of closing syscall
(close(), dup2(), etc.) before the caller of path_init() gets to
complete_walk().  We need to make sure that this sucker gets DCACHE_RCUPDATE
while it's still guaranteed to be pinned down.  Could you try to reproduce
with the patch below applied?

Comments

Dmitry Vyukov March 6, 2017, 9:46 a.m. UTC | #1
On Sun, Mar 5, 2017 at 8:18 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Sun, Mar 05, 2017 at 06:33:18PM +0100, Dmitry Vyukov wrote:
>
>> Added more debug output.
>>
>> name_to_handle_at(r4, &(0x7f0000003000-0x6)="2e2f62757300",
>> &(0x7f0000003000-0xd)={0xc, 0x0, "cd21"}, &(0x7f0000002000)=0x0,
>> 0x1000)
>>
>> actually passes name="" because of the overlapping addresses. Flags
>> contain AT_EMPTY_PATH.
>
> Bloody hell...  So you end up with name == (char *)&handle->handle_type + 3?
> Looks like it would be a lot more useful to dump the actual contents of
> those suckers right before the syscall...

We can't yet do dumping, it's opposite of generation and we don't have
enough info for it. Strace can do it. But note that it does not
necessary say you true. First, kernel can overwrite some of inputs
with copy_to_user before reading them. Second, racing syscalls that
use the same memory for inputs will lead to non-deterministic inputs,
what you will see from strace is not necessary what kernel sees.


> Anyway, that explains WTF is going on.  The bug is in path_init() and
> it triggers when you pass something with dentry allocated by d_alloc_pseudo()
> as dfd, combined with empty pathname.  You need to have the file closed
> by another thread, and have that another thread get out of closing syscall
> (close(), dup2(), etc.) before the caller of path_init() gets to
> complete_walk().  We need to make sure that this sucker gets DCACHE_RCUPDATE
> while it's still guaranteed to be pinned down.  Could you try to reproduce
> with the patch below applied?
>
> diff --git a/fs/namei.c b/fs/namei.c
> index 6f7d96368734..70840281a41c 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -2226,11 +2226,16 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
>                 nd->path = f.file->f_path;
>                 if (flags & LOOKUP_RCU) {
>                         rcu_read_lock();
> -                       nd->inode = nd->path.dentry->d_inode;
> -                       nd->seq = read_seqcount_begin(&nd->path.dentry->d_seq);
> +                       if (unlikely(!(dentry->d_flags & DCACHE_RCUACCESS))) {
> +                               spin_lock(&dentry->d_lock);
> +                               dentry->d_flags |= DCACHE_RCUACCESS;
> +                               spin_unlock(&dentry->d_lock);
> +                       }
> +                       nd->inode = dentry->d_inode;
> +                       nd->seq = read_seqcount_begin(&dentry->d_seq);
>                 } else {
>                         path_get(&nd->path);
> -                       nd->inode = nd->path.dentry->d_inode;
> +                       nd->inode = dentry->d_inode;
>                 }
>                 fdput(f);
>                 return s;

This seems to fix the crash. Reproducer has survived an hour while
usually it crashes within 5 minutes or so.

But we will back to you with data race reports later. All unprotected
accesses should use READ_ONCE/WRITE_ONCE.
Dmitry Vyukov March 23, 2017, 2:17 p.m. UTC | #2
On Sun, Mar 5, 2017 at 8:18 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Sun, Mar 05, 2017 at 06:33:18PM +0100, Dmitry Vyukov wrote:
>
>> Added more debug output.
>>
>> name_to_handle_at(r4, &(0x7f0000003000-0x6)="2e2f62757300",
>> &(0x7f0000003000-0xd)={0xc, 0x0, "cd21"}, &(0x7f0000002000)=0x0,
>> 0x1000)
>>
>> actually passes name="" because of the overlapping addresses. Flags
>> contain AT_EMPTY_PATH.
>
> Bloody hell...  So you end up with name == (char *)&handle->handle_type + 3?
> Looks like it would be a lot more useful to dump the actual contents of
> those suckers right before the syscall...
>
> Anyway, that explains WTF is going on.  The bug is in path_init() and
> it triggers when you pass something with dentry allocated by d_alloc_pseudo()
> as dfd, combined with empty pathname.  You need to have the file closed
> by another thread, and have that another thread get out of closing syscall
> (close(), dup2(), etc.) before the caller of path_init() gets to
> complete_walk().  We need to make sure that this sucker gets DCACHE_RCUPDATE
> while it's still guaranteed to be pinned down.  Could you try to reproduce
> with the patch below applied?
>
> diff --git a/fs/namei.c b/fs/namei.c
> index 6f7d96368734..70840281a41c 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -2226,11 +2226,16 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
>                 nd->path = f.file->f_path;
>                 if (flags & LOOKUP_RCU) {
>                         rcu_read_lock();
> -                       nd->inode = nd->path.dentry->d_inode;
> -                       nd->seq = read_seqcount_begin(&nd->path.dentry->d_seq);
> +                       if (unlikely(!(dentry->d_flags & DCACHE_RCUACCESS))) {
> +                               spin_lock(&dentry->d_lock);
> +                               dentry->d_flags |= DCACHE_RCUACCESS;
> +                               spin_unlock(&dentry->d_lock);
> +                       }
> +                       nd->inode = dentry->d_inode;
> +                       nd->seq = read_seqcount_begin(&dentry->d_seq);
>                 } else {
>                         path_get(&nd->path);
> -                       nd->inode = nd->path.dentry->d_inode;
> +                       nd->inode = dentry->d_inode;
>                 }
>                 fdput(f);
>                 return s;


Al, please send this patch officially. I am running with it since then
and have not seen the crashes, nor any other issues that look related.

Thanks!
Dmitry Vyukov April 28, 2017, 6:19 a.m. UTC | #3
On Thu, Mar 23, 2017 at 3:17 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Sun, Mar 5, 2017 at 8:18 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>> On Sun, Mar 05, 2017 at 06:33:18PM +0100, Dmitry Vyukov wrote:
>>
>>> Added more debug output.
>>>
>>> name_to_handle_at(r4, &(0x7f0000003000-0x6)="2e2f62757300",
>>> &(0x7f0000003000-0xd)={0xc, 0x0, "cd21"}, &(0x7f0000002000)=0x0,
>>> 0x1000)
>>>
>>> actually passes name="" because of the overlapping addresses. Flags
>>> contain AT_EMPTY_PATH.
>>
>> Bloody hell...  So you end up with name == (char *)&handle->handle_type + 3?
>> Looks like it would be a lot more useful to dump the actual contents of
>> those suckers right before the syscall...
>>
>> Anyway, that explains WTF is going on.  The bug is in path_init() and
>> it triggers when you pass something with dentry allocated by d_alloc_pseudo()
>> as dfd, combined with empty pathname.  You need to have the file closed
>> by another thread, and have that another thread get out of closing syscall
>> (close(), dup2(), etc.) before the caller of path_init() gets to
>> complete_walk().  We need to make sure that this sucker gets DCACHE_RCUPDATE
>> while it's still guaranteed to be pinned down.  Could you try to reproduce
>> with the patch below applied?
>>
>> diff --git a/fs/namei.c b/fs/namei.c
>> index 6f7d96368734..70840281a41c 100644
>> --- a/fs/namei.c
>> +++ b/fs/namei.c
>> @@ -2226,11 +2226,16 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
>>                 nd->path = f.file->f_path;
>>                 if (flags & LOOKUP_RCU) {
>>                         rcu_read_lock();
>> -                       nd->inode = nd->path.dentry->d_inode;
>> -                       nd->seq = read_seqcount_begin(&nd->path.dentry->d_seq);
>> +                       if (unlikely(!(dentry->d_flags & DCACHE_RCUACCESS))) {
>> +                               spin_lock(&dentry->d_lock);
>> +                               dentry->d_flags |= DCACHE_RCUACCESS;
>> +                               spin_unlock(&dentry->d_lock);
>> +                       }
>> +                       nd->inode = dentry->d_inode;
>> +                       nd->seq = read_seqcount_begin(&dentry->d_seq);
>>                 } else {
>>                         path_get(&nd->path);
>> -                       nd->inode = nd->path.dentry->d_inode;
>> +                       nd->inode = dentry->d_inode;
>>                 }
>>                 fdput(f);
>>                 return s;
>
>
> Al, please send this patch officially. I am running with it since then
> and have not seen the crashes, nor any other issues that look related.
>
> Thanks!


Al, ping. Please send this patch.
Dmitry Vyukov May 29, 2017, 2:48 p.m. UTC | #4
On Fri, Apr 28, 2017 at 8:19 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Thu, Mar 23, 2017 at 3:17 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> On Sun, Mar 5, 2017 at 8:18 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>>> On Sun, Mar 05, 2017 at 06:33:18PM +0100, Dmitry Vyukov wrote:
>>>
>>>> Added more debug output.
>>>>
>>>> name_to_handle_at(r4, &(0x7f0000003000-0x6)="2e2f62757300",
>>>> &(0x7f0000003000-0xd)={0xc, 0x0, "cd21"}, &(0x7f0000002000)=0x0,
>>>> 0x1000)
>>>>
>>>> actually passes name="" because of the overlapping addresses. Flags
>>>> contain AT_EMPTY_PATH.
>>>
>>> Bloody hell...  So you end up with name == (char *)&handle->handle_type + 3?
>>> Looks like it would be a lot more useful to dump the actual contents of
>>> those suckers right before the syscall...
>>>
>>> Anyway, that explains WTF is going on.  The bug is in path_init() and
>>> it triggers when you pass something with dentry allocated by d_alloc_pseudo()
>>> as dfd, combined with empty pathname.  You need to have the file closed
>>> by another thread, and have that another thread get out of closing syscall
>>> (close(), dup2(), etc.) before the caller of path_init() gets to
>>> complete_walk().  We need to make sure that this sucker gets DCACHE_RCUPDATE
>>> while it's still guaranteed to be pinned down.  Could you try to reproduce
>>> with the patch below applied?
>>>
>>> diff --git a/fs/namei.c b/fs/namei.c
>>> index 6f7d96368734..70840281a41c 100644
>>> --- a/fs/namei.c
>>> +++ b/fs/namei.c
>>> @@ -2226,11 +2226,16 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
>>>                 nd->path = f.file->f_path;
>>>                 if (flags & LOOKUP_RCU) {
>>>                         rcu_read_lock();
>>> -                       nd->inode = nd->path.dentry->d_inode;
>>> -                       nd->seq = read_seqcount_begin(&nd->path.dentry->d_seq);
>>> +                       if (unlikely(!(dentry->d_flags & DCACHE_RCUACCESS))) {
>>> +                               spin_lock(&dentry->d_lock);
>>> +                               dentry->d_flags |= DCACHE_RCUACCESS;
>>> +                               spin_unlock(&dentry->d_lock);
>>> +                       }
>>> +                       nd->inode = dentry->d_inode;
>>> +                       nd->seq = read_seqcount_begin(&dentry->d_seq);
>>>                 } else {
>>>                         path_get(&nd->path);
>>> -                       nd->inode = nd->path.dentry->d_inode;
>>> +                       nd->inode = dentry->d_inode;
>>>                 }
>>>                 fdput(f);
>>>                 return s;
>>
>>
>> Al, please send this patch officially. I am running with it since then
>> and have not seen the crashes, nor any other issues that look related.
>>
>> Thanks!
>
>
> Al, ping. Please send this patch.


Al, do you want me to mail the patch?
I won't be able to write a super detailed description, but I can do
some format patch.
Al Viro May 30, 2017, 6:24 a.m. UTC | #5
On Mon, May 29, 2017 at 04:48:17PM +0200, Dmitry Vyukov wrote:

> Al, do you want me to mail the patch?
> I won't be able to write a super detailed description, but I can do
> some format patch.

It's been fixed by commit c0eb027e5aef7; if you are still able to
trigger it on the current mainline, please yell - that would have
to be something different.
Dmitry Vyukov May 30, 2017, 8:19 a.m. UTC | #6
On Tue, May 30, 2017 at 8:24 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Mon, May 29, 2017 at 04:48:17PM +0200, Dmitry Vyukov wrote:
>
>> Al, do you want me to mail the patch?
>> I won't be able to write a super detailed description, but I can do
>> some format patch.
>
> It's been fixed by commit c0eb027e5aef7; if you are still able to
> trigger it on the current mainline, please yell - that would have
> to be something different.


Thanks for the update!

No, I can't say that I still see this. It happened very infrequently,
so I wanted to make sure that we don't lost it regardless of whether I
see it or not.
diff mbox

Patch

diff --git a/fs/namei.c b/fs/namei.c
index 6f7d96368734..70840281a41c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2226,11 +2226,16 @@  static const char *path_init(struct nameidata *nd, unsigned flags)
 		nd->path = f.file->f_path;
 		if (flags & LOOKUP_RCU) {
 			rcu_read_lock();
-			nd->inode = nd->path.dentry->d_inode;
-			nd->seq = read_seqcount_begin(&nd->path.dentry->d_seq);
+			if (unlikely(!(dentry->d_flags & DCACHE_RCUACCESS))) {
+				spin_lock(&dentry->d_lock);
+				dentry->d_flags |= DCACHE_RCUACCESS;
+				spin_unlock(&dentry->d_lock);
+			}
+			nd->inode = dentry->d_inode;
+			nd->seq = read_seqcount_begin(&dentry->d_seq);
 		} else {
 			path_get(&nd->path);
-			nd->inode = nd->path.dentry->d_inode;
+			nd->inode = dentry->d_inode;
 		}
 		fdput(f);
 		return s;