diff mbox

fs: out of bounds on stack in iov_iter_advance

Message ID 20151106013402.GT22011@ZenIV.linux.org.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Al Viro Nov. 6, 2015, 1:34 a.m. UTC
On Wed, Sep 30, 2015 at 05:30:17PM -0400, Sasha Levin wrote:

> > So I've traced this all the way back to dax_io(). I can trigger this with:
> > 
> > diff --git a/fs/dax.c b/fs/dax.c
> > index 93bf2f9..2cdb8a5 100644
> > --- a/fs/dax.c
> > +++ b/fs/dax.c
> > @@ -178,6 +178,7 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter,
> >         if (need_wmb)
> >                 wmb_pmem();
> > 
> > +       WARN_ON((pos == start) && (pos - start > iov_iter_count(iter)));
> >         return (pos == start) ? retval : pos - start;
> >  }
> > 
> > So it seems that iter gets moved twice here: once in dax_io(), and once again
> > back at generic_file_read_iter().
> > 
> > I don't see how it ever worked. Am I missing something?

This:
                        struct iov_iter data = *iter;
                        retval = mapping->a_ops->direct_IO(iocb, &data, pos);
                }

                if (retval > 0) {
                        *ppos = pos + retval;
                        iov_iter_advance(iter, retval);

The iterator advanced in ->direct_IO() is a _copy_, not the original.
The contents of *iter as seen by generic_file_read_iter() is not
modifiable by ->direct_IO(), simply because its address is nowhere to
be found.  And checking iov_iter_count(iter) at the end of dax_io() is
pointless - from the POV of generic_file_read_iter() it's data.count,
and while it used to be equal to iter->count, it's already been modified.
By the time we call iov_iter_advance() in generic_file_read_iter() that
value will be already discarded, along with rest of struct iov_iter data.

Wait a minute - you are triggering _what_???
> > +       WARN_ON((pos == start) && (pos - start > iov_iter_count(iter)));
With '&&'?  iov_iter_count() is size_t, while pos and start are loff_t,
so you are seeing equal values in pos and start (as integers) *and*
(loff_t)0 > (size_t)something.  loff_t is a signed type, size_t - unsigned.
6.3.1.8[1] says that
	* if rank of size_t is greater or equal to rank of loff_t, the
latter gets converted to size_t.  And conversion of zero should be zero,
i.e. (size_t) 0 > (size_t)something, which is impossible (we compare them
as non-negative integers).
	* if loff_t can represent all values of size_t, size_t value gets
converted to loff_t.  Result of conversion should have the same (in particular,
non-negative) value.  Again, comparison can't be true.
	* otherwise both values are converted to unsigned counterpart of
loff_t.  Again, zero converts to 0 and in any unsigned type 0 > x is
impossible.

I don't see any way for that condition to evaluate true.

Assuming that it's a misquoted ||...  I don't see any way for pos to
get greater than start + original iov_iter_count().  However, I *do*
see a way for bad things to happen in a different way.  Look:
	// first pass through the loop, pos == start (and so's max)
                                retval = dax_get_addr(bh, &addr, blkbits);
	// got a positive value
                                if (retval < 0)
                                        break;
	// nope, keep going
                                if (buffer_unwritten(bh) || buffer_new(bh)) {
                                        dax_new_buf(addr, retval, first, pos,
                                                                        end);
                                        need_wmb = true;
                                }
                                addr += first;
                                size = retval - first;
	// OK...
                        }
                        max = min(pos + size, end);
	// OK...
                }

                if (iov_iter_rw(iter) == WRITE) {
                        len = copy_from_iter_pmem(addr, max - pos, iter);
                        need_wmb = true;
                } else if (!hole)
                        len = copy_to_iter((void __force *)addr, max - pos,
                                        iter);
                else
                        len = iov_iter_zero(max - pos, iter);
	// too bad - we'd hit an unmapped memory area.  len is 0...
	// and retval is fucking positive.
                if (!len)
                        break;

	return (pos == start) ? retval : pos - start;
	// will return a bloody big positive value

Could you try to reproduce it with this:

dax_io(): don't let non-error value escape via retval instead of EFAULT

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Al Viro Nov. 6, 2015, 2:19 a.m. UTC | #1
On Fri, Nov 06, 2015 at 01:34:02AM +0000, Al Viro wrote:

> Could you try to reproduce it with this:
> 
> dax_io(): don't let non-error value escape via retval instead of EFAULT
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
> diff --git a/fs/dax.c b/fs/dax.c
> index a86d3cc..7b653e9 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -169,8 +169,10 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter,
>  		else
>  			len = iov_iter_zero(max - pos, iter);
>  
> -		if (!len)
> +		if (!len) {
> +			retval = -EFAULT;
>  			break;
> +		}
>  
>  		pos += len;
>  		addr += len;
> 

PS: "block, dax: fix lifetime of in-kernel dax mappings with dax_map_atomic()"
Dan Williams had posted a while ago does change the things a bit, but
AFAICS only in turning "return a bogus positive value" into "return an
uninitialized value"; if applying that one after it, s/retval/rc/ in
the above.  And whether it fixes the bug Sasha had been able to trigger,
the bug is real and needs fixing - it's been there since 4.0 when fs/dax.c
went into the tree.

How are we going to handle that one?  I can put it into mainline pull
request via vfs.git, with Cc: stable, but if e.g. Jens prefers to take it
via the block tree, I'll be glad to leave it for him to deal with.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Linus Torvalds Nov. 6, 2015, 3:38 a.m. UTC | #2
On Thu, Nov 5, 2015 at 6:19 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> How are we going to handle that one?  I can put it into mainline pull
> request via vfs.git, with Cc: stable, but if e.g. Jens prefers to take it
> via the block tree, I'll be glad to leave it for him to deal with.

Put it in the vfs tree (I'm hoping for a pull request soon..)

I pulled the block trees from Jens yesterday, so there is presumably
nothing pending there right now.

              Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jens Axboe Nov. 6, 2015, 4:06 p.m. UTC | #3
On 11/05/2015 08:38 PM, Linus Torvalds wrote:
> On Thu, Nov 5, 2015 at 6:19 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>>
>> How are we going to handle that one?  I can put it into mainline pull
>> request via vfs.git, with Cc: stable, but if e.g. Jens prefers to take it
>> via the block tree, I'll be glad to leave it for him to deal with.
>
> Put it in the vfs tree (I'm hoping for a pull request soon..)
>
> I pulled the block trees from Jens yesterday, so there is presumably
> nothing pending there right now.

Either way is obviously fine with me. I have 4 patches pending, but 
unless more urgent things show up, I was going to continue collecting 
fixes and submit that post -rc1.
Linus Torvalds Nov. 11, 2015, 2:21 a.m. UTC | #4
Al, ping?

On Thu, Nov 5, 2015 at 7:38 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Nov 5, 2015 at 6:19 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>>
>> How are we going to handle that one?  I can put it into mainline pull
>> request via vfs.git, with Cc: stable, but if e.g. Jens prefers to take it
>> via the block tree, I'll be glad to leave it for him to deal with.
>
> Put it in the vfs tree (I'm hoping for a pull request soon..)
>
> I pulled the block trees from Jens yesterday, so there is presumably
> nothing pending there right now.

Apparently my "hoping for a pull request soon" was ridiculously optimistic.

Al, looking at the most recent linux-next, most of the vfs commits
there seem to be committed in the last day or two. I'm getting the
feeling that that is all 4.5 material by now.

Should I just take the iov patch as-is, since apparently no vfs pull
request is happening this merge cycle? And no, I'm not taking
"developed during the second week of the merge window, and sent in the
last few days of it". I'm done with that.

                    Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jens Axboe Nov. 11, 2015, 2:25 a.m. UTC | #5
On Tue, Nov 10 2015, Linus Torvalds wrote:
> Al, ping?
> 
> On Thu, Nov 5, 2015 at 7:38 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > On Thu, Nov 5, 2015 at 6:19 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >>
> >> How are we going to handle that one?  I can put it into mainline pull
> >> request via vfs.git, with Cc: stable, but if e.g. Jens prefers to take it
> >> via the block tree, I'll be glad to leave it for him to deal with.
> >
> > Put it in the vfs tree (I'm hoping for a pull request soon..)
> >
> > I pulled the block trees from Jens yesterday, so there is presumably
> > nothing pending there right now.
> 
> Apparently my "hoping for a pull request soon" was ridiculously optimistic.
> 
> Al, looking at the most recent linux-next, most of the vfs commits
> there seem to be committed in the last day or two. I'm getting the
> feeling that that is all 4.5 material by now.
> 
> Should I just take the iov patch as-is, since apparently no vfs pull
> request is happening this merge cycle? And no, I'm not taking
> "developed during the second week of the merge window, and sent in the
> last few days of it". I'm done with that.

I've got 8 other patches pending for a post core merge, just waiting for
the last core pull request to go in. I haven't seen this iov iter fix,
though.



  git://git.kernel.dk/linux-block.git for-linus


----------------------------------------------------------------
Jan Kara (1):
      brd: Refuse improperly aligned discard requests

Jens Axboe (2):
      MAINTAINERS: add reference to new linux-block list
      blk-mq: mark __blk_mq_complete_request() static

Randy Dunlap (1):
      block: fix blk-core.c kernel-doc warning

Sathyavathi M (1):
      NVMe: Increase the max transfer size when mdts is 0

Stephan Günther (2):
      NVMe: use split lo_hi_{read,write}q
      NVMe: add support for Apple NVMe controller

Vivek Goyal (1):
      fs/block_dev.c: Remove WARN_ON() when inode writeback fails

 MAINTAINERS             |  1 +
 block/blk-core.c        |  3 +++
 block/blk-mq.c          |  2 +-
 block/blk-mq.h          |  1 -
 drivers/block/brd.c     |  3 +++
 drivers/nvme/host/pci.c | 15 +++++++++------
 fs/block_dev.c          | 15 ++++++++++++---
 7 files changed, 29 insertions(+), 11 deletions(-)
Linus Torvalds Nov. 11, 2015, 2:31 a.m. UTC | #6
On Tue, Nov 10, 2015 at 6:25 PM, Jens Axboe <axboe@kernel.dk> wrote:
> On Tue, Nov 10 2015, Linus Torvalds wrote:
>> Al, ping?
>>
>> On Thu, Nov 5, 2015 at 7:38 PM, Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>> > On Thu, Nov 5, 2015 at 6:19 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>> >>
>> >> How are we going to handle that one?  I can put it into mainline pull
>> >> request via vfs.git, with Cc: stable, but if e.g. Jens prefers to take it
>> >> via the block tree, I'll be glad to leave it for him to deal with.
>> >
>> > Put it in the vfs tree (I'm hoping for a pull request soon..)
>> >
>> > I pulled the block trees from Jens yesterday, so there is presumably
>> > nothing pending there right now.
>>
>> Apparently my "hoping for a pull request soon" was ridiculously optimistic.
>>
>> Al, looking at the most recent linux-next, most of the vfs commits
>> there seem to be committed in the last day or two. I'm getting the
>> feeling that that is all 4.5 material by now.
>>
>> Should I just take the iov patch as-is, since apparently no vfs pull
>> request is happening this merge cycle? And no, I'm not taking
>> "developed during the second week of the merge window, and sent in the
>> last few days of it". I'm done with that.
>
> I've got 8 other patches pending for a post core merge, just waiting for
> the last core pull request to go in. I haven't seen this iov iter fix,
> though.

It was in this thread, looked like this (without the whitespace damage):

    dax_io(): don't let non-error value escape via retval instead of EFAULT

    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
    ---
    diff --git a/fs/dax.c b/fs/dax.c
    index a86d3cc..7b653e9 100644
    --- a/fs/dax.c
    +++ b/fs/dax.c
    @@ -169,8 +169,10 @@ static ssize_t dax_io(struct inode *inode,
struct iov_iter *iter,
                    else
                            len = iov_iter_zero(max - pos, iter);

    -               if (!len)
    +               if (!len) {
    +                       retval = -EFAULT;
                            break;
    +               }

                    pos += len;
                    addr += len;


although I don't think I saw a confirmation that that was what Sasha
actually hit (but Sasha had narrowed it down to DAX, so it looks
possible/likely)

                    Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jens Axboe Nov. 11, 2015, 2:40 a.m. UTC | #7
On 11/10/2015 07:31 PM, Linus Torvalds wrote:
> On Tue, Nov 10, 2015 at 6:25 PM, Jens Axboe <axboe@kernel.dk> wrote:
>> On Tue, Nov 10 2015, Linus Torvalds wrote:
>>> Al, ping?
>>>
>>> On Thu, Nov 5, 2015 at 7:38 PM, Linus Torvalds
>>> <torvalds@linux-foundation.org> wrote:
>>>> On Thu, Nov 5, 2015 at 6:19 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>>>>>
>>>>> How are we going to handle that one?  I can put it into mainline pull
>>>>> request via vfs.git, with Cc: stable, but if e.g. Jens prefers to take it
>>>>> via the block tree, I'll be glad to leave it for him to deal with.
>>>>
>>>> Put it in the vfs tree (I'm hoping for a pull request soon..)
>>>>
>>>> I pulled the block trees from Jens yesterday, so there is presumably
>>>> nothing pending there right now.
>>>
>>> Apparently my "hoping for a pull request soon" was ridiculously optimistic.
>>>
>>> Al, looking at the most recent linux-next, most of the vfs commits
>>> there seem to be committed in the last day or two. I'm getting the
>>> feeling that that is all 4.5 material by now.
>>>
>>> Should I just take the iov patch as-is, since apparently no vfs pull
>>> request is happening this merge cycle? And no, I'm not taking
>>> "developed during the second week of the merge window, and sent in the
>>> last few days of it". I'm done with that.
>>
>> I've got 8 other patches pending for a post core merge, just waiting for
>> the last core pull request to go in. I haven't seen this iov iter fix,
>> though.
>
> It was in this thread, looked like this (without the whitespace damage):
>
>      dax_io(): don't let non-error value escape via retval instead of EFAULT
>
>      Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
>      ---
>      diff --git a/fs/dax.c b/fs/dax.c
>      index a86d3cc..7b653e9 100644
>      --- a/fs/dax.c
>      +++ b/fs/dax.c
>      @@ -169,8 +169,10 @@ static ssize_t dax_io(struct inode *inode,
> struct iov_iter *iter,
>                      else
>                              len = iov_iter_zero(max - pos, iter);
>
>      -               if (!len)
>      +               if (!len) {
>      +                       retval = -EFAULT;
>                              break;
>      +               }
>
>                      pos += len;
>                      addr += len;
>
>
> although I don't think I saw a confirmation that that was what Sasha
> actually hit (but Sasha had narrowed it down to DAX, so it looks
> possible/likely)

I found it right after sending that email. Patch looks pretty straight 
forward, at least from the case of max - pos != 0 and len == 0 on 
return. Might be cleaner to add a

if (retval < 0)
     break;

check, that should be the case where max == pos anyway. But we'd 
potentially return -Exx into -EFAULT for that case with the patch.

Hmm?
Jens Axboe Nov. 11, 2015, 2:41 a.m. UTC | #8
On 11/10/2015 07:40 PM, Jens Axboe wrote:
> On 11/10/2015 07:31 PM, Linus Torvalds wrote:
>> On Tue, Nov 10, 2015 at 6:25 PM, Jens Axboe <axboe@kernel.dk> wrote:
>>> On Tue, Nov 10 2015, Linus Torvalds wrote:
>>>> Al, ping?
>>>>
>>>> On Thu, Nov 5, 2015 at 7:38 PM, Linus Torvalds
>>>> <torvalds@linux-foundation.org> wrote:
>>>>> On Thu, Nov 5, 2015 at 6:19 PM, Al Viro <viro@zeniv.linux.org.uk>
>>>>> wrote:
>>>>>>
>>>>>> How are we going to handle that one?  I can put it into mainline pull
>>>>>> request via vfs.git, with Cc: stable, but if e.g. Jens prefers to
>>>>>> take it
>>>>>> via the block tree, I'll be glad to leave it for him to deal with.
>>>>>
>>>>> Put it in the vfs tree (I'm hoping for a pull request soon..)
>>>>>
>>>>> I pulled the block trees from Jens yesterday, so there is presumably
>>>>> nothing pending there right now.
>>>>
>>>> Apparently my "hoping for a pull request soon" was ridiculously
>>>> optimistic.
>>>>
>>>> Al, looking at the most recent linux-next, most of the vfs commits
>>>> there seem to be committed in the last day or two. I'm getting the
>>>> feeling that that is all 4.5 material by now.
>>>>
>>>> Should I just take the iov patch as-is, since apparently no vfs pull
>>>> request is happening this merge cycle? And no, I'm not taking
>>>> "developed during the second week of the merge window, and sent in the
>>>> last few days of it". I'm done with that.
>>>
>>> I've got 8 other patches pending for a post core merge, just waiting for
>>> the last core pull request to go in. I haven't seen this iov iter fix,
>>> though.
>>
>> It was in this thread, looked like this (without the whitespace damage):
>>
>>      dax_io(): don't let non-error value escape via retval instead of
>> EFAULT
>>
>>      Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
>>      ---
>>      diff --git a/fs/dax.c b/fs/dax.c
>>      index a86d3cc..7b653e9 100644
>>      --- a/fs/dax.c
>>      +++ b/fs/dax.c
>>      @@ -169,8 +169,10 @@ static ssize_t dax_io(struct inode *inode,
>> struct iov_iter *iter,
>>                      else
>>                              len = iov_iter_zero(max - pos, iter);
>>
>>      -               if (!len)
>>      +               if (!len) {
>>      +                       retval = -EFAULT;
>>                              break;
>>      +               }
>>
>>                      pos += len;
>>                      addr += len;
>>
>>
>> although I don't think I saw a confirmation that that was what Sasha
>> actually hit (but Sasha had narrowed it down to DAX, so it looks
>> possible/likely)
>
> I found it right after sending that email. Patch looks pretty straight
> forward, at least from the case of max - pos != 0 and len == 0 on
> return. Might be cleaner to add a
>
> if (retval < 0)
>      break;
>
> check, that should be the case where max == pos anyway. But we'd
> potentially return -Exx into -EFAULT for that case with the patch.
>
> Hmm?

So we already do that, in the 'if' above. I think the patch looks fine.
Jens Axboe Nov. 11, 2015, 2:44 a.m. UTC | #9
On 11/10/2015 07:41 PM, Jens Axboe wrote:
> On 11/10/2015 07:40 PM, Jens Axboe wrote:
>> On 11/10/2015 07:31 PM, Linus Torvalds wrote:
>>> On Tue, Nov 10, 2015 at 6:25 PM, Jens Axboe <axboe@kernel.dk> wrote:
>>>> On Tue, Nov 10 2015, Linus Torvalds wrote:
>>>>> Al, ping?
>>>>>
>>>>> On Thu, Nov 5, 2015 at 7:38 PM, Linus Torvalds
>>>>> <torvalds@linux-foundation.org> wrote:
>>>>>> On Thu, Nov 5, 2015 at 6:19 PM, Al Viro <viro@zeniv.linux.org.uk>
>>>>>> wrote:
>>>>>>>
>>>>>>> How are we going to handle that one?  I can put it into mainline
>>>>>>> pull
>>>>>>> request via vfs.git, with Cc: stable, but if e.g. Jens prefers to
>>>>>>> take it
>>>>>>> via the block tree, I'll be glad to leave it for him to deal with.
>>>>>>
>>>>>> Put it in the vfs tree (I'm hoping for a pull request soon..)
>>>>>>
>>>>>> I pulled the block trees from Jens yesterday, so there is presumably
>>>>>> nothing pending there right now.
>>>>>
>>>>> Apparently my "hoping for a pull request soon" was ridiculously
>>>>> optimistic.
>>>>>
>>>>> Al, looking at the most recent linux-next, most of the vfs commits
>>>>> there seem to be committed in the last day or two. I'm getting the
>>>>> feeling that that is all 4.5 material by now.
>>>>>
>>>>> Should I just take the iov patch as-is, since apparently no vfs pull
>>>>> request is happening this merge cycle? And no, I'm not taking
>>>>> "developed during the second week of the merge window, and sent in the
>>>>> last few days of it". I'm done with that.
>>>>
>>>> I've got 8 other patches pending for a post core merge, just waiting
>>>> for
>>>> the last core pull request to go in. I haven't seen this iov iter fix,
>>>> though.
>>>
>>> It was in this thread, looked like this (without the whitespace damage):
>>>
>>>      dax_io(): don't let non-error value escape via retval instead of
>>> EFAULT
>>>
>>>      Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
>>>      ---
>>>      diff --git a/fs/dax.c b/fs/dax.c
>>>      index a86d3cc..7b653e9 100644
>>>      --- a/fs/dax.c
>>>      +++ b/fs/dax.c
>>>      @@ -169,8 +169,10 @@ static ssize_t dax_io(struct inode *inode,
>>> struct iov_iter *iter,
>>>                      else
>>>                              len = iov_iter_zero(max - pos, iter);
>>>
>>>      -               if (!len)
>>>      +               if (!len) {
>>>      +                       retval = -EFAULT;
>>>                              break;
>>>      +               }
>>>
>>>                      pos += len;
>>>                      addr += len;
>>>
>>>
>>> although I don't think I saw a confirmation that that was what Sasha
>>> actually hit (but Sasha had narrowed it down to DAX, so it looks
>>> possible/likely)
>>
>> I found it right after sending that email. Patch looks pretty straight
>> forward, at least from the case of max - pos != 0 and len == 0 on
>> return. Might be cleaner to add a
>>
>> if (retval < 0)
>>      break;
>>
>> check, that should be the case where max == pos anyway. But we'd
>> potentially return -Exx into -EFAULT for that case with the patch.
>>
>> Hmm?
>
> So we already do that, in the 'if' above. I think the patch looks fine.

Queued up. Unless Al objects, it'll be part of the 'for-linus' pull 
later this week.
Al Viro Nov. 11, 2015, 2:56 a.m. UTC | #10
On Tue, Nov 10, 2015 at 06:21:47PM -0800, Linus Torvalds wrote:

> Al, looking at the most recent linux-next, most of the vfs commits
> there seem to be committed in the last day or two. I'm getting the
> feeling that that is all 4.5 material by now.
> 
> Should I just take the iov patch as-is, since apparently no vfs pull
> request is happening this merge cycle? And no, I'm not taking
> "developed during the second week of the merge window, and sent in the
> last few days of it". I'm done with that.

s/developed/rebased/, actually, but... point taken.  Mea culpa, and what
to do with those patches is for you to decide; some of those are simply
-stable fodder and probably ought to go one-by-one at any point you would
consider convenient, some are of the "remove stale comment" variety (obviously
can sit around until the next cycle, or go in one-by-one at any point - the
things like
-
-       /* WARNING: probably going away soon, do not use! */
in inode_operations; the comment used to be about the method removed last
cycle and should've been gone with it; etc.)

There's a large pile not in those two classes - xattr+richacl stuff.  I'm more
confident about the first part, but strictly speaking neither qualifies as
fixes.

FWIW, the stuff that had been _developed_ during the merge window is not there
- a patch series around the descriptor bitmaps.  Doesn't change the situation;
I'd fucked up this cycle ;-/
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Al Viro Nov. 11, 2015, 3:06 a.m. UTC | #11
On Tue, Nov 10, 2015 at 07:44:14PM -0700, Jens Axboe wrote:

> Queued up. Unless Al objects, it'll be part of the 'for-linus' pull
> later this week.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: stable@vger.kernel.org # 4.0+

probably ought to be there...
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jens Axboe Nov. 11, 2015, 3:07 a.m. UTC | #12
On 11/10/2015 08:06 PM, Al Viro wrote:
> On Tue, Nov 10, 2015 at 07:44:14PM -0700, Jens Axboe wrote:
>
>> Queued up. Unless Al objects, it'll be part of the 'for-linus' pull
>> later this week.
>
> Reported-by: Sasha Levin <sasha.levin@oracle.com>
> Cc: stable@vger.kernel.org # 4.0+
>
> probably ought to be there...

Agree, done.
Sasha Levin Nov. 11, 2015, 3:20 a.m. UTC | #13
On 11/10/2015 09:31 PM, Linus Torvalds wrote:
> although I don't think I saw a confirmation that that was what Sasha
> actually hit (but Sasha had narrowed it down to DAX, so it looks
> possible/likely)

Yup, that indeed fixed the problem I was seeing.

Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Al Viro Nov. 11, 2015, 3:30 a.m. UTC | #14
On Wed, Nov 11, 2015 at 02:56:47AM +0000, Al Viro wrote:
> s/developed/rebased/, actually, but... point taken.  Mea culpa, and what
> to do with those patches is for you to decide; some of those are simply
> -stable fodder and probably ought to go one-by-one at any point you would
> consider convenient, some are of the "remove stale comment" variety (obviously
> can sit around until the next cycle, or go in one-by-one at any point - the
> things like
> -
> -       /* WARNING: probably going away soon, do not use! */
> in inode_operations; the comment used to be about the method removed last
> cycle and should've been gone with it; etc.)

FWIW, here's what's in there:
	dax_io fix
Jens has just taken it
	fs: fix inode.c kernel-doc warning
	fs: fix writeback.c kernel-doc warnings
trivial comment patches
	overlayfs: move super block magic number to magic.h
got picked into overlayfs tree yesterday
	debugfs: fix refcount imbalance in start_creating
old fix, -stable fodder (had been first posted in October, IIRC)
	vfs: Check attribute names in posix acl xattr handers
	vfs: Fix the posix_acl_xattr_list return value
	ubifs: Remove unused security xattr handler
	hfsplus: Remove unused xattr handler list operations
	jffs2: Add missing capability check for listing trusted xattrs
	xattr handlers: Pass handler to operations instead of flags
	9p: xattr simplifications
	squashfs: xattr simplifications
	f2fs: xattr simplifications
xattr series; the first two are arguably fixes, and whatever happens in this
window, I'm taking the rest into -next for 4.5.  Series makes sense and
cleans the things nicely, IMO.
	FS-Cache: Increase reference of parent after registering, netfs success
	FS-Cache: Don't override netfs's primary_index if registering failed
	cachefiles: perform test on s_blocksize when opening cache file.
	FS-Cache: Handle a write to the page immediately beyond the EOF marker
1, 2 and 4 are simply -stable fodder, 3 is an obvious optimization.
	binfmt_elf: Don't clobber passed executable's file header
	binfmt_elf: Correct `arch_check_elf's description
-stable fodder.
	fs/pipe.c: preserve alloc_file() error code
	fs/pipe.c: return error code rather than 0 in pipe_write()
-stable fodder.
	vfs: remove unused wrapper block_page_mkwrite()
	vfs: remove stale comment in inode_operations
dead code and stale comment removal.  Can go at any point.
	fs: 9p: cache.h: Add #define of include guard
trivial, can go at any point, or stay until the next cycle.
	richacl series
probably misses the window - I'd really like to hear more detailed variant
of Christoph's objections in any case.

Again, my apologies to everyone involved - I'd fucked up, badly.  The only
question is how much PITA it will end up causing.  I can put those into
separate branches and/or mail directly; what ends up missing the window
will go into vfs.git#for-next as soon as -rc1 is out there (with the
possible exception of richacl stuff - I really want to hear from Christoph
and in more details than "it's all been said some iterations ago").

Linus, what would be your preference wrt that stuff?  Besides the "don't
ever do that kind of shit again", that is - that much is obvious.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Linus Torvalds Nov. 11, 2015, 4:36 a.m. UTC | #15
On Tue, Nov 10, 2015 at 7:30 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Linus, what would be your preference wrt that stuff?

If you can just create a branch with the stuff that is obvious and
clearly worth it (ie stuff that would basically be stable material
anyway), I'll just merge it.  Assuming it's all done in some
reasonable timeframe..

               Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Al Viro Nov. 11, 2015, 7:43 a.m. UTC | #16
On Tue, Nov 10, 2015 at 08:36:48PM -0800, Linus Torvalds wrote:
> On Tue, Nov 10, 2015 at 7:30 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > Linus, what would be your preference wrt that stuff?
> 
> If you can just create a branch with the stuff that is obvious and
> clearly worth it (ie stuff that would basically be stable material
> anyway), I'll just merge it.  Assuming it's all done in some
> reasonable timeframe..

OK...  Right now I have #for-linus-stable and #for-linus-2 on top
of it, the latter adding several comment fixes, etc., the most serious
change among which is the removal of never used block_page_mkwrite().

dax_io fix isn't there, neither is overlayfs magic.h patch - both are
already in other trees.  I would like to get xattr series in as well,
but that's a separate pull request, if you'd accept them in this window in
the first place.  richacl stuff isn't there as well, and I think that one
is clear "leave it for 4.5" fodder.

Anyway, for
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-linus-2
(both -stable fodder and trivial patches)

Shortlog:
Daniel Borkmann (1):
      debugfs: fix refcount imbalance in start_creating

David Howells (1):
      FS-Cache: Handle a write to the page immediately beyond the EOF marker

Eric Biggers (2):
      fs/pipe.c: preserve alloc_file() error code
      fs/pipe.c: return error code rather than 0 in pipe_write()

Kinglong Mee (2):
      FS-Cache: Increase reference of parent after registering, netfs success
      FS-Cache: Don't override netfs's primary_index if registering failed

Maciej W. Rozycki (2):
      binfmt_elf: Don't clobber passed executable's file header
      binfmt_elf: Correct `arch_check_elf's description

NeilBrown (1):
      cachefiles: perform test on s_blocksize when opening cache file.

Randy Dunlap (2):
      fs: fix inode.c kernel-doc warning
      fs: fix writeback.c kernel-doc warnings

Ross Zwisler (2):
      vfs: remove unused wrapper block_page_mkwrite()
      vfs: remove stale comment in inode_operations

Tzvetelin Katchov (1):
      fs: 9p: cache.h: Add #define of include guard

Diffstat:
 fs/9p/cache.h               |  1 +
 fs/binfmt_elf.c             | 12 ++++----
 fs/buffer.c                 | 24 ++-------------
 fs/cachefiles/namei.c       |  2 ++
 fs/cachefiles/rdwr.c        | 73 +++++++++++++++++++++++----------------------
 fs/debugfs/inode.c          |  6 +++-
 fs/ext4/inode.c             |  4 +--
 fs/fs-writeback.c           |  3 +-
 fs/fscache/netfs.c          | 38 +++++++++++------------
 fs/fscache/page.c           |  2 +-
 fs/inode.c                  |  1 +
 fs/nilfs2/file.c            |  2 +-
 fs/pipe.c                   | 18 ++++++-----
 fs/xfs/xfs_file.c           |  2 +-
 include/linux/buffer_head.h |  2 --
 include/linux/fs.h          |  2 --
 16 files changed, 89 insertions(+), 103 deletions(-)

If you'd prefer to do that in two separate pulls - yell (or just pull
#for-linux-stable first, then this on top of it).  I'd reordered
#for-next so that it continues #for-linus-2; tree of its tip being the
same as yesterday.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stephen Rothwell Nov. 11, 2015, 8:16 a.m. UTC | #17
Hi Al,

On Wed, 11 Nov 2015 07:43:30 +0000 Al Viro <viro@ZenIV.linux.org.uk> wrote:
>
> dax_io fix isn't there, neither is overlayfs magic.h patch - both are
> already in other trees.  I would like to get xattr series in as well,
> but that's a separate pull request, if you'd accept them in this window in
> the first place.  richacl stuff isn't there as well, and I think that one
> is clear "leave it for 4.5" fodder.

So could you please remove the 4.5 stuff from your for-next branch
until after the merge window closes.

Also, I noticed these new warnings today:

fs/orangefs/xattr.c:509:9: warning: initialization from incompatible pointer type [-Wincompatible-pointer-types]
  .get = pvfs2_xattr_get_trusted,
         ^
fs/orangefs/xattr.c:509:9: note: (near initialization for 'pvfs2_xattr_trusted_handler.get')
fs/orangefs/xattr.c:510:9: warning: initialization from incompatible pointer type [-Wincompatible-pointer-types]
  .set = pvfs2_xattr_set_trusted,
         ^
fs/orangefs/xattr.c:510:9: note: (near initialization for 'pvfs2_xattr_trusted_handler.set')
fs/orangefs/xattr.c:520:9: warning: initialization from incompatible pointer type [-Wincompatible-pointer-types]
  .get = pvfs2_xattr_get_default,
         ^
fs/orangefs/xattr.c:520:9: note: (near initialization for 'pvfs2_xattr_default_handler.get')
fs/orangefs/xattr.c:521:9: warning: initialization from incompatible pointer type [-Wincompatible-pointer-types]
  .set = pvfs2_xattr_set_default,
         ^
fs/orangefs/xattr.c:521:9: note: (near initialization for 'pvfs2_xattr_default_handler.set')
Al Viro Nov. 11, 2015, 10:19 a.m. UTC | #18
On Wed, Nov 11, 2015 at 07:16:36PM +1100, Stephen Rothwell wrote:
> Hi Al,
> 
> On Wed, 11 Nov 2015 07:43:30 +0000 Al Viro <viro@ZenIV.linux.org.uk> wrote:
> >
> > dax_io fix isn't there, neither is overlayfs magic.h patch - both are
> > already in other trees.  I would like to get xattr series in as well,
> > but that's a separate pull request, if you'd accept them in this window in
> > the first place.  richacl stuff isn't there as well, and I think that one
> > is clear "leave it for 4.5" fodder.
> 
> So could you please remove the 4.5 stuff from your for-next branch
> until after the merge window closes.

Done.

> Also, I noticed these new warnings today:
> 
> fs/orangefs/xattr.c:509:9: warning: initialization from incompatible pointer type [-Wincompatible-pointer-types]
>   .get = pvfs2_xattr_get_trusted,
>          ^
> fs/orangefs/xattr.c:509:9: note: (near initialization for 'pvfs2_xattr_trusted_handler.get')
> fs/orangefs/xattr.c:510:9: warning: initialization from incompatible pointer type [-Wincompatible-pointer-types]
>   .set = pvfs2_xattr_set_trusted,
>          ^
> fs/orangefs/xattr.c:510:9: note: (near initialization for 'pvfs2_xattr_trusted_handler.set')
> fs/orangefs/xattr.c:520:9: warning: initialization from incompatible pointer type [-Wincompatible-pointer-types]
>   .get = pvfs2_xattr_get_default,
>          ^
> fs/orangefs/xattr.c:520:9: note: (near initialization for 'pvfs2_xattr_default_handler.get')
> fs/orangefs/xattr.c:521:9: warning: initialization from incompatible pointer type [-Wincompatible-pointer-types]
>   .set = pvfs2_xattr_set_default,
>          ^
> fs/orangefs/xattr.c:521:9: note: (near initialization for 'pvfs2_xattr_default_handler.set')

That's "xattr handlers: Pass handler to operations instead of flags" fallout,
trivially adjusted (typical change is
-ext2_xattr_security_list(struct dentry *dentry, char *list, size_t list_size,
-                        const char *name, size_t name_len, int type)
+ext2_xattr_security_list(const struct xattr_handler *handler,
+                        struct dentry *dentry, char *list, size_t list_size,
+                        const char *name, size_t name_len)
with type replaced with handler->flags if it's used anywhere in the body;
AFAICS, none of orangefs instances use it at all, so it's just a matter of
changing the argument lists in pvfs2_xattr_[gs]et_{default,trusted},
adding const struct xattr_handler *handler in the beginning and removing
the last argument; callers in pvfs2_ioctl() should simply use
pvfs2_inode_[gs]etxattr()).

Note, however, that orangefs in linux-next lacks a lot of fixes (see
vfs.git#orangefs-untested for some; AFAICS, those are missing from all
branches in orangefs git tree) and there are problems I don't know
how to fix, mostly due to the lack of documentation.  The last I've
heard from them was that they were putting such docs together; hopefully
once that get done we'll be able to sort the rest of that thing out.
It'll be after -rc1, though.

So xattr conflicts are the least of the problems there; those are easy
to adjust for, there are more serious issues in the entire thing ;-/
BTW, while we are at it - pvfs2_listxattr() doesn't even validate
resp.listxattr.returned_count, so a bogus response from buggered
server will do really interesting things to the kernel.

I'll cook the minimal fixup for API change after I get some sleep and
send it your way, unless somebody gets there first...
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stephen Rothwell Nov. 11, 2015, 10:28 a.m. UTC | #19
Hi Al,

On Wed, 11 Nov 2015 10:19:48 +0000 Al Viro <viro@ZenIV.linux.org.uk> wrote:
>
> On Wed, Nov 11, 2015 at 07:16:36PM +1100, Stephen Rothwell wrote:
> > 
> > So could you please remove the 4.5 stuff from your for-next branch
> > until after the merge window closes.  
> 
> Done.

Thanks.

> > Also, I noticed these new warnings today:
> > 
> I'll cook the minimal fixup for API change after I get some sleep and
> send it your way, unless somebody gets there first...

Thanks again.
Mike Marshall Nov. 11, 2015, 4:25 p.m. UTC | #20
I'm the Orangefs guy...

If the orangefs warnings that people see because of what's in
linux-next is annoying, I could focus on quieting them down...

We've been focusing on code review and documentation ever
since our last big exchange with Al and Linus...

-Mike

On Wed, Nov 11, 2015 at 5:28 AM, Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> Hi Al,
>
> On Wed, 11 Nov 2015 10:19:48 +0000 Al Viro <viro@ZenIV.linux.org.uk> wrote:
>>
>> On Wed, Nov 11, 2015 at 07:16:36PM +1100, Stephen Rothwell wrote:
>> >
>> > So could you please remove the 4.5 stuff from your for-next branch
>> > until after the merge window closes.
>>
>> Done.
>
> Thanks.
>
>> > Also, I noticed these new warnings today:
>> >
>> I'll cook the minimal fixup for API change after I get some sleep and
>> send it your way, unless somebody gets there first...
>
> Thanks again.
>
> --
> Cheers,
> Stephen Rothwell                    sfr@canb.auug.org.au
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Al Viro Nov. 11, 2015, 4:36 p.m. UTC | #21
On Wed, Nov 11, 2015 at 11:25:17AM -0500, Mike Marshall wrote:
> I'm the Orangefs guy...
> 
> If the orangefs warnings that people see because of what's in
> linux-next is annoying, I could focus on quieting them down...

See the fixup just posted in this thread.

> We've been focusing on code review and documentation ever
> since our last big exchange with Al and Linus...

BTW, could you put the current state of the docs someplace public?
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mike Marshall Nov. 11, 2015, 4:56 p.m. UTC | #22
> BTW, could you put the current state of the docs someplace public?

The documentation will eventually end up in
Documentation/filesystems/orangefs.txt.

This part about the creation of the shared memory between userspace and
the kernel module seems complete and accurate to me so far. This "bufmap"
data structure is central to the protocol between userspace and the kernel
module. This describes the creation of the bufmap, details on how it is used
in exchanges is what I am working on now...

-----------------------------------------------------------------------------------------------------------

Orangefs is a user space filesystem and an associated kernel module.
We'll just refer to the user space part of Orangefs as "userspace"
from here on out...

The kernel module implements a pseudo device that userspace
can read from and write to. Userspace can also manipulate the
kernel module through the pseudo device with ioctl.

At startup userspace allocates two page-size-aligned (posix_memalign)
mlocked memory blocks, one is used for IO and one is used for readdir
operations. The IO block is 41943040 bytes and the readdir block is
4194304 bytes. Each block contains logical chunks, and a pointer to each
block is added to its own PVFS_dev_map_desc structure which also describes
its total size, as well as the size and number of the logical chunks.

A pointer to the IO block's PVFS_dev_map_desc structure is sent to a
mapping routine in the kernel module with an ioctl. The structure is
copied from user space to kernel space with copy_from_user and is used
to initialize the kernel module's "bufmap" (struct pvfs2_bufmap), which
then contains:

  * refcnt - a reference counter
  * desc_size - PVFS2_BUFMAP_DEFAULT_DESC_SIZE (4194304) the IO block's
    logical chunk size, which represents the filesystem's block size and
    is used for s_blocksize in super blocks.
  * desc_count - PVFS2_BUFMAP_DEFAULT_DESC_COUNT (10) the number of
    logical chunks in the IO block.
  * desc_shift - log2(desc_size), used for s_blocksize_bits in super blocks.
  * total_size - the total size of the IO block.
  * page_count - the number of 4096 byte pages in the IO block.
  * page_array - a pointer to page_count * (sizeof(struct page*)) bytes
    of kcalloced memory. This memory is used as an array of pointers
    to each of the pages in the IO block through a call to get_user_pages.
  * desc_array - a pointer to desc_count * (sizeof(struct pvfs_bufmap_desc))
    bytes of kcalloced memory. This memory is further intialized:

      user_desc is the kernel's copy of the IO block's PVFS_dev_map_desc
      structure. user_desc->ptr points to the IO block.

      pages_per_desc = bufmap->desc_size / PAGE_SIZE
      offset = 0

        bufmap->desc_array[0].page_array = &bufmap->page_array[offset]
        bufmap->desc_array[0].array_count = pages_per_desc = 1024
        bufmap->desc_array[0].uaddr = (user_desc->ptr) + (0 * 1024 * 4096)
        offset += 1024
                           .
                           .
                           .
        bufmap->desc_array[9].page_array = &bufmap->page_array[offset]
        bufmap->desc_array[9].array_count = pages_per_desc = 1024
        bufmap->desc_array[9].uaddr = (user_desc->ptr) +
                                               (9 * 1024 * 4096)
        offset += 1024

  * buffer_index_array - a desc_count sized array of ints, used to
    indicate which of the IO block's chunks are available to use.
  * buffer_index_lock - a spinlock to protect buffer_index_array during update.
  * readdir_index_array - a five (PVFS2_READDIR_DEFAULT_DESC_COUNT) element
    int array used to indicate which of the readdir block's chunks are
    available to use.
  * readdir_index_lock - a spinlock to protect readdir_index_array during
    update.

On Wed, Nov 11, 2015 at 11:36 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Wed, Nov 11, 2015 at 11:25:17AM -0500, Mike Marshall wrote:
>> I'm the Orangefs guy...
>>
>> If the orangefs warnings that people see because of what's in
>> linux-next is annoying, I could focus on quieting them down...
>
> See the fixup just posted in this thread.
>
>> We've been focusing on code review and documentation ever
>> since our last big exchange with Al and Linus...
>
> BTW, could you put the current state of the docs someplace public?
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/dax.c b/fs/dax.c
index a86d3cc..7b653e9 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -169,8 +169,10 @@  static ssize_t dax_io(struct inode *inode, struct iov_iter *iter,
 		else
 			len = iov_iter_zero(max - pos, iter);
 
-		if (!len)
+		if (!len) {
+			retval = -EFAULT;
 			break;
+		}
 
 		pos += len;
 		addr += len;