mbox series

[GIT,PULL] 9p update for 5.7

Message ID 20200406110702.GA13469@nautica (mailing list archive)
State New, archived
Headers show
Series [GIT,PULL] 9p update for 5.7 | expand

Pull-request

https://github.com/martinetd/linux tags/9p-for-5.7

Message

Dominique Martinet April 6, 2020, 11:07 a.m. UTC
Hi Linus,

Not much new, but a few patches for this cycle.

Thanks,
Dominique


The following changes since commit 16fbf79b0f83bc752cee8589279f1ebfe57b3b6e:

  Linux 5.6-rc7 (2020-03-22 18:31:56 -0700)

are available in the Git repository at:

  https://github.com/martinetd/linux tags/9p-for-5.7

for you to fetch changes up to 43657496e46672fe63bccc1fcfb5b68de6e1e2f4:

  net/9p: remove unused p9_req_t aux field (2020-03-27 09:29:57 +0000)

----------------------------------------------------------------
9p pull request for inclusion in 5.7

- Fix read with O_NONBLOCK to allow incomplete read and return
immediately
- Rest is just cleanup (indent, unused field in struct, extra semicolon)

----------------------------------------------------------------
Dominique Martinet (1):
      net/9p: remove unused p9_req_t aux field

Krzysztof Kozlowski (1):
      9p: Fix Kconfig indentation

Sergey Alirzaev (2):
      9pnet: allow making incomplete read requests
      9p: read only once on O_NONBLOCK

zhengbin (1):
      9p: Remove unneeded semicolon

 fs/9p/Kconfig           |  18 +++++++++---------
 fs/9p/vfs_file.c        |   5 ++++-
 fs/9p/vfs_inode.c       |   2 +-
 include/net/9p/client.h |   4 ++--
 net/9p/client.c         | 144 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------------------------------------------------------
 5 files changed, 94 insertions(+), 79 deletions(-)

Comments

Linus Torvalds April 6, 2020, 3:53 p.m. UTC | #1
On Mon, Apr 6, 2020 at 4:07 AM Dominique Martinet
<asmadeus@codewreck.org> wrote:
>
> - Fix read with O_NONBLOCK to allow incomplete read and return
> immediately

Hmm. This is kind of special semantics (normally a POSIX filesystem
ignores O_NONBLOCK), but I guess it makes sense for a network
filesystem.

It might be worth a bti more documentation/commenting because of the
special semantics. For example, since you don't have 'poll()',
O_NONBLOCK doesn't really mean "nonblocking", it means "stop earlier"
if I read that patch right. You can't just return -EAGAIN because
there's no way to then avoid busy looping..

                Linus
Dominique Martinet April 6, 2020, 4:40 p.m. UTC | #2
Linus Torvalds wrote on Mon, Apr 06, 2020:
> On Mon, Apr 6, 2020 at 4:07 AM Dominique Martinet
> <asmadeus@codewreck.org> wrote:
> > - Fix read with O_NONBLOCK to allow incomplete read and return
> > immediately
> 
> Hmm. This is kind of special semantics (normally a POSIX filesystem
> ignores O_NONBLOCK), but I guess it makes sense for a network
> filesystem.
> 
> It might be worth a bti more documentation/commenting because of the
> special semantics. For example, since you don't have 'poll()',
> O_NONBLOCK doesn't really mean "nonblocking", it means "stop earlier"
> if I read that patch right. You can't just return -EAGAIN because
> there's no way to then avoid busy looping..

Yes, I think you got this right.

Basically there is no way to tell if the server will return immediately
or not, so even with O_NONBLOCK the read() will still be 'blocking' if
the server decides to wait for something before sending a reply.

This patch will just make the read stop after a single round-trip with
the server instead of looping to fill the buffer if O_NONBLOCK is set.

The use-case here is stuff like reading from synthetic files (think fake
pipes) where data comes in like a pipe and one would want read to return
as soon as data is available.
Just thinking out loud it might be possible to make pipes go through the
server and somewhat work, but this might bring its own share of other
problems and existing programs would need to be changed (e.g. wmii's
synthetic filesystem exposes this kind of files as well as regular
files, which works fine for their userspace client (wmiir) but can't
really be used with a linux client)
(Added Sergey in Cc for opinion)



Anyway, I agree looking at O_NONBLOCK for that isn't obvious.
I agree with the usecase here and posix allows short reads regardless of
the flag so the behaviour is legal either way ; the filesystem is
allowed to return whenever it wants on a whim - let's just add some docs
as you suggest unless Sergey has something to add.

I will add a few lines on that in Documentation/filesystems/9p.txt and
send another pull request in a couple of days to give whoever wants to
review time to comment on wording.

Cheers,
pr-tracker-bot@kernel.org April 6, 2020, 4:45 p.m. UTC | #3
The pull request you sent on Mon, 6 Apr 2020 13:07:02 +0200:

> https://github.com/martinetd/linux tags/9p-for-5.7

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/e14679b62d84b8ab9136189fc069d389da43fe71

Thank you!
Matthew Wilcox (Oracle) April 6, 2020, 4:46 p.m. UTC | #4
On Mon, Apr 06, 2020 at 06:40:57PM +0200, Dominique Martinet wrote:
> Anyway, I agree looking at O_NONBLOCK for that isn't obvious.
> I agree with the usecase here and posix allows short reads regardless of
> the flag so the behaviour is legal either way ; the filesystem is
> allowed to return whenever it wants on a whim - let's just add some docs
> as you suggest unless Sergey has something to add.

Ahahahahhahahahahaha.

POSIX may well "allow" short reads, but userspace programmers basically
never check the return value from read().  Short reads aren't actually
allowed.  That's why signals are only allowed to interrupt syscalls if
they're fatal (and the application will never see the returned value
because it's already dead).
Dominique Martinet April 6, 2020, 4:55 p.m. UTC | #5
Matthew Wilcox wrote on Mon, Apr 06, 2020:
> POSIX may well "allow" short reads, but userspace programmers basically
> never check the return value from read().  Short reads aren't actually
> allowed.  That's why signals are only allowed to interrupt syscalls if
> they're fatal (and the application will never see the returned value
> because it's already dead).

I've seen tons of programs not check read return value yes but these
also have no idea what O_NONBLOCK is so I'm not sure how realistic a
use-case that is?

The alternative I see would be making pipes go through the server as I
said, but that would probably mean another mount option for this; pipes
work as local pipes like they do in nfs currently.
Linus Torvalds April 6, 2020, 5:04 p.m. UTC | #6
On Mon, Apr 6, 2020 at 9:46 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> POSIX may well "allow" short reads, but userspace programmers basically
> never check the return value from read().  Short reads aren't actually
> allowed.  That's why signals are only allowed to interrupt syscalls if
> they're fatal (and the application will never see the returned value
> because it's already dead).

Well, that's true for some applications.

But look at anybody who ever worked more with NFS mounts, and they got
used to having the 'intr' mount flag set and incomplete reads and
-EAGAIN as a result.

So a lot of normal applications are actually used to partial reads
even from file reads.

Are there apps that react badly? I'm sure - but they also wouldn't
have O_NONBLOCK set on a regular file. The only reason to set
O_NONBLOCK is because you think the fd might be a pipe or something,
and you _are_ ready to get partial reads.

So the 9p behavior certainly isn't outrageously out of line for a
network filesystem. In fact, because of O_NONBLOCK rather than a mount
option, I think it's a lot safer than a fairly standard NFS option.

               Linus
Matthew Wilcox (Oracle) April 6, 2020, 5:39 p.m. UTC | #7
On Mon, Apr 06, 2020 at 10:04:11AM -0700, Linus Torvalds wrote:
> On Mon, Apr 6, 2020 at 9:46 AM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > POSIX may well "allow" short reads, but userspace programmers basically
> > never check the return value from read().  Short reads aren't actually
> > allowed.  That's why signals are only allowed to interrupt syscalls if
> > they're fatal (and the application will never see the returned value
> > because it's already dead).
> 
> Well, that's true for some applications.
> 
> But look at anybody who ever worked more with NFS mounts, and they got
> used to having the 'intr' mount flag set and incomplete reads and
> -EAGAIN as a result.

That's why you had me implement TASK_KILLABLE ;-)

> Are there apps that react badly? I'm sure - but they also wouldn't
> have O_NONBLOCK set on a regular file. The only reason to set
> O_NONBLOCK is because you think the fd might be a pipe or something,
> and you _are_ ready to get partial reads.
> 
> So the 9p behavior certainly isn't outrageously out of line for a
> network filesystem. In fact, because of O_NONBLOCK rather than a mount
> option, I think it's a lot safer than a fairly standard NFS option.

The NFS option has been a no-op for over a decade ;-)  I agree with you
that O_NONBLOCK is a good indicator the application is willing to handle
short reads (or indeed writes).
Linus Torvalds April 6, 2020, 5:46 p.m. UTC | #8
On Mon, Apr 6, 2020 at 10:40 AM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > But look at anybody who ever worked more with NFS mounts, and they got
> > used to having the 'intr' mount flag set and incomplete reads and
> > -EAGAIN as a result.
>
> That's why you had me implement TASK_KILLABLE ;-)

Oh, absolutely. We can *NOT* do this in general. Applications _will_
break if you end up just randomly breaking POSIX behavior.

But network filesystems are almost never fully POSIX anyway. And yes,
they do break some apps.  'intr' may not be a thing any more, but
other differences wrt various atomicity guarantees (or file locking)
etc still exist.

So the whole "network filesystems do odd things in corner cases" isn't
exactly unusual.

I think the O_NONBLOCK difference is one of the more benign ones.

I just think it should be documented more.

             Linus
Dominique Martinet April 6, 2020, 6:42 p.m. UTC | #9
Linus Torvalds wrote on Mon, Apr 06, 2020:
> I think the O_NONBLOCK difference is one of the more benign ones.
> 
> I just think it should be documented more.

Hadn't explicitely added you in Ccs, but I did post something to
document it:
http://lkml.kernel.org/r/1586193572-1375-1-git-send-email-asmadeus@codewreck.org

(small enough to just quote:)
--------------
diff --git a/Documentation/filesystems/9p.txt b/Documentation/filesystems/9p.txt
index fec7144e817c..3fb780ffdf23 100644
--- a/Documentation/filesystems/9p.txt
+++ b/Documentation/filesystems/9p.txt
@@ -133,6 +133,16 @@ OPTIONS
 		cache tags for existing cache sessions can be listed at
 		/sys/fs/9p/caches. (applies only to cache=fscache)
 
+BEHAVIOR
+========
+
+This section aims at describing 9p 'quirks' that can be different
+from a local filesystem behaviors.
+
+ - Setting O_NONBLOCK on a file will make client reads return as early
+   as the server returns some data instead of trying to fill the read
+   buffer with the requested amount of bytes or end of file is reached.
+
 RESOURCES
 =========

--------------

Will submit the pull request again with that in a couple of days unless
anything bad happens.

Thanks,
L29Ah April 7, 2020, 2:16 a.m. UTC | #10
On Mon, Apr 06, 2020 at 06:40:57PM +0200, Dominique Martinet wrote:
> The use-case here is stuff like reading from synthetic files (think fake
> pipes) where data comes in like a pipe and one would want read to return
> as soon as data is available.
> Just thinking out loud it might be possible to make pipes go through the
> server and somewhat work, but this might bring its own share of other
> problems and existing programs would need to be changed (e.g. wmii's
> synthetic filesystem exposes this kind of files as well as regular
> files, which works fine for their userspace client (wmiir) but can't
> really be used with a linux client)

> Anyway, I agree looking at O_NONBLOCK for that isn't obvious.
> I agree with the usecase here and posix allows short reads regardless of
> the flag so the behaviour is legal either way ; the filesystem is
> allowed to return whenever it wants on a whim - let's just add some docs
> as you suggest unless Sergey has something to add.

In fact i would prefer disabling the full reads unconditionally, but AFAIR some userspace programs might interpret a short read as EOF (and also would need to check the logic that motivated the kernel-side looping).
Dominique Martinet April 7, 2020, 6:31 a.m. UTC | #11
L29Ah wrote on Tue, Apr 07, 2020:
> In fact i would prefer disabling the full reads unconditionally, but
> AFAIR some userspace programs might interpret a short read as EOF (and
> also would need to check the logic that motivated the kernel-side
> looping).

Willy is correct there we can't just do that, way too many applications
would break.

I think O_NONBLOCK on regular files is a good compromise, let's not go
overboard :)
Linus Torvalds April 7, 2020, 5:59 p.m. UTC | #12
On Mon, Apr 6, 2020 at 7:16 PM L29Ah <l29ah@cock.li> wrote:
>
> In fact i would prefer disabling the full reads unconditionally, but AFAIR some userspace programs might interpret a short read as EOF (and also would need to check the logic that motivated the kernel-side looping).

Oh, it's even worse than "might interpret a short read as EOF".

Lots of ad-hoc small tools will basically do something like

     fd = open(name, O_RDONLY);
     fstat(fd, &st);
     buf = malloc(st.st_size);
     read(fd, buf, st.st_size);

and be done with it. Obviously they may have some error handling (ie
imagine the above being written with proper tests for buf beign NULL
and 'fstat()' returning an error), but if they check the return value
of "read()" at all, it might be just to verify that it matches
st.st_size.

I've written stuff like that myself.

Sure, the "real" programs I write would have loops with EAGAIN and
partial reads, and maybe I'd have a helper function called "xread()"
that does that.  And most major applications will do things like that,
exactly because they've seen years of development, they're trying to
be portable, and they might even have hit other network filesystems
that do partial reads or return EAGAIN - or they might have more
complex functionality anyway which allows you to pipe things in from a
buffer etc.

But the above kind of "assume read() gets the whole thing" is not
unusual for quick hacks.

After all, it's a _valid_ assumption for a proper POSIX filesystem,
although it obviously _also_ assumes that nobody else is writing to
that file at the same time.

And some of those quick hacks may end up existing for years in major
code-bases, who knows..

[ Honesty in advertising: the Linux VFS layer itself says "screw
POSIX" for some things.

  Particularly, if somebody tries to do a read larger than 2GB in
size, the VFS layer will just say "POSIX is garbage in this situation,
we _will_ truncate this read".

  So if you deal with huge files, you _have_ to do the proper "loop
until EOF" even for regular files, and POSIX be damned.

  The kernel refuses to do crazy things, and no amount of standard
paperwork matters. ]

But basically honoring full reads for any _reasonable_ situation is
pretty much required for a lot of reasons. Yes, lots of apps will deal
gracefully with partial reads - maybe even most. But "lots" is not
"all".

             Linus