diff mbox series

[v3,02/11] 9pfs: require msize >= 4096

Message ID 49ff399635ccfd21858b15417a398df362ff0b90.1578957500.git.qemu_oss@crudebyte.com (mailing list archive)
State New, archived
Headers show
Series 9pfs: readdir optimization | expand

Commit Message

Christian Schoenebeck Jan. 13, 2020, 10:21 p.m. UTC
A client establishes a session by sending a Tversion request along with
a 'msize' parameter which client uses to suggest server a maximum
message size ever to be used for communication (for both requests and
replies) between client and server during that session. If client
suggests a 'msize' smaller than 4096 then deny session by server
immediately with an error response (Rlerror for "9P2000.L" clients or
Rerror for "9P2000.u" clients) instead of replying with Rversion.

Introduction of a minimum msize is required to prevent issues in
responses to numerous individual request types. For instance with a
msize of < P9_IOHDRSZ no useful operations would be possible at all.
Furthermore there are some responses which are not allowed by the 9p
protocol to be truncated like e.g. Rreadlink which may yield up to
a size of PATH_MAX which is usually 4096. Hence this size was chosen
as min. msize for server, which is already the minimum msize of the
Linux kernel's 9pfs client. By forcing a min. msize already at
session start (when handling Tversion) we don't have to check for a
minimum msize on a per request type basis later on during session,
which would be much harder and more error prone to maintain.

This is a user visible change which should be documented as such
(i.e. in public QEMU release changelog).

Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
---
 hw/9pfs/9p.c | 12 ++++++++++++
 hw/9pfs/9p.h | 11 +++++++++++
 2 files changed, 23 insertions(+)

Comments

Greg Kurz Jan. 16, 2020, 1:15 p.m. UTC | #1
On Mon, 13 Jan 2020 23:21:04 +0100
Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:

> A client establishes a session by sending a Tversion request along with
> a 'msize' parameter which client uses to suggest server a maximum
> message size ever to be used for communication (for both requests and
> replies) between client and server during that session. If client
> suggests a 'msize' smaller than 4096 then deny session by server
> immediately with an error response (Rlerror for "9P2000.L" clients or
> Rerror for "9P2000.u" clients) instead of replying with Rversion.
> 
> Introduction of a minimum msize is required to prevent issues in
> responses to numerous individual request types. For instance with a
> msize of < P9_IOHDRSZ no useful operations would be possible at all.

P9_IOHDRSZ is really specific to write/read operations. 

/*
 * ample room for Twrite/Rread header
 * size[4] Tread/Twrite tag[2] fid[4] offset[8] count[4]
 */
#define P9_IOHDRSZ 24

As you see P9_IOHDRSZ is approximately the size of a Twrite header.
Its primary use is to inform the client about the 'count' to use for
Twrite/Tread messages (see get_iounit()).

Not sure it helps to mention P9_IOHDRSZ since we're going to choose
something much greater. I'd personally drop this sentence.

> Furthermore there are some responses which are not allowed by the 9p
> protocol to be truncated like e.g. Rreadlink which may yield up to

No message may be truncated in any way actually. The spec just allows
an exception with the string part of Rerror.

Maybe just mention that and say we choose 4096 to be able to send
big Rreadlink messages.

> a size of PATH_MAX which is usually 4096. Hence this size was chosen
> as min. msize for server, which is already the minimum msize of the
> Linux kernel's 9pfs client. By forcing a min. msize already at
> session start (when handling Tversion) we don't have to check for a
> minimum msize on a per request type basis later on during session,
> which would be much harder and more error prone to maintain.
> 
> This is a user visible change which should be documented as such
> (i.e. in public QEMU release changelog).
> 

This last sentence isn't informative in the commit message. This
kind of indication should be added after the --- below.

> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
> ---

LGTM

With an updated changelog,

Reviewed-by: Greg Kurz <groug@kaod.org>

>  hw/9pfs/9p.c | 12 ++++++++++++
>  hw/9pfs/9p.h | 11 +++++++++++
>  2 files changed, 23 insertions(+)
> 
> diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> index 520177f40c..a5fbe821d4 100644
> --- a/hw/9pfs/9p.c
> +++ b/hw/9pfs/9p.c
> @@ -1363,8 +1363,20 @@ static void coroutine_fn v9fs_version(void *opaque)
>          s->proto_version = V9FS_PROTO_2000L;
>      } else {
>          v9fs_string_sprintf(&version, "unknown");
> +        /* skip min. msize check, reporting invalid version has priority */
> +        goto marshal;
>      }
>  
> +    if (s->msize < P9_MIN_MSIZE) {
> +        err = -EMSGSIZE;
> +        error_report(
> +            "9pfs: Client requested msize < minimum msize ("
> +            stringify(P9_MIN_MSIZE) ") supported by this server."
> +        );
> +        goto out;
> +    }
> +
> +marshal:
>      err = pdu_marshal(pdu, offset, "ds", s->msize, &version);
>      if (err < 0) {
>          goto out;
> diff --git a/hw/9pfs/9p.h b/hw/9pfs/9p.h
> index 3904f82901..6fffe44f5a 100644
> --- a/hw/9pfs/9p.h
> +++ b/hw/9pfs/9p.h
> @@ -100,6 +100,17 @@ typedef enum P9ProtoVersion {
>      V9FS_PROTO_2000L = 0x02,
>  } P9ProtoVersion;
>  
> +/**
> + * @brief Minimum message size supported by this 9pfs server.
> + *
> + * A client establishes a session by sending a Tversion request along with a
> + * 'msize' parameter which suggests the server a maximum message size ever to be
> + * used for communication (for both requests and replies) between client and
> + * server during that session. If client suggests a 'msize' smaller than this
> + * value then session is denied by server with an error response.
> + */
> +#define P9_MIN_MSIZE    4096
> +
>  #define P9_NOTAG    UINT16_MAX
>  #define P9_NOFID    UINT32_MAX
>  #define P9_MAXWELEM 16
Christian Schoenebeck Jan. 16, 2020, 4:16 p.m. UTC | #2
On Donnerstag, 16. Januar 2020 14:15:03 CET Greg Kurz wrote:
> On Mon, 13 Jan 2020 23:21:04 +0100
> 
> Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:
> > A client establishes a session by sending a Tversion request along with
> > a 'msize' parameter which client uses to suggest server a maximum
> > message size ever to be used for communication (for both requests and
> > replies) between client and server during that session. If client
> > suggests a 'msize' smaller than 4096 then deny session by server
> > immediately with an error response (Rlerror for "9P2000.L" clients or
> > Rerror for "9P2000.u" clients) instead of replying with Rversion.
> > 
> > Introduction of a minimum msize is required to prevent issues in
> > responses to numerous individual request types. For instance with a
> > msize of < P9_IOHDRSZ no useful operations would be possible at all.
> 
> P9_IOHDRSZ is really specific to write/read operations.
> 
> /*
>  * ample room for Twrite/Rread header
>  * size[4] Tread/Twrite tag[2] fid[4] offset[8] count[4]
>  */
> #define P9_IOHDRSZ 24
> 
> As you see P9_IOHDRSZ is approximately the size of a Twrite header.
> Its primary use is to inform the client about the 'count' to use for
> Twrite/Tread messages (see get_iounit()).
> 
> Not sure it helps to mention P9_IOHDRSZ since we're going to choose
> something much greater. I'd personally drop this sentence.

The point here was that there must be some kind of absolute minimum msize, 
even though the protocol specs officially don't mandate one. And if a client 
choses a msize < P9_IOHDRSZ, what useful actions shall it be able to do? 
That's why I mentioned P9_IOHDRSZ just in case somebody might come up later 
asking to lower that minimum msize value for whatever reason.

But np, IMO this sentence makes the fundamental requirement for this patch 
more clear, but I have no problem dropping this sentence.

> > Furthermore there are some responses which are not allowed by the 9p
> > protocol to be truncated like e.g. Rreadlink which may yield up to
> 
> No message may be truncated in any way actually. The spec just allows
> an exception with the string part of Rerror.

Rreaddir, Rread, Twrite, Rerror can be truncated. So I suggest I'll just 
change that to s/some/most/ semantically.

> Maybe just mention that and say we choose 4096 to be able to send
> big Rreadlink messages.

That's not the point for this patch. The main purpose is to avoid having to 
maintain individual error prone minimum msize checks all over the code base.
If we would allow very small msizes (much smaller than 4096), then we would 
need to add numerous error handlers all over the code base, each one checking 
for different values (depending on the specific message type).

> > a size of PATH_MAX which is usually 4096. Hence this size was chosen
> > as min. msize for server, which is already the minimum msize of the
> > Linux kernel's 9pfs client. By forcing a min. msize already at
> > session start (when handling Tversion) we don't have to check for a
> > minimum msize on a per request type basis later on during session,
> > which would be much harder and more error prone to maintain.
> > 
> > This is a user visible change which should be documented as such
> > (i.e. in public QEMU release changelog).
> 
> This last sentence isn't informative in the commit message. This
> kind of indication should be added after the --- below.
> 
> > Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
> > ---

np

> 
> LGTM
> 
> With an updated changelog,
> 
> Reviewed-by: Greg Kurz <groug@kaod.org>
> 
> >  hw/9pfs/9p.c | 12 ++++++++++++
> >  hw/9pfs/9p.h | 11 +++++++++++
> >  2 files changed, 23 insertions(+)
> > 
> > diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> > index 520177f40c..a5fbe821d4 100644
> > --- a/hw/9pfs/9p.c
> > +++ b/hw/9pfs/9p.c
> > @@ -1363,8 +1363,20 @@ static void coroutine_fn v9fs_version(void *opaque)
> > 
> >          s->proto_version = V9FS_PROTO_2000L;
> >      
> >      } else {
> >      
> >          v9fs_string_sprintf(&version, "unknown");
> > 
> > +        /* skip min. msize check, reporting invalid version has priority
> > */ +        goto marshal;
> > 
> >      }
> > 
> > +    if (s->msize < P9_MIN_MSIZE) {
> > +        err = -EMSGSIZE;
> > +        error_report(
> > +            "9pfs: Client requested msize < minimum msize ("
> > +            stringify(P9_MIN_MSIZE) ") supported by this server."
> > +        );
> > +        goto out;
> > +    }
> > +
> > 
> > +marshal:
> >      err = pdu_marshal(pdu, offset, "ds", s->msize, &version);
> >      if (err < 0) {
> >      
> >          goto out;
> > 
> > diff --git a/hw/9pfs/9p.h b/hw/9pfs/9p.h
> > index 3904f82901..6fffe44f5a 100644
> > --- a/hw/9pfs/9p.h
> > +++ b/hw/9pfs/9p.h
> > @@ -100,6 +100,17 @@ typedef enum P9ProtoVersion {
> > 
> >      V9FS_PROTO_2000L = 0x02,
> >  
> >  } P9ProtoVersion;
> > 
> > +/**
> > + * @brief Minimum message size supported by this 9pfs server.
> > + *
> > + * A client establishes a session by sending a Tversion request along
> > with a + * 'msize' parameter which suggests the server a maximum message
> > size ever to be + * used for communication (for both requests and
> > replies) between client and + * server during that session. If client
> > suggests a 'msize' smaller than this + * value then session is denied by
> > server with an error response. + */
> > +#define P9_MIN_MSIZE    4096
> > +
> > 
> >  #define P9_NOTAG    UINT16_MAX
> >  #define P9_NOFID    UINT32_MAX
> >  #define P9_MAXWELEM 16

Best regards,
Christian Schoenebeck
Greg Kurz Jan. 16, 2020, 6:07 p.m. UTC | #3
On Thu, 16 Jan 2020 17:16:07 +0100
Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:

> On Donnerstag, 16. Januar 2020 14:15:03 CET Greg Kurz wrote:
> > On Mon, 13 Jan 2020 23:21:04 +0100
> > 
> > Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:
> > > A client establishes a session by sending a Tversion request along with
> > > a 'msize' parameter which client uses to suggest server a maximum
> > > message size ever to be used for communication (for both requests and
> > > replies) between client and server during that session. If client
> > > suggests a 'msize' smaller than 4096 then deny session by server
> > > immediately with an error response (Rlerror for "9P2000.L" clients or
> > > Rerror for "9P2000.u" clients) instead of replying with Rversion.
> > > 
> > > Introduction of a minimum msize is required to prevent issues in
> > > responses to numerous individual request types. For instance with a
> > > msize of < P9_IOHDRSZ no useful operations would be possible at all.
> > 
> > P9_IOHDRSZ is really specific to write/read operations.
> > 
> > /*
> >  * ample room for Twrite/Rread header
> >  * size[4] Tread/Twrite tag[2] fid[4] offset[8] count[4]
> >  */
> > #define P9_IOHDRSZ 24
> > 
> > As you see P9_IOHDRSZ is approximately the size of a Twrite header.
> > Its primary use is to inform the client about the 'count' to use for
> > Twrite/Tread messages (see get_iounit()).
> > 
> > Not sure it helps to mention P9_IOHDRSZ since we're going to choose
> > something much greater. I'd personally drop this sentence.
> 
> The point here was that there must be some kind of absolute minimum msize, 

Then the absolute minimum size is 7, the size of the header part that is
common to all messages:

size[4] Message tag[2]

> even though the protocol specs officially don't mandate one. And if a client 
> choses a msize < P9_IOHDRSZ, what useful actions shall it be able to do? 

I haven't checked but I guess some messages can fit in 24 bytes.

> That's why I mentioned P9_IOHDRSZ just in case somebody might come up later 
> asking to lower that minimum msize value for whatever reason.
> 

That's precisely what I want to avoid. The semantic of P9_IOHDRSZ is
that it is the header size of a Twrite message and as such it is used
to compute the 'iounit' which the guest later uses to compute a
suitable 'count' for Tread/Twrite messages only. It shouldn't be
involved in msize considerations IMHO.

> But np, IMO this sentence makes the fundamental requirement for this patch 
> more clear, but I have no problem dropping this sentence.
> 

Then you can mention 7 has the true minimal size.

> > > Furthermore there are some responses which are not allowed by the 9p
> > > protocol to be truncated like e.g. Rreadlink which may yield up to
> > 
> > No message may be truncated in any way actually. The spec just allows
> > an exception with the string part of Rerror.
> 
> Rreaddir, Rread, Twrite, Rerror can be truncated. So I suggest I'll just 
> change that to s/some/most/ semantically.
> 

I mean truncate with loss of data. Rreaddir can return less than 'count'
but it won't truncate an individual entry. It rewinds to the previous
one and returns its offset for the next Treaddir. Same goes with Rread
and Twrite, we always return a 'count' that can be used by the client
to continue reading or writing.

Rerror is really the only message where we can deliberately drop
data (but we actually don't do that).

> > Maybe just mention that and say we choose 4096 to be able to send
> > big Rreadlink messages.
> 
> That's not the point for this patch. The main purpose is to avoid having to 
> maintain individual error prone minimum msize checks all over the code base.
> If we would allow very small msizes (much smaller than 4096), then we would 
> need to add numerous error handlers all over the code base, each one checking 
> for different values (depending on the specific message type).
> 

I'm not sure what you mean by 'minimum msize checks all over the code base'...
Please provide an example.

> > > a size of PATH_MAX which is usually 4096. Hence this size was chosen
> > > as min. msize for server, which is already the minimum msize of the
> > > Linux kernel's 9pfs client. By forcing a min. msize already at
> > > session start (when handling Tversion) we don't have to check for a
> > > minimum msize on a per request type basis later on during session,
> > > which would be much harder and more error prone to maintain.
> > > 
> > > This is a user visible change which should be documented as such
> > > (i.e. in public QEMU release changelog).
> > 
> > This last sentence isn't informative in the commit message. This
> > kind of indication should be added after the --- below.
> > 
> > > Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
> > > ---
> 
> np
> 
> > 
> > LGTM
> > 
> > With an updated changelog,
> > 
> > Reviewed-by: Greg Kurz <groug@kaod.org>
> > 
> > >  hw/9pfs/9p.c | 12 ++++++++++++
> > >  hw/9pfs/9p.h | 11 +++++++++++
> > >  2 files changed, 23 insertions(+)
> > > 
> > > diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> > > index 520177f40c..a5fbe821d4 100644
> > > --- a/hw/9pfs/9p.c
> > > +++ b/hw/9pfs/9p.c
> > > @@ -1363,8 +1363,20 @@ static void coroutine_fn v9fs_version(void *opaque)
> > > 
> > >          s->proto_version = V9FS_PROTO_2000L;
> > >      
> > >      } else {
> > >      
> > >          v9fs_string_sprintf(&version, "unknown");
> > > 
> > > +        /* skip min. msize check, reporting invalid version has priority
> > > */ +        goto marshal;
> > > 
> > >      }
> > > 
> > > +    if (s->msize < P9_MIN_MSIZE) {
> > > +        err = -EMSGSIZE;
> > > +        error_report(
> > > +            "9pfs: Client requested msize < minimum msize ("
> > > +            stringify(P9_MIN_MSIZE) ") supported by this server."
> > > +        );
> > > +        goto out;
> > > +    }
> > > +
> > > 
> > > +marshal:
> > >      err = pdu_marshal(pdu, offset, "ds", s->msize, &version);
> > >      if (err < 0) {
> > >      
> > >          goto out;
> > > 
> > > diff --git a/hw/9pfs/9p.h b/hw/9pfs/9p.h
> > > index 3904f82901..6fffe44f5a 100644
> > > --- a/hw/9pfs/9p.h
> > > +++ b/hw/9pfs/9p.h
> > > @@ -100,6 +100,17 @@ typedef enum P9ProtoVersion {
> > > 
> > >      V9FS_PROTO_2000L = 0x02,
> > >  
> > >  } P9ProtoVersion;
> > > 
> > > +/**
> > > + * @brief Minimum message size supported by this 9pfs server.
> > > + *
> > > + * A client establishes a session by sending a Tversion request along
> > > with a + * 'msize' parameter which suggests the server a maximum message
> > > size ever to be + * used for communication (for both requests and
> > > replies) between client and + * server during that session. If client
> > > suggests a 'msize' smaller than this + * value then session is denied by
> > > server with an error response. + */
> > > +#define P9_MIN_MSIZE    4096
> > > +
> > > 
> > >  #define P9_NOTAG    UINT16_MAX
> > >  #define P9_NOFID    UINT32_MAX
> > >  #define P9_MAXWELEM 16
> 
> Best regards,
> Christian Schoenebeck
> 
>
Christian Schoenebeck Jan. 16, 2020, 9:39 p.m. UTC | #4
On Donnerstag, 16. Januar 2020 19:07:48 CET Greg Kurz wrote:
> > The point here was that there must be some kind of absolute minimum msize,
> 
> Then the absolute minimum size is 7, the size of the header part that is
> common to all messages:
> 
> size[4] Message tag[2]
> 
> > even though the protocol specs officially don't mandate one. And if a
> > client choses a msize < P9_IOHDRSZ, what useful actions shall it be able
> > to do?
> I haven't checked but I guess some messages can fit in 24 bytes.
> 
> > That's why I mentioned P9_IOHDRSZ just in case somebody might come up
> > later
> > asking to lower that minimum msize value for whatever reason.
> 
> That's precisely what I want to avoid. The semantic of P9_IOHDRSZ is
> that it is the header size of a Twrite message and as such it is used
> to compute the 'iounit' which the guest later uses to compute a
> suitable 'count' for Tread/Twrite messages only. It shouldn't be
> involved in msize considerations IMHO.
> 
> > But np, IMO this sentence makes the fundamental requirement for this patch
> > more clear, but I have no problem dropping this sentence.
> 
> Then you can mention 7 has the true minimal size.

Ok, I'll drop P9_IOHDRSZ completely and mention literal 7 as absolute 
theoretical minimum instead.

> > > > Furthermore there are some responses which are not allowed by the 9p
> > > > protocol to be truncated like e.g. Rreadlink which may yield up to
> > > 
> > > No message may be truncated in any way actually. The spec just allows
> > > an exception with the string part of Rerror.
> > 
> > Rreaddir, Rread, Twrite, Rerror can be truncated. So I suggest I'll just
> > change that to s/some/most/ semantically.
> 
> I mean truncate with loss of data. Rreaddir can return less than 'count'
> but it won't truncate an individual entry. It rewinds to the previous
> one and returns its offset for the next Treaddir. Same goes with Rread
> and Twrite, we always return a 'count' that can be used by the client
> to continue reading or writing.

Individual entries are not truncated, correct, but data loss: Example, you 
have this directory and say server's fs would deliver entries by readdir() in 
this order (from top down):

	.
	..
	a
	1234567890
	b
	c
	d

and say 37 >= msize < 45, then client would receive 3 entries ".", "..", "a" 
and that's it.

> Rerror is really the only message where we can deliberately drop
> data (but we actually don't do that).
> 
> > > Maybe just mention that and say we choose 4096 to be able to send
> > > big Rreadlink messages.
> > 
> > That's not the point for this patch. The main purpose is to avoid having
> > to
> > maintain individual error prone minimum msize checks all over the code
> > base. If we would allow very small msizes (much smaller than 4096), then
> > we would need to add numerous error handlers all over the code base, each
> > one checking for different values (depending on the specific message
> > type).
> 
> I'm not sure what you mean by 'minimum msize checks all over the code
> base'... Please provide an example.

1. Like done in this patch series: Treaddir has to limit client's 'count'
   parameter according to msize (by subtracting with msize).

2. get_iounit() does this:

		iounit = stbuf.f_bsize;
		iounit *= (s->msize - P9_IOHDRSZ) / stbuf.f_bsize;

   without checking anywhere for a potential negative outcome
   (i.e. when msize < P9_IOHDRSZ)

3. Example of directory entry name length with Rreaddir above.

Individual issues that can easily be overlooked but also easily avoided by not 
allowing small msizes in the first place.

Best regards,
Christian Schoenebeck
Greg Kurz Jan. 17, 2020, 10:24 a.m. UTC | #5
On Thu, 16 Jan 2020 22:39:19 +0100
Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:

> On Donnerstag, 16. Januar 2020 19:07:48 CET Greg Kurz wrote:
> > > The point here was that there must be some kind of absolute minimum msize,
> > 
> > Then the absolute minimum size is 7, the size of the header part that is
> > common to all messages:
> > 
> > size[4] Message tag[2]
> > 
> > > even though the protocol specs officially don't mandate one. And if a
> > > client choses a msize < P9_IOHDRSZ, what useful actions shall it be able
> > > to do?
> > I haven't checked but I guess some messages can fit in 24 bytes.
> > 
> > > That's why I mentioned P9_IOHDRSZ just in case somebody might come up
> > > later
> > > asking to lower that minimum msize value for whatever reason.
> > 
> > That's precisely what I want to avoid. The semantic of P9_IOHDRSZ is
> > that it is the header size of a Twrite message and as such it is used
> > to compute the 'iounit' which the guest later uses to compute a
> > suitable 'count' for Tread/Twrite messages only. It shouldn't be
> > involved in msize considerations IMHO.
> > 
> > > But np, IMO this sentence makes the fundamental requirement for this patch
> > > more clear, but I have no problem dropping this sentence.
> > 
> > Then you can mention 7 has the true minimal size.
> 
> Ok, I'll drop P9_IOHDRSZ completely and mention literal 7 as absolute 
> theoretical minimum instead.
> 
> > > > > Furthermore there are some responses which are not allowed by the 9p
> > > > > protocol to be truncated like e.g. Rreadlink which may yield up to
> > > > 
> > > > No message may be truncated in any way actually. The spec just allows
> > > > an exception with the string part of Rerror.
> > > 
> > > Rreaddir, Rread, Twrite, Rerror can be truncated. So I suggest I'll just
> > > change that to s/some/most/ semantically.
> > 
> > I mean truncate with loss of data. Rreaddir can return less than 'count'
> > but it won't truncate an individual entry. It rewinds to the previous
> > one and returns its offset for the next Treaddir. Same goes with Rread
> > and Twrite, we always return a 'count' that can be used by the client
> > to continue reading or writing.
> 
> Individual entries are not truncated, correct, but data loss: Example, you 
> have this directory and say server's fs would deliver entries by readdir() in 
> this order (from top down):
> 
> 	.
> 	..
> 	a
> 	1234567890
> 	b
> 	c
> 	d
> 
> and say 37 >= msize < 45, then client would receive 3 entries ".", "..", "a" 
> and that's it.
> 

Yes msize < 45 isn't enough to send the Rreaddir for "1234567890"
and since it is forbidden to truncate, we bail out... but we
did not send a truncated message still.

> > Rerror is really the only message where we can deliberately drop
> > data (but we actually don't do that).
> > 
> > > > Maybe just mention that and say we choose 4096 to be able to send
> > > > big Rreadlink messages.
> > > 
> > > That's not the point for this patch. The main purpose is to avoid having
> > > to
> > > maintain individual error prone minimum msize checks all over the code
> > > base. If we would allow very small msizes (much smaller than 4096), then
> > > we would need to add numerous error handlers all over the code base, each
> > > one checking for different values (depending on the specific message
> > > type).
> > 
> > I'm not sure what you mean by 'minimum msize checks all over the code
> > base'... Please provide an example.
> 
> 1. Like done in this patch series: Treaddir has to limit client's 'count'
>    parameter according to msize (by subtracting with msize).
> 

Hmm... this patch does a sanity check on 'count', not on 'msize'... I
mean no matter what msize is, clipping count to msize - 11 gives a
chance to stop processing the entries before overflowing the transport
buffer.

> 2. get_iounit() does this:
> 
> 		iounit = stbuf.f_bsize;
> 		iounit *= (s->msize - P9_IOHDRSZ) / stbuf.f_bsize;
> 
>    without checking anywhere for a potential negative outcome
>    (i.e. when msize < P9_IOHDRSZ)
> 

This function should even have an assert() for that, just to be
sure no one tries to call it before s->msize is even set, which
would clearly be a bug in QEMU. But this can be done in a
follow-up patch.

> 3. Example of directory entry name length with Rreaddir above.
> 

msize being too small to accommodate an Rreaddir with a single
entry is a different problem as we cannot do anything about it
at this point but fail... That's why the minimum msize should
rather be chosen with file names in mind, which are likely to
be longer than any message header. Rreadlink being the one with
the higher potential since it will usually return a string
containing a path name (while Rreaddir entries only contain
a single path element).

> Individual issues that can easily be overlooked but also easily avoided by not 
> allowing small msizes in the first place.
> 

My point is that we're not going to check msize in Tversion in
order to to avoid multiple checks everywhere. We're going to do
it there because it is the only place where it makes sense to
do it.

> Best regards,
> Christian Schoenebeck
> 
>
Christian Schoenebeck Jan. 17, 2020, 12:01 p.m. UTC | #6
On Freitag, 17. Januar 2020 11:24:21 CET Greg Kurz wrote:
> On Thu, 16 Jan 2020 22:39:19 +0100
> 
> Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:
> > On Donnerstag, 16. Januar 2020 19:07:48 CET Greg Kurz wrote:
> > > > The point here was that there must be some kind of absolute minimum
> > > > msize,
> > > 
> > > Then the absolute minimum size is 7, the size of the header part that is
> > > common to all messages:
> > > 
> > > size[4] Message tag[2]
> > > 
> > > > even though the protocol specs officially don't mandate one. And if a
> > > > client choses a msize < P9_IOHDRSZ, what useful actions shall it be
> > > > able
> > > > to do?
> > > 
> > > I haven't checked but I guess some messages can fit in 24 bytes.
> > > 
> > > > That's why I mentioned P9_IOHDRSZ just in case somebody might come up
> > > > later
> > > > asking to lower that minimum msize value for whatever reason.
> > > 
> > > That's precisely what I want to avoid. The semantic of P9_IOHDRSZ is
> > > that it is the header size of a Twrite message and as such it is used
> > > to compute the 'iounit' which the guest later uses to compute a
> > > suitable 'count' for Tread/Twrite messages only. It shouldn't be
> > > involved in msize considerations IMHO.
> > > 
> > > > But np, IMO this sentence makes the fundamental requirement for this
> > > > patch
> > > > more clear, but I have no problem dropping this sentence.
> > > 
> > > Then you can mention 7 has the true minimal size.
> > 
> > Ok, I'll drop P9_IOHDRSZ completely and mention literal 7 as absolute
> > theoretical minimum instead.
> > 
> > > > > > Furthermore there are some responses which are not allowed by the
> > > > > > 9p
> > > > > > protocol to be truncated like e.g. Rreadlink which may yield up to
> > > > > 
> > > > > No message may be truncated in any way actually. The spec just
> > > > > allows
> > > > > an exception with the string part of Rerror.
> > > > 
> > > > Rreaddir, Rread, Twrite, Rerror can be truncated. So I suggest I'll
> > > > just
> > > > change that to s/some/most/ semantically.
> > > 
> > > I mean truncate with loss of data. Rreaddir can return less than 'count'
> > > but it won't truncate an individual entry. It rewinds to the previous
> > > one and returns its offset for the next Treaddir. Same goes with Rread
> > > and Twrite, we always return a 'count' that can be used by the client
> > > to continue reading or writing.
> > 
> > Individual entries are not truncated, correct, but data loss: Example, you
> > have this directory and say server's fs would deliver entries by readdir()
> > in> 
> > this order (from top down):
> > 	.
> > 	..
> > 	a
> > 	1234567890
> > 	b
> > 	c
> > 	d
> > 
> > and say 37 >= msize < 45, then client would receive 3 entries ".", "..",
> > "a" and that's it.
> 
> Yes msize < 45 isn't enough to send the Rreaddir for "1234567890"
> and since it is forbidden to truncate, we bail out... but we
> did not send a truncated message still.

I really think we're discussing semantical interpretation details here which 
bring us nowhere.

> > > Rerror is really the only message where we can deliberately drop
> > > data (but we actually don't do that).
> > > 
> > > > > Maybe just mention that and say we choose 4096 to be able to send
> > > > > big Rreadlink messages.
> > > > 
> > > > That's not the point for this patch. The main purpose is to avoid
> > > > having
> > > > to
> > > > maintain individual error prone minimum msize checks all over the code
> > > > base. If we would allow very small msizes (much smaller than 4096),
> > > > then
> > > > we would need to add numerous error handlers all over the code base,
> > > > each
> > > > one checking for different values (depending on the specific message
> > > > type).
> > > 
> > > I'm not sure what you mean by 'minimum msize checks all over the code
> > > base'... Please provide an example.
> > 
> > 1. Like done in this patch series: Treaddir has to limit client's 'count'
> > 
> >    parameter according to msize (by subtracting with msize).
> 
> Hmm... this patch does a sanity check on 'count', not on 'msize'... 

Yes ... :)

> I mean no matter what msize is, clipping count to msize - 11 gives a
> chance to stop processing the entries before overflowing the transport
> buffer.

... and no, this cannot happen if minimum msize of 4096 is forced already by 
Tversion. Maybe you now get my point -> It is about avoiding exactly such kind 
of issues in the first place. Most file systems have a name limit of 255 
bytes:

https://en.wikipedia.org/wiki/Comparison_of_file_systems#Limits

So by forcing a minimum 'msize' of 4096 you avoid having to deal with this  
issue (and similar ones) on Treaddir request level (and other request type 
handlers), including ReiserFS BTW because 4032+35 < 4096.

If you would allow smaller 'msize' values by Tversion, then you would need to 
suffer some kind of death when handling Treaddir with certain high file name 
length. Either a transport error (with an error message that a normal user 
would not be able to understand at all) or by returning an incomplete Treaddir 
response sequence with { Rreaddir count=0 }, or ... any other kind of death.

No matter which death you would choose here, it would be horrible from 
usability perspective, because the system would most of the time work 
flawlessy, and an error case would just be triggered if guest hits a file/dir 
beyond some certain name length. It is not worth it! Force 4kiB already at 
Tversion and that's it.

> > 2. get_iounit() does this:
> > 		iounit = stbuf.f_bsize;
> > 		iounit *= (s->msize - P9_IOHDRSZ) / stbuf.f_bsize;
> > 		
> >    without checking anywhere for a potential negative outcome
> >    (i.e. when msize < P9_IOHDRSZ)
> 
> This function should even have an assert() for that, just to be
> sure no one tries to call it before s->msize is even set, which
> would clearly be a bug in QEMU. But this can be done in a
> follow-up patch.
> 
> > 3. Example of directory entry name length with Rreaddir above.
> 
> msize being too small to accommodate an Rreaddir with a single
> entry is a different problem as we cannot do anything about it
> at this point but fail... That's why the minimum msize should
> rather be chosen with file names in mind, which are likely to
> be longer than any message header. Rreadlink being the one with
> the higher potential since it will usually return a string
> containing a path name (while Rreaddir entries only contain
> a single path element).
> 
> > Individual issues that can easily be overlooked but also easily avoided by
> > not allowing small msizes in the first place.
> 
> My point is that we're not going to check msize in Tversion in
> order to to avoid multiple checks everywhere. We're going to do
> it there because it is the only place where it makes sense to
> do it.

Also yes and no. Of course it just makes sense to handle it already at 
Tversion. But no, you could theoretically also allow much smaller minimum 
'msize' value << 4096 (i.e. somewhere closely >7 as we discussed), then you 
would indeed still need to add msize checks at other places of the code base 
as you just found out now. So forcing a minimum 'msize' which is high enough, 
avoids having to add such individual checks and having to deal with them in 
some kind of unpleasant way.

I hope this makes it more clear now.

Best regards,
Christian Schoenebeck
Greg Kurz Jan. 17, 2020, 3:15 p.m. UTC | #7
On Fri, 17 Jan 2020 13:01:30 +0100
Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:

> On Freitag, 17. Januar 2020 11:24:21 CET Greg Kurz wrote:
> > On Thu, 16 Jan 2020 22:39:19 +0100
> > 
> > Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:
> > > On Donnerstag, 16. Januar 2020 19:07:48 CET Greg Kurz wrote:
> > > > > The point here was that there must be some kind of absolute minimum
> > > > > msize,
> > > > 
> > > > Then the absolute minimum size is 7, the size of the header part that is
> > > > common to all messages:
> > > > 
> > > > size[4] Message tag[2]
> > > > 
> > > > > even though the protocol specs officially don't mandate one. And if a
> > > > > client choses a msize < P9_IOHDRSZ, what useful actions shall it be
> > > > > able
> > > > > to do?
> > > > 
> > > > I haven't checked but I guess some messages can fit in 24 bytes.
> > > > 
> > > > > That's why I mentioned P9_IOHDRSZ just in case somebody might come up
> > > > > later
> > > > > asking to lower that minimum msize value for whatever reason.
> > > > 
> > > > That's precisely what I want to avoid. The semantic of P9_IOHDRSZ is
> > > > that it is the header size of a Twrite message and as such it is used
> > > > to compute the 'iounit' which the guest later uses to compute a
> > > > suitable 'count' for Tread/Twrite messages only. It shouldn't be
> > > > involved in msize considerations IMHO.
> > > > 
> > > > > But np, IMO this sentence makes the fundamental requirement for this
> > > > > patch
> > > > > more clear, but I have no problem dropping this sentence.
> > > > 
> > > > Then you can mention 7 has the true minimal size.
> > > 
> > > Ok, I'll drop P9_IOHDRSZ completely and mention literal 7 as absolute
> > > theoretical minimum instead.
> > > 
> > > > > > > Furthermore there are some responses which are not allowed by the
> > > > > > > 9p
> > > > > > > protocol to be truncated like e.g. Rreadlink which may yield up to
> > > > > > 
> > > > > > No message may be truncated in any way actually. The spec just
> > > > > > allows
> > > > > > an exception with the string part of Rerror.
> > > > > 
> > > > > Rreaddir, Rread, Twrite, Rerror can be truncated. So I suggest I'll
> > > > > just
> > > > > change that to s/some/most/ semantically.
> > > > 
> > > > I mean truncate with loss of data. Rreaddir can return less than 'count'
> > > > but it won't truncate an individual entry. It rewinds to the previous
> > > > one and returns its offset for the next Treaddir. Same goes with Rread
> > > > and Twrite, we always return a 'count' that can be used by the client
> > > > to continue reading or writing.
> > > 
> > > Individual entries are not truncated, correct, but data loss: Example, you
> > > have this directory and say server's fs would deliver entries by readdir()
> > > in> 
> > > this order (from top down):
> > > 	.
> > > 	..
> > > 	a
> > > 	1234567890
> > > 	b
> > > 	c
> > > 	d
> > > 
> > > and say 37 >= msize < 45, then client would receive 3 entries ".", "..",
> > > "a" and that's it.
> > 
> > Yes msize < 45 isn't enough to send the Rreaddir for "1234567890"
> > and since it is forbidden to truncate, we bail out... but we
> > did not send a truncated message still.
> 
> I really think we're discussing semantical interpretation details here which 
> bring us nowhere.
> 

Agreed.

> > > > Rerror is really the only message where we can deliberately drop
> > > > data (but we actually don't do that).
> > > > 
> > > > > > Maybe just mention that and say we choose 4096 to be able to send
> > > > > > big Rreadlink messages.
> > > > > 
> > > > > That's not the point for this patch. The main purpose is to avoid
> > > > > having
> > > > > to
> > > > > maintain individual error prone minimum msize checks all over the code
> > > > > base. If we would allow very small msizes (much smaller than 4096),
> > > > > then
> > > > > we would need to add numerous error handlers all over the code base,
> > > > > each
> > > > > one checking for different values (depending on the specific message
> > > > > type).
> > > > 
> > > > I'm not sure what you mean by 'minimum msize checks all over the code
> > > > base'... Please provide an example.
> > > 
> > > 1. Like done in this patch series: Treaddir has to limit client's 'count'
> > > 
> > >    parameter according to msize (by subtracting with msize).
> > 
> > Hmm... this patch does a sanity check on 'count', not on 'msize'... 
> 
> Yes ... :)
> 
> > I mean no matter what msize is, clipping count to msize - 11 gives a
> > chance to stop processing the entries before overflowing the transport
> > buffer.
> 
> ... and no, this cannot happen if minimum msize of 4096 is forced already by 
> Tversion. Maybe you now get my point -> It is about avoiding exactly such kind 

I'm not sure to see how setting a minimum msize of 4096 at Tversion would
prevent the client to pass a higher 'count' argument and lure the server
into generating a bigger than msize response since it does not check
count < msize - 11 without patch 3.

> of issues in the first place. Most file systems have a name limit of 255 
> bytes:
> 
> https://en.wikipedia.org/wiki/Comparison_of_file_systems#Limits
> 
> So by forcing a minimum 'msize' of 4096 you avoid having to deal with this  
> issue (and similar ones) on Treaddir request level (and other request type 
> handlers), including ReiserFS BTW because 4032+35 < 4096.
> 

Good to know for ReiserFS.

> If you would allow smaller 'msize' values by Tversion, then you would need to 
> suffer some kind of death when handling Treaddir with certain high file name 
> length. Either a transport error (with an error message that a normal user 
> would not be able to understand at all) or by returning an incomplete Treaddir 
> response sequence with { Rreaddir count=0 }, or ... any other kind of death.
> 

Ahh I now understand at last your argument about Rreaddir loosing data.
We may end up sending { Rreaddir count=0 } because the next entry is too
large... and thus end the readdir sequence. Mentioning this explicitly
from the start would have been more clear for me ;-)

This looks like yet another bug to me. It looks wrong to return this
special response if we have more entries to go. Also this could be the
client's _fault_ if it provides a ridiculously small value for count.
The current code will return count=0 all the same.

In any case, I think v9fs_do_readdir() should only return 0 if there
are no more entries to read. It should error out otherwise, but I'm
not sure how...

> No matter which death you would choose here, it would be horrible from 
> usability perspective, because the system would most of the time work 
> flawlessy, and an error case would just be triggered if guest hits a file/dir 
> beyond some certain name length. It is not worth it! Force 4kiB already at 
> Tversion and that's it.
> 
> > > 2. get_iounit() does this:
> > > 		iounit = stbuf.f_bsize;
> > > 		iounit *= (s->msize - P9_IOHDRSZ) / stbuf.f_bsize;
> > > 		
> > >    without checking anywhere for a potential negative outcome
> > >    (i.e. when msize < P9_IOHDRSZ)
> > 
> > This function should even have an assert() for that, just to be
> > sure no one tries to call it before s->msize is even set, which
> > would clearly be a bug in QEMU. But this can be done in a
> > follow-up patch.
> > 
> > > 3. Example of directory entry name length with Rreaddir above.
> > 
> > msize being too small to accommodate an Rreaddir with a single
> > entry is a different problem as we cannot do anything about it
> > at this point but fail... That's why the minimum msize should
> > rather be chosen with file names in mind, which are likely to
> > be longer than any message header. Rreadlink being the one with
> > the higher potential since it will usually return a string
> > containing a path name (while Rreaddir entries only contain
> > a single path element).
> > 
> > > Individual issues that can easily be overlooked but also easily avoided by
> > > not allowing small msizes in the first place.
> > 
> > My point is that we're not going to check msize in Tversion in
> > order to to avoid multiple checks everywhere. We're going to do
> > it there because it is the only place where it makes sense to
> > do it.
> 
> Also yes and no. Of course it just makes sense to handle it already at 
> Tversion. But no, you could theoretically also allow much smaller minimum 
> 'msize' value << 4096 (i.e. somewhere closely >7 as we discussed), then you 
> would indeed still need to add msize checks at other places of the code base 
> as you just found out now. So forcing a minimum 'msize' which is high enough, 
> avoids having to add such individual checks and having to deal with them in 
> some kind of unpleasant way.
> 

We still don't understand each other I'm afraid... we actually have
implicit 'msize' checks already for every single thing we write on
the wire: v9fs_packunpack() which detects when we're trying to write
passed the buffer. When this happens, it is propagated to the transport
which then disconnects, which is the painful thing you've been
experiencing with your readdir experiments. In the case of Rreaddir, it
really does make sense to try to avoid the disconnection like you do in
patch 3 because the readdir sequence allows _partial_ reads. Same goes
for Rread. But that's it. No other message in the protocol allows that,
so I've never thought of adding individual 'msize' checks anywhere else.
What would they do better than v9fs_packunpack() already does ?

> I hope this makes it more clear now.
> 
> Best regards,
> Christian Schoenebeck
> 
>
Christian Schoenebeck Jan. 17, 2020, 4:41 p.m. UTC | #8
On Freitag, 17. Januar 2020 16:15:37 CET Greg Kurz wrote:
> > > Hmm... this patch does a sanity check on 'count', not on 'msize'...
> > 
> > Yes ... :)
> > 
> > > I mean no matter what msize is, clipping count to msize - 11 gives a
> > > chance to stop processing the entries before overflowing the transport
> > > buffer.
> > 
> > ... and no, this cannot happen if minimum msize of 4096 is forced already
> > by Tversion. Maybe you now get my point -> It is about avoiding exactly
> > such kind
> I'm not sure to see how setting a minimum msize of 4096 at Tversion would
> prevent the client to pass a higher 'count' argument and lure the server
> into generating a bigger than msize response since it does not check
> count < msize - 11 without patch 3.

That's correct, it requires patch 3 as well to prevent that. Without patch 3, 
if a (i.e. bad) client sends a 'count' parameter >> msize then the Treaddir 
request is processed by server to full extent according to 'count' and finally 
aborted by a transport error since server's response would exceed msize.

> > of issues in the first place. Most file systems have a name limit of 255
> > bytes:
> > 
> > https://en.wikipedia.org/wiki/Comparison_of_file_systems#Limits
> > 
> > So by forcing a minimum 'msize' of 4096 you avoid having to deal with this
> > issue (and similar ones) on Treaddir request level (and other request type
> > handlers), including ReiserFS BTW because 4032+35 < 4096.
> 
> Good to know for ReiserFS.
> 
> > If you would allow smaller 'msize' values by Tversion, then you would need
> > to suffer some kind of death when handling Treaddir with certain high
> > file name length. Either a transport error (with an error message that a
> > normal user would not be able to understand at all) or by returning an
> > incomplete Treaddir response sequence with { Rreaddir count=0 }, or ...
> > any other kind of death.
> Ahh I now understand at last your argument about Rreaddir loosing data.
> We may end up sending { Rreaddir count=0 } because the next entry is too
> large... and thus end the readdir sequence.

Yep.

> Mentioning this explicitly
> from the start would have been more clear for me ;-)

Sorry for that. :) I thought I made it clear with the directory entries 
example. I try to be more clear next time.

> This looks like yet another bug to me. It looks wrong to return this
> special response if we have more entries to go. Also this could be the
> client's _fault_ if it provides a ridiculously small value for count.
> The current code will return count=0 all the same.
> 
> In any case, I think v9fs_do_readdir() should only return 0 if there
> are no more entries to read. It should error out otherwise, but I'm
> not sure how...

Patience please. I have to limit the scope of this patch series somewhere. I 
am aware about these issues, but if I add fixes for more and more edge cases 
(which already exist) as part of this patch series, it will become a never 
ending story.

I just added those particular fixes to this series, because they were directly 
related to things I've changed here for the actual purpose of this patch set, 
which was and is: readdir latency optimization.

> > > My point is that we're not going to check msize in Tversion in
> > > order to to avoid multiple checks everywhere. We're going to do
> > > it there because it is the only place where it makes sense to
> > > do it.
> > 
> > Also yes and no. Of course it just makes sense to handle it already at
> > Tversion. But no, you could theoretically also allow much smaller minimum
> > 'msize' value << 4096 (i.e. somewhere closely >7 as we discussed), then
> > you
> > would indeed still need to add msize checks at other places of the code
> > base as you just found out now. So forcing a minimum 'msize' which is
> > high enough, avoids having to add such individual checks and having to
> > deal with them in some kind of unpleasant way.
> 
> We still don't understand each other I'm afraid... we actually have
> implicit 'msize' checks already for every single thing we write on
> the wire: v9fs_packunpack() which detects when we're trying to write
> passed the buffer. When this happens, it is propagated to the transport
> which then disconnects, which is the painful thing you've been
> experiencing with your readdir experiments. In the case of Rreaddir, it
> really does make sense to try to avoid the disconnection like you do in
> patch 3 because the readdir sequence allows _partial_ reads. Same goes
> for Rread. But that's it. No other message in the protocol allows that,
> so I've never thought of adding individual 'msize' checks anywhere else.
> What would they do better than v9fs_packunpack() already does ?

Right, but you realized that a min. msize of 4096 (in combination with
patch 3) prevents the readdir data loss issue we discussed here (provided we 
have a "good" client sending count=msize-11), right?

If so, I suggest I "try" to address your concerns you came up with here in the 
commit log message as far as I can, and would like to ask you to adjust the 
message later on according to your personal preference if required.

Best regards,
Christian Schoenebeck
diff mbox series

Patch

diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index 520177f40c..a5fbe821d4 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -1363,8 +1363,20 @@  static void coroutine_fn v9fs_version(void *opaque)
         s->proto_version = V9FS_PROTO_2000L;
     } else {
         v9fs_string_sprintf(&version, "unknown");
+        /* skip min. msize check, reporting invalid version has priority */
+        goto marshal;
     }
 
+    if (s->msize < P9_MIN_MSIZE) {
+        err = -EMSGSIZE;
+        error_report(
+            "9pfs: Client requested msize < minimum msize ("
+            stringify(P9_MIN_MSIZE) ") supported by this server."
+        );
+        goto out;
+    }
+
+marshal:
     err = pdu_marshal(pdu, offset, "ds", s->msize, &version);
     if (err < 0) {
         goto out;
diff --git a/hw/9pfs/9p.h b/hw/9pfs/9p.h
index 3904f82901..6fffe44f5a 100644
--- a/hw/9pfs/9p.h
+++ b/hw/9pfs/9p.h
@@ -100,6 +100,17 @@  typedef enum P9ProtoVersion {
     V9FS_PROTO_2000L = 0x02,
 } P9ProtoVersion;
 
+/**
+ * @brief Minimum message size supported by this 9pfs server.
+ *
+ * A client establishes a session by sending a Tversion request along with a
+ * 'msize' parameter which suggests the server a maximum message size ever to be
+ * used for communication (for both requests and replies) between client and
+ * server during that session. If client suggests a 'msize' smaller than this
+ * value then session is denied by server with an error response.
+ */
+#define P9_MIN_MSIZE    4096
+
 #define P9_NOTAG    UINT16_MAX
 #define P9_NOFID    UINT32_MAX
 #define P9_MAXWELEM 16