[v4,08/11] 9pfs: readdir benchmark

Message ID	b1eae734604cdbda0cecf1fbb7d103dd249642ee.1579567020.git.qemu_oss@crudebyte.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=WByt=3K=nongnu.org=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8B29E22522 Message-Id: <b1eae734604cdbda0cecf1fbb7d103dd249642ee.1579567020.git.qemu_oss@crudebyte.com> In-Reply-To: <cover.1579567019.git.qemu_oss@crudebyte.com> References: <cover.1579567019.git.qemu_oss@crudebyte.com> From: Christian Schoenebeck <qemu_oss@crudebyte.com> Date: Tue, 21 Jan 2020 01:23:55 +0100 Subject: [PATCH v4 08/11] 9pfs: readdir benchmark To: qemu-devel@nongnu.org Cc: Greg Kurz <groug@kaod.org> Precedence: list Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>
Series	9pfs: readdir optimization \| expand [v4,00/11] 9pfs: readdir optimization [v4,01/11] tests/virtio-9p: add terminating null in v9fs_string_read() [v4,02/11] 9pfs: require msize >= 4096 [v4,03/11] 9pfs: validate count sent by client with T_readdir [v4,04/11] hw/9pfs/9p-synth: added directory for readdir test [v4,05/11] tests/virtio-9p: added readdir test [v4,06/11] tests/virtio-9p: added splitted readdir test [v4,07/11] tests/virtio-9p: failing splitted readdir test [v4,08/11] 9pfs: readdir benchmark [v4,09/11] hw/9pfs/9p-synth: avoid n-square issue in synth_readdir() [v4,10/11] 9pfs: T_readdir latency optimization [v4,11/11] hw/9pfs/9p.c: benchmark time on T_readdir request

Message ID

b1eae734604cdbda0cecf1fbb7d103dd249642ee.1579567020.git.qemu_oss@crudebyte.com (mailing list archive)

State

New, archived

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8B29E22522
Message-Id: 
 <b1eae734604cdbda0cecf1fbb7d103dd249642ee.1579567020.git.qemu_oss@crudebyte.com>
In-Reply-To: <cover.1579567019.git.qemu_oss@crudebyte.com>
References: <cover.1579567019.git.qemu_oss@crudebyte.com>
From: Christian Schoenebeck <qemu_oss@crudebyte.com>
Date: Tue, 21 Jan 2020 01:23:55 +0100
Subject: [PATCH v4 08/11] 9pfs: readdir benchmark
To: qemu-devel@nongnu.org
Cc: Greg Kurz <groug@kaod.org>
Precedence: list
Errors-To: 
 qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org
Sender: "Qemu-devel"
 <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>

Series

9pfs: readdir optimization | expand

Commit Message

Christian Schoenebeck Jan. 21, 2020, 12:23 a.m. UTC

This patch is not intended to be merged. It just provides a
temporary benchmark foundation for coneniently A/B comparison
of the subsequent 9p readdir optimization patches:

* hw/9pfs/9p-synth: increase amount of simulated files for
  readdir test to 2000 files.

* tests/virtio-9p: measure wall time that elapsed between
  sending T_readdir request and arrival of R_readdir response
  and print out that measured duration, as well as amount of
  directory entries received, and the amount of bytes of the
  response message.

* tests/virtio-9p: increased msize to 256kiB to allow
  retrieving all 2000 files (simulated by 9pfs synth driver)
  with only one T_readdir request.

Running this benchmark is fairly quick & simple and does not
require any guest OS installation or other prerequisites:

cd build
make && make tests/qtest/qos-test
export QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64
tests/qtest/qos-test -p $(tests/qtest/qos-test -l | grep readdir/basic)

Since this benchmark uses the 9pfs synth driver, the host
machine's I/O hardware (SSDs/HDDs) is not relevant for the
benchmark result, because the synth backend's readdir
implementation returns immediately (without any blocking I/O
that would incur with a real-life fs driver) and just returns
already prepared, simulated directory entries directly from RAM.
So this benchmark focuses on the efficiency of the 9pfs
controller code (or top half) for readdir request handling.

Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
---
 hw/9pfs/9p-synth.h           |  2 +-
 tests/qtest/virtio-9p-test.c | 37 +++++++++++++++++++++++++++++++++++-
 2 files changed, 37 insertions(+), 2 deletions(-)

Comments

Greg Kurz Jan. 23, 2020, 10:34 a.m. UTC | #1

On Tue, 21 Jan 2020 01:23:55 +0100
Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:

> This patch is not intended to be merged. It just provides a

Well I like the idea of having such a benchmark available.
It could probably be merged after a few changes...

> temporary benchmark foundation for coneniently A/B comparison
> of the subsequent 9p readdir optimization patches:
> 
> * hw/9pfs/9p-synth: increase amount of simulated files for
>   readdir test to 2000 files.
> 

... the 9p-synth backend could maybe always create this number
of files ?

> * tests/virtio-9p: measure wall time that elapsed between
>   sending T_readdir request and arrival of R_readdir response
>   and print out that measured duration, as well as amount of
>   directory entries received, and the amount of bytes of the
>   response message.
> 

... maybe we should make the printing optional and off by
default so that it doesn't pollute CI logs.

> * tests/virtio-9p: increased msize to 256kiB to allow
>   retrieving all 2000 files (simulated by 9pfs synth driver)
>   with only one T_readdir request.
> 

... always use 256k for both the basic test and a revisited
split test ?

> Running this benchmark is fairly quick & simple and does not
> require any guest OS installation or other prerequisites:
> 
> cd build
> make && make tests/qtest/qos-test
> export QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64
> tests/qtest/qos-test -p $(tests/qtest/qos-test -l | grep readdir/basic)
> 
> Since this benchmark uses the 9pfs synth driver, the host
> machine's I/O hardware (SSDs/HDDs) is not relevant for the
> benchmark result, because the synth backend's readdir
> implementation returns immediately (without any blocking I/O
> that would incur with a real-life fs driver) and just returns
> already prepared, simulated directory entries directly from RAM.
> So this benchmark focuses on the efficiency of the 9pfs
> controller code (or top half) for readdir request handling.
> 
> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
> ---
>  hw/9pfs/9p-synth.h           |  2 +-
>  tests/qtest/virtio-9p-test.c | 37 +++++++++++++++++++++++++++++++++++-
>  2 files changed, 37 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/9pfs/9p-synth.h b/hw/9pfs/9p-synth.h
> index 036d7e4a5b..7d6cedcdac 100644
> --- a/hw/9pfs/9p-synth.h
> +++ b/hw/9pfs/9p-synth.h
> @@ -58,7 +58,7 @@ int qemu_v9fs_synth_add_file(V9fsSynthNode *parent, int mode,
>  /* for READDIR test */
>  #define QTEST_V9FS_SYNTH_READDIR_DIR "ReadDirDir"
>  #define QTEST_V9FS_SYNTH_READDIR_FILE "ReadDirFile%d"
> -#define QTEST_V9FS_SYNTH_READDIR_NFILES 100
> +#define QTEST_V9FS_SYNTH_READDIR_NFILES 2000
>  
>  /* Any write to the "FLUSH" file is handled one byte at a time by the
>   * backend. If the byte is zero, the backend returns success (ie, 1),
> diff --git a/tests/qtest/virtio-9p-test.c b/tests/qtest/virtio-9p-test.c
> index e47b286340..d71b37aa6c 100644
> --- a/tests/qtest/virtio-9p-test.c
> +++ b/tests/qtest/virtio-9p-test.c
> @@ -15,6 +15,18 @@
>  #include "libqos/virtio-9p.h"
>  #include "libqos/qgraph.h"
>  
> +/*
> + * to benchmark the real time (not CPU time) that elapsed between start of
> + * a request and arrival of its response
> + */
> +static double wall_time(void)
> +{
> +    struct timeval t;
> +    struct timezone tz;
> +    gettimeofday(&t, &tz);
> +    return t.tv_sec + t.tv_usec * 0.000001;
> +}
> +
>  #define QVIRTIO_9P_TIMEOUT_US (10 * 1000 * 1000)
>  static QGuestAllocator *alloc;
>  
> @@ -36,7 +48,7 @@ static void pci_config(void *obj, void *data, QGuestAllocator *t_alloc)
>      g_free(tag);
>  }
>  
> -#define P9_MAX_SIZE 4096 /* Max size of a T-message or R-message */
> +#define P9_MAX_SIZE (256 * 1024) /* Max size of a T-message or R-message */
>  
>  typedef struct {
>      QTestState *qts;
> @@ -600,12 +612,35 @@ static void fs_readdir(void *obj, void *data, QGuestAllocator *t_alloc)
>      v9fs_req_wait_for_reply(req, NULL);
>      v9fs_rlopen(req, &qid, NULL);
>  
> +    const double start = wall_time();
> +
>      /*
>       * submit count = msize - 11, because 11 is the header size of Rreaddir
>       */
>      req = v9fs_treaddir(v9p, 1, 0, P9_MAX_SIZE - 11, 0);
> +    const double treaddir = wall_time();
>      v9fs_req_wait_for_reply(req, NULL);
> +    const double waitforreply = wall_time();
>      v9fs_rreaddir(req, &count, &nentries, &entries);
> +    const double end = wall_time();
> +
> +    printf("\nTime client spent on sending T_readdir: %fs\n\n",
> +           treaddir - start);
> +
> +    printf("Time client spent for waiting for reply from server: %fs "
> +           "[MOST IMPORTANT]\n", waitforreply - start);
> +    printf("(This is the most important value, because it reflects the time\n"
> +           "the 9p server required to process and return the result of the\n"
> +           "T_readdir request.)\n\n");
> +
> +    printf("Total client time: %fs\n", end - start);
> +    printf("(NOTE: this time is not relevant; this huge time comes from\n"
> +           "inefficient qtest_memread() calls. So you can discard this\n"
> +           "value as a problem of this test client implementation while\n"
> +           "processing the received server T_readdir reply.)\n\n");
> +
> +    printf("Details of response message data: R_readddir nentries=%d "
> +           "rbytes=%d\n", nentries, count);
>  
>      /*
>       * Assuming msize (P9_MAX_SIZE) is large enough so we can retrieve all

Christian Schoenebeck Jan. 23, 2020, 1:20 p.m. UTC | #2

On Donnerstag, 23. Januar 2020 11:34:15 CET Greg Kurz wrote:
> On Tue, 21 Jan 2020 01:23:55 +0100
> 
> Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:
> > This patch is not intended to be merged. It just provides a
> 
> Well I like the idea of having such a benchmark available.
> It could probably be merged after a few changes...

Never thought about this benchmark patch being useful in general, because it's 
just few lines of code actually to benchmark readdir performance. Plus, as you 
just found out, it does matter whether the synth driver's n-square issue 
taints the benchmark results.

So sure, I could add a boolean macro P9_BENCHMARK_READDIR or whatever, that 
one could just enable if desired, but my concern would be that people would 
not interpret the values correctly, since they are certainly unaware about the 
impact of the driver performance on these values being printed.

I mean if you missed that point, then other ones will definitely as well.

> > temporary benchmark foundation for coneniently A/B comparison
> > of the subsequent 9p readdir optimization patches:
> > 
> > * hw/9pfs/9p-synth: increase amount of simulated files for
> > 
> >   readdir test to 2000 files.
> 
> ... the 9p-synth backend could maybe always create this number
> of files ?

That's up to you. I don't mind about the precise value in the production 
version. The tests just take more time to execute if there are 2000 files.

> > * tests/virtio-9p: measure wall time that elapsed between
> > 
> >   sending T_readdir request and arrival of R_readdir response
> >   and print out that measured duration, as well as amount of
> >   directory entries received, and the amount of bytes of the
> >   response message.
> 
> ... maybe we should make the printing optional and off by
> default so that it doesn't pollute CI logs.
> 
> > * tests/virtio-9p: increased msize to 256kiB to allow
> > 
> >   retrieving all 2000 files (simulated by 9pfs synth driver)
> >   with only one T_readdir request.
> 
> ... always use 256k for both the basic test and a revisited
> split test ?

Same thing, I don't mind, it's up to you to decide.

Actually I would find it more important to document it somewhere how to 
actually run these tests and/or just run one specific individual test. Because 
I don't find it obvious for external people how to do that. That might explain 
why there are so little 9p tests so far. I also did many tests manually before 
(with a real guest OS, and fs rollbacks, etc.), simply because I did not know.

> > Running this benchmark is fairly quick & simple and does not
> > require any guest OS installation or other prerequisites:
> > 
> > cd build
> > make && make tests/qtest/qos-test
> > export QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64
> > tests/qtest/qos-test -p $(tests/qtest/qos-test -l | grep readdir/basic)
> > 
> > Since this benchmark uses the 9pfs synth driver, the host
> > machine's I/O hardware (SSDs/HDDs) is not relevant for the
> > benchmark result, because the synth backend's readdir
> > implementation returns immediately (without any blocking I/O
> > that would incur with a real-life fs driver) and just returns
> > already prepared, simulated directory entries directly from RAM.
> > So this benchmark focuses on the efficiency of the 9pfs
> > controller code (or top half) for readdir request handling.
> > 
> > Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
> > ---
> > 
> >  hw/9pfs/9p-synth.h           |  2 +-
> >  tests/qtest/virtio-9p-test.c | 37 +++++++++++++++++++++++++++++++++++-
> >  2 files changed, 37 insertions(+), 2 deletions(-)
> > 
> > diff --git a/hw/9pfs/9p-synth.h b/hw/9pfs/9p-synth.h
> > index 036d7e4a5b..7d6cedcdac 100644
> > --- a/hw/9pfs/9p-synth.h
> > +++ b/hw/9pfs/9p-synth.h
> > @@ -58,7 +58,7 @@ int qemu_v9fs_synth_add_file(V9fsSynthNode *parent, int
> > mode,> 
> >  /* for READDIR test */
> >  #define QTEST_V9FS_SYNTH_READDIR_DIR "ReadDirDir"
> >  #define QTEST_V9FS_SYNTH_READDIR_FILE "ReadDirFile%d"
> > 
> > -#define QTEST_V9FS_SYNTH_READDIR_NFILES 100
> > +#define QTEST_V9FS_SYNTH_READDIR_NFILES 2000
> > 
> >  /* Any write to the "FLUSH" file is handled one byte at a time by the
> >  
> >   * backend. If the byte is zero, the backend returns success (ie, 1),
> > 
> > diff --git a/tests/qtest/virtio-9p-test.c b/tests/qtest/virtio-9p-test.c
> > index e47b286340..d71b37aa6c 100644
> > --- a/tests/qtest/virtio-9p-test.c
> > +++ b/tests/qtest/virtio-9p-test.c
> > @@ -15,6 +15,18 @@
> > 
> >  #include "libqos/virtio-9p.h"
> >  #include "libqos/qgraph.h"
> > 
> > +/*
> > + * to benchmark the real time (not CPU time) that elapsed between start
> > of
> > + * a request and arrival of its response
> > + */
> > +static double wall_time(void)
> > +{
> > +    struct timeval t;
> > +    struct timezone tz;
> > +    gettimeofday(&t, &tz);
> > +    return t.tv_sec + t.tv_usec * 0.000001;
> > +}
> > +
> > 
> >  #define QVIRTIO_9P_TIMEOUT_US (10 * 1000 * 1000)
> >  static QGuestAllocator *alloc;
> > 
> > @@ -36,7 +48,7 @@ static void pci_config(void *obj, void *data,
> > QGuestAllocator *t_alloc)> 
> >      g_free(tag);
> >  
> >  }
> > 
> > -#define P9_MAX_SIZE 4096 /* Max size of a T-message or R-message */
> > +#define P9_MAX_SIZE (256 * 1024) /* Max size of a T-message or R-message
> > */> 
> >  typedef struct {
> >  
> >      QTestState *qts;
> > 
> > @@ -600,12 +612,35 @@ static void fs_readdir(void *obj, void *data,
> > QGuestAllocator *t_alloc)> 
> >      v9fs_req_wait_for_reply(req, NULL);
> >      v9fs_rlopen(req, &qid, NULL);
> > 
> > +    const double start = wall_time();
> > +
> > 
> >      /*
> >      
> >       * submit count = msize - 11, because 11 is the header size of
> >       Rreaddir
> >       */
> >      
> >      req = v9fs_treaddir(v9p, 1, 0, P9_MAX_SIZE - 11, 0);
> > 
> > +    const double treaddir = wall_time();
> > 
> >      v9fs_req_wait_for_reply(req, NULL);
> > 
> > +    const double waitforreply = wall_time();
> > 
> >      v9fs_rreaddir(req, &count, &nentries, &entries);
> > 
> > +    const double end = wall_time();
> > +
> > +    printf("\nTime client spent on sending T_readdir: %fs\n\n",
> > +           treaddir - start);
> > +
> > +    printf("Time client spent for waiting for reply from server: %fs "
> > +           "[MOST IMPORTANT]\n", waitforreply - start);
> > +    printf("(This is the most important value, because it reflects the
> > time\n" +           "the 9p server required to process and return the
> > result of the\n" +           "T_readdir request.)\n\n");
> > +
> > +    printf("Total client time: %fs\n", end - start);
> > +    printf("(NOTE: this time is not relevant; this huge time comes
> > from\n"
> > +           "inefficient qtest_memread() calls. So you can discard this\n"
> > +           "value as a problem of this test client implementation
> > while\n"
> > +           "processing the received server T_readdir reply.)\n\n");
> > +
> > +    printf("Details of response message data: R_readddir nentries=%d "
> > +           "rbytes=%d\n", nentries, count);
> > 
> >      /*
> >      
> >       * Assuming msize (P9_MAX_SIZE) is large enough so we can retrieve
> >       all

Best regards,
Christian Schoenebeck

diff --git a/hw/9pfs/9p-synth.h b/hw/9pfs/9p-synth.h
index 036d7e4a5b..7d6cedcdac 100644
--- a/hw/9pfs/9p-synth.h
+++ b/hw/9pfs/9p-synth.h
@@ -58,7 +58,7 @@  int qemu_v9fs_synth_add_file(V9fsSynthNode *parent, int mode,
 /* for READDIR test */
 #define QTEST_V9FS_SYNTH_READDIR_DIR "ReadDirDir"
 #define QTEST_V9FS_SYNTH_READDIR_FILE "ReadDirFile%d"
-#define QTEST_V9FS_SYNTH_READDIR_NFILES 100
+#define QTEST_V9FS_SYNTH_READDIR_NFILES 2000
 
 /* Any write to the "FLUSH" file is handled one byte at a time by the
  * backend. If the byte is zero, the backend returns success (ie, 1),
diff --git a/tests/qtest/virtio-9p-test.c b/tests/qtest/virtio-9p-test.c
index e47b286340..d71b37aa6c 100644
--- a/tests/qtest/virtio-9p-test.c
+++ b/tests/qtest/virtio-9p-test.c
@@ -15,6 +15,18 @@ 
 #include "libqos/virtio-9p.h"
 #include "libqos/qgraph.h"
 
+/*
+ * to benchmark the real time (not CPU time) that elapsed between start of
+ * a request and arrival of its response
+ */
+static double wall_time(void)
+{
+    struct timeval t;
+    struct timezone tz;
+    gettimeofday(&t, &tz);
+    return t.tv_sec + t.tv_usec * 0.000001;
+}
+
 #define QVIRTIO_9P_TIMEOUT_US (10 * 1000 * 1000)
 static QGuestAllocator *alloc;
 
@@ -36,7 +48,7 @@  static void pci_config(void *obj, void *data, QGuestAllocator *t_alloc)
     g_free(tag);
 }
 
-#define P9_MAX_SIZE 4096 /* Max size of a T-message or R-message */
+#define P9_MAX_SIZE (256 * 1024) /* Max size of a T-message or R-message */
 
 typedef struct {
     QTestState *qts;
@@ -600,12 +612,35 @@  static void fs_readdir(void *obj, void *data, QGuestAllocator *t_alloc)
     v9fs_req_wait_for_reply(req, NULL);
     v9fs_rlopen(req, &qid, NULL);
 
+    const double start = wall_time();
+
     /*
      * submit count = msize - 11, because 11 is the header size of Rreaddir
      */
     req = v9fs_treaddir(v9p, 1, 0, P9_MAX_SIZE - 11, 0);
+    const double treaddir = wall_time();
     v9fs_req_wait_for_reply(req, NULL);
+    const double waitforreply = wall_time();
     v9fs_rreaddir(req, &count, &nentries, &entries);
+    const double end = wall_time();
+
+    printf("\nTime client spent on sending T_readdir: %fs\n\n",
+           treaddir - start);
+
+    printf("Time client spent for waiting for reply from server: %fs "
+           "[MOST IMPORTANT]\n", waitforreply - start);
+    printf("(This is the most important value, because it reflects the time\n"
+           "the 9p server required to process and return the result of the\n"
+           "T_readdir request.)\n\n");
+
+    printf("Total client time: %fs\n", end - start);
+    printf("(NOTE: this time is not relevant; this huge time comes from\n"
+           "inefficient qtest_memread() calls. So you can discard this\n"
+           "value as a problem of this test client implementation while\n"
+           "processing the received server T_readdir reply.)\n\n");
+
+    printf("Details of response message data: R_readddir nentries=%d "
+           "rbytes=%d\n", nentries, count);
 
     /*
      * Assuming msize (P9_MAX_SIZE) is large enough so we can retrieve all

[v4,08/11] 9pfs: readdir benchmark

Commit Message

Comments

Patch