Message ID | b1eae734604cdbda0cecf1fbb7d103dd249642ee.1579567020.git.qemu_oss@crudebyte.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | 9pfs: readdir optimization | expand |
On Tue, 21 Jan 2020 01:23:55 +0100 Christian Schoenebeck <qemu_oss@crudebyte.com> wrote: > This patch is not intended to be merged. It just provides a Well I like the idea of having such a benchmark available. It could probably be merged after a few changes... > temporary benchmark foundation for coneniently A/B comparison > of the subsequent 9p readdir optimization patches: > > * hw/9pfs/9p-synth: increase amount of simulated files for > readdir test to 2000 files. > ... the 9p-synth backend could maybe always create this number of files ? > * tests/virtio-9p: measure wall time that elapsed between > sending T_readdir request and arrival of R_readdir response > and print out that measured duration, as well as amount of > directory entries received, and the amount of bytes of the > response message. > ... maybe we should make the printing optional and off by default so that it doesn't pollute CI logs. > * tests/virtio-9p: increased msize to 256kiB to allow > retrieving all 2000 files (simulated by 9pfs synth driver) > with only one T_readdir request. > ... always use 256k for both the basic test and a revisited split test ? > Running this benchmark is fairly quick & simple and does not > require any guest OS installation or other prerequisites: > > cd build > make && make tests/qtest/qos-test > export QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 > tests/qtest/qos-test -p $(tests/qtest/qos-test -l | grep readdir/basic) > > Since this benchmark uses the 9pfs synth driver, the host > machine's I/O hardware (SSDs/HDDs) is not relevant for the > benchmark result, because the synth backend's readdir > implementation returns immediately (without any blocking I/O > that would incur with a real-life fs driver) and just returns > already prepared, simulated directory entries directly from RAM. > So this benchmark focuses on the efficiency of the 9pfs > controller code (or top half) for readdir request handling. > > Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> > --- > hw/9pfs/9p-synth.h | 2 +- > tests/qtest/virtio-9p-test.c | 37 +++++++++++++++++++++++++++++++++++- > 2 files changed, 37 insertions(+), 2 deletions(-) > > diff --git a/hw/9pfs/9p-synth.h b/hw/9pfs/9p-synth.h > index 036d7e4a5b..7d6cedcdac 100644 > --- a/hw/9pfs/9p-synth.h > +++ b/hw/9pfs/9p-synth.h > @@ -58,7 +58,7 @@ int qemu_v9fs_synth_add_file(V9fsSynthNode *parent, int mode, > /* for READDIR test */ > #define QTEST_V9FS_SYNTH_READDIR_DIR "ReadDirDir" > #define QTEST_V9FS_SYNTH_READDIR_FILE "ReadDirFile%d" > -#define QTEST_V9FS_SYNTH_READDIR_NFILES 100 > +#define QTEST_V9FS_SYNTH_READDIR_NFILES 2000 > > /* Any write to the "FLUSH" file is handled one byte at a time by the > * backend. If the byte is zero, the backend returns success (ie, 1), > diff --git a/tests/qtest/virtio-9p-test.c b/tests/qtest/virtio-9p-test.c > index e47b286340..d71b37aa6c 100644 > --- a/tests/qtest/virtio-9p-test.c > +++ b/tests/qtest/virtio-9p-test.c > @@ -15,6 +15,18 @@ > #include "libqos/virtio-9p.h" > #include "libqos/qgraph.h" > > +/* > + * to benchmark the real time (not CPU time) that elapsed between start of > + * a request and arrival of its response > + */ > +static double wall_time(void) > +{ > + struct timeval t; > + struct timezone tz; > + gettimeofday(&t, &tz); > + return t.tv_sec + t.tv_usec * 0.000001; > +} > + > #define QVIRTIO_9P_TIMEOUT_US (10 * 1000 * 1000) > static QGuestAllocator *alloc; > > @@ -36,7 +48,7 @@ static void pci_config(void *obj, void *data, QGuestAllocator *t_alloc) > g_free(tag); > } > > -#define P9_MAX_SIZE 4096 /* Max size of a T-message or R-message */ > +#define P9_MAX_SIZE (256 * 1024) /* Max size of a T-message or R-message */ > > typedef struct { > QTestState *qts; > @@ -600,12 +612,35 @@ static void fs_readdir(void *obj, void *data, QGuestAllocator *t_alloc) > v9fs_req_wait_for_reply(req, NULL); > v9fs_rlopen(req, &qid, NULL); > > + const double start = wall_time(); > + > /* > * submit count = msize - 11, because 11 is the header size of Rreaddir > */ > req = v9fs_treaddir(v9p, 1, 0, P9_MAX_SIZE - 11, 0); > + const double treaddir = wall_time(); > v9fs_req_wait_for_reply(req, NULL); > + const double waitforreply = wall_time(); > v9fs_rreaddir(req, &count, &nentries, &entries); > + const double end = wall_time(); > + > + printf("\nTime client spent on sending T_readdir: %fs\n\n", > + treaddir - start); > + > + printf("Time client spent for waiting for reply from server: %fs " > + "[MOST IMPORTANT]\n", waitforreply - start); > + printf("(This is the most important value, because it reflects the time\n" > + "the 9p server required to process and return the result of the\n" > + "T_readdir request.)\n\n"); > + > + printf("Total client time: %fs\n", end - start); > + printf("(NOTE: this time is not relevant; this huge time comes from\n" > + "inefficient qtest_memread() calls. So you can discard this\n" > + "value as a problem of this test client implementation while\n" > + "processing the received server T_readdir reply.)\n\n"); > + > + printf("Details of response message data: R_readddir nentries=%d " > + "rbytes=%d\n", nentries, count); > > /* > * Assuming msize (P9_MAX_SIZE) is large enough so we can retrieve all
On Donnerstag, 23. Januar 2020 11:34:15 CET Greg Kurz wrote: > On Tue, 21 Jan 2020 01:23:55 +0100 > > Christian Schoenebeck <qemu_oss@crudebyte.com> wrote: > > This patch is not intended to be merged. It just provides a > > Well I like the idea of having such a benchmark available. > It could probably be merged after a few changes... Never thought about this benchmark patch being useful in general, because it's just few lines of code actually to benchmark readdir performance. Plus, as you just found out, it does matter whether the synth driver's n-square issue taints the benchmark results. So sure, I could add a boolean macro P9_BENCHMARK_READDIR or whatever, that one could just enable if desired, but my concern would be that people would not interpret the values correctly, since they are certainly unaware about the impact of the driver performance on these values being printed. I mean if you missed that point, then other ones will definitely as well. > > temporary benchmark foundation for coneniently A/B comparison > > of the subsequent 9p readdir optimization patches: > > > > * hw/9pfs/9p-synth: increase amount of simulated files for > > > > readdir test to 2000 files. > > ... the 9p-synth backend could maybe always create this number > of files ? That's up to you. I don't mind about the precise value in the production version. The tests just take more time to execute if there are 2000 files. > > * tests/virtio-9p: measure wall time that elapsed between > > > > sending T_readdir request and arrival of R_readdir response > > and print out that measured duration, as well as amount of > > directory entries received, and the amount of bytes of the > > response message. > > ... maybe we should make the printing optional and off by > default so that it doesn't pollute CI logs. > > > * tests/virtio-9p: increased msize to 256kiB to allow > > > > retrieving all 2000 files (simulated by 9pfs synth driver) > > with only one T_readdir request. > > ... always use 256k for both the basic test and a revisited > split test ? Same thing, I don't mind, it's up to you to decide. Actually I would find it more important to document it somewhere how to actually run these tests and/or just run one specific individual test. Because I don't find it obvious for external people how to do that. That might explain why there are so little 9p tests so far. I also did many tests manually before (with a real guest OS, and fs rollbacks, etc.), simply because I did not know. > > Running this benchmark is fairly quick & simple and does not > > require any guest OS installation or other prerequisites: > > > > cd build > > make && make tests/qtest/qos-test > > export QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 > > tests/qtest/qos-test -p $(tests/qtest/qos-test -l | grep readdir/basic) > > > > Since this benchmark uses the 9pfs synth driver, the host > > machine's I/O hardware (SSDs/HDDs) is not relevant for the > > benchmark result, because the synth backend's readdir > > implementation returns immediately (without any blocking I/O > > that would incur with a real-life fs driver) and just returns > > already prepared, simulated directory entries directly from RAM. > > So this benchmark focuses on the efficiency of the 9pfs > > controller code (or top half) for readdir request handling. > > > > Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> > > --- > > > > hw/9pfs/9p-synth.h | 2 +- > > tests/qtest/virtio-9p-test.c | 37 +++++++++++++++++++++++++++++++++++- > > 2 files changed, 37 insertions(+), 2 deletions(-) > > > > diff --git a/hw/9pfs/9p-synth.h b/hw/9pfs/9p-synth.h > > index 036d7e4a5b..7d6cedcdac 100644 > > --- a/hw/9pfs/9p-synth.h > > +++ b/hw/9pfs/9p-synth.h > > @@ -58,7 +58,7 @@ int qemu_v9fs_synth_add_file(V9fsSynthNode *parent, int > > mode,> > > /* for READDIR test */ > > #define QTEST_V9FS_SYNTH_READDIR_DIR "ReadDirDir" > > #define QTEST_V9FS_SYNTH_READDIR_FILE "ReadDirFile%d" > > > > -#define QTEST_V9FS_SYNTH_READDIR_NFILES 100 > > +#define QTEST_V9FS_SYNTH_READDIR_NFILES 2000 > > > > /* Any write to the "FLUSH" file is handled one byte at a time by the > > > > * backend. If the byte is zero, the backend returns success (ie, 1), > > > > diff --git a/tests/qtest/virtio-9p-test.c b/tests/qtest/virtio-9p-test.c > > index e47b286340..d71b37aa6c 100644 > > --- a/tests/qtest/virtio-9p-test.c > > +++ b/tests/qtest/virtio-9p-test.c > > @@ -15,6 +15,18 @@ > > > > #include "libqos/virtio-9p.h" > > #include "libqos/qgraph.h" > > > > +/* > > + * to benchmark the real time (not CPU time) that elapsed between start > > of > > + * a request and arrival of its response > > + */ > > +static double wall_time(void) > > +{ > > + struct timeval t; > > + struct timezone tz; > > + gettimeofday(&t, &tz); > > + return t.tv_sec + t.tv_usec * 0.000001; > > +} > > + > > > > #define QVIRTIO_9P_TIMEOUT_US (10 * 1000 * 1000) > > static QGuestAllocator *alloc; > > > > @@ -36,7 +48,7 @@ static void pci_config(void *obj, void *data, > > QGuestAllocator *t_alloc)> > > g_free(tag); > > > > } > > > > -#define P9_MAX_SIZE 4096 /* Max size of a T-message or R-message */ > > +#define P9_MAX_SIZE (256 * 1024) /* Max size of a T-message or R-message > > */> > > typedef struct { > > > > QTestState *qts; > > > > @@ -600,12 +612,35 @@ static void fs_readdir(void *obj, void *data, > > QGuestAllocator *t_alloc)> > > v9fs_req_wait_for_reply(req, NULL); > > v9fs_rlopen(req, &qid, NULL); > > > > + const double start = wall_time(); > > + > > > > /* > > > > * submit count = msize - 11, because 11 is the header size of > > Rreaddir > > */ > > > > req = v9fs_treaddir(v9p, 1, 0, P9_MAX_SIZE - 11, 0); > > > > + const double treaddir = wall_time(); > > > > v9fs_req_wait_for_reply(req, NULL); > > > > + const double waitforreply = wall_time(); > > > > v9fs_rreaddir(req, &count, &nentries, &entries); > > > > + const double end = wall_time(); > > + > > + printf("\nTime client spent on sending T_readdir: %fs\n\n", > > + treaddir - start); > > + > > + printf("Time client spent for waiting for reply from server: %fs " > > + "[MOST IMPORTANT]\n", waitforreply - start); > > + printf("(This is the most important value, because it reflects the > > time\n" + "the 9p server required to process and return the > > result of the\n" + "T_readdir request.)\n\n"); > > + > > + printf("Total client time: %fs\n", end - start); > > + printf("(NOTE: this time is not relevant; this huge time comes > > from\n" > > + "inefficient qtest_memread() calls. So you can discard this\n" > > + "value as a problem of this test client implementation > > while\n" > > + "processing the received server T_readdir reply.)\n\n"); > > + > > + printf("Details of response message data: R_readddir nentries=%d " > > + "rbytes=%d\n", nentries, count); > > > > /* > > > > * Assuming msize (P9_MAX_SIZE) is large enough so we can retrieve > > all Best regards, Christian Schoenebeck
diff --git a/hw/9pfs/9p-synth.h b/hw/9pfs/9p-synth.h index 036d7e4a5b..7d6cedcdac 100644 --- a/hw/9pfs/9p-synth.h +++ b/hw/9pfs/9p-synth.h @@ -58,7 +58,7 @@ int qemu_v9fs_synth_add_file(V9fsSynthNode *parent, int mode, /* for READDIR test */ #define QTEST_V9FS_SYNTH_READDIR_DIR "ReadDirDir" #define QTEST_V9FS_SYNTH_READDIR_FILE "ReadDirFile%d" -#define QTEST_V9FS_SYNTH_READDIR_NFILES 100 +#define QTEST_V9FS_SYNTH_READDIR_NFILES 2000 /* Any write to the "FLUSH" file is handled one byte at a time by the * backend. If the byte is zero, the backend returns success (ie, 1), diff --git a/tests/qtest/virtio-9p-test.c b/tests/qtest/virtio-9p-test.c index e47b286340..d71b37aa6c 100644 --- a/tests/qtest/virtio-9p-test.c +++ b/tests/qtest/virtio-9p-test.c @@ -15,6 +15,18 @@ #include "libqos/virtio-9p.h" #include "libqos/qgraph.h" +/* + * to benchmark the real time (not CPU time) that elapsed between start of + * a request and arrival of its response + */ +static double wall_time(void) +{ + struct timeval t; + struct timezone tz; + gettimeofday(&t, &tz); + return t.tv_sec + t.tv_usec * 0.000001; +} + #define QVIRTIO_9P_TIMEOUT_US (10 * 1000 * 1000) static QGuestAllocator *alloc; @@ -36,7 +48,7 @@ static void pci_config(void *obj, void *data, QGuestAllocator *t_alloc) g_free(tag); } -#define P9_MAX_SIZE 4096 /* Max size of a T-message or R-message */ +#define P9_MAX_SIZE (256 * 1024) /* Max size of a T-message or R-message */ typedef struct { QTestState *qts; @@ -600,12 +612,35 @@ static void fs_readdir(void *obj, void *data, QGuestAllocator *t_alloc) v9fs_req_wait_for_reply(req, NULL); v9fs_rlopen(req, &qid, NULL); + const double start = wall_time(); + /* * submit count = msize - 11, because 11 is the header size of Rreaddir */ req = v9fs_treaddir(v9p, 1, 0, P9_MAX_SIZE - 11, 0); + const double treaddir = wall_time(); v9fs_req_wait_for_reply(req, NULL); + const double waitforreply = wall_time(); v9fs_rreaddir(req, &count, &nentries, &entries); + const double end = wall_time(); + + printf("\nTime client spent on sending T_readdir: %fs\n\n", + treaddir - start); + + printf("Time client spent for waiting for reply from server: %fs " + "[MOST IMPORTANT]\n", waitforreply - start); + printf("(This is the most important value, because it reflects the time\n" + "the 9p server required to process and return the result of the\n" + "T_readdir request.)\n\n"); + + printf("Total client time: %fs\n", end - start); + printf("(NOTE: this time is not relevant; this huge time comes from\n" + "inefficient qtest_memread() calls. So you can discard this\n" + "value as a problem of this test client implementation while\n" + "processing the received server T_readdir reply.)\n\n"); + + printf("Details of response message data: R_readddir nentries=%d " + "rbytes=%d\n", nentries, count); /* * Assuming msize (P9_MAX_SIZE) is large enough so we can retrieve all
This patch is not intended to be merged. It just provides a temporary benchmark foundation for coneniently A/B comparison of the subsequent 9p readdir optimization patches: * hw/9pfs/9p-synth: increase amount of simulated files for readdir test to 2000 files. * tests/virtio-9p: measure wall time that elapsed between sending T_readdir request and arrival of R_readdir response and print out that measured duration, as well as amount of directory entries received, and the amount of bytes of the response message. * tests/virtio-9p: increased msize to 256kiB to allow retrieving all 2000 files (simulated by 9pfs synth driver) with only one T_readdir request. Running this benchmark is fairly quick & simple and does not require any guest OS installation or other prerequisites: cd build make && make tests/qtest/qos-test export QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 tests/qtest/qos-test -p $(tests/qtest/qos-test -l | grep readdir/basic) Since this benchmark uses the 9pfs synth driver, the host machine's I/O hardware (SSDs/HDDs) is not relevant for the benchmark result, because the synth backend's readdir implementation returns immediately (without any blocking I/O that would incur with a real-life fs driver) and just returns already prepared, simulated directory entries directly from RAM. So this benchmark focuses on the efficiency of the 9pfs controller code (or top half) for readdir request handling. Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> --- hw/9pfs/9p-synth.h | 2 +- tests/qtest/virtio-9p-test.c | 37 +++++++++++++++++++++++++++++++++++- 2 files changed, 37 insertions(+), 2 deletions(-)