Message ID | 20210506155419.1796056-1-philmd@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | virtio-blk: Convert QEMUBH callback to "bitops.h" API | expand |
On 5/6/21 8:54 AM, Philippe Mathieu-Daudé wrote: > static void notify_guest_bh(void *opaque) > { > VirtIOBlockDataPlane *s = opaque; > - unsigned nvqs = s->conf->num_queues; > - unsigned long bitmap[BITS_TO_LONGS(nvqs)]; > - unsigned j; > > - memcpy(bitmap, s->batch_notify_vqs, sizeof(bitmap)); > - memset(s->batch_notify_vqs, 0, sizeof(bitmap)); > - > - for (j = 0; j < nvqs; j += BITS_PER_LONG) { > - unsigned long bits = bitmap[j / BITS_PER_LONG]; > - > - while (bits != 0) { > - unsigned i = j + ctzl(bits); > + for (unsigned i = 0; i < s->conf->num_queues; i++) { Is this bitmap dense enough that you want to iterate by index, or is it sparse enough to iterate via find_first_bit/find_next_bit? In either case, leave the copy of s->conf->num_queues to a local variable. r~
On 5/6/21 8:22 PM, Richard Henderson wrote: > On 5/6/21 8:54 AM, Philippe Mathieu-Daudé wrote: >> static void notify_guest_bh(void *opaque) >> { >> VirtIOBlockDataPlane *s = opaque; >> - unsigned nvqs = s->conf->num_queues; >> - unsigned long bitmap[BITS_TO_LONGS(nvqs)]; >> - unsigned j; >> - memcpy(bitmap, s->batch_notify_vqs, sizeof(bitmap)); >> - memset(s->batch_notify_vqs, 0, sizeof(bitmap)); >> - >> - for (j = 0; j < nvqs; j += BITS_PER_LONG) { >> - unsigned long bits = bitmap[j / BITS_PER_LONG]; >> - >> - while (bits != 0) { >> - unsigned i = j + ctzl(bits); >> + for (unsigned i = 0; i < s->conf->num_queues; i++) { > > Is this bitmap dense enough that you want to iterate by index, By 'iterate by index' do you mean the actual iteration with 'j'? > or is it > sparse enough to iterate via find_first_bit/find_next_bit? I looked at find_first_bit/find_next_bit() but they seemed to do a lot more than test_and_clear_bit(). As Stefan said this is hot path, I thought this would be cheaper, but haven't profiled the performance. > In either case, leave the copy of s->conf->num_queues to a local variable. That is sensible to do :)
On 5/6/21 9:42 PM, Philippe Mathieu-Daudé wrote: > On 5/6/21 8:22 PM, Richard Henderson wrote: >> On 5/6/21 8:54 AM, Philippe Mathieu-Daudé wrote: >>> static void notify_guest_bh(void *opaque) >>> { >>> VirtIOBlockDataPlane *s = opaque; >>> - unsigned nvqs = s->conf->num_queues; >>> - unsigned long bitmap[BITS_TO_LONGS(nvqs)]; >>> - unsigned j; >>> - memcpy(bitmap, s->batch_notify_vqs, sizeof(bitmap)); >>> - memset(s->batch_notify_vqs, 0, sizeof(bitmap)); >>> - >>> - for (j = 0; j < nvqs; j += BITS_PER_LONG) { >>> - unsigned long bits = bitmap[j / BITS_PER_LONG]; >>> - >>> - while (bits != 0) { >>> - unsigned i = j + ctzl(bits); >>> + for (unsigned i = 0; i < s->conf->num_queues; i++) { >> >> Is this bitmap dense enough that you want to iterate by index, The max is 1Kb: #define VIRTIO_QUEUE_MAX 1024 > > By 'iterate by index' do you mean the actual iteration with 'j'? > >> or is it >> sparse enough to iterate via find_first_bit/find_next_bit? > > I looked at find_first_bit/find_next_bit() but they seemed to do > a lot more than test_and_clear_bit(). As Stefan said this is hot > path, I thought this would be cheaper, but haven't profiled the > performance. > >> In either case, leave the copy of s->conf->num_queues to a local variable. > > That is sensible to do :) >
diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c index e9050c8987e..25d9b10c02b 100644 --- a/hw/block/dataplane/virtio-blk.c +++ b/hw/block/dataplane/virtio-blk.c @@ -59,23 +59,12 @@ void virtio_blk_data_plane_notify(VirtIOBlockDataPlane *s, VirtQueue *vq) static void notify_guest_bh(void *opaque) { VirtIOBlockDataPlane *s = opaque; - unsigned nvqs = s->conf->num_queues; - unsigned long bitmap[BITS_TO_LONGS(nvqs)]; - unsigned j; - memcpy(bitmap, s->batch_notify_vqs, sizeof(bitmap)); - memset(s->batch_notify_vqs, 0, sizeof(bitmap)); - - for (j = 0; j < nvqs; j += BITS_PER_LONG) { - unsigned long bits = bitmap[j / BITS_PER_LONG]; - - while (bits != 0) { - unsigned i = j + ctzl(bits); + for (unsigned i = 0; i < s->conf->num_queues; i++) { + if (test_and_clear_bit(i, s->batch_notify_vqs)) { VirtQueue *vq = virtio_get_queue(s->vdev, i); virtio_notify_irqfd(s->vdev, vq); - - bits &= bits - 1; /* clear right-most bit */ } } }
By directly using test_and_clear_bit() from the "bitops.h" API, we can remove the bitmap[] variable-length array copy on the stack and the complex manual bit testing/clearing logic. Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> --- hw/block/dataplane/virtio-blk.c | 15 ++------------- 1 file changed, 2 insertions(+), 13 deletions(-)