Message ID | 20230124170108.1070389-5-dhowells@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | iov_iter: Improve page extraction (pin or just list) | expand |
On 1/24/23 09:01, David Howells wrote: > Fix bio_flagged() so that multiple instances of it, such as: > > if (bio_flagged(bio, BIO_PAGE_REFFED) || > bio_flagged(bio, BIO_PAGE_PINNED)) > > can be combined by the gcc optimiser into a single test in assembly > (arguably, this is a compiler optimisation issue[1]). > > The missed optimisation stems from bio_flagged() comparing the result of > the bitwise-AND to zero. This results in an out-of-line bio_release_page() > being compiled to something like: > > <+0>: mov 0x14(%rdi),%eax > <+3>: test $0x1,%al > <+5>: jne 0xffffffff816dac53 <bio_release_pages+11> > <+7>: test $0x2,%al > <+9>: je 0xffffffff816dac5c <bio_release_pages+20> > <+11>: movzbl %sil,%esi > <+15>: jmp 0xffffffff816daba1 <__bio_release_pages> > <+20>: jmp 0xffffffff81d0b800 <__x86_return_thunk> > > However, the test is superfluous as the return type is bool. Removing it > results in: > > <+0>: testb $0x3,0x14(%rdi) > <+4>: je 0xffffffff816e4af4 <bio_release_pages+15> > <+6>: movzbl %sil,%esi > <+10>: jmp 0xffffffff816dab7c <__bio_release_pages> > <+15>: jmp 0xffffffff81d0b7c0 <__x86_return_thunk> > > instead. > > Also, the MOVZBL instruction looks unnecessary[2] - I think it's just > 're-booling' the mark_dirty parameter. > > Signed-off-by: David Howells <dhowells@redhat.com> > Reviewed-by: Christoph Hellwig <hch@lst.de> > cc: Jens Axboe <axboe@kernel.dk> > cc: linux-block@vger.kernel.org > Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108370 [1] > Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108371 [2] > Link: https://lore.kernel.org/r/167391056756.2311931.356007731815807265.stgit@warthog.procyon.org.uk/ # v6 > --- > include/linux/bio.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/include/linux/bio.h b/include/linux/bio.h > index c1da63f6c808..10366b8bdb13 100644 > --- a/include/linux/bio.h > +++ b/include/linux/bio.h > @@ -227,7 +227,7 @@ static inline void bio_cnt_set(struct bio *bio, unsigned int count) > > static inline bool bio_flagged(struct bio *bio, unsigned int bit) > { > - return (bio->bi_flags & (1U << bit)) != 0; > + return bio->bi_flags & (1U << bit); > } > > static inline void bio_set_flag(struct bio *bio, unsigned int bit) > I don't know how you noticed that this was even a problem! Neatly fixed. Reviewed-by: John Hubbard <jhubbard@nvidia.com> thanks,
John Hubbard <jhubbard@nvidia.com> wrote: > I don't know how you noticed that this was even a problem! Neatly > fixed. I wanted BIO_PAGE_REFFED/PINNED to translate to FOLL_GET/PIN with no more than a single AND instruction, assuming they were assigned to the same values (1 & 2), so I checked to see what assembly was produced by: gup_flags |= bio_flagged(bio, BIO_PAGE_REFFED) ? FOLL_GET : 0; gup_flags |= bio_flagged(bio, BIO_PAGE_PINNED) ? FOLL_PIN : 0; Complicated though it looks, it should optimise down to something like: and $3,%eax assuming something like REFFED/GET == 0x1 and PINNED/PIN == 0x2. David
diff --git a/include/linux/bio.h b/include/linux/bio.h index c1da63f6c808..10366b8bdb13 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -227,7 +227,7 @@ static inline void bio_cnt_set(struct bio *bio, unsigned int count) static inline bool bio_flagged(struct bio *bio, unsigned int bit) { - return (bio->bi_flags & (1U << bit)) != 0; + return bio->bi_flags & (1U << bit); } static inline void bio_set_flag(struct bio *bio, unsigned int bit)