Message ID | 20230519074047.1739879-30-dhowells@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | splice, block: Use page pinning and kill ITER_PIPE | expand |
On Fri, May 19, 2023 at 08:40:44AM +0100, David Howells wrote: > From: Christoph Hellwig <hch@lst.de> > > Replace BIO_NO_PAGE_REF with a BIO_PAGE_REFFED flag that has the inverted > meaning is only set when a page reference has been acquired that needs to > be released by bio_release_pages(). What was the motivation for this patch?
On Fri, May 19, 2023 at 09:26:07PM -0400, Kent Overstreet wrote: > On Fri, May 19, 2023 at 08:40:44AM +0100, David Howells wrote: > > From: Christoph Hellwig <hch@lst.de> > > > > Replace BIO_NO_PAGE_REF with a BIO_PAGE_REFFED flag that has the inverted > > meaning is only set when a page reference has been acquired that needs to > > be released by bio_release_pages(). > > What was the motivation for this patch? So that is only is set when we need to release a page, instead telling code to not release it when it otherwise would, where otherwise would is implicit and undocumented and changes in this series.
On Fri, May 19, 2023 at 08:56:56PM -0700, Christoph Hellwig wrote: > On Fri, May 19, 2023 at 09:26:07PM -0400, Kent Overstreet wrote: > > On Fri, May 19, 2023 at 08:40:44AM +0100, David Howells wrote: > > > From: Christoph Hellwig <hch@lst.de> > > > > > > Replace BIO_NO_PAGE_REF with a BIO_PAGE_REFFED flag that has the inverted > > > meaning is only set when a page reference has been acquired that needs to > > > be released by bio_release_pages(). > > > > What was the motivation for this patch? > > So that is only is set when we need to release a page, instead telling > code to not release it when it otherwise would, where otherwise would > is implicit and undocumented and changes in this series. I suppose this way setting it can be done in bio_iov_iter_get_pages() - ok yeah, that makes sense. But it seems like it should be set in bio_iov_iter_get_pages() though, and I'm not seeing that?
On Sat, May 20, 2023 at 12:13:49AM -0400, Kent Overstreet wrote: > I suppose this way setting it can be done in bio_iov_iter_get_pages() - > ok yeah, that makes sense. > > But it seems like it should be set in bio_iov_iter_get_pages() though, > and I'm not seeing that? It is set in bio_iov_iter_get_pages in this patch. The later gets replaced with the pinned flag when we bio_iov_iter_get_pages is changed to pin pages instead.
On Fri, May 19, 2023 at 09:17:43PM -0700, Christoph Hellwig wrote: > On Sat, May 20, 2023 at 12:13:49AM -0400, Kent Overstreet wrote: > > I suppose this way setting it can be done in bio_iov_iter_get_pages() - > > ok yeah, that makes sense. > > > > But it seems like it should be set in bio_iov_iter_get_pages() though, > > and I'm not seeing that? > > It is set in bio_iov_iter_get_pages in this patch. The later gets > replaced with the pinned flag when we bio_iov_iter_get_pages is > changed to pin pages instead. Whoops, missed it. Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet <kent.overstreet@linux.dev> wrote: > > Replace BIO_NO_PAGE_REF with a BIO_PAGE_REFFED flag that has the inverted > > meaning is only set when a page reference has been acquired that needs to > > be released by bio_release_pages(). > > What was the motivation for this patch? We need to move to using FOLL_PIN for buffers derived from direct I/O to avoid the fork vs async-DIO race. Further, we shouldn't be taking a ref or a pin on pages derived from internal kernel iterators such as KVEC or BVEC as the page refcount might not be a valid way to control the lifetime of the data/buffers in those pages (slab, for instance). Rather, for internal kernel I/O, we need to rely on the caller to hold onto the memory until we tell them we've finished. So we flip the polarity of the page-is-ref'd flag and then add a page-is-pinned flag. The intention is to ultimately drop the page-is-ref'd flag - but we still need to keep the page-is-pinned flag. This makes it easier to take a stepwise approach - and having both flags working the same way makes the logic easier to follow. See iov_iter_extract_pages() and iov_iter_extract_will_pin(). David
diff --git a/block/bio.c b/block/bio.c index 043944fd46eb..8516adeaea26 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1191,7 +1191,6 @@ void bio_iov_bvec_set(struct bio *bio, struct iov_iter *iter) bio->bi_io_vec = (struct bio_vec *)iter->bvec; bio->bi_iter.bi_bvec_done = iter->iov_offset; bio->bi_iter.bi_size = size; - bio_set_flag(bio, BIO_NO_PAGE_REF); bio_set_flag(bio, BIO_CLONED); } @@ -1336,6 +1335,7 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) return 0; } + bio_set_flag(bio, BIO_PAGE_REFFED); do { ret = __bio_iov_iter_get_pages(bio, iter); } while (!ret && iov_iter_count(iter) && !bio_full(bio, 0)); diff --git a/block/blk-map.c b/block/blk-map.c index 04c55f1c492e..33d9f6e89ba6 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -282,6 +282,7 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter, if (blk_queue_pci_p2pdma(rq->q)) extraction_flags |= ITER_ALLOW_P2PDMA; + bio_set_flag(bio, BIO_PAGE_REFFED); while (iov_iter_count(iter)) { struct page **pages, *stack_pages[UIO_FASTIOV]; ssize_t bytes; diff --git a/fs/direct-io.c b/fs/direct-io.c index 0b380bb8a81e..ad20f3428bab 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -402,6 +402,8 @@ dio_bio_alloc(struct dio *dio, struct dio_submit *sdio, bio->bi_end_io = dio_bio_end_aio; else bio->bi_end_io = dio_bio_end_io; + /* for now require references for all pages */ + bio_set_flag(bio, BIO_PAGE_REFFED); sdio->bio = bio; sdio->logical_offset_in_bio = sdio->cur_page_fs_offset; } diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 66a9f10e3207..08873f0627dd 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -203,7 +203,6 @@ static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio, bio->bi_private = dio; bio->bi_end_io = iomap_dio_bio_end_io; - bio_set_flag(bio, BIO_NO_PAGE_REF); __bio_add_page(bio, page, len, 0); iomap_dio_submit_bio(iter, dio, bio, pos); } diff --git a/include/linux/bio.h b/include/linux/bio.h index 7f53be035cf0..0922729acd26 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -488,7 +488,7 @@ void zero_fill_bio(struct bio *bio); static inline void bio_release_pages(struct bio *bio, bool mark_dirty) { - if (!bio_flagged(bio, BIO_NO_PAGE_REF)) + if (bio_flagged(bio, BIO_PAGE_REFFED)) __bio_release_pages(bio, mark_dirty); } diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 740afe80f297..dfd2c2cb909d 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -323,7 +323,7 @@ struct bio { * bio flags */ enum { - BIO_NO_PAGE_REF, /* don't put release vec pages */ + BIO_PAGE_REFFED, /* put pages in bio_release_pages() */ BIO_CLONED, /* doesn't own data */ BIO_BOUNCED, /* bio is a bounce bio */ BIO_QUIET, /* Make BIO Quiet */