Message ID | 20181130165646.27341-25-axboe@kernel.dk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [01/27] aio: fix failure to put the file pointer | expand |
On Fri, Nov 30, 2018 at 09:56:43AM -0700, Jens Axboe wrote: > For an ITER_KVEC, we can just iterate the iov and add the pages > to the bio directly. > + page = virt_to_page(kv->iov_base); > + size = bio_add_page(bio, page, kv->iov_len, > + offset_in_page(kv->iov_base)); Who said that you *can* do virt_to_page() on those? E.g. vmalloc()'ed addresses are fine for ITER_KVEC, etc.
On 11/30/18 12:21 PM, Al Viro wrote: > On Fri, Nov 30, 2018 at 09:56:43AM -0700, Jens Axboe wrote: >> For an ITER_KVEC, we can just iterate the iov and add the pages >> to the bio directly. > >> + page = virt_to_page(kv->iov_base); >> + size = bio_add_page(bio, page, kv->iov_len, >> + offset_in_page(kv->iov_base)); > > Who said that you *can* do virt_to_page() on those? E.g. vmalloc()'ed > addresses are fine for ITER_KVEC, etc. Then how do you set up a kvec based iter with memory you can safely DMA to/from?
On 11/30/18 1:15 PM, Jens Axboe wrote: > On 11/30/18 12:21 PM, Al Viro wrote: >> On Fri, Nov 30, 2018 at 09:56:43AM -0700, Jens Axboe wrote: >>> For an ITER_KVEC, we can just iterate the iov and add the pages >>> to the bio directly. >> >>> + page = virt_to_page(kv->iov_base); >>> + size = bio_add_page(bio, page, kv->iov_len, >>> + offset_in_page(kv->iov_base)); >> >> Who said that you *can* do virt_to_page() on those? E.g. vmalloc()'ed >> addresses are fine for ITER_KVEC, etc. > > Then how do you set up a kvec based iter with memory you can safely > DMA to/from? Would this make you happy: if (!is_vmalloc_addr(kv->iov_base)) page = virt_to_page(kv->iov_base); else page = vmalloc_to_page(kv->iov_base);
On Fri, Nov 30, 2018 at 01:32:21PM -0700, Jens Axboe wrote: > On 11/30/18 1:15 PM, Jens Axboe wrote: > > On 11/30/18 12:21 PM, Al Viro wrote: > >> On Fri, Nov 30, 2018 at 09:56:43AM -0700, Jens Axboe wrote: > >>> For an ITER_KVEC, we can just iterate the iov and add the pages > >>> to the bio directly. > >> > >>> + page = virt_to_page(kv->iov_base); > >>> + size = bio_add_page(bio, page, kv->iov_len, > >>> + offset_in_page(kv->iov_base)); > >> > >> Who said that you *can* do virt_to_page() on those? E.g. vmalloc()'ed > >> addresses are fine for ITER_KVEC, etc. > > > > Then how do you set up a kvec based iter with memory you can safely > > DMA to/from? > > Would this make you happy: > > if (!is_vmalloc_addr(kv->iov_base)) > page = virt_to_page(kv->iov_base); > else > page = vmalloc_to_page(kv->iov_base); Free advice: don't ever let Linus see anything along those lines. Results tend to be colourful...
On 11/30/18 2:11 PM, Al Viro wrote: > On Fri, Nov 30, 2018 at 01:32:21PM -0700, Jens Axboe wrote: >> On 11/30/18 1:15 PM, Jens Axboe wrote: >>> On 11/30/18 12:21 PM, Al Viro wrote: >>>> On Fri, Nov 30, 2018 at 09:56:43AM -0700, Jens Axboe wrote: >>>>> For an ITER_KVEC, we can just iterate the iov and add the pages >>>>> to the bio directly. >>>> >>>>> + page = virt_to_page(kv->iov_base); >>>>> + size = bio_add_page(bio, page, kv->iov_len, >>>>> + offset_in_page(kv->iov_base)); >>>> >>>> Who said that you *can* do virt_to_page() on those? E.g. vmalloc()'ed >>>> addresses are fine for ITER_KVEC, etc. >>> >>> Then how do you set up a kvec based iter with memory you can safely >>> DMA to/from? >> >> Would this make you happy: >> >> if (!is_vmalloc_addr(kv->iov_base)) >> page = virt_to_page(kv->iov_base); >> else >> page = vmalloc_to_page(kv->iov_base); > > Free advice: don't ever let Linus see anything along those lines. Results > tend to be colourful... We already have those lines in the kernel, XFS for instance. Al, could you please try to be helpful instead of being deliberately obtuse? Examples of being helpful: 1) Informing the sender of why something is a bad idea, instead of just saying it's a bad idea. 2) Making helpful suggestions to improve the current situation.
On Fri, Nov 30, 2018 at 02:16:38PM -0700, Jens Axboe wrote: > >> Would this make you happy: > >> > >> if (!is_vmalloc_addr(kv->iov_base)) > >> page = virt_to_page(kv->iov_base); > >> else > >> page = vmalloc_to_page(kv->iov_base); > > > > Free advice: don't ever let Linus see anything along those lines. Results > > tend to be colourful... > > We already have those lines in the kernel, XFS for instance. Al, could you > please try to be helpful instead of being deliberately obtuse? Again, the last time something like that had been suggested, Linus had replied with a very impressive rant. I *did* propose pretty much that, and reaction was basically "hell no, not in general-purpose primitives". Precisely about iov_iter stuff. A part of that was due to touching page refcounts, but quite a bit wasn't.
On 11/30/18 2:25 PM, Al Viro wrote: > On Fri, Nov 30, 2018 at 02:16:38PM -0700, Jens Axboe wrote: >>>> Would this make you happy: >>>> >>>> if (!is_vmalloc_addr(kv->iov_base)) >>>> page = virt_to_page(kv->iov_base); >>>> else >>>> page = vmalloc_to_page(kv->iov_base); >>> >>> Free advice: don't ever let Linus see anything along those lines. Results >>> tend to be colourful... >> >> We already have those lines in the kernel, XFS for instance. Al, could you >> please try to be helpful instead of being deliberately obtuse? > > Again, the last time something like that had been suggested, Linus had replied > with a very impressive rant. I *did* propose pretty much that, and reaction > was basically "hell no, not in general-purpose primitives". Precisely about > iov_iter stuff. A part of that was due to touching page refcounts, but quite > a bit wasn't. Nobody is touching the page count here, and for the aio user mapped IO, nobody is touching them at the end either. As far as I can tell, the above is fine. It's either a vmalloc'ed address and should be treated specially, or we can do virt_to_page() on it. Do you have a link to said rant?
On 11/30/18 2:34 PM, Jens Axboe wrote: > On 11/30/18 2:25 PM, Al Viro wrote: >> On Fri, Nov 30, 2018 at 02:16:38PM -0700, Jens Axboe wrote: >>>>> Would this make you happy: >>>>> >>>>> if (!is_vmalloc_addr(kv->iov_base)) >>>>> page = virt_to_page(kv->iov_base); >>>>> else >>>>> page = vmalloc_to_page(kv->iov_base); >>>> >>>> Free advice: don't ever let Linus see anything along those lines. Results >>>> tend to be colourful... >>> >>> We already have those lines in the kernel, XFS for instance. Al, could you >>> please try to be helpful instead of being deliberately obtuse? >> >> Again, the last time something like that had been suggested, Linus had replied >> with a very impressive rant. I *did* propose pretty much that, and reaction >> was basically "hell no, not in general-purpose primitives". Precisely about >> iov_iter stuff. A part of that was due to touching page refcounts, but quite >> a bit wasn't. > > Nobody is touching the page count here, and for the aio user mapped IO, > nobody is touching them at the end either. > > As far as I can tell, the above is fine. It's either a vmalloc'ed > address and should be treated specially, or we can do virt_to_page() on > it. > > Do you have a link to said rant? I found the rant. I think the solution here is to switch it to using ITER_BVEC instead. With ITER_KVEC, we don't necessarily know if we can map it, for bvec we already have the pages. And from the aio point of view, we know the pages are sane.
On Fri, Nov 30, 2018 at 07:21:02PM +0000, Al Viro wrote: > On Fri, Nov 30, 2018 at 09:56:43AM -0700, Jens Axboe wrote: > > For an ITER_KVEC, we can just iterate the iov and add the pages > > to the bio directly. > > > + page = virt_to_page(kv->iov_base); > > + size = bio_add_page(bio, page, kv->iov_len, > > + offset_in_page(kv->iov_base)); > > Who said that you *can* do virt_to_page() on those? E.g. vmalloc()'ed > addresses are fine for ITER_KVEC, etc. In this particular case the caller it seems the callers knows what kind of pages we have, but we need to properly document this at the very least. Note that in the completely generic case iov_base could also point to memory without a struct page at all (e.g. ioremap()ed memory).
On 12/4/18 7:55 AM, Christoph Hellwig wrote: > On Fri, Nov 30, 2018 at 07:21:02PM +0000, Al Viro wrote: >> On Fri, Nov 30, 2018 at 09:56:43AM -0700, Jens Axboe wrote: >>> For an ITER_KVEC, we can just iterate the iov and add the pages >>> to the bio directly. >> >>> + page = virt_to_page(kv->iov_base); >>> + size = bio_add_page(bio, page, kv->iov_len, >>> + offset_in_page(kv->iov_base)); >> >> Who said that you *can* do virt_to_page() on those? E.g. vmalloc()'ed >> addresses are fine for ITER_KVEC, etc. > > In this particular case the caller it seems the callers knows what kind > of pages we have, but we need to properly document this at the very least. > > Note that in the completely generic case iov_base could also point to > memory without a struct page at all (e.g. ioremap()ed memory). That's why I went to ITER_BVEC instead, that's much saner for this use case.
diff --git a/block/bio.c b/block/bio.c index ab174bce5436..7e59ef547ed4 100644 --- a/block/bio.c +++ b/block/bio.c @@ -903,6 +903,36 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) } EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages); +/** + * bio_iov_kvec_add_pages - add pages from an ITER_KVEC to a bio + * @bio: bio to add pages to + * @iter: iov iterator describing the region to be added + * + * Iterate pages in the @iter and add them to the bio. We flag the + * @bio with BIO_HOLD_PAGES, telling IO completion not to free them. + */ +int bio_iov_kvec_add_pages(struct bio *bio, struct iov_iter *iter) +{ + unsigned short orig_vcnt = bio->bi_vcnt; + const struct kvec *kv; + + do { + struct page *page; + size_t size; + + kv = iter->kvec + iter->iov_offset; + page = virt_to_page(kv->iov_base); + size = bio_add_page(bio, page, kv->iov_len, + offset_in_page(kv->iov_base)); + if (size != kv->iov_len) + break; + iov_iter_advance(iter, size); + } while (iov_iter_count(iter) && !bio_full(bio)); + + bio_set_flag(bio, BIO_HOLD_PAGES); + return bio->bi_vcnt > orig_vcnt ? 0 : -EINVAL; +} + static void submit_bio_wait_endio(struct bio *bio) { complete(bio->bi_private); diff --git a/include/linux/bio.h b/include/linux/bio.h index 056fb627edb3..23ae8fb66b1e 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -434,6 +434,7 @@ bool __bio_try_merge_page(struct bio *bio, struct page *page, void __bio_add_page(struct bio *bio, struct page *page, unsigned int len, unsigned int off); int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter); +int bio_iov_kvec_add_pages(struct bio *bio, struct iov_iter *iter); struct rq_map_data; extern struct bio *bio_map_user_iov(struct request_queue *, struct iov_iter *, gfp_t);
For an ITER_KVEC, we can just iterate the iov and add the pages to the bio directly. Signed-off-by: Jens Axboe <axboe@kernel.dk> --- block/bio.c | 30 ++++++++++++++++++++++++++++++ include/linux/bio.h | 1 + 2 files changed, 31 insertions(+)