diff mbox series

[24/27] block: implement bio helper to add iter kvec pages to bio

Message ID 20181130165646.27341-25-axboe@kernel.dk (mailing list archive)
State New, archived
Headers show
Series [01/27] aio: fix failure to put the file pointer | expand

Commit Message

Jens Axboe Nov. 30, 2018, 4:56 p.m. UTC
For an ITER_KVEC, we can just iterate the iov and add the pages
to the bio directly.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/bio.c         | 30 ++++++++++++++++++++++++++++++
 include/linux/bio.h |  1 +
 2 files changed, 31 insertions(+)

Comments

Al Viro Nov. 30, 2018, 7:21 p.m. UTC | #1
On Fri, Nov 30, 2018 at 09:56:43AM -0700, Jens Axboe wrote:
> For an ITER_KVEC, we can just iterate the iov and add the pages
> to the bio directly.

> +		page = virt_to_page(kv->iov_base);
> +		size = bio_add_page(bio, page, kv->iov_len,
> +					offset_in_page(kv->iov_base));

Who said that you *can* do virt_to_page() on those?  E.g. vmalloc()'ed
addresses are fine for ITER_KVEC, etc.
Jens Axboe Nov. 30, 2018, 8:15 p.m. UTC | #2
On 11/30/18 12:21 PM, Al Viro wrote:
> On Fri, Nov 30, 2018 at 09:56:43AM -0700, Jens Axboe wrote:
>> For an ITER_KVEC, we can just iterate the iov and add the pages
>> to the bio directly.
> 
>> +		page = virt_to_page(kv->iov_base);
>> +		size = bio_add_page(bio, page, kv->iov_len,
>> +					offset_in_page(kv->iov_base));
> 
> Who said that you *can* do virt_to_page() on those?  E.g. vmalloc()'ed
> addresses are fine for ITER_KVEC, etc.

Then how do you set up a kvec based iter with memory you can safely
DMA to/from?
Jens Axboe Nov. 30, 2018, 8:32 p.m. UTC | #3
On 11/30/18 1:15 PM, Jens Axboe wrote:
> On 11/30/18 12:21 PM, Al Viro wrote:
>> On Fri, Nov 30, 2018 at 09:56:43AM -0700, Jens Axboe wrote:
>>> For an ITER_KVEC, we can just iterate the iov and add the pages
>>> to the bio directly.
>>
>>> +		page = virt_to_page(kv->iov_base);
>>> +		size = bio_add_page(bio, page, kv->iov_len,
>>> +					offset_in_page(kv->iov_base));
>>
>> Who said that you *can* do virt_to_page() on those?  E.g. vmalloc()'ed
>> addresses are fine for ITER_KVEC, etc.
> 
> Then how do you set up a kvec based iter with memory you can safely
> DMA to/from?

Would this make you happy:

if (!is_vmalloc_addr(kv->iov_base))
        page = virt_to_page(kv->iov_base);
else
        page = vmalloc_to_page(kv->iov_base);
Al Viro Nov. 30, 2018, 9:11 p.m. UTC | #4
On Fri, Nov 30, 2018 at 01:32:21PM -0700, Jens Axboe wrote:
> On 11/30/18 1:15 PM, Jens Axboe wrote:
> > On 11/30/18 12:21 PM, Al Viro wrote:
> >> On Fri, Nov 30, 2018 at 09:56:43AM -0700, Jens Axboe wrote:
> >>> For an ITER_KVEC, we can just iterate the iov and add the pages
> >>> to the bio directly.
> >>
> >>> +		page = virt_to_page(kv->iov_base);
> >>> +		size = bio_add_page(bio, page, kv->iov_len,
> >>> +					offset_in_page(kv->iov_base));
> >>
> >> Who said that you *can* do virt_to_page() on those?  E.g. vmalloc()'ed
> >> addresses are fine for ITER_KVEC, etc.
> > 
> > Then how do you set up a kvec based iter with memory you can safely
> > DMA to/from?
> 
> Would this make you happy:
> 
> if (!is_vmalloc_addr(kv->iov_base))
>         page = virt_to_page(kv->iov_base);
> else
>         page = vmalloc_to_page(kv->iov_base);

Free advice: don't ever let Linus see anything along those lines.  Results
tend to be colourful...
Jens Axboe Nov. 30, 2018, 9:16 p.m. UTC | #5
On 11/30/18 2:11 PM, Al Viro wrote:
> On Fri, Nov 30, 2018 at 01:32:21PM -0700, Jens Axboe wrote:
>> On 11/30/18 1:15 PM, Jens Axboe wrote:
>>> On 11/30/18 12:21 PM, Al Viro wrote:
>>>> On Fri, Nov 30, 2018 at 09:56:43AM -0700, Jens Axboe wrote:
>>>>> For an ITER_KVEC, we can just iterate the iov and add the pages
>>>>> to the bio directly.
>>>>
>>>>> +		page = virt_to_page(kv->iov_base);
>>>>> +		size = bio_add_page(bio, page, kv->iov_len,
>>>>> +					offset_in_page(kv->iov_base));
>>>>
>>>> Who said that you *can* do virt_to_page() on those?  E.g. vmalloc()'ed
>>>> addresses are fine for ITER_KVEC, etc.
>>>
>>> Then how do you set up a kvec based iter with memory you can safely
>>> DMA to/from?
>>
>> Would this make you happy:
>>
>> if (!is_vmalloc_addr(kv->iov_base))
>>         page = virt_to_page(kv->iov_base);
>> else
>>         page = vmalloc_to_page(kv->iov_base);
> 
> Free advice: don't ever let Linus see anything along those lines.  Results
> tend to be colourful...

We already have those lines in the kernel, XFS for instance. Al, could you
please try to be helpful instead of being deliberately obtuse?

Examples of being helpful:

1) Informing the sender of why something is a bad idea, instead of just
   saying it's a bad idea.

2) Making helpful suggestions to improve the current situation.
Al Viro Nov. 30, 2018, 9:25 p.m. UTC | #6
On Fri, Nov 30, 2018 at 02:16:38PM -0700, Jens Axboe wrote:
> >> Would this make you happy:
> >>
> >> if (!is_vmalloc_addr(kv->iov_base))
> >>         page = virt_to_page(kv->iov_base);
> >> else
> >>         page = vmalloc_to_page(kv->iov_base);
> > 
> > Free advice: don't ever let Linus see anything along those lines.  Results
> > tend to be colourful...
> 
> We already have those lines in the kernel, XFS for instance. Al, could you
> please try to be helpful instead of being deliberately obtuse?

Again, the last time something like that had been suggested, Linus had replied
with a very impressive rant.  I *did* propose pretty much that, and reaction
was basically "hell no, not in general-purpose primitives".  Precisely about
iov_iter stuff.  A part of that was due to touching page refcounts, but quite
a bit wasn't.
Jens Axboe Nov. 30, 2018, 9:34 p.m. UTC | #7
On 11/30/18 2:25 PM, Al Viro wrote:
> On Fri, Nov 30, 2018 at 02:16:38PM -0700, Jens Axboe wrote:
>>>> Would this make you happy:
>>>>
>>>> if (!is_vmalloc_addr(kv->iov_base))
>>>>         page = virt_to_page(kv->iov_base);
>>>> else
>>>>         page = vmalloc_to_page(kv->iov_base);
>>>
>>> Free advice: don't ever let Linus see anything along those lines.  Results
>>> tend to be colourful...
>>
>> We already have those lines in the kernel, XFS for instance. Al, could you
>> please try to be helpful instead of being deliberately obtuse?
> 
> Again, the last time something like that had been suggested, Linus had replied
> with a very impressive rant.  I *did* propose pretty much that, and reaction
> was basically "hell no, not in general-purpose primitives".  Precisely about
> iov_iter stuff.  A part of that was due to touching page refcounts, but quite
> a bit wasn't.

Nobody is touching the page count here, and for the aio user mapped IO,
nobody is touching them at the end either.

As far as I can tell, the above is fine. It's either a vmalloc'ed
address and should be treated specially, or we can do virt_to_page() on
it.

Do you have a link to said rant?
Jens Axboe Nov. 30, 2018, 10:06 p.m. UTC | #8
On 11/30/18 2:34 PM, Jens Axboe wrote:
> On 11/30/18 2:25 PM, Al Viro wrote:
>> On Fri, Nov 30, 2018 at 02:16:38PM -0700, Jens Axboe wrote:
>>>>> Would this make you happy:
>>>>>
>>>>> if (!is_vmalloc_addr(kv->iov_base))
>>>>>         page = virt_to_page(kv->iov_base);
>>>>> else
>>>>>         page = vmalloc_to_page(kv->iov_base);
>>>>
>>>> Free advice: don't ever let Linus see anything along those lines.  Results
>>>> tend to be colourful...
>>>
>>> We already have those lines in the kernel, XFS for instance. Al, could you
>>> please try to be helpful instead of being deliberately obtuse?
>>
>> Again, the last time something like that had been suggested, Linus had replied
>> with a very impressive rant.  I *did* propose pretty much that, and reaction
>> was basically "hell no, not in general-purpose primitives".  Precisely about
>> iov_iter stuff.  A part of that was due to touching page refcounts, but quite
>> a bit wasn't.
> 
> Nobody is touching the page count here, and for the aio user mapped IO,
> nobody is touching them at the end either.
> 
> As far as I can tell, the above is fine. It's either a vmalloc'ed
> address and should be treated specially, or we can do virt_to_page() on
> it.
> 
> Do you have a link to said rant?

I found the rant. I think the solution here is to switch it to using
ITER_BVEC instead. With ITER_KVEC, we don't necessarily know if we can
map it, for bvec we already have the pages. And from the aio point of
view, we know the pages are sane.
Christoph Hellwig Dec. 4, 2018, 2:55 p.m. UTC | #9
On Fri, Nov 30, 2018 at 07:21:02PM +0000, Al Viro wrote:
> On Fri, Nov 30, 2018 at 09:56:43AM -0700, Jens Axboe wrote:
> > For an ITER_KVEC, we can just iterate the iov and add the pages
> > to the bio directly.
> 
> > +		page = virt_to_page(kv->iov_base);
> > +		size = bio_add_page(bio, page, kv->iov_len,
> > +					offset_in_page(kv->iov_base));
> 
> Who said that you *can* do virt_to_page() on those?  E.g. vmalloc()'ed
> addresses are fine for ITER_KVEC, etc.

In this particular case the caller it seems the callers knows what kind
of pages we have, but we need to properly document this at the very least.

Note that in the completely generic case iov_base could also point to
memory without a struct page at all (e.g. ioremap()ed memory).
Jens Axboe Dec. 4, 2018, 3:25 p.m. UTC | #10
On 12/4/18 7:55 AM, Christoph Hellwig wrote:
> On Fri, Nov 30, 2018 at 07:21:02PM +0000, Al Viro wrote:
>> On Fri, Nov 30, 2018 at 09:56:43AM -0700, Jens Axboe wrote:
>>> For an ITER_KVEC, we can just iterate the iov and add the pages
>>> to the bio directly.
>>
>>> +		page = virt_to_page(kv->iov_base);
>>> +		size = bio_add_page(bio, page, kv->iov_len,
>>> +					offset_in_page(kv->iov_base));
>>
>> Who said that you *can* do virt_to_page() on those?  E.g. vmalloc()'ed
>> addresses are fine for ITER_KVEC, etc.
> 
> In this particular case the caller it seems the callers knows what kind
> of pages we have, but we need to properly document this at the very least.
> 
> Note that in the completely generic case iov_base could also point to
> memory without a struct page at all (e.g. ioremap()ed memory).

That's why I went to ITER_BVEC instead, that's much saner for this
use case.
diff mbox series

Patch

diff --git a/block/bio.c b/block/bio.c
index ab174bce5436..7e59ef547ed4 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -903,6 +903,36 @@  int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 }
 EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages);
 
+/**
+ * bio_iov_kvec_add_pages - add pages from an ITER_KVEC to a bio
+ * @bio: bio to add pages to
+ * @iter: iov iterator describing the region to be added
+ *
+ * Iterate pages in the @iter and add them to the bio. We flag the
+ * @bio with BIO_HOLD_PAGES, telling IO completion not to free them.
+ */
+int bio_iov_kvec_add_pages(struct bio *bio, struct iov_iter *iter)
+{
+	unsigned short orig_vcnt = bio->bi_vcnt;
+	const struct kvec *kv;
+
+	do {
+		struct page *page;
+		size_t size;
+
+		kv = iter->kvec + iter->iov_offset;
+		page = virt_to_page(kv->iov_base);
+		size = bio_add_page(bio, page, kv->iov_len,
+					offset_in_page(kv->iov_base));
+		if (size != kv->iov_len)
+			break;
+		iov_iter_advance(iter, size);
+	} while (iov_iter_count(iter) && !bio_full(bio));
+
+	bio_set_flag(bio, BIO_HOLD_PAGES);
+	return bio->bi_vcnt > orig_vcnt ? 0 : -EINVAL;
+}
+
 static void submit_bio_wait_endio(struct bio *bio)
 {
 	complete(bio->bi_private);
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 056fb627edb3..23ae8fb66b1e 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -434,6 +434,7 @@  bool __bio_try_merge_page(struct bio *bio, struct page *page,
 void __bio_add_page(struct bio *bio, struct page *page,
 		unsigned int len, unsigned int off);
 int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter);
+int bio_iov_kvec_add_pages(struct bio *bio, struct iov_iter *iter);
 struct rq_map_data;
 extern struct bio *bio_map_user_iov(struct request_queue *,
 				    struct iov_iter *, gfp_t);