Message ID | 20230328173613.555192-6-axboe@kernel.dk (mailing list archive) |
---|---|
State | Mainlined, archived |
Headers | show |
Series | Turn single segment imports into ITER_UBUF | expand |
On Tue, Mar 28, 2023 at 10:36 AM Jens Axboe <axboe@kernel.dk> wrote: > > Don't assume that a user backed iterator is always of the type > ITER_IOVEC. Handle the single segment case separately, then we can > use the same logic for ITER_UBUF and ITER_IOVEC. Ugh. This is ugly. Yes,. the original code is ugly too, but this makes it worse. You have that helper for "give me the number of iovecs" and that just works automatically with the ITER_UBUF case. But this code (and the sound driver code in the previous patch), really lso wants a helper to just return the 'iov' array. And I think you should just do exactly that. The problem with 'iov_iter_iovec()' is that it doesn't return the array, it just returns the first entry, so it's unusable for this case, and then you have all these special "do something else for the single-entry situation" cases. And iov_iter_iovec() actually tries to be nice and clever and add the iov_offset, so that you can actually do the proper iov_iter_advance() on it etc, but again, this is not what any of this code wants, it just wants the raw iov array, and the base will always be zero, because this code just doesn't *work* on the iter level, and never advances the iterator, it just advances the array index. And the thing is, I think you could easily just add a const struct iovec *iov_iter_iovec_array(iter); helper that just always returns a valid array of iov's. For a ITER_IOV, it would just return the raw iov pointer. And for a ITER_UBUF, we could either (a) just always pass in a single-entry auto iov that gets filled in and the pointer to it returned (b) be *really* clever (or ugly, depending on how you want to see it), and do something like this: --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -49,14 +49,23 @@ struct iov_iter { size_t iov_offset; int last_offset; }; - size_t count; - union { - const struct iovec *iov; - const struct kvec *kvec; - const struct bio_vec *bvec; - struct xarray *xarray; - struct pipe_inode_info *pipe; - void __user *ubuf; + + /* + * This has the same layout as 'struct iovec'! + * In particular, the ITER_UBUF form can create + * a single-entry 'struct iovec' by casting the + * address of the 'ubuf' member to that. + */ + struct { + union { + const struct iovec *iov; + const struct kvec *kvec; + const struct bio_vec *bvec; + struct xarray *xarray; + struct pipe_inode_info *pipe; + void __user *ubuf; + }; + size_t count; }; union { unsigned long nr_segs; and if you accept the above, then you can do #define iter_ubuf_to_iov(iter) ((const struct iovec *)&(iter)->ubuf) which I will admit is not *pretty*, but it's kind of clever, I think. So now you can trivially turn a user-backed iov_iter into the related 'struct iovec *' by just doing #define iov_iter_iovec_array(iter) \ ((iter)->type == ITER_UBUF ? iter_ubuf_to_iov(iter) : (iter)->iov) or something like that. And no, the above is NOT AT ALL TESTED. Caveat emptor. And if you go blind from looking at that patch, I will not accept responsibility. Linus
On Tue, Mar 28, 2023 at 11:43:34AM -0700, Linus Torvalds wrote: > - size_t count; > - union { > - const struct iovec *iov; > - const struct kvec *kvec; > - const struct bio_vec *bvec; > - struct xarray *xarray; > - struct pipe_inode_info *pipe; > - void __user *ubuf; > + > + /* > + * This has the same layout as 'struct iovec'! > + * In particular, the ITER_UBUF form can create > + * a single-entry 'struct iovec' by casting the > + * address of the 'ubuf' member to that. > + */ > + struct { > + union { > + const struct iovec *iov; > + const struct kvec *kvec; > + const struct bio_vec *bvec; > + struct xarray *xarray; > + struct pipe_inode_info *pipe; > + void __user *ubuf; > + }; > + size_t count; > }; > union { > unsigned long nr_segs; > > and if you accept the above, then you can do > > #define iter_ubuf_to_iov(iter) ((const struct iovec *)&(iter)->ubuf) > > which I will admit is not *pretty*, but it's kind of clever, I think. I think it'll annoy gcc, and particularly the randstruct plugin. How about: union { struct iovec ubuf; struct { const struct iovec *iov; size_t count; /* Also valid for subsequent types */ }; const struct kvec *kvec; const struct bio_vec *bvec; struct xarray *xarray; struct pipe_inode_info *pipe; }
On Tue, Mar 28, 2023 at 11:55 AM Matthew Wilcox <willy@infradead.org> wrote: > > I think it'll annoy gcc, and particularly the randstruct plugin. No, randstruct doesn't go change any normal data structure layout on its own. You have to actively mark things for randstruct, or they have to be pure function pointer ones. But it's not like adding a 'struct iovec' explicitly to the members just as extra "code documentation" would be wrong. I don't think it really helps, though, since you have to have that other explicit structure there anyway to get the member names right. So I don't hate your version, but I don't think it really helps either. Linus
On Tue, Mar 28, 2023 at 12:05 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > But it's not like adding a 'struct iovec' explicitly to the members > just as extra "code documentation" would be wrong. > > I don't think it really helps, though, since you have to have that > other explicit structure there anyway to get the member names right. Actually, thinking a bit more about it, adding a const struct iovec xyzzy; member might be a good idea just to avoid a cast. Then that iter_ubuf_to_iov() macro becomes just #define iter_ubuf_to_iov(iter) (&(iter)->xyzzy) and that looks much nicer (plus still acts kind of as a "code comment" to clarify things). Linus
On 3/28/23 12:43?PM, Linus Torvalds wrote: > On Tue, Mar 28, 2023 at 10:36?AM Jens Axboe <axboe@kernel.dk> wrote: >> >> Don't assume that a user backed iterator is always of the type >> ITER_IOVEC. Handle the single segment case separately, then we can >> use the same logic for ITER_UBUF and ITER_IOVEC. > > Ugh. This is ugly. > > Yes,. the original code is ugly too, but this makes it worse. Hah I know, I did feel dirty writing that patch... The existing code is pretty ugly as-is, but it sure didn't get better. > You have that helper for "give me the number of iovecs" and that just > works automatically with the ITER_UBUF case. But this code (and the > sound driver code in the previous patch), really lso wants a helper to > just return the 'iov' array. > > And I think you should just do exactly that. The problem with > 'iov_iter_iovec()' is that it doesn't return the array, it just > returns the first entry, so it's unusable for this case, and then you > have all these special "do something else for the single-entry > situation" cases. > > And iov_iter_iovec() actually tries to be nice and clever and add the > iov_offset, so that you can actually do the proper iov_iter_advance() > on it etc, but again, this is not what any of this code wants, it just > wants the raw iov array, and the base will always be zero, because > this code just doesn't *work* on the iter level, and never advances > the iterator, it just advances the array index. > > And the thing is, I think you could easily just add a > > const struct iovec *iov_iter_iovec_array(iter); > > helper that just always returns a valid array of iov's. > > For a ITER_IOV, it would just return the raw iov pointer. > > And for a ITER_UBUF, we could either > > (a) just always pass in a single-entry auto iov that gets filled in > and the pointer to it returned > > (b) be *really* clever (or ugly, depending on how you want to see > it), and do something like this: > > --- a/include/linux/uio.h > +++ b/include/linux/uio.h > @@ -49,14 +49,23 @@ struct iov_iter { > size_t iov_offset; > int last_offset; > }; > - size_t count; > - union { > - const struct iovec *iov; > - const struct kvec *kvec; > - const struct bio_vec *bvec; > - struct xarray *xarray; > - struct pipe_inode_info *pipe; > - void __user *ubuf; > + > + /* > + * This has the same layout as 'struct iovec'! > + * In particular, the ITER_UBUF form can create > + * a single-entry 'struct iovec' by casting the > + * address of the 'ubuf' member to that. > + */ > + struct { > + union { > + const struct iovec *iov; > + const struct kvec *kvec; > + const struct bio_vec *bvec; > + struct xarray *xarray; > + struct pipe_inode_info *pipe; > + void __user *ubuf; > + }; > + size_t count; > }; > union { > unsigned long nr_segs; > > and if you accept the above, then you can do > > #define iter_ubuf_to_iov(iter) ((const struct iovec *)&(iter)->ubuf) > > which I will admit is not *pretty*, but it's kind of clever, I think. > > So now you can trivially turn a user-backed iov_iter into the related > 'struct iovec *' by just doing > > #define iov_iter_iovec_array(iter) \ > ((iter)->type == ITER_UBUF ? iter_ubuf_to_iov(iter) : (iter)->iov) > > or something like that. > > And no, the above is NOT AT ALL TESTED. Caveat emptor. > > And if you go blind from looking at that patch, I will not accept > responsibility. I pondered something like that too, but balked at adding to iov_iter and then didn't pursue that any further. But bundled nicely, it should work out quite fine in the union. So I like the suggestion, and then just return a pointer to the vec rather than the copy, unifying the two cases. Thanks for the review and suggestion, I'll make that change.
On Tue, Mar 28, 2023 at 11:43:34AM -0700, Linus Torvalds wrote: > And if you go blind from looking at that patch, I will not accept > responsibility. ... and all that, er, cleverness - for the sake of a single piece of shit driver for piece of shit hardware *and* equally beautiful ABI. Is it really worth bothering with? And if anyone has doubts about the inturdprize kwality of the ABI in question, I suggest taking a look at hfi1_user_sdma_process_request() - that's where the horrors are.
On 3/28/23 2:38 PM, Al Viro wrote: > On Tue, Mar 28, 2023 at 11:43:34AM -0700, Linus Torvalds wrote: > >> And if you go blind from looking at that patch, I will not accept >> responsibility. > > ... and all that, er, cleverness - for the sake of a single piece of shit > driver for piece of shit hardware *and* equally beautiful ABI. > > Is it really worth bothering with? And if anyone has doubts about the > inturdprize kwality of the ABI in question, I suggest taking a look at > hfi1_user_sdma_process_request() - that's where the horrors are. It is horrible code, I only go as far as that very function before deciding that I wasn't going to be touching it as part of this. I do like the cleverness of the overlay, the only practical downside I see are things that _assign_ to eg iter->iovec after setting it up. That would lead to some, ehm, interesting side effects.
On 3/28/23 1:16?PM, Linus Torvalds wrote: > On Tue, Mar 28, 2023 at 12:05?PM Linus Torvalds > <torvalds@linux-foundation.org> wrote: >> >> But it's not like adding a 'struct iovec' explicitly to the members >> just as extra "code documentation" would be wrong. >> >> I don't think it really helps, though, since you have to have that >> other explicit structure there anyway to get the member names right. > > Actually, thinking a bit more about it, adding a > > const struct iovec xyzzy; > > member might be a good idea just to avoid a cast. Then that > iter_ubuf_to_iov() macro becomes just > > #define iter_ubuf_to_iov(iter) (&(iter)->xyzzy) > > and that looks much nicer (plus still acts kind of as a "code comment" > to clarify things). I went down this path, and it _mostly_ worked out. You can view the series here, I'll send it out when I've actually tested it: https://git.kernel.dk/cgit/linux-block/log/?h=iter-ubuf A few mental notes I made along the way: - The IB/sound changes are now just replacing an inappropriate iter_is_iovec() with iter->user_backed. That's nice and simple. - The iov_iter_iovec() case becomes a bit simpler. Or so I thought, because we still need to add in the offset so we can't just use out embedded iovec for that. The above branch is just using the iovec, but I don't think this is right. - Looks like it exposed a block bug, where the copy in bio_alloc_map_data() was obvious garbage but happened to work before. I'm still inclined to favor this approach over the previous, even if the IB driver is a pile of garbage and lighting it a bit more on fire would not really hurt. Opinions? Or do you want me to just send it out for easier reading
On 3/28/23 3:21?PM, Jens Axboe wrote: > On 3/28/23 1:16?PM, Linus Torvalds wrote: >> On Tue, Mar 28, 2023 at 12:05?PM Linus Torvalds >> <torvalds@linux-foundation.org> wrote: >>> >>> But it's not like adding a 'struct iovec' explicitly to the members >>> just as extra "code documentation" would be wrong. >>> >>> I don't think it really helps, though, since you have to have that >>> other explicit structure there anyway to get the member names right. >> >> Actually, thinking a bit more about it, adding a >> >> const struct iovec xyzzy; >> >> member might be a good idea just to avoid a cast. Then that >> iter_ubuf_to_iov() macro becomes just >> >> #define iter_ubuf_to_iov(iter) (&(iter)->xyzzy) >> >> and that looks much nicer (plus still acts kind of as a "code comment" >> to clarify things). > > I went down this path, and it _mostly_ worked out. You can view the > series here, I'll send it out when I've actually tested it: > > https://git.kernel.dk/cgit/linux-block/log/?h=iter-ubuf > > A few mental notes I made along the way: > > - The IB/sound changes are now just replacing an inappropriate > iter_is_iovec() with iter->user_backed. That's nice and simple. > > - The iov_iter_iovec() case becomes a bit simpler. Or so I thought, > because we still need to add in the offset so we can't just use > out embedded iovec for that. The above branch is just using the > iovec, but I don't think this is right. > > - Looks like it exposed a block bug, where the copy in > bio_alloc_map_data() was obvious garbage but happened to work > before. > > I'm still inclined to favor this approach over the previous, even if the > IB driver is a pile of garbage and lighting it a bit more on fire would > not really hurt. > > Opinions? Or do you want me to just send it out for easier reading While cleaning up that stuff, we only have a few users of iov_iter_iovec(). Why don't we just kill them off and the helper too? That drops that part of it and it kind of works out nicely beyond that. diff --git a/fs/read_write.c b/fs/read_write.c index 7a2ff6157eda..fb932d0997d4 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -749,15 +749,15 @@ static ssize_t do_loop_readv_writev(struct file *filp, struct iov_iter *iter, return -EOPNOTSUPP; while (iov_iter_count(iter)) { - struct iovec iovec = iov_iter_iovec(iter); + const struct iovec *iov = iter->iov; ssize_t nr; if (type == READ) { - nr = filp->f_op->read(filp, iovec.iov_base, - iovec.iov_len, ppos); + nr = filp->f_op->read(filp, iov->iov_base, + iov->iov_len, ppos); } else { - nr = filp->f_op->write(filp, iovec.iov_base, - iovec.iov_len, ppos); + nr = filp->f_op->write(filp, iov->iov_base, + iov->iov_len, ppos); } if (nr < 0) { @@ -766,7 +766,7 @@ static ssize_t do_loop_readv_writev(struct file *filp, struct iov_iter *iter, break; } ret += nr; - if (nr != iovec.iov_len) + if (nr != iov->iov_len) break; iov_iter_advance(iter, nr); } diff --git a/io_uring/rw.c b/io_uring/rw.c index 4c233910e200..585461a6f6a0 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -454,7 +454,8 @@ static ssize_t loop_rw_iter(int ddir, struct io_rw *rw, struct iov_iter *iter) iovec.iov_base = iter->ubuf + iter->iov_offset; iovec.iov_len = iov_iter_count(iter); } else if (!iov_iter_is_bvec(iter)) { - iovec = iov_iter_iovec(iter); + iovec.iov_base = iter->iov->iov_base; + iovec.iov_len = iter->iov->iov_len; } else { iovec.iov_base = u64_to_user_ptr(rw->addr); iovec.iov_len = rw->len; diff --git a/mm/madvise.c b/mm/madvise.c index 340125d08c03..0701a3bd530b 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -1456,7 +1456,8 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec, size_t, vlen, int, behavior, unsigned int, flags) { ssize_t ret; - struct iovec iovstack[UIO_FASTIOV], iovec; + struct iovec iovstack[UIO_FASTIOV]; + const struct iovec *iovec; struct iovec *iov = iovstack; struct iov_iter iter; struct task_struct *task; @@ -1503,12 +1504,12 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec, total_len = iov_iter_count(&iter); while (iov_iter_count(&iter)) { - iovec = iov_iter_iovec(&iter); - ret = do_madvise(mm, (unsigned long)iovec.iov_base, - iovec.iov_len, behavior); + iovec = iter.iov; + ret = do_madvise(mm, (unsigned long)iovec->iov_base, + iovec->iov_len, behavior); if (ret < 0) break; - iov_iter_advance(&iter, iovec.iov_len); + iov_iter_advance(&iter, iovec->iov_len); } ret = (total_len - iov_iter_count(&iter)) ? : ret;
On 3/28/23 3:38 PM, Jens Axboe wrote: > On 3/28/23 3:21?PM, Jens Axboe wrote: >> On 3/28/23 1:16?PM, Linus Torvalds wrote: >>> On Tue, Mar 28, 2023 at 12:05?PM Linus Torvalds >>> <torvalds@linux-foundation.org> wrote: >>>> >>>> But it's not like adding a 'struct iovec' explicitly to the members >>>> just as extra "code documentation" would be wrong. >>>> >>>> I don't think it really helps, though, since you have to have that >>>> other explicit structure there anyway to get the member names right. >>> >>> Actually, thinking a bit more about it, adding a >>> >>> const struct iovec xyzzy; >>> >>> member might be a good idea just to avoid a cast. Then that >>> iter_ubuf_to_iov() macro becomes just >>> >>> #define iter_ubuf_to_iov(iter) (&(iter)->xyzzy) >>> >>> and that looks much nicer (plus still acts kind of as a "code comment" >>> to clarify things). >> >> I went down this path, and it _mostly_ worked out. You can view the >> series here, I'll send it out when I've actually tested it: >> >> https://git.kernel.dk/cgit/linux-block/log/?h=iter-ubuf >> >> A few mental notes I made along the way: >> >> - The IB/sound changes are now just replacing an inappropriate >> iter_is_iovec() with iter->user_backed. That's nice and simple. >> >> - The iov_iter_iovec() case becomes a bit simpler. Or so I thought, >> because we still need to add in the offset so we can't just use >> out embedded iovec for that. The above branch is just using the >> iovec, but I don't think this is right. >> >> - Looks like it exposed a block bug, where the copy in >> bio_alloc_map_data() was obvious garbage but happened to work >> before. >> >> I'm still inclined to favor this approach over the previous, even if the >> IB driver is a pile of garbage and lighting it a bit more on fire would >> not really hurt. >> >> Opinions? Or do you want me to just send it out for easier reading > > While cleaning up that stuff, we only have a few users of iov_iter_iovec(). > Why don't we just kill them off and the helper too? That drops that > part of it and it kind of works out nicely beyond that. Ugh that won't work obviously, as we can't factor in per-vec offsets... So it has to be a copy.
On Tue, Mar 28, 2023 at 1:38 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > ... and all that, er, cleverness - for the sake of a single piece of shit > driver for piece of shit hardware *and* equally beautiful ABI. Now, I wish we didn't have any of those 'walk the iov{} array', but sadly we do, and it's not just a single case. It's also that pcm_native.c case, it's the qib rdma driver. So if we didn't have people walking the iov[] array, I'd hate to add this. But considering that we *do* have people walking the iov[] array, I'd rather unify the two user-mode cases than have them do the whole "do two different things for the ITER_UBUF vs ITER_IOV case". > Is it really worth bothering with? And if anyone has doubts about the > inturdprize kwality of the ABI in question, I suggest taking a look at > hfi1_user_sdma_process_request() - that's where the horrors are. Yes. I started out my email to Jens by suggesting that instead of passing down the iov[] pointer, he should just pass down the iter instead. And then I looked at that code and went "yeah, no way do I want to touch it". Which then got me to that "could we at least *unify* these two cases". Linus
diff --git a/drivers/infiniband/hw/hfi1/file_ops.c b/drivers/infiniband/hw/hfi1/file_ops.c index b1d6ca7e9708..f52f57c30429 100644 --- a/drivers/infiniband/hw/hfi1/file_ops.c +++ b/drivers/infiniband/hw/hfi1/file_ops.c @@ -262,11 +262,17 @@ static ssize_t hfi1_write_iter(struct kiocb *kiocb, struct iov_iter *from) struct hfi1_user_sdma_pkt_q *pq; struct hfi1_user_sdma_comp_q *cq = fd->cq; int done = 0, reqs = 0; - unsigned long dim = from->nr_segs; + unsigned long dim; int idx; if (!HFI1_CAP_IS_KSET(SDMA)) return -EINVAL; + if (!from->user_backed) + return -EFAULT; + dim = iovec_nr_user_vecs(from); + if (!dim) + return -EINVAL; + idx = srcu_read_lock(&fd->pq_srcu); pq = srcu_dereference(fd->pq, &fd->pq_srcu); if (!cq || !pq) { @@ -274,11 +280,6 @@ static ssize_t hfi1_write_iter(struct kiocb *kiocb, struct iov_iter *from) return -EIO; } - if (!iter_is_iovec(from) || !dim) { - srcu_read_unlock(&fd->pq_srcu, idx); - return -EINVAL; - } - trace_hfi1_sdma_request(fd->dd, fd->uctxt->ctxt, fd->subctxt, dim); if (atomic_read(&pq->n_reqs) == pq->n_max_reqs) { @@ -286,20 +287,27 @@ static ssize_t hfi1_write_iter(struct kiocb *kiocb, struct iov_iter *from) return -ENOSPC; } - while (dim) { - int ret; + if (dim == 1) { + struct iovec iov = iov_iter_iovec(from); unsigned long count = 0; - ret = hfi1_user_sdma_process_request( - fd, (struct iovec *)(from->iov + done), - dim, &count); - if (ret) { - reqs = ret; - break; + reqs = hfi1_user_sdma_process_request(fd, &iov, 1, &count); + } else { + while (dim) { + int ret; + unsigned long count = 0; + + ret = hfi1_user_sdma_process_request(fd, + (struct iovec *)(from->iov + done), + dim, &count); + if (ret) { + reqs = ret; + break; + } + dim -= count; + done += count; + reqs++; } - dim -= count; - done += count; - reqs++; } srcu_read_unlock(&fd->pq_srcu, idx);
Don't assume that a user backed iterator is always of the type ITER_IOVEC. Handle the single segment case separately, then we can use the same logic for ITER_UBUF and ITER_IOVEC. Signed-off-by: Jens Axboe <axboe@kernel.dk> --- drivers/infiniband/hw/hfi1/file_ops.c | 42 ++++++++++++++++----------- 1 file changed, 25 insertions(+), 17 deletions(-)