mbox series

[POC,RFC,0/3] splice(2) support for io_uring

Message ID cover.1579649589.git.asml.silence@gmail.com (mailing list archive)
Headers show
Series splice(2) support for io_uring | expand

Message

Pavel Begunkov Jan. 22, 2020, 12:05 a.m. UTC
It works well for basic cases, but there is still work to be done. E.g.
it misses @hash_reg_file checks for the second (output) file. Anyway,
there are some questions I want to discuss:

- why sqe->len is __u32? Splice uses size_t, and I think it's better
to have something wider (e.g. u64) for fututre use. That's the story
behind added sqe->splice_len.

- it requires 2 fds, and it's painful. Currently file managing is done
by common path (e.g. io_req_set_file(), __io_req_aux_free()). I'm
thinking to make each opcode function handle file grabbing/putting
themself with some helpers, as it's done in the patch for splice's
out-file.
    1. Opcode handler knows, whether it have/needs a file, and thus
       doesn't need extra checks done in common path.
    2. It will be more consistent with splice.
Objections? Ideas?

- do we need offset pointers with fallback to file->f_pos? Or is it
enough to have offset value. Jens, I remember you added the first
option somewhere, could you tell the reasoning?


Pavel Begunkov (3):
  splice: make do_splice public
  io_uring: add interface for getting files
  io_uring: add splice(2) support

 fs/io_uring.c                 | 152 ++++++++++++++++++++++++++++------
 fs/splice.c                   |   6 +-
 include/linux/splice.h        |   3 +
 include/uapi/linux/io_uring.h |  16 +++-
 4 files changed, 147 insertions(+), 30 deletions(-)

Comments

Jens Axboe Jan. 22, 2020, 1:55 a.m. UTC | #1
On 1/21/20 5:05 PM, Pavel Begunkov wrote:
> It works well for basic cases, but there is still work to be done. E.g.
> it misses @hash_reg_file checks for the second (output) file. Anyway,
> there are some questions I want to discuss:
> 
> - why sqe->len is __u32? Splice uses size_t, and I think it's better
> to have something wider (e.g. u64) for fututre use. That's the story
> behind added sqe->splice_len.

IO operations in Linux generally are INT_MAX, so the u32 is plenty big.
That's why I chose it. For this specifically, if you look at splice:

	if (unlikely(len > MAX_RW_COUNT))
		len = MAX_RW_COUNT;

so anything larger is truncated anyway.

> - it requires 2 fds, and it's painful. Currently file managing is done
> by common path (e.g. io_req_set_file(), __io_req_aux_free()). I'm
> thinking to make each opcode function handle file grabbing/putting
> themself with some helpers, as it's done in the patch for splice's
> out-file.
>     1. Opcode handler knows, whether it have/needs a file, and thus
>        doesn't need extra checks done in common path.
>     2. It will be more consistent with splice.
> Objections? Ideas?

Sounds reasonable to me, but always easier to judge in patch form :-)

> - do we need offset pointers with fallback to file->f_pos? Or is it
> enough to have offset value. Jens, I remember you added the first
> option somewhere, could you tell the reasoning?

I recently added support for -1/cur position, which splice also uses. So
you should be fine with that.
Pavel Begunkov Jan. 22, 2020, 3:11 a.m. UTC | #2
On 22/01/2020 04:55, Jens Axboe wrote:
> On 1/21/20 5:05 PM, Pavel Begunkov wrote:
>> It works well for basic cases, but there is still work to be done. E.g.
>> it misses @hash_reg_file checks for the second (output) file. Anyway,
>> there are some questions I want to discuss:
>>
>> - why sqe->len is __u32? Splice uses size_t, and I think it's better
>> to have something wider (e.g. u64) for fututre use. That's the story
>> behind added sqe->splice_len.
> 
> IO operations in Linux generally are INT_MAX, so the u32 is plenty big.
> That's why I chose it. For this specifically, if you look at splice:
> 
> 	if (unlikely(len > MAX_RW_COUNT))
> 		len = MAX_RW_COUNT;
> 
> so anything larger is truncated anyway.

Yeah, I saw this one, but that was rather an argument for the future. It's
pretty easy to transfer more than 4GB with sg list, but that would be the case
for splice.

> 
>> - it requires 2 fds, and it's painful. Currently file managing is done
>> by common path (e.g. io_req_set_file(), __io_req_aux_free()). I'm
>> thinking to make each opcode function handle file grabbing/putting
>> themself with some helpers, as it's done in the patch for splice's
>> out-file.
>>     1. Opcode handler knows, whether it have/needs a file, and thus
>>        doesn't need extra checks done in common path.
>>     2. It will be more consistent with splice.
>> Objections? Ideas?
> 
> Sounds reasonable to me, but always easier to judge in patch form :-)
> 
>> - do we need offset pointers with fallback to file->f_pos? Or is it
>> enough to have offset value. Jens, I remember you added the first
>> option somewhere, could you tell the reasoning?
> 
> I recently added support for -1/cur position, which splice also uses. So
> you should be fine with that.
> 

I always have been thinking about it as a legacy from old days, and one of the
problems of posix. It's not hard to count it in the userspace especially in C++
or high-level languages, and is just another obstacle for having a performant
API. So, I'd rather get rid of it here. But is there any reasons against?
Jens Axboe Jan. 22, 2020, 3:30 a.m. UTC | #3
On 1/21/20 8:11 PM, Pavel Begunkov wrote:
> On 22/01/2020 04:55, Jens Axboe wrote:
>> On 1/21/20 5:05 PM, Pavel Begunkov wrote:
>>> It works well for basic cases, but there is still work to be done. E.g.
>>> it misses @hash_reg_file checks for the second (output) file. Anyway,
>>> there are some questions I want to discuss:
>>>
>>> - why sqe->len is __u32? Splice uses size_t, and I think it's better
>>> to have something wider (e.g. u64) for fututre use. That's the story
>>> behind added sqe->splice_len.
>>
>> IO operations in Linux generally are INT_MAX, so the u32 is plenty big.
>> That's why I chose it. For this specifically, if you look at splice:
>>
>> 	if (unlikely(len > MAX_RW_COUNT))
>> 		len = MAX_RW_COUNT;
>>
>> so anything larger is truncated anyway.
> 
> Yeah, I saw this one, but that was rather an argument for the future.
> It's pretty easy to transfer more than 4GB with sg list, but that
> would be the case for splice.

I don't see this changing, ever, basically. And probably not a big deal,
if you want to do more than 2GB worth of IO, you simply splice them over
multiple commands. At those sizes, the overhead there is negligible.

>>> - it requires 2 fds, and it's painful. Currently file managing is done
>>> by common path (e.g. io_req_set_file(), __io_req_aux_free()). I'm
>>> thinking to make each opcode function handle file grabbing/putting
>>> themself with some helpers, as it's done in the patch for splice's
>>> out-file.
>>>     1. Opcode handler knows, whether it have/needs a file, and thus
>>>        doesn't need extra checks done in common path.
>>>     2. It will be more consistent with splice.
>>> Objections? Ideas?
>>
>> Sounds reasonable to me, but always easier to judge in patch form :-)
>>
>>> - do we need offset pointers with fallback to file->f_pos? Or is it
>>> enough to have offset value. Jens, I remember you added the first
>>> option somewhere, could you tell the reasoning?
>>
>> I recently added support for -1/cur position, which splice also uses. So
>> you should be fine with that.
>>
> 
> I always have been thinking about it as a legacy from old days, and
> one of the problems of posix. It's not hard to count it in the
> userspace especially in C++ or high-level languages, and is just
> another obstacle for having a performant API. So, I'd rather get rid
> of it here. But is there any reasons against?

It's not always trivial to do in libraries, or programming languages
even. That's why it exists. I would not expect anyone to use it outside
of that.