diff mbox series

[v4,4/4] ioctl_userfaultfd.2: Add write-protect mode docs

Message ID 20210322220848.52162-5-peterx@redhat.com (mailing list archive)
State New, archived
Headers show
Series man2: udpate mm/userfaultfd manpages to latest | expand

Commit Message

Peter Xu March 22, 2021, 10:08 p.m. UTC
Userfaultfd write-protect mode is supported starting from Linux 5.7.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 man2/ioctl_userfaultfd.2 | 84 ++++++++++++++++++++++++++++++++++++++--
 1 file changed, 81 insertions(+), 3 deletions(-)

Comments

Alejandro Colomar March 23, 2021, 6:11 p.m. UTC | #1
Hi Peter,

Please see a few comments below.

Thanks,

Alex

On 3/22/21 11:08 PM, Peter Xu wrote:
> Userfaultfd write-protect mode is supported starting from Linux 5.7.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>   man2/ioctl_userfaultfd.2 | 84 ++++++++++++++++++++++++++++++++++++++--
>   1 file changed, 81 insertions(+), 3 deletions(-)
> 
> diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
> index d4a8375b8..5419687a6 100644
> --- a/man2/ioctl_userfaultfd.2
> +++ b/man2/ioctl_userfaultfd.2
> @@ -234,6 +234,11 @@ operation is supported.
>   The
>   .B UFFDIO_UNREGISTER
>   operation is supported.
> +.TP
> +.B 1 << _UFFDIO_WRITEPROTECT
> +The
> +.B UFFDIO_WRITEPROTECT
> +operation is supported.
>   .PP
>   This
>   .BR ioctl (2)
> @@ -322,9 +327,6 @@ Track page faults on missing pages.
>   .B UFFDIO_REGISTER_MODE_WP
>   Track page faults on write-protected pages.
>   .PP
> -Currently, the only supported mode is
> -.BR UFFDIO_REGISTER_MODE_MISSING .
> -.PP
>   If the operation is successful, the kernel modifies the
>   .I ioctls
>   bit-mask field to indicate which
> @@ -443,6 +445,16 @@ operation:
>   .TP
>   .B UFFDIO_COPY_MODE_DONTWAKE
>   Do not wake up the thread that waits for page-fault resolution
> +.TP
> +.B UFFDIO_COPY_MODE_WP
> +Copy the page with read-only permission.
> +This allows the user to trap the next write to the page,
> +which will block and generate another write-protect userfault message.

s/write-protect/write-protected/
?

> +This is only used when both
> +.B UFFDIO_REGISTER_MODE_MISSING
> +and
> +.B UFFDIO_REGISTER_MODE_WP
> +modes are enabled for the registered range.
>   .PP
>   The
>   .I copy
> @@ -654,6 +666,72 @@ field of the
>   structure was not a multiple of the system page size; or
>   .I len
>   was zero; or the specified range was otherwise invalid.
> +.SS UFFDIO_WRITEPROTECT (Since Linux 5.7)
> +Write-protect or write-unprotect an userfaultfd registered memory range
> +registered with mode
> +.BR UFFDIO_REGISTER_MODE_WP .
> +.PP
> +The
> +.I argp
> +argument is a pointer to a
> +.I uffdio_range
> +structure as shown below:
> +.PP
> +.in +4n
> +.EX
> +struct uffdio_writeprotect {
> +    struct uffdio_range range;  /* Range to change write permission */
> +    __u64 mode;                 /* Mode to change write permission */
> +};
> +.EE
> +.in
> +There're two mode bits that are supported in this structure:
> +.TP
> +.B UFFDIO_WRITEPROTECT_MODE_WP
> +When this mode bit is set, the ioctl will be a write-protect operation upon the
> +memory range specified by
> +.IR range .
> +Otherwise it'll be a write-unprotect operation upon the specified range,
> +which can be used to resolve an userfaultfd write-protect page fault.
> +.TP
> +.B UFFDIO_WRITEPROTECT_MODE_DONTWAKE
> +When this mode bit is set,
> +do not wake up any thread that waits for page-fault resolution after the operation.
> +This could only be specified if
> +.B UFFDIO_WRITEPROTECT_MODE_WP
> +is not specified.
> +.PP
> +This
> +.BR ioctl (2)
> +operation returns 0 on success.
> +On error, \-1 is returned and
> +.I errno
> +is set to indicate the error.
> +Possible errors include:
> +.TP
> +.B EINVAL
> +The
> +.I start
> +or the
> +.I len
> +field of the
> +.I ufdio_range
> +structure was not a multiple of the system page size; or
> +.I len
> +was zero; or the specified range was otherwise invalid.
> +.TP
> +.B EAGAIN
> +The process was interrupted and need to retry.

Maybe: "The process was interrupted; retry this call."?
I don't know what other pager say about this kind of error.

> +.TP
> +.B ENOENT
> +The range specified in
> +.I range
> +is not valid.

I'm not sure how this is different from the wording above in EINVAL.  An 
"otherwise invalid range" was already giving EINVAL?

> +For example, the virtual address does not exist,
> +or not registered with userfaultfd write-protect mode.
> +.TP
> +.B EFAULT
> +Encountered a generic fault during processing.

What is a "generic fault"?

>   .SH RETURN VALUE
>   See descriptions of the individual operations, above.
>   .SH ERRORS
>
Peter Xu March 23, 2021, 7:16 p.m. UTC | #2
On Tue, Mar 23, 2021 at 07:11:04PM +0100, Alejandro Colomar (man-pages) wrote:
> Hi Peter,

Hi, Alex,

[...]

> > +.TP
> > +.B UFFDIO_COPY_MODE_WP
> > +Copy the page with read-only permission.
> > +This allows the user to trap the next write to the page,
> > +which will block and generate another write-protect userfault message.
> 
> s/write-protect/write-protected/
> ?

I think here "write-protect" is the wording I wanted to use, it is the name of
the type of the message in plain text.

[...]

> > +.B EAGAIN
> > +The process was interrupted and need to retry.
> 
> Maybe: "The process was interrupted; retry this call."?
> I don't know what other pager say about this kind of error.

Frankly I see no difference between the two..  If you prefer the latter, I can
switch.

> 
> > +.TP
> > +.B ENOENT
> > +The range specified in
> > +.I range
> > +is not valid.
> 
> I'm not sure how this is different from the wording above in EINVAL.  An
> "otherwise invalid range" was already giving EINVAL?

This can be returned when vma is not found (mwriteprotect_range()):

	err = -ENOENT;
	dst_vma = find_dst_vma(dst_mm, start, len);

	if (!dst_vma)
		goto out_unlock;

I think maybe I could simply remove this entry, because from an user app
developer pov I'd only be interested in specific error that I'd be able to
detect and (even better) recover from.  For such error I'd say there's not much
to do besides failing the app.

> 
> > +For example, the virtual address does not exist,
> > +or not registered with userfaultfd write-protect mode.
> > +.TP
> > +.B EFAULT
> > +Encountered a generic fault during processing.
> 
> What is a "generic fault"?

For example when the user copy failed due to some reason.  See
userfaultfd_writeprotect():

	if (copy_from_user(&uffdio_wp, user_uffdio_wp,
			   sizeof(struct uffdio_writeprotect)))
		return -EFAULT;

But I didn't check other places, generally I'd return -EFAULT if I can't find a
proper other replacement which has a clearer meaning.

I don't think this is really helpful to user app too because no user app would
start to read this -EFAULT to do anything useful.. how about I drop it too if
you think the description is confusing?

Thanks,
Alejandro Colomar March 25, 2021, 9:32 p.m. UTC | #3
Hi Peter,

On 3/23/21 8:16 PM, Peter Xu wrote:
> On Tue, Mar 23, 2021 at 07:11:04PM +0100, Alejandro Colomar (man-pages) wrote:
>>> +.TP
>>> +.B UFFDIO_COPY_MODE_WP
>>> +Copy the page with read-only permission.
>>> +This allows the user to trap the next write to the page,
>>> +which will block and generate another write-protect userfault message.
>>
>> s/write-protect/write-protected/
>> ?
> 
> I think here "write-protect" is the wording I wanted to use, it is the name of
> the type of the message in plain text.

Okay.

> 
> [...]
> 
>>> +.B EAGAIN
>>> +The process was interrupted and need to retry.
>>
>> Maybe: "The process was interrupted; retry this call."?
>> I don't know what other pager say about this kind of error.
> 
> Frankly I see no difference between the two..  If you prefer the latter, I can
> switch.

I understand yours, but technically it's a bit incorrect:  The subject 
of the sentence changes: in "The process was interrupted" it's the 
process, and in "need to retry" it's [you].  By separating the sentence 
into two, it's more natural. :)

> 
>>
>>> +.TP
>>> +.B ENOENT
>>> +The range specified in
>>> +.I range
>>> +is not valid.
>>
>> I'm not sure how this is different from the wording above in EINVAL.  An
>> "otherwise invalid range" was already giving EINVAL?
> 
> This can be returned when vma is not found (mwriteprotect_range()):
> 
> 	err = -ENOENT;
> 	dst_vma = find_dst_vma(dst_mm, start, len);
> 
> 	if (!dst_vma)
> 		goto out_unlock;
> 
> I think maybe I could simply remove this entry, because from an user app
> developer pov I'd only be interested in specific error that I'd be able to
> detect and (even better) recover from.  For such error I'd say there's not much
> to do besides failing the app.

If there's any possibility that the error can happen, it should be 
documented, even if it's to say "Fatal error; abort!".  Just try to 
explain the causes and how to avoid causing them and/or possibly what to 
do when they happen (abort?).

> 
>>
>>> +For example, the virtual address does not exist,
>>> +or not registered with userfaultfd write-protect mode.
>>> +.TP
>>> +.B EFAULT
>>> +Encountered a generic fault during processing.
>>
>> What is a "generic fault"?
> 
> For example when the user copy failed due to some reason.  See
> userfaultfd_writeprotect():
> 
> 	if (copy_from_user(&uffdio_wp, user_uffdio_wp,
> 			   sizeof(struct uffdio_writeprotect)))
> 		return -EFAULT;
> 
> But I didn't check other places, generally I'd return -EFAULT if I can't find a
> proper other replacement which has a clearer meaning.
> 
> I don't think this is really helpful to user app too because no user app would
> start to read this -EFAULT to do anything useful.. how about I drop it too if
> you think the description is confusing?

Same as above.

Thanks,

Alex
Peter Xu March 29, 2021, 9:51 p.m. UTC | #4
On Thu, Mar 25, 2021 at 10:32:20PM +0100, Alejandro Colomar (man-pages) wrote:
> Hi Peter,
> 
> On 3/23/21 8:16 PM, Peter Xu wrote:
> > On Tue, Mar 23, 2021 at 07:11:04PM +0100, Alejandro Colomar (man-pages) wrote:
> > > > +.TP
> > > > +.B UFFDIO_COPY_MODE_WP
> > > > +Copy the page with read-only permission.
> > > > +This allows the user to trap the next write to the page,
> > > > +which will block and generate another write-protect userfault message.
> > > 
> > > s/write-protect/write-protected/
> > > ?
> > 
> > I think here "write-protect" is the wording I wanted to use, it is the name of
> > the type of the message in plain text.
> 
> Okay.
> 
> > 
> > [...]
> > 
> > > > +.B EAGAIN
> > > > +The process was interrupted and need to retry.
> > > 
> > > Maybe: "The process was interrupted; retry this call."?
> > > I don't know what other pager say about this kind of error.
> > 
> > Frankly I see no difference between the two..  If you prefer the latter, I can
> > switch.
> 
> I understand yours, but technically it's a bit incorrect:  The subject of
> the sentence changes: in "The process was interrupted" it's the process, and
> in "need to retry" it's [you].  By separating the sentence into two, it's
> more natural. :)

Sure, I'll change.

> 
> > 
> > > 
> > > > +.TP
> > > > +.B ENOENT
> > > > +The range specified in
> > > > +.I range
> > > > +is not valid.
> > > 
> > > I'm not sure how this is different from the wording above in EINVAL.  An
> > > "otherwise invalid range" was already giving EINVAL?
> > 
> > This can be returned when vma is not found (mwriteprotect_range()):
> > 
> > 	err = -ENOENT;
> > 	dst_vma = find_dst_vma(dst_mm, start, len);
> > 
> > 	if (!dst_vma)
> > 		goto out_unlock;
> > 
> > I think maybe I could simply remove this entry, because from an user app
> > developer pov I'd only be interested in specific error that I'd be able to
> > detect and (even better) recover from.  For such error I'd say there's not much
> > to do besides failing the app.
> 
> If there's any possibility that the error can happen, it should be
> documented, even if it's to say "Fatal error; abort!".  Just try to explain
> the causes and how to avoid causing them and/or possibly what to do when
> they happen (abort?).

Okay.  Would you mind me keeping my original wording?  Because IMHO that
exactly does what you said as "trying to explain the causes" and so on:

        .B ENOENT
        The range specified in
        .I range
        is not valid.
        For example, the virtual address does not exist,
        or not registered with userfaultfd write-protect mode.

It's indeed slightly duplicated with EINVAL, but if you don't agree with the
wording meanwhile if you don't agree on overlapping of the errors, then what I
need is not reworking this patchset, but proposing a kernel patch to change the
error retval to make them match. I am not against proposing a kernel patch, but
I just don't see it extremely necessary.

For my own experience on working with the kernel, the return value sometimes is
not that strict - say, it's hard to control every single bit of the possible
return code of a syscall/ioctl to reflect everything matching the document.  We
should always try to do it accurate but it seems not easy to me.  It's also
hard to write up the document that 100% matching the kernel code, because at
least that'll require a full-path workthrough of every single piece of kernel
code that the syscall/ioctl has called, so as to collect all the errors, then
summarize their meanings.  That could be a lot of work.

> 
> > 
> > > 
> > > > +For example, the virtual address does not exist,
> > > > +or not registered with userfaultfd write-protect mode.
> > > > +.TP
> > > > +.B EFAULT
> > > > +Encountered a generic fault during processing.
> > > 
> > > What is a "generic fault"?
> > 
> > For example when the user copy failed due to some reason.  See
> > userfaultfd_writeprotect():
> > 
> > 	if (copy_from_user(&uffdio_wp, user_uffdio_wp,
> > 			   sizeof(struct uffdio_writeprotect)))
> > 		return -EFAULT;
> > 
> > But I didn't check other places, generally I'd return -EFAULT if I can't find a
> > proper other replacement which has a clearer meaning.
> > 
> > I don't think this is really helpful to user app too because no user app would
> > start to read this -EFAULT to do anything useful.. how about I drop it too if
> > you think the description is confusing?
> 
> Same as above.

Above copy_from_user() is the only place that could trigger -EFAULT so far I
can find.  So either I can change above into:

        .TP
        .B EFAULT
        Failure on copying ioctl parameters into the kernel.

Would you think it okay (before I repost)?  I'd still prefer my original
wording because I bet 90% user developer may not even know what does it mean
when the kernel cannot copy the user parameter, and what he/she can do with
it..  However if you think it's proper I'll use it.

Thanks,
Alejandro Colomar March 29, 2021, 10:05 p.m. UTC | #5
Hi Peter,

On 3/29/21 11:51 PM, Peter Xu wrote:
> On Thu, Mar 25, 2021 at 10:32:20PM +0100, Alejandro Colomar (man-pages) wrote:
>>>>> +.TP
>>>>> +.B ENOENT
>>>>> +The range specified in
>>>>> +.I range
>>>>> +is not valid.
>>>>
>>>> I'm not sure how this is different from the wording above in EINVAL.  An
>>>> "otherwise invalid range" was already giving EINVAL?
>>>
>>> This can be returned when vma is not found (mwriteprotect_range()):
>>>
>>> 	err = -ENOENT;
>>> 	dst_vma = find_dst_vma(dst_mm, start, len);
>>>
>>> 	if (!dst_vma)
>>> 		goto out_unlock;
>>>
>>> I think maybe I could simply remove this entry, because from an user app
>>> developer pov I'd only be interested in specific error that I'd be able to
>>> detect and (even better) recover from.  For such error I'd say there's not much
>>> to do besides failing the app.
>>
>> If there's any possibility that the error can happen, it should be
>> documented, even if it's to say "Fatal error; abort!".  Just try to explain
>> the causes and how to avoid causing them and/or possibly what to do when
>> they happen (abort?).
> 
> Okay.  Would you mind me keeping my original wording?  Because IMHO that
> exactly does what you said as "trying to explain the causes" and so on:
> 
>         .B ENOENT
>         The range specified in
>         .I range
>         is not valid.
>         For example, the virtual address does not exist,
>         or not registered with userfaultfd write-protect mode.
> 
> It's indeed slightly duplicated with EINVAL, but if you don't agree with the
> wording meanwhile if you don't agree on overlapping of the errors, then what I
> need is not reworking this patchset, but proposing a kernel patch to change the
> error retval to make them match. I am not against proposing a kernel patch, but
> I just don't see it extremely necessary.
> 
> For my own experience on working with the kernel, the return value sometimes is
> not that strict - say, it's hard to control every single bit of the possible
> return code of a syscall/ioctl to reflect everything matching the document.  We
> should always try to do it accurate but it seems not easy to me.  It's also
> hard to write up the document that 100% matching the kernel code, because at
> least that'll require a full-path workthrough of every single piece of kernel
> code that the syscall/ioctl has called, so as to collect all the errors, then
> summarize their meanings.  That could be a lot of work.

Yes, That's fine.  I was only curious about the overlap, but if they do
overlap, that's it.

>>>>> +For example, the virtual address does not exist,
>>>>> +or not registered with userfaultfd write-protect mode.
>>>>> +.TP
>>>>> +.B EFAULT
>>>>> +Encountered a generic fault during processing.
>>>>
>>>> What is a "generic fault"?
>>>
>>> For example when the user copy failed due to some reason.  See
>>> userfaultfd_writeprotect():
>>>
>>> 	if (copy_from_user(&uffdio_wp, user_uffdio_wp,
>>> 			   sizeof(struct uffdio_writeprotect)))
>>> 		return -EFAULT;
>>>
>>> But I didn't check other places, generally I'd return -EFAULT if I can't find a
>>> proper other replacement which has a clearer meaning.
>>>
>>> I don't think this is really helpful to user app too because no user app would
>>> start to read this -EFAULT to do anything useful.. how about I drop it too if
>>> you think the description is confusing?
>>
>> Same as above.
> 
> Above copy_from_user() is the only place that could trigger -EFAULT so far I
> can find.  So either I can change above into:
> 
>         .TP
>         .B EFAULT
>         Failure on copying ioctl parameters into the kernel.
> 
> Would you think it okay (before I repost)?  I'd still prefer my original
> wording because I bet 90% user developer may not even know what does it mean
> when the kernel cannot copy the user parameter, and what he/she can do with
> it..  However if you think it's proper I'll use it.

Okay, I'll take your original words.  Maybe all this "extra" info could
go into the commit message.  I'll wait for your resend with the a-b and
the minor changes :-)

Thanks,

Alex
diff mbox series

Patch

diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
index d4a8375b8..5419687a6 100644
--- a/man2/ioctl_userfaultfd.2
+++ b/man2/ioctl_userfaultfd.2
@@ -234,6 +234,11 @@  operation is supported.
 The
 .B UFFDIO_UNREGISTER
 operation is supported.
+.TP
+.B 1 << _UFFDIO_WRITEPROTECT
+The
+.B UFFDIO_WRITEPROTECT
+operation is supported.
 .PP
 This
 .BR ioctl (2)
@@ -322,9 +327,6 @@  Track page faults on missing pages.
 .B UFFDIO_REGISTER_MODE_WP
 Track page faults on write-protected pages.
 .PP
-Currently, the only supported mode is
-.BR UFFDIO_REGISTER_MODE_MISSING .
-.PP
 If the operation is successful, the kernel modifies the
 .I ioctls
 bit-mask field to indicate which
@@ -443,6 +445,16 @@  operation:
 .TP
 .B UFFDIO_COPY_MODE_DONTWAKE
 Do not wake up the thread that waits for page-fault resolution
+.TP
+.B UFFDIO_COPY_MODE_WP
+Copy the page with read-only permission.
+This allows the user to trap the next write to the page,
+which will block and generate another write-protect userfault message.
+This is only used when both
+.B UFFDIO_REGISTER_MODE_MISSING
+and
+.B UFFDIO_REGISTER_MODE_WP
+modes are enabled for the registered range.
 .PP
 The
 .I copy
@@ -654,6 +666,72 @@  field of the
 structure was not a multiple of the system page size; or
 .I len
 was zero; or the specified range was otherwise invalid.
+.SS UFFDIO_WRITEPROTECT (Since Linux 5.7)
+Write-protect or write-unprotect an userfaultfd registered memory range
+registered with mode
+.BR UFFDIO_REGISTER_MODE_WP .
+.PP
+The
+.I argp
+argument is a pointer to a
+.I uffdio_range
+structure as shown below:
+.PP
+.in +4n
+.EX
+struct uffdio_writeprotect {
+    struct uffdio_range range;  /* Range to change write permission */
+    __u64 mode;                 /* Mode to change write permission */
+};
+.EE
+.in
+There're two mode bits that are supported in this structure:
+.TP
+.B UFFDIO_WRITEPROTECT_MODE_WP
+When this mode bit is set, the ioctl will be a write-protect operation upon the
+memory range specified by
+.IR range .
+Otherwise it'll be a write-unprotect operation upon the specified range,
+which can be used to resolve an userfaultfd write-protect page fault.
+.TP
+.B UFFDIO_WRITEPROTECT_MODE_DONTWAKE
+When this mode bit is set,
+do not wake up any thread that waits for page-fault resolution after the operation.
+This could only be specified if
+.B UFFDIO_WRITEPROTECT_MODE_WP
+is not specified.
+.PP
+This
+.BR ioctl (2)
+operation returns 0 on success.
+On error, \-1 is returned and
+.I errno
+is set to indicate the error.
+Possible errors include:
+.TP
+.B EINVAL
+The
+.I start
+or the
+.I len
+field of the
+.I ufdio_range
+structure was not a multiple of the system page size; or
+.I len
+was zero; or the specified range was otherwise invalid.
+.TP
+.B EAGAIN
+The process was interrupted and need to retry.
+.TP
+.B ENOENT
+The range specified in
+.I range
+is not valid.
+For example, the virtual address does not exist,
+or not registered with userfaultfd write-protect mode.
+.TP
+.B EFAULT
+Encountered a generic fault during processing.
 .SH RETURN VALUE
 See descriptions of the individual operations, above.
 .SH ERRORS