diff mbox

[RFC] stat.2: Document that stat can fail with EINTR

Message ID 20171203002359.GA17037@juliacomputing.com (mailing list archive)
State New, archived
Headers show

Commit Message

Keno Fischer Dec. 3, 2017, 12:23 a.m. UTC
Particularly on network file systems, a stat call may require
submitting a message over the network and waiting interruptably
for a reply.

Signed-off-by: Keno Fischer <keno@juliacomputing.com>
---

The catalyst for this patch was me experiencing EINTR errors when
using the 9p file system. In linux commit 9523feac, the 9p file
system was changed to use wait_event_killable instead of
wait_event_interruptible, which does indeed address my problem,
but also makes me a bit unhappy, because uninterruptable waits
prevents things like ^C'ing the execution and some debugging
tools which depend on being able to cancel long-running operations
by sending signals. I'd like to ask the user space applications I
care about to properly handle such situations (either by using
SA_RESTART or by explicitly handling EINTR), but it's a bit of a
hard sell if EINTR isn't documented to be a possibility. I'm hoping
this doc PATCH will generate a discussion of whether EINTR is an
appropriate thing for stat (as a stand in for a file system call that's
not read/write) to return. If so, I'd be happy to submit
patches to other file system-related syscalls along these same lines.

I realize I'm probably 20 years too late here, but it feels like
clarificaion on what to expect from the kernel would still go a long
way here.  

 man2/stat.2 | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Matthew Wilcox Dec. 3, 2017, 2:25 a.m. UTC | #1
On Sat, Dec 02, 2017 at 07:23:59PM -0500, Keno Fischer wrote:
> The catalyst for this patch was me experiencing EINTR errors when
> using the 9p file system. In linux commit 9523feac, the 9p file
> system was changed to use wait_event_killable instead of
> wait_event_interruptible, which does indeed address my problem,
> but also makes me a bit unhappy, because uninterruptable waits
> prevents things like ^C'ing the execution and some debugging
> tools which depend on being able to cancel long-running operations
> by sending signals.

Wait, wait, wait.  killable is not uninterruptible.  It's "can accept
a signal if the signal is fatal".  ie userspace will never see it.
So, no, it doesn't prevent ^C.  It does prevent the debugging tool you're
talking about from working, because it's handling the signal, so it's not
fatal.

> I realize I'm probably 20 years too late here, but it feels like
> clarificaion on what to expect from the kernel would still go a long
> way here.  

A change to user-visible behaviour has to be opt-in.  So here's an idea --
a prctl() (or whatever) that says "I can handle EINTR on any syscall".
It would effectively change the *_killable logic to return EINTR on any
signal, not just fatal ones.  Best of luck auditing every syscall your
application  makes ... and every library it uses ... maybe dlopen()ed
like PAM modules ...

But we could do it!  And it's more sensible than "I want to change
individual syscalls one at a time as I notice each one is a problem".
Keno Fischer Dec. 3, 2017, 3:15 a.m. UTC | #2
Resending as plain text (apologies for those receiving it twice, and
those that got
an HTML copy, I'm used to my mail client switching that over
automatically, which
for some reason didn't happen here).


This is exactly the discussion I want to generate, so thank you.
I should point out that I'm not advocating for anything other
than clarity of what kernel behavior user space may assume.


On Sat, Dec 2, 2017 at 9:25 PM, Matthew Wilcox <willy@infradead.org> wrote:
> On Sat, Dec 02, 2017 at 07:23:59PM -0500, Keno Fischer wrote:
>> The catalyst for this patch was me experiencing EINTR errors when
>> using the 9p file system. In linux commit 9523feac, the 9p file
>> system was changed to use wait_event_killable instead of
>> wait_event_interruptible, which does indeed address my problem,
>> but also makes me a bit unhappy, because uninterruptable waits
>> prevents things like ^C'ing the execution and some debugging
>> tools which depend on being able to cancel long-running operations
>> by sending signals.
>
> Wait, wait, wait.  killable is not uninterruptible.  It's "can accept
> a signal if the signal is fatal".  ie userspace will never see it.
> So, no, it doesn't prevent ^C.  It does prevent the debugging tool you're
> talking about from working, because it's handling the signal, so it's not
> fatal.

This probably shows that I've been in REPL based environments too long,
that catch SIGINT ;). You are of course correct that a fatal SIGINT would
still be delivered.

>> I realize I'm probably 20 years too late here, but it feels like
>> clarificaion on what to expect from the kernel would still go a long
>> way here.
>
> A change to user-visible behaviour has to be opt-in.

I agree. However, it was my impression that stat() can return EINTR
depending on the file system. Prior to the referenced commit,
this was certainly true on 9p and I suspect it's not the only network file
system for which this is true (though prior to my experiencing this
with 9p, the only
time I've ever experienced it was on HPC clusters with who knows what
code providing the network filesystem). If it is indeed the case that
an EINTR return from stat() and similar is illegal and should be considered
a kernel bug, a statement to that extent all I'm looking for here.
Walter Harms Dec. 3, 2017, 4:09 p.m. UTC | #3
Am 03.12.2017 01:23, schrieb Keno Fischer:
> Particularly on network file systems, a stat call may require
> submitting a message over the network and waiting interruptably
> for a reply.
> 
> Signed-off-by: Keno Fischer <keno@juliacomputing.com>
> ---
> 
> The catalyst for this patch was me experiencing EINTR errors when
> using the 9p file system. In linux commit 9523feac, the 9p file
> system was changed to use wait_event_killable instead of
> wait_event_interruptible, which does indeed address my problem,
> but also makes me a bit unhappy, because uninterruptable waits
> prevents things like ^C'ing the execution and some debugging
> tools which depend on being able to cancel long-running operations
> by sending signals. I'd like to ask the user space applications I
> care about to properly handle such situations (either by using
> SA_RESTART or by explicitly handling EINTR), but it's a bit of a
> hard sell if EINTR isn't documented to be a possibility. I'm hoping
> this doc PATCH will generate a discussion of whether EINTR is an
> appropriate thing for stat (as a stand in for a file system call that's
> not read/write) to return. If so, I'd be happy to submit
> patches to other file system-related syscalls along these same lines.
> 
> I realize I'm probably 20 years too late here, but it feels like
> clarificaion on what to expect from the kernel would still go a long
> way here.  
> 

no matter, if it can happen it should be document.
Nothing is more anoying that triggering an undocumented error.


>  man2/stat.2 | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/man2/stat.2 b/man2/stat.2
> index dad9a01..f10235a 100644
> --- a/man2/stat.2
> +++ b/man2/stat.2
> @@ -452,6 +452,11 @@ Invalid flag specified in
>  is relative and
>  .I dirfd
>  is a file descriptor referring to a file other than a directory.
> +.TP
> +.B EINTR
> +The call was interrupted by delivery of a signal caught by a handler; see
What is about:
  The call was interrupted by a
.BR signal (7).
> +.BR signal (7).
> +The possibility of this error is file-system dependent.
You mean:
  This error is file-system dependent.

just my 2 cents,
re,
 wh

>  .SH VERSIONS
>  .BR fstatat ()
>  was added to Linux in kernel 2.6.16;
Michael Kerrisk (man-pages) Dec. 4, 2017, 8:58 p.m. UTC | #4
Hello Keno

On 12/03/2017 04:15 AM, Keno Fischer wrote:
> Resending as plain text (apologies for those receiving it twice, and
> those that got
> an HTML copy, I'm used to my mail client switching that over
> automatically, which
> for some reason didn't happen here).
> 
> 
> This is exactly the discussion I want to generate, so thank you.
> I should point out that I'm not advocating for anything other
> than clarity of what kernel behavior user space may assume.

So, should the documentation patch be applied at this point, or dropped?

Thanks,

Michael


> On Sat, Dec 2, 2017 at 9:25 PM, Matthew Wilcox <willy@infradead.org> wrote:
>> On Sat, Dec 02, 2017 at 07:23:59PM -0500, Keno Fischer wrote:
>>> The catalyst for this patch was me experiencing EINTR errors when
>>> using the 9p file system. In linux commit 9523feac, the 9p file
>>> system was changed to use wait_event_killable instead of
>>> wait_event_interruptible, which does indeed address my problem,
>>> but also makes me a bit unhappy, because uninterruptable waits
>>> prevents things like ^C'ing the execution and some debugging
>>> tools which depend on being able to cancel long-running operations
>>> by sending signals.
>>
>> Wait, wait, wait.  killable is not uninterruptible.  It's "can accept
>> a signal if the signal is fatal".  ie userspace will never see it.
>> So, no, it doesn't prevent ^C.  It does prevent the debugging tool you're
>> talking about from working, because it's handling the signal, so it's not
>> fatal.
> 
> This probably shows that I've been in REPL based environments too long,
> that catch SIGINT ;). You are of course correct that a fatal SIGINT would
> still be delivered.
> 
>>> I realize I'm probably 20 years too late here, but it feels like
>>> clarificaion on what to expect from the kernel would still go a long
>>> way here.
>>
>> A change to user-visible behaviour has to be opt-in.
> 
> I agree. However, it was my impression that stat() can return EINTR
> depending on the file system. Prior to the referenced commit,
> this was certainly true on 9p and I suspect it's not the only network file
> system for which this is true (though prior to my experiencing this
> with 9p, the only
> time I've ever experienced it was on HPC clusters with who knows what
> code providing the network filesystem). If it is indeed the case that
> an EINTR return from stat() and similar is illegal and should be considered
> a kernel bug, a statement to that extent all I'm looking for here.
>
Keno Fischer Dec. 4, 2017, 9:03 p.m. UTC | #5
Hi Michael,

I was hoping to get a clear statement one way or another from the kernel
maintainers as to whether an EINTR from stat() is supposed to be allowed
kernel behavior (hence the RFC in the subject). If it's not, then I don't think
it should be documented, even if there is buggy filesystems that do at
the moment.
So I'd say let's hold off on applying this until more people have had a chance
to comment. If it would be more convenient for you, feel free to drop
this from your
patch queue and if appropriate, I'll resend a non-RFC version of this
patch for you
to apply, once a conclusion has been reached.


On Mon, Dec 4, 2017 at 3:58 PM, Michael Kerrisk (man-pages)
<mtk.manpages@gmail.com> wrote:
> Hello Keno
>
> On 12/03/2017 04:15 AM, Keno Fischer wrote:
>> Resending as plain text (apologies for those receiving it twice, and
>> those that got
>> an HTML copy, I'm used to my mail client switching that over
>> automatically, which
>> for some reason didn't happen here).
>>
>>
>> This is exactly the discussion I want to generate, so thank you.
>> I should point out that I'm not advocating for anything other
>> than clarity of what kernel behavior user space may assume.
>
> So, should the documentation patch be applied at this point, or dropped?
>
> Thanks,
>
> Michael
>
>
>> On Sat, Dec 2, 2017 at 9:25 PM, Matthew Wilcox <willy@infradead.org> wrote:
>>> On Sat, Dec 02, 2017 at 07:23:59PM -0500, Keno Fischer wrote:
>>>> The catalyst for this patch was me experiencing EINTR errors when
>>>> using the 9p file system. In linux commit 9523feac, the 9p file
>>>> system was changed to use wait_event_killable instead of
>>>> wait_event_interruptible, which does indeed address my problem,
>>>> but also makes me a bit unhappy, because uninterruptable waits
>>>> prevents things like ^C'ing the execution and some debugging
>>>> tools which depend on being able to cancel long-running operations
>>>> by sending signals.
>>>
>>> Wait, wait, wait.  killable is not uninterruptible.  It's "can accept
>>> a signal if the signal is fatal".  ie userspace will never see it.
>>> So, no, it doesn't prevent ^C.  It does prevent the debugging tool you're
>>> talking about from working, because it's handling the signal, so it's not
>>> fatal.
>>
>> This probably shows that I've been in REPL based environments too long,
>> that catch SIGINT ;). You are of course correct that a fatal SIGINT would
>> still be delivered.
>>
>>>> I realize I'm probably 20 years too late here, but it feels like
>>>> clarificaion on what to expect from the kernel would still go a long
>>>> way here.
>>>
>>> A change to user-visible behaviour has to be opt-in.
>>
>> I agree. However, it was my impression that stat() can return EINTR
>> depending on the file system. Prior to the referenced commit,
>> this was certainly true on 9p and I suspect it's not the only network file
>> system for which this is true (though prior to my experiencing this
>> with 9p, the only
>> time I've ever experienced it was on HPC clusters with who knows what
>> code providing the network filesystem). If it is indeed the case that
>> an EINTR return from stat() and similar is illegal and should be considered
>> a kernel bug, a statement to that extent all I'm looking for here.
>>
>
>
> --
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/
Matthew Wilcox Dec. 4, 2017, 10:31 p.m. UTC | #6
On Sat, Dec 02, 2017 at 10:15:33PM -0500, Keno Fischer wrote:
> This is exactly the discussion I want to generate, so thank you.
> I should point out that I'm not advocating for anything other
> than clarity of what kernel behavior user space may assume.

I don't think we tend to document short-lived now-fixed special-case
bugs ... right, Michael?

> On Sat, Dec 2, 2017 at 9:25 PM, Matthew Wilcox <willy@infradead.org> wrote:
> > On Sat, Dec 02, 2017 at 07:23:59PM -0500, Keno Fischer wrote:
> >> The catalyst for this patch was me experiencing EINTR errors when
> >> using the 9p file system. In linux commit 9523feac, the 9p file
> >> system was changed to use wait_event_killable instead of
> >> wait_event_interruptible, which does indeed address my problem,
> >> but also makes me a bit unhappy, because uninterruptable waits
> >> prevents things like ^C'ing the execution and some debugging
> >> tools which depend on being able to cancel long-running operations
> >> by sending signals.
> >
> > Wait, wait, wait.  killable is not uninterruptible.  It's "can accept
> > a signal if the signal is fatal".  ie userspace will never see it.
> > So, no, it doesn't prevent ^C.  It does prevent the debugging tool you're
> > talking about from working, because it's handling the signal, so it's not
> > fatal.
> 
> This probably shows that I've been in REPL based environments too long,
> that catch SIGINT ;). You are of course correct that a fatal SIGINT would
> still be delivered.

I think ^\ (SIGQUIT) is a good signal that REPL environments don't tend
to catch, and everybody's favourite SIGKILL can't be intercepted.  But
REPL environments are actually a great example of a place where the
prctl() I mentioned would make sense.  When your code is managed, you can
make blanket statements like "All signals are handled correctly", because
the code manager (the REPL environment, the JVM, gdb, whatever) is auditable.

> >> I realize I'm probably 20 years too late here, but it feels like
> >> clarificaion on what to expect from the kernel would still go a long
> >> way here.
> >
> > A change to user-visible behaviour has to be opt-in.
> 
> I agree. However, it was my impression that stat() can return EINTR
> depending on the file system. Prior to the referenced commit,
> this was certainly true on 9p and I suspect it's not the only network file
> system for which this is true (though prior to my experiencing this
> with 9p, the only
> time I've ever experienced it was on HPC clusters with who knows what
> code providing the network filesystem). If it is indeed the case that
> an EINTR return from stat() and similar is illegal and should be considered
> a kernel bug, a statement to that extent all I'm looking for here.

I would be happy to make the statement that returning EINTR from stat()
is a kernel bug.  It may be wider-spread than anybody would like, and of
course HPC people do rather tend to emphasise expedience over standards
compliance ;-)
Michael Kerrisk (man-pages) Dec. 19, 2017, 1:57 p.m. UTC | #7
Hi Keno,

On 12/04/2017 10:03 PM, Keno Fischer wrote:
> Hi Michael,
> 
> I was hoping to get a clear statement one way or another from the kernel
> maintainers as to whether an EINTR from stat() is supposed to be allowed
> kernel behavior (hence the RFC in the subject). If it's not, then I don't think
> it should be documented, even if there is buggy filesystems that do at
> the moment.
> So I'd say let's hold off on applying this until more people have had a chance
> to comment. If it would be more convenient for you, feel free to drop
> this from your
> patch queue and if appropriate, I'll resend a non-RFC version of this
> patch for you
> to apply, once a conclusion has been reached.

So, was there any further conclusion on this?

Cheers,

Michael

> On Mon, Dec 4, 2017 at 3:58 PM, Michael Kerrisk (man-pages)
> <mtk.manpages@gmail.com> wrote:
>> Hello Keno
>>
>> On 12/03/2017 04:15 AM, Keno Fischer wrote:
>>> Resending as plain text (apologies for those receiving it twice, and
>>> those that got
>>> an HTML copy, I'm used to my mail client switching that over
>>> automatically, which
>>> for some reason didn't happen here).
>>>
>>>
>>> This is exactly the discussion I want to generate, so thank you.
>>> I should point out that I'm not advocating for anything other
>>> than clarity of what kernel behavior user space may assume.
>>
>> So, should the documentation patch be applied at this point, or dropped?
>>
>> Thanks,
>>
>> Michael
>>
>>
>>> On Sat, Dec 2, 2017 at 9:25 PM, Matthew Wilcox <willy@infradead.org> wrote:
>>>> On Sat, Dec 02, 2017 at 07:23:59PM -0500, Keno Fischer wrote:
>>>>> The catalyst for this patch was me experiencing EINTR errors when
>>>>> using the 9p file system. In linux commit 9523feac, the 9p file
>>>>> system was changed to use wait_event_killable instead of
>>>>> wait_event_interruptible, which does indeed address my problem,
>>>>> but also makes me a bit unhappy, because uninterruptable waits
>>>>> prevents things like ^C'ing the execution and some debugging
>>>>> tools which depend on being able to cancel long-running operations
>>>>> by sending signals.
>>>>
>>>> Wait, wait, wait.  killable is not uninterruptible.  It's "can accept
>>>> a signal if the signal is fatal".  ie userspace will never see it.
>>>> So, no, it doesn't prevent ^C.  It does prevent the debugging tool you're
>>>> talking about from working, because it's handling the signal, so it's not
>>>> fatal.
>>>
>>> This probably shows that I've been in REPL based environments too long,
>>> that catch SIGINT ;). You are of course correct that a fatal SIGINT would
>>> still be delivered.
>>>
>>>>> I realize I'm probably 20 years too late here, but it feels like
>>>>> clarificaion on what to expect from the kernel would still go a long
>>>>> way here.
>>>>
>>>> A change to user-visible behaviour has to be opt-in.
>>>
>>> I agree. However, it was my impression that stat() can return EINTR
>>> depending on the file system. Prior to the referenced commit,
>>> this was certainly true on 9p and I suspect it's not the only network file
>>> system for which this is true (though prior to my experiencing this
>>> with 9p, the only
>>> time I've ever experienced it was on HPC clusters with who knows what
>>> code providing the network filesystem). If it is indeed the case that
>>> an EINTR return from stat() and similar is illegal and should be considered
>>> a kernel bug, a statement to that extent all I'm looking for here.
>>>
>>
>>
>> --
>> Michael Kerrisk
>> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
>> Linux/UNIX System Programming Training: http://man7.org/training/
>
Michael Kerrisk (man-pages) Dec. 19, 2017, 7:28 p.m. UTC | #8
On 19 December 2017 at 18:52, Keno Fischer <keno@juliacomputing.com> wrote:
> Yes it seems like an EINTR return should be considered a bug, so please drop
> this from your patch queue. Thanks for the follow up.

Okay -- thanks for the info.

Cheers,

Michael


> On Dec 19, 2017 14:57, "Michael Kerrisk (man-pages)"
> <mtk.manpages@gmail.com> wrote:
>>
>>
>> Hi Keno,
>>
>> On 12/04/2017 10:03 PM, Keno Fischer wrote:
>> > Hi Michael,
>> >
>> > I was hoping to get a clear statement one way or another from the kernel
>> > maintainers as to whether an EINTR from stat() is supposed to be allowed
>> > kernel behavior (hence the RFC in the subject). If it's not, then I
>> > don't think
>> > it should be documented, even if there is buggy filesystems that do at
>> > the moment.
>> > So I'd say let's hold off on applying this until more people have had a
>> > chance
>> > to comment. If it would be more convenient for you, feel free to drop
>> > this from your
>> > patch queue and if appropriate, I'll resend a non-RFC version of this
>> > patch for you
>> > to apply, once a conclusion has been reached.
>>
>> So, was there any further conclusion on this?
>>
>> Cheers,
>>
>> Michael
>>
>> > On Mon, Dec 4, 2017 at 3:58 PM, Michael Kerrisk (man-pages)
>> > <mtk.manpages@gmail.com> wrote:
>> >> Hello Keno
>> >>
>> >> On 12/03/2017 04:15 AM, Keno Fischer wrote:
>> >>> Resending as plain text (apologies for those receiving it twice, and
>> >>> those that got
>> >>> an HTML copy, I'm used to my mail client switching that over
>> >>> automatically, which
>> >>> for some reason didn't happen here).
>> >>>
>> >>>
>> >>> This is exactly the discussion I want to generate, so thank you.
>> >>> I should point out that I'm not advocating for anything other
>> >>> than clarity of what kernel behavior user space may assume.
>> >>
>> >> So, should the documentation patch be applied at this point, or
>> >> dropped?
>> >>
>> >> Thanks,
>> >>
>> >> Michael
>> >>
>> >>
>> >>> On Sat, Dec 2, 2017 at 9:25 PM, Matthew Wilcox <willy@infradead.org>
>> >>> wrote:
>> >>>> On Sat, Dec 02, 2017 at 07:23:59PM -0500, Keno Fischer wrote:
>> >>>>> The catalyst for this patch was me experiencing EINTR errors when
>> >>>>> using the 9p file system. In linux commit 9523feac, the 9p file
>> >>>>> system was changed to use wait_event_killable instead of
>> >>>>> wait_event_interruptible, which does indeed address my problem,
>> >>>>> but also makes me a bit unhappy, because uninterruptable waits
>> >>>>> prevents things like ^C'ing the execution and some debugging
>> >>>>> tools which depend on being able to cancel long-running operations
>> >>>>> by sending signals.
>> >>>>
>> >>>> Wait, wait, wait.  killable is not uninterruptible.  It's "can accept
>> >>>> a signal if the signal is fatal".  ie userspace will never see it.
>> >>>> So, no, it doesn't prevent ^C.  It does prevent the debugging tool
>> >>>> you're
>> >>>> talking about from working, because it's handling the signal, so it's
>> >>>> not
>> >>>> fatal.
>> >>>
>> >>> This probably shows that I've been in REPL based environments too
>> >>> long,
>> >>> that catch SIGINT ;). You are of course correct that a fatal SIGINT
>> >>> would
>> >>> still be delivered.
>> >>>
>> >>>>> I realize I'm probably 20 years too late here, but it feels like
>> >>>>> clarificaion on what to expect from the kernel would still go a long
>> >>>>> way here.
>> >>>>
>> >>>> A change to user-visible behaviour has to be opt-in.
>> >>>
>> >>> I agree. However, it was my impression that stat() can return EINTR
>> >>> depending on the file system. Prior to the referenced commit,
>> >>> this was certainly true on 9p and I suspect it's not the only network
>> >>> file
>> >>> system for which this is true (though prior to my experiencing this
>> >>> with 9p, the only
>> >>> time I've ever experienced it was on HPC clusters with who knows what
>> >>> code providing the network filesystem). If it is indeed the case that
>> >>> an EINTR return from stat() and similar is illegal and should be
>> >>> considered
>> >>> a kernel bug, a statement to that extent all I'm looking for here.
>> >>>
>> >>
>> >>
>> >> --
>> >> Michael Kerrisk
>> >> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
>> >> Linux/UNIX System Programming Training: http://man7.org/training/
>> >
>>
>>
>> --
>> Michael Kerrisk
>> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
>> Linux/UNIX System Programming Training: http://man7.org/training/
diff mbox

Patch

diff --git a/man2/stat.2 b/man2/stat.2
index dad9a01..f10235a 100644
--- a/man2/stat.2
+++ b/man2/stat.2
@@ -452,6 +452,11 @@  Invalid flag specified in
 is relative and
 .I dirfd
 is a file descriptor referring to a file other than a directory.
+.TP
+.B EINTR
+The call was interrupted by delivery of a signal caught by a handler; see
+.BR signal (7).
+The possibility of this error is file-system dependent.
 .SH VERSIONS
 .BR fstatat ()
 was added to Linux in kernel 2.6.16;