[2/2] mount: RPC_PROGNOTREGISTERED should not be a permanent error

On Thu, Nov 24 2016, Steve Dickson wrote:

> On 11/22/2016 05:43 PM, NeilBrown wrote:
>> On Wed, Nov 23 2016, Steve Dickson wrote:
>> 
>>> [Resent due to mailman rejecting the HTML subpart]
>> (and the resend included HTML too ... how embarrassing :-)
> Yeah... :-) I guess an upgrade turned it on.. 
>
>> 
>>>
>>> Hey Neil,
>>>
>>>
>>> On 08/18/2016 09:45 PM, NeilBrown wrote:
>>>> Commit: bf66c9facb8e ("mounts.nfs: v2 and v3 background mounts should retry when server is down.")
>>>>
>>>> changed the behaviour of "bg" mounts so that RPC_PROGNOTREGISTERED,
>>>> which maps to EOPNOTSUPP, is not a permanent error.
>>>> This useful because when an NFS server starts up there is a small window between
>>>> the moment that rpcbind (or portmap) starts responding to lookup requests,
>>>> and the moment when nfsd registers with rpcbind.  During that window
>>>> rpcbind will reply with RPC_PROGNOTREGISTERED, but mount should not give up.
>>>>
>>>> This same reasoning applies to foreground mounts.  They don't wait for
>>>> as long, but could still hit the window and fail prematurely.
>>>>
>>>> So revert the above patch and instead add EOPNOTSUPP to the list of
>>>> temporary errors known to nfs_is_permanent_error.
>>>>
>>>> Signed-off-by: NeilBrown <neilb@suse.com>
>>>> ---
>>>>  utils/mount/stropts.c |    7 +++----
>>>>  1 file changed, 3 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/utils/mount/stropts.c b/utils/mount/stropts.c
>>>> index 9de6794c6177..d5dfb5e4a669 100644
>>>> --- a/utils/mount/stropts.c
>>>> +++ b/utils/mount/stropts.c
>>>> @@ -948,6 +948,7 @@ static int nfs_is_permanent_error(int error)
>>>>  	case ETIMEDOUT:
>>>>  	case ECONNREFUSED:
>>>>  	case EHOSTUNREACH:
>>>> +	case EOPNOTSUPP:	/* aka RPC_PROGNOTREGISTERED */
>>> I think this introduced a regression... When the server does not support
>>> a protocol, say UDP, this patch cause the mount to hang forever,
>>> which I don't think we want.
>> 
>> 
>> I think we do want it to wait a while so that the nfs server has a
>> chance to start up.  We have no guarantee that the NFS server will be
>> registered with rpcbind before rpcbind responds to requests.
> I do see this race but there it has to be a small window. With
> Fedora its under seconds between the time rpcbind started
> and the NFS server.
>
>> 
>> I disagree with the "hang forever" description.  I just tested after
>> disabling UDP on an nfs server, and the delay was 2 minutes, 5 seconds
>> before a failure was reported.  It might be longer when trying TCP on a
>> server that only supports UDP.
> Yeah I did not wait that long... You are much more of a patient man than I ;-) 
> I do think this is a regression. Going an from an instant failure to one
> that takes over 2min is not a good thing... IMHO.
>
>> 
>> So I think the current behavior is correct.  You might be able to argue
>> that certain error codes should trigger a shorter timeout, but it would
>> need a strong argument.
> Going with the theory the window is very small, how about 
> a retry with a timeout then a failure? 

I started looking at changing the timeout and it wouldn't be too hard
(if we can agree on a suitable delay), but I feel I must ask why this is
important.
In what situation are you likely to mount with the wrong protocol, that
you aren't able to just Ctrl-C when you realized what a dumb thing you
just did?

If rpcbind isn't running, which is arguably a very similar situation
(no protocols are register) we have always had a long timeout. Why is
"just one protocol not registered" any different?

Anyway, below is the patch I was working on.  I stopped when I wasn't
sure how to handle ECONNREFUSED.

NeilBrown

[2/2] mount: RPC_PROGNOTREGISTERED should not be a permanent error

Commit Message

Comments

Patch