Message ID | 20210929005641.60861-1-dai.ngo@oracle.com (mailing list archive) |
---|---|
Headers | show |
Series | nfsd: Initial implementation of NFSv4 Courteous Server | expand |
On Tue, Sep 28, 2021 at 08:56:39PM -0400, Dai Ngo wrote: > > Hi Bruce, > > This series of patches implement the NFSv4 Courteous Server. Apologies, I keep meaning to get back to this and haven't yet. I do notice I'm seeing a timeout on pynfs 4.0 test OPEN18. --b. > > A server which does not immediately expunge the state on lease expiration > is known as a Courteous Server. A Courteous Server continues to recognize > previously generated state tokens as valid until conflict arises between > the expired state and the requests from another client, or the server > reboots. > > The v2 patch includes the following: > > . add new callback, lm_expire_lock, to lock_manager_operations to > allow the lock manager to take appropriate action with conflict lock. > > . handle conflicts of NFSv4 locks with NFSv3/NLM and local locks. > > . expire courtesy client after 24hr if client has not reconnected. > > . do not allow expired client to become courtesy client if there are > waiters for client's locks. > > . modify client_info_show to show courtesy client and seconds from > last renew. > > . fix a problem with NFSv4.1 server where the it keeps returning > SEQ4_STATUS_CB_PATH_DOWN in the successful SEQUENCE reply, after > the courtesy client re-connects, causing the client to keep sending > BCTS requests to server. > > The v3 patch includes the following: > > . modified posix_test_lock to check and resolve conflict locks > to handle NLM TEST and NFSv4 LOCKT requests. > > . separate out fix for back channel stuck in SEQ4_STATUS_CB_PATH_DOWN. > > The v4 patch includes: > > . rework nfsd_check_courtesy to avoid dead lock of fl_lock and client_lock > by asking the laudromat thread to destroy the courtesy client. > > . handle NFSv4 share reservation conflicts with courtesy client. This > includes conflicts between access mode and deny mode and vice versa. > > . drop the patch for back channel stuck in SEQ4_STATUS_CB_PATH_DOWN. > > The v5 patch includes: > > . fix recursive locking of file_rwsem from posix_lock_file. > > . retest with LOCKDEP enabled. > > NOTE: I will submit pynfs tests for courteous server including tests > for share reservation conflicts in a separate patch. >
On 10/1/21 1:53 PM, J. Bruce Fields wrote: > On Tue, Sep 28, 2021 at 08:56:39PM -0400, Dai Ngo wrote: >> Hi Bruce, >> >> This series of patches implement the NFSv4 Courteous Server. > Apologies, I keep meaning to get back to this and haven't yet. > > I do notice I'm seeing a timeout on pynfs 4.0 test OPEN18. It's weird, this test passes on my system: [root@nfsvmf25 nfs4.0]# ./testserver.py $server --rundeps -v OPEN18 INIT st_setclientid.testValid : RUNNING INIT st_setclientid.testValid : PASS MKFILE st_open.testOpen : RUNNING MKFILE st_open.testOpen : PASS OPEN18 st_open.testShareConflict1 : RUNNING OPEN18 st_open.testShareConflict1 : PASS ************************************************** INIT st_setclientid.testValid : PASS OPEN18 st_open.testShareConflict1 : PASS MKFILE st_open.testOpen : PASS ************************************************** Command line asked for 3 of 673 tests Of those: 0 Skipped, 0 Failed, 0 Warned, 3 Passed [root@nfsvmf25 nfs4.0]# Do you have a network trace? -Dai > > --b. > >> A server which does not immediately expunge the state on lease expiration >> is known as a Courteous Server. A Courteous Server continues to recognize >> previously generated state tokens as valid until conflict arises between >> the expired state and the requests from another client, or the server >> reboots. >> >> The v2 patch includes the following: >> >> . add new callback, lm_expire_lock, to lock_manager_operations to >> allow the lock manager to take appropriate action with conflict lock. >> >> . handle conflicts of NFSv4 locks with NFSv3/NLM and local locks. >> >> . expire courtesy client after 24hr if client has not reconnected. >> >> . do not allow expired client to become courtesy client if there are >> waiters for client's locks. >> >> . modify client_info_show to show courtesy client and seconds from >> last renew. >> >> . fix a problem with NFSv4.1 server where the it keeps returning >> SEQ4_STATUS_CB_PATH_DOWN in the successful SEQUENCE reply, after >> the courtesy client re-connects, causing the client to keep sending >> BCTS requests to server. >> >> The v3 patch includes the following: >> >> . modified posix_test_lock to check and resolve conflict locks >> to handle NLM TEST and NFSv4 LOCKT requests. >> >> . separate out fix for back channel stuck in SEQ4_STATUS_CB_PATH_DOWN. >> >> The v4 patch includes: >> >> . rework nfsd_check_courtesy to avoid dead lock of fl_lock and client_lock >> by asking the laudromat thread to destroy the courtesy client. >> >> . handle NFSv4 share reservation conflicts with courtesy client. This >> includes conflicts between access mode and deny mode and vice versa. >> >> . drop the patch for back channel stuck in SEQ4_STATUS_CB_PATH_DOWN. >> >> The v5 patch includes: >> >> . fix recursive locking of file_rwsem from posix_lock_file. >> >> . retest with LOCKDEP enabled. >> >> NOTE: I will submit pynfs tests for courteous server including tests >> for share reservation conflicts in a separate patch. >>
On Fri, Oct 01, 2021 at 02:41:55PM -0700, dai.ngo@oracle.com wrote: > > On 10/1/21 1:53 PM, J. Bruce Fields wrote: > >On Tue, Sep 28, 2021 at 08:56:39PM -0400, Dai Ngo wrote: > >>Hi Bruce, > >> > >>This series of patches implement the NFSv4 Courteous Server. > >Apologies, I keep meaning to get back to this and haven't yet. > > > >I do notice I'm seeing a timeout on pynfs 4.0 test OPEN18. > > It's weird, this test passes on my system: > > > [root@nfsvmf25 nfs4.0]# ./testserver.py $server --rundeps -v OPEN18 > INIT st_setclientid.testValid : RUNNING > INIT st_setclientid.testValid : PASS > MKFILE st_open.testOpen : RUNNING > MKFILE st_open.testOpen : PASS > OPEN18 st_open.testShareConflict1 : RUNNING > OPEN18 st_open.testShareConflict1 : PASS > ************************************************** > INIT st_setclientid.testValid : PASS > OPEN18 st_open.testShareConflict1 : PASS > MKFILE st_open.testOpen : PASS > ************************************************** > Command line asked for 3 of 673 tests > Of those: 0 Skipped, 0 Failed, 0 Warned, 3 Passed > [root@nfsvmf25 nfs4.0]# > > Do you have a network trace? Yeah, weirdly, I think it's failing only when I run it with all the other pynfs tests, not when I run it alone. I'll check again and see if I can get a trace, probably next week. --b.
Hi Bruce, Just a reminder that this patch is still waiting for your review. Thanks, -Dai On 10/1/21 2:41 PM, dai.ngo@oracle.com wrote: > > On 10/1/21 1:53 PM, J. Bruce Fields wrote: >> On Tue, Sep 28, 2021 at 08:56:39PM -0400, Dai Ngo wrote: >>> Hi Bruce, >>> >>> This series of patches implement the NFSv4 Courteous Server. >> Apologies, I keep meaning to get back to this and haven't yet. >> >> I do notice I'm seeing a timeout on pynfs 4.0 test OPEN18. > > It's weird, this test passes on my system: > > > [root@nfsvmf25 nfs4.0]# ./testserver.py $server --rundeps -v OPEN18 > INIT st_setclientid.testValid : RUNNING > INIT st_setclientid.testValid : PASS > MKFILE st_open.testOpen : RUNNING > MKFILE st_open.testOpen : PASS > OPEN18 st_open.testShareConflict1 : RUNNING > OPEN18 st_open.testShareConflict1 : PASS > ************************************************** > INIT st_setclientid.testValid : PASS > OPEN18 st_open.testShareConflict1 : PASS > MKFILE st_open.testOpen : PASS > ************************************************** > Command line asked for 3 of 673 tests > Of those: 0 Skipped, 0 Failed, 0 Warned, 3 Passed > [root@nfsvmf25 nfs4.0]# > > Do you have a network trace? > > -Dai > >> >> --b. >> >>> A server which does not immediately expunge the state on lease >>> expiration >>> is known as a Courteous Server. A Courteous Server continues to >>> recognize >>> previously generated state tokens as valid until conflict arises >>> between >>> the expired state and the requests from another client, or the server >>> reboots. >>> >>> The v2 patch includes the following: >>> >>> . add new callback, lm_expire_lock, to lock_manager_operations to >>> allow the lock manager to take appropriate action with conflict >>> lock. >>> >>> . handle conflicts of NFSv4 locks with NFSv3/NLM and local locks. >>> >>> . expire courtesy client after 24hr if client has not reconnected. >>> >>> . do not allow expired client to become courtesy client if there are >>> waiters for client's locks. >>> >>> . modify client_info_show to show courtesy client and seconds from >>> last renew. >>> >>> . fix a problem with NFSv4.1 server where the it keeps returning >>> SEQ4_STATUS_CB_PATH_DOWN in the successful SEQUENCE reply, after >>> the courtesy client re-connects, causing the client to keep sending >>> BCTS requests to server. >>> >>> The v3 patch includes the following: >>> >>> . modified posix_test_lock to check and resolve conflict locks >>> to handle NLM TEST and NFSv4 LOCKT requests. >>> >>> . separate out fix for back channel stuck in SEQ4_STATUS_CB_PATH_DOWN. >>> >>> The v4 patch includes: >>> >>> . rework nfsd_check_courtesy to avoid dead lock of fl_lock and >>> client_lock >>> by asking the laudromat thread to destroy the courtesy client. >>> >>> . handle NFSv4 share reservation conflicts with courtesy client. This >>> includes conflicts between access mode and deny mode and vice versa. >>> >>> . drop the patch for back channel stuck in SEQ4_STATUS_CB_PATH_DOWN. >>> >>> The v5 patch includes: >>> >>> . fix recursive locking of file_rwsem from posix_lock_file. >>> >>> . retest with LOCKDEP enabled. >>> >>> NOTE: I will submit pynfs tests for courteous server including tests >>> for share reservation conflicts in a separate patch. >>>
On Tue, Nov 16, 2021 at 03:06:32PM -0800, dai.ngo@oracle.com wrote: > Just a reminder that this patch is still waiting for your review. Yeah, I was procrastinating and hoping yo'ud figure out the pynfs failure for me.... I'll see if I can get some time today.--b. > > Thanks, > -Dai > > On 10/1/21 2:41 PM, dai.ngo@oracle.com wrote: > > > >On 10/1/21 1:53 PM, J. Bruce Fields wrote: > >>On Tue, Sep 28, 2021 at 08:56:39PM -0400, Dai Ngo wrote: > >>>Hi Bruce, > >>> > >>>This series of patches implement the NFSv4 Courteous Server. > >>Apologies, I keep meaning to get back to this and haven't yet. > >> > >>I do notice I'm seeing a timeout on pynfs 4.0 test OPEN18. > > > >It's weird, this test passes on my system: > > > > > >[root@nfsvmf25 nfs4.0]# ./testserver.py $server --rundeps -v OPEN18 > >INIT st_setclientid.testValid : RUNNING > >INIT st_setclientid.testValid : PASS > >MKFILE st_open.testOpen : RUNNING > >MKFILE st_open.testOpen : PASS > >OPEN18 st_open.testShareConflict1 : RUNNING > >OPEN18 st_open.testShareConflict1 : PASS > >************************************************** > >INIT st_setclientid.testValid : PASS > >OPEN18 st_open.testShareConflict1 : PASS > >MKFILE st_open.testOpen : PASS > >************************************************** > >Command line asked for 3 of 673 tests > >Of those: 0 Skipped, 0 Failed, 0 Warned, 3 Passed > >[root@nfsvmf25 nfs4.0]# > > > >Do you have a network trace? > > > >-Dai > > > >> > >>--b. > >> > >>>A server which does not immediately expunge the state on lease > >>>expiration > >>>is known as a Courteous Server. A Courteous Server continues > >>>to recognize > >>>previously generated state tokens as valid until conflict > >>>arises between > >>>the expired state and the requests from another client, or the server > >>>reboots. > >>> > >>>The v2 patch includes the following: > >>> > >>>. add new callback, lm_expire_lock, to lock_manager_operations to > >>> allow the lock manager to take appropriate action with > >>>conflict lock. > >>> > >>>. handle conflicts of NFSv4 locks with NFSv3/NLM and local locks. > >>> > >>>. expire courtesy client after 24hr if client has not reconnected. > >>> > >>>. do not allow expired client to become courtesy client if there are > >>> waiters for client's locks. > >>> > >>>. modify client_info_show to show courtesy client and seconds from > >>> last renew. > >>> > >>>. fix a problem with NFSv4.1 server where the it keeps returning > >>> SEQ4_STATUS_CB_PATH_DOWN in the successful SEQUENCE reply, after > >>> the courtesy client re-connects, causing the client to keep sending > >>> BCTS requests to server. > >>> > >>>The v3 patch includes the following: > >>> > >>>. modified posix_test_lock to check and resolve conflict locks > >>> to handle NLM TEST and NFSv4 LOCKT requests. > >>> > >>>. separate out fix for back channel stuck in SEQ4_STATUS_CB_PATH_DOWN. > >>> > >>>The v4 patch includes: > >>> > >>>. rework nfsd_check_courtesy to avoid dead lock of fl_lock and > >>>client_lock > >>> by asking the laudromat thread to destroy the courtesy client. > >>> > >>>. handle NFSv4 share reservation conflicts with courtesy client. This > >>> includes conflicts between access mode and deny mode and vice versa. > >>> > >>>. drop the patch for back channel stuck in SEQ4_STATUS_CB_PATH_DOWN. > >>> > >>>The v5 patch includes: > >>> > >>>. fix recursive locking of file_rwsem from posix_lock_file. > >>> > >>>. retest with LOCKDEP enabled. > >>> > >>>NOTE: I will submit pynfs tests for courteous server including tests > >>>for share reservation conflicts in a separate patch. > >>>
On 11/17/21 6:14 AM, J. Bruce Fields wrote: > On Tue, Nov 16, 2021 at 03:06:32PM -0800, dai.ngo@oracle.com wrote: >> Just a reminder that this patch is still waiting for your review. > Yeah, I was procrastinating and hoping yo'ud figure out the pynfs > failure for me.... Last time I ran 4.0 OPEN18 test by itself and it passed. I will run all OPEN tests together with 5.15-rc7 to see if the problem you've seen still there. -Dai > I'll see if I can get some time today.--b. > >> Thanks, >> -Dai >> >> On 10/1/21 2:41 PM, dai.ngo@oracle.com wrote: >>> On 10/1/21 1:53 PM, J. Bruce Fields wrote: >>>> On Tue, Sep 28, 2021 at 08:56:39PM -0400, Dai Ngo wrote: >>>>> Hi Bruce, >>>>> >>>>> This series of patches implement the NFSv4 Courteous Server. >>>> Apologies, I keep meaning to get back to this and haven't yet. >>>> >>>> I do notice I'm seeing a timeout on pynfs 4.0 test OPEN18. >>> It's weird, this test passes on my system: >>> >>> >>> [root@nfsvmf25 nfs4.0]# ./testserver.py $server --rundeps -v OPEN18 >>> INIT st_setclientid.testValid : RUNNING >>> INIT st_setclientid.testValid : PASS >>> MKFILE st_open.testOpen : RUNNING >>> MKFILE st_open.testOpen : PASS >>> OPEN18 st_open.testShareConflict1 : RUNNING >>> OPEN18 st_open.testShareConflict1 : PASS >>> ************************************************** >>> INIT st_setclientid.testValid : PASS >>> OPEN18 st_open.testShareConflict1 : PASS >>> MKFILE st_open.testOpen : PASS >>> ************************************************** >>> Command line asked for 3 of 673 tests >>> Of those: 0 Skipped, 0 Failed, 0 Warned, 3 Passed >>> [root@nfsvmf25 nfs4.0]# >>> >>> Do you have a network trace? >>> >>> -Dai >>> >>>> --b. >>>> >>>>> A server which does not immediately expunge the state on lease >>>>> expiration >>>>> is known as a Courteous Server. A Courteous Server continues >>>>> to recognize >>>>> previously generated state tokens as valid until conflict >>>>> arises between >>>>> the expired state and the requests from another client, or the server >>>>> reboots. >>>>> >>>>> The v2 patch includes the following: >>>>> >>>>> . add new callback, lm_expire_lock, to lock_manager_operations to >>>>> allow the lock manager to take appropriate action with >>>>> conflict lock. >>>>> >>>>> . handle conflicts of NFSv4 locks with NFSv3/NLM and local locks. >>>>> >>>>> . expire courtesy client after 24hr if client has not reconnected. >>>>> >>>>> . do not allow expired client to become courtesy client if there are >>>>> waiters for client's locks. >>>>> >>>>> . modify client_info_show to show courtesy client and seconds from >>>>> last renew. >>>>> >>>>> . fix a problem with NFSv4.1 server where the it keeps returning >>>>> SEQ4_STATUS_CB_PATH_DOWN in the successful SEQUENCE reply, after >>>>> the courtesy client re-connects, causing the client to keep sending >>>>> BCTS requests to server. >>>>> >>>>> The v3 patch includes the following: >>>>> >>>>> . modified posix_test_lock to check and resolve conflict locks >>>>> to handle NLM TEST and NFSv4 LOCKT requests. >>>>> >>>>> . separate out fix for back channel stuck in SEQ4_STATUS_CB_PATH_DOWN. >>>>> >>>>> The v4 patch includes: >>>>> >>>>> . rework nfsd_check_courtesy to avoid dead lock of fl_lock and >>>>> client_lock >>>>> by asking the laudromat thread to destroy the courtesy client. >>>>> >>>>> . handle NFSv4 share reservation conflicts with courtesy client. This >>>>> includes conflicts between access mode and deny mode and vice versa. >>>>> >>>>> . drop the patch for back channel stuck in SEQ4_STATUS_CB_PATH_DOWN. >>>>> >>>>> The v5 patch includes: >>>>> >>>>> . fix recursive locking of file_rwsem from posix_lock_file. >>>>> >>>>> . retest with LOCKDEP enabled. >>>>> >>>>> NOTE: I will submit pynfs tests for courteous server including tests >>>>> for share reservation conflicts in a separate patch. >>>>>
On 11/17/21 9:59 AM, dai.ngo@oracle.com wrote: > > On 11/17/21 6:14 AM, J. Bruce Fields wrote: >> On Tue, Nov 16, 2021 at 03:06:32PM -0800, dai.ngo@oracle.com wrote: >>> Just a reminder that this patch is still waiting for your review. >> Yeah, I was procrastinating and hoping yo'ud figure out the pynfs >> failure for me.... > > Last time I ran 4.0 OPEN18 test by itself and it passed. I will run > all OPEN tests together with 5.15-rc7 to see if the problem you've > seen still there. I ran all tests in nfsv4.1 and nfsv4.0 with courteous and non-courteous 5.15-rc7 server. Nfs4.1 results are the same for both courteous and non-courteous server: > Of those: 0 Skipped, 0 Failed, 0 Warned, 169 Passed Results of nfs4.0 with non-courteous server: >Of those: 8 Skipped, 1 Failed, 0 Warned, 577 Passed test failed: LOCK24 Results of nfs4.0 with courteous server: >Of those: 8 Skipped, 3 Failed, 0 Warned, 575 Passed tests failed: LOCK24, OPEN18, OPEN30 OPEN18 and OPEN30 test pass if each is run by itself. I will look into this problem. -Dai > > -Dai > >> I'll see if I can get some time today.--b. >> >>> Thanks, >>> -Dai >>> >>> On 10/1/21 2:41 PM, dai.ngo@oracle.com wrote: >>>> On 10/1/21 1:53 PM, J. Bruce Fields wrote: >>>>> On Tue, Sep 28, 2021 at 08:56:39PM -0400, Dai Ngo wrote: >>>>>> Hi Bruce, >>>>>> >>>>>> This series of patches implement the NFSv4 Courteous Server. >>>>> Apologies, I keep meaning to get back to this and haven't yet. >>>>> >>>>> I do notice I'm seeing a timeout on pynfs 4.0 test OPEN18. >>>> It's weird, this test passes on my system: >>>> >>>> >>>> [root@nfsvmf25 nfs4.0]# ./testserver.py $server --rundeps -v OPEN18 >>>> INIT st_setclientid.testValid : RUNNING >>>> INIT st_setclientid.testValid : PASS >>>> MKFILE st_open.testOpen : RUNNING >>>> MKFILE st_open.testOpen : PASS >>>> OPEN18 st_open.testShareConflict1 : RUNNING >>>> OPEN18 st_open.testShareConflict1 : PASS >>>> ************************************************** >>>> INIT st_setclientid.testValid : PASS >>>> OPEN18 st_open.testShareConflict1 : PASS >>>> MKFILE st_open.testOpen : PASS >>>> ************************************************** >>>> Command line asked for 3 of 673 tests >>>> Of those: 0 Skipped, 0 Failed, 0 Warned, 3 Passed >>>> [root@nfsvmf25 nfs4.0]# >>>> >>>> Do you have a network trace? >>>> >>>> -Dai >>>> >>>>> --b. >>>>> >>>>>> A server which does not immediately expunge the state on lease >>>>>> expiration >>>>>> is known as a Courteous Server. A Courteous Server continues >>>>>> to recognize >>>>>> previously generated state tokens as valid until conflict >>>>>> arises between >>>>>> the expired state and the requests from another client, or the >>>>>> server >>>>>> reboots. >>>>>> >>>>>> The v2 patch includes the following: >>>>>> >>>>>> . add new callback, lm_expire_lock, to lock_manager_operations to >>>>>> allow the lock manager to take appropriate action with >>>>>> conflict lock. >>>>>> >>>>>> . handle conflicts of NFSv4 locks with NFSv3/NLM and local locks. >>>>>> >>>>>> . expire courtesy client after 24hr if client has not reconnected. >>>>>> >>>>>> . do not allow expired client to become courtesy client if there are >>>>>> waiters for client's locks. >>>>>> >>>>>> . modify client_info_show to show courtesy client and seconds from >>>>>> last renew. >>>>>> >>>>>> . fix a problem with NFSv4.1 server where the it keeps returning >>>>>> SEQ4_STATUS_CB_PATH_DOWN in the successful SEQUENCE reply, after >>>>>> the courtesy client re-connects, causing the client to keep >>>>>> sending >>>>>> BCTS requests to server. >>>>>> >>>>>> The v3 patch includes the following: >>>>>> >>>>>> . modified posix_test_lock to check and resolve conflict locks >>>>>> to handle NLM TEST and NFSv4 LOCKT requests. >>>>>> >>>>>> . separate out fix for back channel stuck in >>>>>> SEQ4_STATUS_CB_PATH_DOWN. >>>>>> >>>>>> The v4 patch includes: >>>>>> >>>>>> . rework nfsd_check_courtesy to avoid dead lock of fl_lock and >>>>>> client_lock >>>>>> by asking the laudromat thread to destroy the courtesy client. >>>>>> >>>>>> . handle NFSv4 share reservation conflicts with courtesy client. >>>>>> This >>>>>> includes conflicts between access mode and deny mode and vice >>>>>> versa. >>>>>> >>>>>> . drop the patch for back channel stuck in SEQ4_STATUS_CB_PATH_DOWN. >>>>>> >>>>>> The v5 patch includes: >>>>>> >>>>>> . fix recursive locking of file_rwsem from posix_lock_file. >>>>>> >>>>>> . retest with LOCKDEP enabled. >>>>>> >>>>>> NOTE: I will submit pynfs tests for courteous server including tests >>>>>> for share reservation conflicts in a separate patch. >>>>>>
On Wed, Nov 17, 2021 at 01:46:02PM -0800, dai.ngo@oracle.com wrote: > > On 11/17/21 9:59 AM, dai.ngo@oracle.com wrote: > > > >On 11/17/21 6:14 AM, J. Bruce Fields wrote: > >>On Tue, Nov 16, 2021 at 03:06:32PM -0800, dai.ngo@oracle.com wrote: > >>>Just a reminder that this patch is still waiting for your review. > >>Yeah, I was procrastinating and hoping yo'ud figure out the pynfs > >>failure for me.... > > > >Last time I ran 4.0 OPEN18 test by itself and it passed. I will run > >all OPEN tests together with 5.15-rc7 to see if the problem you've > >seen still there. > > I ran all tests in nfsv4.1 and nfsv4.0 with courteous and non-courteous > 5.15-rc7 server. > > Nfs4.1 results are the same for both courteous and non-courteous server: > >Of those: 0 Skipped, 0 Failed, 0 Warned, 169 Passed > > Results of nfs4.0 with non-courteous server: > >Of those: 8 Skipped, 1 Failed, 0 Warned, 577 Passed > test failed: LOCK24 > > Results of nfs4.0 with courteous server: > >Of those: 8 Skipped, 3 Failed, 0 Warned, 575 Passed > tests failed: LOCK24, OPEN18, OPEN30 > > OPEN18 and OPEN30 test pass if each is run by itself. Could well be a bug in the tests, I don't know. > I will look into this problem. Thanks! --b.
On 11/17/21 4:34 PM, J. Bruce Fields wrote: > On Wed, Nov 17, 2021 at 01:46:02PM -0800, dai.ngo@oracle.com wrote: >> On 11/17/21 9:59 AM, dai.ngo@oracle.com wrote: >>> On 11/17/21 6:14 AM, J. Bruce Fields wrote: >>>> On Tue, Nov 16, 2021 at 03:06:32PM -0800, dai.ngo@oracle.com wrote: >>>>> Just a reminder that this patch is still waiting for your review. >>>> Yeah, I was procrastinating and hoping yo'ud figure out the pynfs >>>> failure for me.... >>> Last time I ran 4.0 OPEN18 test by itself and it passed. I will run >>> all OPEN tests together with 5.15-rc7 to see if the problem you've >>> seen still there. >> I ran all tests in nfsv4.1 and nfsv4.0 with courteous and non-courteous >> 5.15-rc7 server. >> >> Nfs4.1 results are the same for both courteous and non-courteous server: >>> Of those: 0 Skipped, 0 Failed, 0 Warned, 169 Passed >> Results of nfs4.0 with non-courteous server: >>> Of those: 8 Skipped, 1 Failed, 0 Warned, 577 Passed >> test failed: LOCK24 >> >> Results of nfs4.0 with courteous server: >>> Of those: 8 Skipped, 3 Failed, 0 Warned, 575 Passed >> tests failed: LOCK24, OPEN18, OPEN30 >> >> OPEN18 and OPEN30 test pass if each is run by itself. > Could well be a bug in the tests, I don't know. The reason OPEN18 failed was because the test timed out waiting for the reply of an OPEN call. The RPC connection used for the test was configured with 15 secs timeout. Note that OPEN18 only fails when the tests were run with 'all' option, this test passes if it's run by itself. With courteous server, by the time OPEN18 runs, there are about 1026 courtesy 4.0 clients on the server and all of these clients have opened the same file X with WRITE access. These clients were created by the previous tests. After each test completed, since 4.0 does not have session, the client states are not cleaned up immediately on the server and are allowed to become courtesy clients. When OPEN18 runs (about 20 minutes after the 1st test started), it sends OPEN of file X with OPEN4_SHARE_DENY_WRITE which causes the server to check for conflicts with courtesy clients. The loop that checks 1026 courtesy clients for share/access conflict took less than 1 sec. But it took about 55 secs, on my VM, for the server to expire all 1026 courtesy clients. I modified pynfs to configure the 4.0 RPC connection with 60 seconds timeout and OPEN18 now consistently passed. The 4.0 test results are now the same for courteous and non-courteous server: 8 Skipped, 1 Failed, 0 Warned, 577 Passed Note that 4.1 tests do not suffer this timeout problem because the 4.1 clients and sessions are destroyed after each test completes. -Dai >> I will look into this problem. > Thanks! > > --b.
Hi Bruce, On 11/21/21 7:04 PM, dai.ngo@oracle.com wrote: > > On 11/17/21 4:34 PM, J. Bruce Fields wrote: >> On Wed, Nov 17, 2021 at 01:46:02PM -0800, dai.ngo@oracle.com wrote: >>> On 11/17/21 9:59 AM, dai.ngo@oracle.com wrote: >>>> On 11/17/21 6:14 AM, J. Bruce Fields wrote: >>>>> On Tue, Nov 16, 2021 at 03:06:32PM -0800, dai.ngo@oracle.com wrote: >>>>>> Just a reminder that this patch is still waiting for your review. >>>>> Yeah, I was procrastinating and hoping yo'ud figure out the pynfs >>>>> failure for me.... >>>> Last time I ran 4.0 OPEN18 test by itself and it passed. I will run >>>> all OPEN tests together with 5.15-rc7 to see if the problem you've >>>> seen still there. >>> I ran all tests in nfsv4.1 and nfsv4.0 with courteous and non-courteous >>> 5.15-rc7 server. >>> >>> Nfs4.1 results are the same for both courteous and non-courteous >>> server: >>>> Of those: 0 Skipped, 0 Failed, 0 Warned, 169 Passed >>> Results of nfs4.0 with non-courteous server: >>>> Of those: 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>> test failed: LOCK24 >>> >>> Results of nfs4.0 with courteous server: >>>> Of those: 8 Skipped, 3 Failed, 0 Warned, 575 Passed >>> tests failed: LOCK24, OPEN18, OPEN30 >>> >>> OPEN18 and OPEN30 test pass if each is run by itself. >> Could well be a bug in the tests, I don't know. > > The reason OPEN18 failed was because the test timed out waiting for > the reply of an OPEN call. The RPC connection used for the test was > configured with 15 secs timeout. Note that OPEN18 only fails when > the tests were run with 'all' option, this test passes if it's run > by itself. > > With courteous server, by the time OPEN18 runs, there are about 1026 > courtesy 4.0 clients on the server and all of these clients have opened > the same file X with WRITE access. These clients were created by the > previous tests. After each test completed, since 4.0 does not have > session, the client states are not cleaned up immediately on the > server and are allowed to become courtesy clients. > > When OPEN18 runs (about 20 minutes after the 1st test started), it > sends OPEN of file X with OPEN4_SHARE_DENY_WRITE which causes the > server to check for conflicts with courtesy clients. The loop that > checks 1026 courtesy clients for share/access conflict took less > than 1 sec. But it took about 55 secs, on my VM, for the server > to expire all 1026 courtesy clients. > > I modified pynfs to configure the 4.0 RPC connection with 60 seconds > timeout and OPEN18 now consistently passed. The 4.0 test results are > now the same for courteous and non-courteous server: > > 8 Skipped, 1 Failed, 0 Warned, 577 Passed > > Note that 4.1 tests do not suffer this timeout problem because the > 4.1 clients and sessions are destroyed after each test completes. Do you want me to send the patch to increase the timeout for pynfs? or is there any other things you think we should do? Thanks, -Dai
On Mon, Nov 29, 2021 at 09:13:16AM -0800, dai.ngo@oracle.com wrote: > Hi Bruce, > > On 11/21/21 7:04 PM, dai.ngo@oracle.com wrote: > > > >On 11/17/21 4:34 PM, J. Bruce Fields wrote: > >>On Wed, Nov 17, 2021 at 01:46:02PM -0800, dai.ngo@oracle.com wrote: > >>>On 11/17/21 9:59 AM, dai.ngo@oracle.com wrote: > >>>>On 11/17/21 6:14 AM, J. Bruce Fields wrote: > >>>>>On Tue, Nov 16, 2021 at 03:06:32PM -0800, dai.ngo@oracle.com wrote: > >>>>>>Just a reminder that this patch is still waiting for your review. > >>>>>Yeah, I was procrastinating and hoping yo'ud figure out the pynfs > >>>>>failure for me.... > >>>>Last time I ran 4.0 OPEN18 test by itself and it passed. I will run > >>>>all OPEN tests together with 5.15-rc7 to see if the problem you've > >>>>seen still there. > >>>I ran all tests in nfsv4.1 and nfsv4.0 with courteous and non-courteous > >>>5.15-rc7 server. > >>> > >>>Nfs4.1 results are the same for both courteous and > >>>non-courteous server: > >>>>Of those: 0 Skipped, 0 Failed, 0 Warned, 169 Passed > >>>Results of nfs4.0 with non-courteous server: > >>>>Of those: 8 Skipped, 1 Failed, 0 Warned, 577 Passed > >>>test failed: LOCK24 > >>> > >>>Results of nfs4.0 with courteous server: > >>>>Of those: 8 Skipped, 3 Failed, 0 Warned, 575 Passed > >>>tests failed: LOCK24, OPEN18, OPEN30 > >>> > >>>OPEN18 and OPEN30 test pass if each is run by itself. > >>Could well be a bug in the tests, I don't know. > > > >The reason OPEN18 failed was because the test timed out waiting for > >the reply of an OPEN call. The RPC connection used for the test was > >configured with 15 secs timeout. Note that OPEN18 only fails when > >the tests were run with 'all' option, this test passes if it's run > >by itself. > > > >With courteous server, by the time OPEN18 runs, there are about 1026 > >courtesy 4.0 clients on the server and all of these clients have opened > >the same file X with WRITE access. These clients were created by the > >previous tests. After each test completed, since 4.0 does not have > >session, the client states are not cleaned up immediately on the > >server and are allowed to become courtesy clients. > > > >When OPEN18 runs (about 20 minutes after the 1st test started), it > >sends OPEN of file X with OPEN4_SHARE_DENY_WRITE which causes the > >server to check for conflicts with courtesy clients. The loop that > >checks 1026 courtesy clients for share/access conflict took less > >than 1 sec. But it took about 55 secs, on my VM, for the server > >to expire all 1026 courtesy clients. > > > >I modified pynfs to configure the 4.0 RPC connection with 60 seconds > >timeout and OPEN18 now consistently passed. The 4.0 test results are > >now the same for courteous and non-courteous server: > > > >8 Skipped, 1 Failed, 0 Warned, 577 Passed > > > >Note that 4.1 tests do not suffer this timeout problem because the > >4.1 clients and sessions are destroyed after each test completes. > > Do you want me to send the patch to increase the timeout for pynfs? > or is there any other things you think we should do? I don't know. 55 seconds to clean up 1026 clients is about 50ms per client, which is pretty slow. I wonder why. I guess it's probably updating the stable storage information. Is /var/lib/nfs/ on your server backed by a hard drive or an SSD or something else? I wonder if that's an argument for limiting the number of courtesy clients. --b.
On 11/29/21 9:30 AM, J. Bruce Fields wrote: > On Mon, Nov 29, 2021 at 09:13:16AM -0800, dai.ngo@oracle.com wrote: >> Hi Bruce, >> >> On 11/21/21 7:04 PM, dai.ngo@oracle.com wrote: >>> On 11/17/21 4:34 PM, J. Bruce Fields wrote: >>>> On Wed, Nov 17, 2021 at 01:46:02PM -0800, dai.ngo@oracle.com wrote: >>>>> On 11/17/21 9:59 AM, dai.ngo@oracle.com wrote: >>>>>> On 11/17/21 6:14 AM, J. Bruce Fields wrote: >>>>>>> On Tue, Nov 16, 2021 at 03:06:32PM -0800, dai.ngo@oracle.com wrote: >>>>>>>> Just a reminder that this patch is still waiting for your review. >>>>>>> Yeah, I was procrastinating and hoping yo'ud figure out the pynfs >>>>>>> failure for me.... >>>>>> Last time I ran 4.0 OPEN18 test by itself and it passed. I will run >>>>>> all OPEN tests together with 5.15-rc7 to see if the problem you've >>>>>> seen still there. >>>>> I ran all tests in nfsv4.1 and nfsv4.0 with courteous and non-courteous >>>>> 5.15-rc7 server. >>>>> >>>>> Nfs4.1 results are the same for both courteous and >>>>> non-courteous server: >>>>>> Of those: 0 Skipped, 0 Failed, 0 Warned, 169 Passed >>>>> Results of nfs4.0 with non-courteous server: >>>>>> Of those: 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>>>> test failed: LOCK24 >>>>> >>>>> Results of nfs4.0 with courteous server: >>>>>> Of those: 8 Skipped, 3 Failed, 0 Warned, 575 Passed >>>>> tests failed: LOCK24, OPEN18, OPEN30 >>>>> >>>>> OPEN18 and OPEN30 test pass if each is run by itself. >>>> Could well be a bug in the tests, I don't know. >>> The reason OPEN18 failed was because the test timed out waiting for >>> the reply of an OPEN call. The RPC connection used for the test was >>> configured with 15 secs timeout. Note that OPEN18 only fails when >>> the tests were run with 'all' option, this test passes if it's run >>> by itself. >>> >>> With courteous server, by the time OPEN18 runs, there are about 1026 >>> courtesy 4.0 clients on the server and all of these clients have opened >>> the same file X with WRITE access. These clients were created by the >>> previous tests. After each test completed, since 4.0 does not have >>> session, the client states are not cleaned up immediately on the >>> server and are allowed to become courtesy clients. >>> >>> When OPEN18 runs (about 20 minutes after the 1st test started), it >>> sends OPEN of file X with OPEN4_SHARE_DENY_WRITE which causes the >>> server to check for conflicts with courtesy clients. The loop that >>> checks 1026 courtesy clients for share/access conflict took less >>> than 1 sec. But it took about 55 secs, on my VM, for the server >>> to expire all 1026 courtesy clients. >>> >>> I modified pynfs to configure the 4.0 RPC connection with 60 seconds >>> timeout and OPEN18 now consistently passed. The 4.0 test results are >>> now the same for courteous and non-courteous server: >>> >>> 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>> >>> Note that 4.1 tests do not suffer this timeout problem because the >>> 4.1 clients and sessions are destroyed after each test completes. >> Do you want me to send the patch to increase the timeout for pynfs? >> or is there any other things you think we should do? > I don't know. > > 55 seconds to clean up 1026 clients is about 50ms per client, which is > pretty slow. I wonder why. I guess it's probably updating the stable > storage information. Is /var/lib/nfs/ on your server backed by a hard > drive or an SSD or something else? My server is a virtualbox VM that has 1 CPU, 4GB RAM and 64GB of hard disk. I think a production system that supports this many clients should have faster CPUs, faster storage. > > I wonder if that's an argument for limiting the number of courtesy > clients. I think we might want to treat 4.0 clients a bit different from 4.1 clients. With 4.0, every client will become a courtesy client after the client is done with the export and unmounts it. Since there is no destroy session/client with 4.0, the courteous server allows the client to be around and becomes a courtesy client. So after awhile, even with normal usage, there will be lots 4.0 courtesy clients hanging around and these clients won't be destroyed until 24hrs later, or until they cause conflicts with other clients. We can reduce the courtesy_client_expiry time for 4.0 clients from 24hrs to 15/20 mins, enough for most network partition to heal?, or limit the number of 4.0 courtesy clients. Or don't support 4.0 clients at all which is my preference since I think in general users should skip 4.0 and use 4.1 instead. -Dai
Hello Dai! > On Nov 29, 2021, at 1:32 PM, Dai Ngo <dai.ngo@oracle.com> wrote: > > > On 11/29/21 9:30 AM, J. Bruce Fields wrote: >> On Mon, Nov 29, 2021 at 09:13:16AM -0800, dai.ngo@oracle.com wrote: >>> Hi Bruce, >>> >>> On 11/21/21 7:04 PM, dai.ngo@oracle.com wrote: >>>> On 11/17/21 4:34 PM, J. Bruce Fields wrote: >>>>> On Wed, Nov 17, 2021 at 01:46:02PM -0800, dai.ngo@oracle.com wrote: >>>>>> On 11/17/21 9:59 AM, dai.ngo@oracle.com wrote: >>>>>>> On 11/17/21 6:14 AM, J. Bruce Fields wrote: >>>>>>>> On Tue, Nov 16, 2021 at 03:06:32PM -0800, dai.ngo@oracle.com wrote: >>>>>>>>> Just a reminder that this patch is still waiting for your review. >>>>>>>> Yeah, I was procrastinating and hoping yo'ud figure out the pynfs >>>>>>>> failure for me.... >>>>>>> Last time I ran 4.0 OPEN18 test by itself and it passed. I will run >>>>>>> all OPEN tests together with 5.15-rc7 to see if the problem you've >>>>>>> seen still there. >>>>>> I ran all tests in nfsv4.1 and nfsv4.0 with courteous and non-courteous >>>>>> 5.15-rc7 server. >>>>>> >>>>>> Nfs4.1 results are the same for both courteous and >>>>>> non-courteous server: >>>>>>> Of those: 0 Skipped, 0 Failed, 0 Warned, 169 Passed >>>>>> Results of nfs4.0 with non-courteous server: >>>>>>> Of those: 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>>>>> test failed: LOCK24 >>>>>> >>>>>> Results of nfs4.0 with courteous server: >>>>>>> Of those: 8 Skipped, 3 Failed, 0 Warned, 575 Passed >>>>>> tests failed: LOCK24, OPEN18, OPEN30 >>>>>> >>>>>> OPEN18 and OPEN30 test pass if each is run by itself. >>>>> Could well be a bug in the tests, I don't know. >>>> The reason OPEN18 failed was because the test timed out waiting for >>>> the reply of an OPEN call. The RPC connection used for the test was >>>> configured with 15 secs timeout. Note that OPEN18 only fails when >>>> the tests were run with 'all' option, this test passes if it's run >>>> by itself. >>>> >>>> With courteous server, by the time OPEN18 runs, there are about 1026 >>>> courtesy 4.0 clients on the server and all of these clients have opened >>>> the same file X with WRITE access. These clients were created by the >>>> previous tests. After each test completed, since 4.0 does not have >>>> session, the client states are not cleaned up immediately on the >>>> server and are allowed to become courtesy clients. >>>> >>>> When OPEN18 runs (about 20 minutes after the 1st test started), it >>>> sends OPEN of file X with OPEN4_SHARE_DENY_WRITE which causes the >>>> server to check for conflicts with courtesy clients. The loop that >>>> checks 1026 courtesy clients for share/access conflict took less >>>> than 1 sec. But it took about 55 secs, on my VM, for the server >>>> to expire all 1026 courtesy clients. >>>> >>>> I modified pynfs to configure the 4.0 RPC connection with 60 seconds >>>> timeout and OPEN18 now consistently passed. The 4.0 test results are >>>> now the same for courteous and non-courteous server: >>>> >>>> 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>>> >>>> Note that 4.1 tests do not suffer this timeout problem because the >>>> 4.1 clients and sessions are destroyed after each test completes. >>> Do you want me to send the patch to increase the timeout for pynfs? >>> or is there any other things you think we should do? >> I don't know. >> >> 55 seconds to clean up 1026 clients is about 50ms per client, which is >> pretty slow. I wonder why. I guess it's probably updating the stable >> storage information. Is /var/lib/nfs/ on your server backed by a hard >> drive or an SSD or something else? > > My server is a virtualbox VM that has 1 CPU, 4GB RAM and 64GB of hard > disk. I think a production system that supports this many clients should > have faster CPUs, faster storage. > >> >> I wonder if that's an argument for limiting the number of courtesy >> clients. > > I think we might want to treat 4.0 clients a bit different from 4.1 > clients. With 4.0, every client will become a courtesy client after > the client is done with the export and unmounts it. It should be safe for a server to purge a client's lease immediately if there is no open or lock state associated with it. When an NFSv4.0 client unmounts, all files should be closed at that point, so the server can wait for the lease to expire and purge it normally. Or am I missing something? > Since there is > no destroy session/client with 4.0, the courteous server allows the > client to be around and becomes a courtesy client. So after awhile, > even with normal usage, there will be lots 4.0 courtesy clients > hanging around and these clients won't be destroyed until 24hrs > later, or until they cause conflicts with other clients. > > We can reduce the courtesy_client_expiry time for 4.0 clients from > 24hrs to 15/20 mins, enough for most network partition to heal?, > or limit the number of 4.0 courtesy clients. Or don't support 4.0 > clients at all which is my preference since I think in general users > should skip 4.0 and use 4.1 instead. > > -Dai -- Chuck Lever
On Mon, Nov 29, 2021 at 07:03:12PM +0000, Chuck Lever III wrote: > Hello Dai! > > > > On Nov 29, 2021, at 1:32 PM, Dai Ngo <dai.ngo@oracle.com> wrote: > > > > > > On 11/29/21 9:30 AM, J. Bruce Fields wrote: > >> On Mon, Nov 29, 2021 at 09:13:16AM -0800, dai.ngo@oracle.com wrote: > >>> Hi Bruce, > >>> > >>> On 11/21/21 7:04 PM, dai.ngo@oracle.com wrote: > >>>> On 11/17/21 4:34 PM, J. Bruce Fields wrote: > >>>>> On Wed, Nov 17, 2021 at 01:46:02PM -0800, dai.ngo@oracle.com wrote: > >>>>>> On 11/17/21 9:59 AM, dai.ngo@oracle.com wrote: > >>>>>>> On 11/17/21 6:14 AM, J. Bruce Fields wrote: > >>>>>>>> On Tue, Nov 16, 2021 at 03:06:32PM -0800, dai.ngo@oracle.com wrote: > >>>>>>>>> Just a reminder that this patch is still waiting for your review. > >>>>>>>> Yeah, I was procrastinating and hoping yo'ud figure out the pynfs > >>>>>>>> failure for me.... > >>>>>>> Last time I ran 4.0 OPEN18 test by itself and it passed. I will run > >>>>>>> all OPEN tests together with 5.15-rc7 to see if the problem you've > >>>>>>> seen still there. > >>>>>> I ran all tests in nfsv4.1 and nfsv4.0 with courteous and non-courteous > >>>>>> 5.15-rc7 server. > >>>>>> > >>>>>> Nfs4.1 results are the same for both courteous and > >>>>>> non-courteous server: > >>>>>>> Of those: 0 Skipped, 0 Failed, 0 Warned, 169 Passed > >>>>>> Results of nfs4.0 with non-courteous server: > >>>>>>> Of those: 8 Skipped, 1 Failed, 0 Warned, 577 Passed > >>>>>> test failed: LOCK24 > >>>>>> > >>>>>> Results of nfs4.0 with courteous server: > >>>>>>> Of those: 8 Skipped, 3 Failed, 0 Warned, 575 Passed > >>>>>> tests failed: LOCK24, OPEN18, OPEN30 > >>>>>> > >>>>>> OPEN18 and OPEN30 test pass if each is run by itself. > >>>>> Could well be a bug in the tests, I don't know. > >>>> The reason OPEN18 failed was because the test timed out waiting for > >>>> the reply of an OPEN call. The RPC connection used for the test was > >>>> configured with 15 secs timeout. Note that OPEN18 only fails when > >>>> the tests were run with 'all' option, this test passes if it's run > >>>> by itself. > >>>> > >>>> With courteous server, by the time OPEN18 runs, there are about 1026 > >>>> courtesy 4.0 clients on the server and all of these clients have opened > >>>> the same file X with WRITE access. These clients were created by the > >>>> previous tests. After each test completed, since 4.0 does not have > >>>> session, the client states are not cleaned up immediately on the > >>>> server and are allowed to become courtesy clients. > >>>> > >>>> When OPEN18 runs (about 20 minutes after the 1st test started), it > >>>> sends OPEN of file X with OPEN4_SHARE_DENY_WRITE which causes the > >>>> server to check for conflicts with courtesy clients. The loop that > >>>> checks 1026 courtesy clients for share/access conflict took less > >>>> than 1 sec. But it took about 55 secs, on my VM, for the server > >>>> to expire all 1026 courtesy clients. > >>>> > >>>> I modified pynfs to configure the 4.0 RPC connection with 60 seconds > >>>> timeout and OPEN18 now consistently passed. The 4.0 test results are > >>>> now the same for courteous and non-courteous server: > >>>> > >>>> 8 Skipped, 1 Failed, 0 Warned, 577 Passed > >>>> > >>>> Note that 4.1 tests do not suffer this timeout problem because the > >>>> 4.1 clients and sessions are destroyed after each test completes. > >>> Do you want me to send the patch to increase the timeout for pynfs? > >>> or is there any other things you think we should do? > >> I don't know. > >> > >> 55 seconds to clean up 1026 clients is about 50ms per client, which is > >> pretty slow. I wonder why. I guess it's probably updating the stable > >> storage information. Is /var/lib/nfs/ on your server backed by a hard > >> drive or an SSD or something else? > > > > My server is a virtualbox VM that has 1 CPU, 4GB RAM and 64GB of hard > > disk. I think a production system that supports this many clients should > > have faster CPUs, faster storage. > > > >> > >> I wonder if that's an argument for limiting the number of courtesy > >> clients. > > > > I think we might want to treat 4.0 clients a bit different from 4.1 > > clients. With 4.0, every client will become a courtesy client after > > the client is done with the export and unmounts it. > > It should be safe for a server to purge a client's lease immediately > if there is no open or lock state associated with it. > > When an NFSv4.0 client unmounts, all files should be closed at that > point, so the server can wait for the lease to expire and purge it > normally. Or am I missing something? Makes sense to me! > > Since there is > > no destroy session/client with 4.0, the courteous server allows the > > client to be around and becomes a courtesy client. So after awhile, > > even with normal usage, there will be lots 4.0 courtesy clients > > hanging around and these clients won't be destroyed until 24hrs > > later, or until they cause conflicts with other clients. > > > > We can reduce the courtesy_client_expiry time for 4.0 clients from > > 24hrs to 15/20 mins, enough for most network partition to heal?, > > or limit the number of 4.0 courtesy clients. Or don't support 4.0 > > clients at all which is my preference since I think in general users > > should skip 4.0 and use 4.1 instead. I'm also totally fine with leaving out 4.0, at least to start. --b.
On 11/29/21 11:03 AM, Chuck Lever III wrote: > Hello Dai! > > >> On Nov 29, 2021, at 1:32 PM, Dai Ngo <dai.ngo@oracle.com> wrote: >> >> >> On 11/29/21 9:30 AM, J. Bruce Fields wrote: >>> On Mon, Nov 29, 2021 at 09:13:16AM -0800, dai.ngo@oracle.com wrote: >>>> Hi Bruce, >>>> >>>> On 11/21/21 7:04 PM, dai.ngo@oracle.com wrote: >>>>> On 11/17/21 4:34 PM, J. Bruce Fields wrote: >>>>>> On Wed, Nov 17, 2021 at 01:46:02PM -0800, dai.ngo@oracle.com wrote: >>>>>>> On 11/17/21 9:59 AM, dai.ngo@oracle.com wrote: >>>>>>>> On 11/17/21 6:14 AM, J. Bruce Fields wrote: >>>>>>>>> On Tue, Nov 16, 2021 at 03:06:32PM -0800, dai.ngo@oracle.com wrote: >>>>>>>>>> Just a reminder that this patch is still waiting for your review. >>>>>>>>> Yeah, I was procrastinating and hoping yo'ud figure out the pynfs >>>>>>>>> failure for me.... >>>>>>>> Last time I ran 4.0 OPEN18 test by itself and it passed. I will run >>>>>>>> all OPEN tests together with 5.15-rc7 to see if the problem you've >>>>>>>> seen still there. >>>>>>> I ran all tests in nfsv4.1 and nfsv4.0 with courteous and non-courteous >>>>>>> 5.15-rc7 server. >>>>>>> >>>>>>> Nfs4.1 results are the same for both courteous and >>>>>>> non-courteous server: >>>>>>>> Of those: 0 Skipped, 0 Failed, 0 Warned, 169 Passed >>>>>>> Results of nfs4.0 with non-courteous server: >>>>>>>> Of those: 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>>>>>> test failed: LOCK24 >>>>>>> >>>>>>> Results of nfs4.0 with courteous server: >>>>>>>> Of those: 8 Skipped, 3 Failed, 0 Warned, 575 Passed >>>>>>> tests failed: LOCK24, OPEN18, OPEN30 >>>>>>> >>>>>>> OPEN18 and OPEN30 test pass if each is run by itself. >>>>>> Could well be a bug in the tests, I don't know. >>>>> The reason OPEN18 failed was because the test timed out waiting for >>>>> the reply of an OPEN call. The RPC connection used for the test was >>>>> configured with 15 secs timeout. Note that OPEN18 only fails when >>>>> the tests were run with 'all' option, this test passes if it's run >>>>> by itself. >>>>> >>>>> With courteous server, by the time OPEN18 runs, there are about 1026 >>>>> courtesy 4.0 clients on the server and all of these clients have opened >>>>> the same file X with WRITE access. These clients were created by the >>>>> previous tests. After each test completed, since 4.0 does not have >>>>> session, the client states are not cleaned up immediately on the >>>>> server and are allowed to become courtesy clients. >>>>> >>>>> When OPEN18 runs (about 20 minutes after the 1st test started), it >>>>> sends OPEN of file X with OPEN4_SHARE_DENY_WRITE which causes the >>>>> server to check for conflicts with courtesy clients. The loop that >>>>> checks 1026 courtesy clients for share/access conflict took less >>>>> than 1 sec. But it took about 55 secs, on my VM, for the server >>>>> to expire all 1026 courtesy clients. >>>>> >>>>> I modified pynfs to configure the 4.0 RPC connection with 60 seconds >>>>> timeout and OPEN18 now consistently passed. The 4.0 test results are >>>>> now the same for courteous and non-courteous server: >>>>> >>>>> 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>>>> >>>>> Note that 4.1 tests do not suffer this timeout problem because the >>>>> 4.1 clients and sessions are destroyed after each test completes. >>>> Do you want me to send the patch to increase the timeout for pynfs? >>>> or is there any other things you think we should do? >>> I don't know. >>> >>> 55 seconds to clean up 1026 clients is about 50ms per client, which is >>> pretty slow. I wonder why. I guess it's probably updating the stable >>> storage information. Is /var/lib/nfs/ on your server backed by a hard >>> drive or an SSD or something else? >> My server is a virtualbox VM that has 1 CPU, 4GB RAM and 64GB of hard >> disk. I think a production system that supports this many clients should >> have faster CPUs, faster storage. >> >>> I wonder if that's an argument for limiting the number of courtesy >>> clients. >> I think we might want to treat 4.0 clients a bit different from 4.1 >> clients. With 4.0, every client will become a courtesy client after >> the client is done with the export and unmounts it. > It should be safe for a server to purge a client's lease immediately > if there is no open or lock state associated with it. In this case, each client has opened files so there are open states associated with them. > > When an NFSv4.0 client unmounts, all files should be closed at that > point, I'm not sure pynfs does proper clean up after each subtest, I will check. There must be state associated with the client in order for it to become courtesy client. > so the server can wait for the lease to expire and purge it > normally. Or am I missing something? When 4.0 client lease expires and there are still states associated with the client then the server allows this client to become courtesy client. -Dai > > >> Since there is >> no destroy session/client with 4.0, the courteous server allows the >> client to be around and becomes a courtesy client. So after awhile, >> even with normal usage, there will be lots 4.0 courtesy clients >> hanging around and these clients won't be destroyed until 24hrs >> later, or until they cause conflicts with other clients. >> >> We can reduce the courtesy_client_expiry time for 4.0 clients from >> 24hrs to 15/20 mins, enough for most network partition to heal?, >> or limit the number of 4.0 courtesy clients. Or don't support 4.0 >> clients at all which is my preference since I think in general users >> should skip 4.0 and use 4.1 instead. >> >> -Dai > -- > Chuck Lever > > >
On 11/29/21 11:13 AM, Bruce Fields wrote: > On Mon, Nov 29, 2021 at 07:03:12PM +0000, Chuck Lever III wrote: >> Hello Dai! >> >> >>> On Nov 29, 2021, at 1:32 PM, Dai Ngo <dai.ngo@oracle.com> wrote: >>> >>> >>> On 11/29/21 9:30 AM, J. Bruce Fields wrote: >>>> On Mon, Nov 29, 2021 at 09:13:16AM -0800, dai.ngo@oracle.com wrote: >>>>> Hi Bruce, >>>>> >>>>> On 11/21/21 7:04 PM, dai.ngo@oracle.com wrote: >>>>>> On 11/17/21 4:34 PM, J. Bruce Fields wrote: >>>>>>> On Wed, Nov 17, 2021 at 01:46:02PM -0800, dai.ngo@oracle.com wrote: >>>>>>>> On 11/17/21 9:59 AM, dai.ngo@oracle.com wrote: >>>>>>>>> On 11/17/21 6:14 AM, J. Bruce Fields wrote: >>>>>>>>>> On Tue, Nov 16, 2021 at 03:06:32PM -0800, dai.ngo@oracle.com wrote: >>>>>>>>>>> Just a reminder that this patch is still waiting for your review. >>>>>>>>>> Yeah, I was procrastinating and hoping yo'ud figure out the pynfs >>>>>>>>>> failure for me.... >>>>>>>>> Last time I ran 4.0 OPEN18 test by itself and it passed. I will run >>>>>>>>> all OPEN tests together with 5.15-rc7 to see if the problem you've >>>>>>>>> seen still there. >>>>>>>> I ran all tests in nfsv4.1 and nfsv4.0 with courteous and non-courteous >>>>>>>> 5.15-rc7 server. >>>>>>>> >>>>>>>> Nfs4.1 results are the same for both courteous and >>>>>>>> non-courteous server: >>>>>>>>> Of those: 0 Skipped, 0 Failed, 0 Warned, 169 Passed >>>>>>>> Results of nfs4.0 with non-courteous server: >>>>>>>>> Of those: 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>>>>>>> test failed: LOCK24 >>>>>>>> >>>>>>>> Results of nfs4.0 with courteous server: >>>>>>>>> Of those: 8 Skipped, 3 Failed, 0 Warned, 575 Passed >>>>>>>> tests failed: LOCK24, OPEN18, OPEN30 >>>>>>>> >>>>>>>> OPEN18 and OPEN30 test pass if each is run by itself. >>>>>>> Could well be a bug in the tests, I don't know. >>>>>> The reason OPEN18 failed was because the test timed out waiting for >>>>>> the reply of an OPEN call. The RPC connection used for the test was >>>>>> configured with 15 secs timeout. Note that OPEN18 only fails when >>>>>> the tests were run with 'all' option, this test passes if it's run >>>>>> by itself. >>>>>> >>>>>> With courteous server, by the time OPEN18 runs, there are about 1026 >>>>>> courtesy 4.0 clients on the server and all of these clients have opened >>>>>> the same file X with WRITE access. These clients were created by the >>>>>> previous tests. After each test completed, since 4.0 does not have >>>>>> session, the client states are not cleaned up immediately on the >>>>>> server and are allowed to become courtesy clients. >>>>>> >>>>>> When OPEN18 runs (about 20 minutes after the 1st test started), it >>>>>> sends OPEN of file X with OPEN4_SHARE_DENY_WRITE which causes the >>>>>> server to check for conflicts with courtesy clients. The loop that >>>>>> checks 1026 courtesy clients for share/access conflict took less >>>>>> than 1 sec. But it took about 55 secs, on my VM, for the server >>>>>> to expire all 1026 courtesy clients. >>>>>> >>>>>> I modified pynfs to configure the 4.0 RPC connection with 60 seconds >>>>>> timeout and OPEN18 now consistently passed. The 4.0 test results are >>>>>> now the same for courteous and non-courteous server: >>>>>> >>>>>> 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>>>>> >>>>>> Note that 4.1 tests do not suffer this timeout problem because the >>>>>> 4.1 clients and sessions are destroyed after each test completes. >>>>> Do you want me to send the patch to increase the timeout for pynfs? >>>>> or is there any other things you think we should do? >>>> I don't know. >>>> >>>> 55 seconds to clean up 1026 clients is about 50ms per client, which is >>>> pretty slow. I wonder why. I guess it's probably updating the stable >>>> storage information. Is /var/lib/nfs/ on your server backed by a hard >>>> drive or an SSD or something else? >>> My server is a virtualbox VM that has 1 CPU, 4GB RAM and 64GB of hard >>> disk. I think a production system that supports this many clients should >>> have faster CPUs, faster storage. >>> >>>> I wonder if that's an argument for limiting the number of courtesy >>>> clients. >>> I think we might want to treat 4.0 clients a bit different from 4.1 >>> clients. With 4.0, every client will become a courtesy client after >>> the client is done with the export and unmounts it. >> It should be safe for a server to purge a client's lease immediately >> if there is no open or lock state associated with it. >> >> When an NFSv4.0 client unmounts, all files should be closed at that >> point, so the server can wait for the lease to expire and purge it >> normally. Or am I missing something? > Makes sense to me! > >>> Since there is >>> no destroy session/client with 4.0, the courteous server allows the >>> client to be around and becomes a courtesy client. So after awhile, >>> even with normal usage, there will be lots 4.0 courtesy clients >>> hanging around and these clients won't be destroyed until 24hrs >>> later, or until they cause conflicts with other clients. >>> >>> We can reduce the courtesy_client_expiry time for 4.0 clients from >>> 24hrs to 15/20 mins, enough for most network partition to heal?, >>> or limit the number of 4.0 courtesy clients. Or don't support 4.0 >>> clients at all which is my preference since I think in general users >>> should skip 4.0 and use 4.1 instead. > I'm also totally fine with leaving out 4.0, at least to start. Ok Bruce, I will submit v6 patch for this. Thanks, -Dai > > --b.
On 11/29/21 11:36 AM, dai.ngo@oracle.com wrote: > > On 11/29/21 11:03 AM, Chuck Lever III wrote: >> Hello Dai! >> >> >>> On Nov 29, 2021, at 1:32 PM, Dai Ngo <dai.ngo@oracle.com> wrote: >>> >>> >>> On 11/29/21 9:30 AM, J. Bruce Fields wrote: >>>> On Mon, Nov 29, 2021 at 09:13:16AM -0800, dai.ngo@oracle.com wrote: >>>>> Hi Bruce, >>>>> >>>>> On 11/21/21 7:04 PM, dai.ngo@oracle.com wrote: >>>>>> On 11/17/21 4:34 PM, J. Bruce Fields wrote: >>>>>>> On Wed, Nov 17, 2021 at 01:46:02PM -0800, dai.ngo@oracle.com wrote: >>>>>>>> On 11/17/21 9:59 AM, dai.ngo@oracle.com wrote: >>>>>>>>> On 11/17/21 6:14 AM, J. Bruce Fields wrote: >>>>>>>>>> On Tue, Nov 16, 2021 at 03:06:32PM -0800, dai.ngo@oracle.com >>>>>>>>>> wrote: >>>>>>>>>>> Just a reminder that this patch is still waiting for your >>>>>>>>>>> review. >>>>>>>>>> Yeah, I was procrastinating and hoping yo'ud figure out the >>>>>>>>>> pynfs >>>>>>>>>> failure for me.... >>>>>>>>> Last time I ran 4.0 OPEN18 test by itself and it passed. I >>>>>>>>> will run >>>>>>>>> all OPEN tests together with 5.15-rc7 to see if the problem >>>>>>>>> you've >>>>>>>>> seen still there. >>>>>>>> I ran all tests in nfsv4.1 and nfsv4.0 with courteous and >>>>>>>> non-courteous >>>>>>>> 5.15-rc7 server. >>>>>>>> >>>>>>>> Nfs4.1 results are the same for both courteous and >>>>>>>> non-courteous server: >>>>>>>>> Of those: 0 Skipped, 0 Failed, 0 Warned, 169 Passed >>>>>>>> Results of nfs4.0 with non-courteous server: >>>>>>>>> Of those: 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>>>>>>> test failed: LOCK24 >>>>>>>> >>>>>>>> Results of nfs4.0 with courteous server: >>>>>>>>> Of those: 8 Skipped, 3 Failed, 0 Warned, 575 Passed >>>>>>>> tests failed: LOCK24, OPEN18, OPEN30 >>>>>>>> >>>>>>>> OPEN18 and OPEN30 test pass if each is run by itself. >>>>>>> Could well be a bug in the tests, I don't know. >>>>>> The reason OPEN18 failed was because the test timed out waiting for >>>>>> the reply of an OPEN call. The RPC connection used for the test was >>>>>> configured with 15 secs timeout. Note that OPEN18 only fails when >>>>>> the tests were run with 'all' option, this test passes if it's run >>>>>> by itself. >>>>>> >>>>>> With courteous server, by the time OPEN18 runs, there are about 1026 >>>>>> courtesy 4.0 clients on the server and all of these clients have >>>>>> opened >>>>>> the same file X with WRITE access. These clients were created by the >>>>>> previous tests. After each test completed, since 4.0 does not have >>>>>> session, the client states are not cleaned up immediately on the >>>>>> server and are allowed to become courtesy clients. >>>>>> >>>>>> When OPEN18 runs (about 20 minutes after the 1st test started), it >>>>>> sends OPEN of file X with OPEN4_SHARE_DENY_WRITE which causes the >>>>>> server to check for conflicts with courtesy clients. The loop that >>>>>> checks 1026 courtesy clients for share/access conflict took less >>>>>> than 1 sec. But it took about 55 secs, on my VM, for the server >>>>>> to expire all 1026 courtesy clients. >>>>>> >>>>>> I modified pynfs to configure the 4.0 RPC connection with 60 seconds >>>>>> timeout and OPEN18 now consistently passed. The 4.0 test results are >>>>>> now the same for courteous and non-courteous server: >>>>>> >>>>>> 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>>>>> >>>>>> Note that 4.1 tests do not suffer this timeout problem because the >>>>>> 4.1 clients and sessions are destroyed after each test completes. >>>>> Do you want me to send the patch to increase the timeout for pynfs? >>>>> or is there any other things you think we should do? >>>> I don't know. >>>> >>>> 55 seconds to clean up 1026 clients is about 50ms per client, which is >>>> pretty slow. I wonder why. I guess it's probably updating the stable >>>> storage information. Is /var/lib/nfs/ on your server backed by a hard >>>> drive or an SSD or something else? >>> My server is a virtualbox VM that has 1 CPU, 4GB RAM and 64GB of hard >>> disk. I think a production system that supports this many clients >>> should >>> have faster CPUs, faster storage. >>> >>>> I wonder if that's an argument for limiting the number of courtesy >>>> clients. >>> I think we might want to treat 4.0 clients a bit different from 4.1 >>> clients. With 4.0, every client will become a courtesy client after >>> the client is done with the export and unmounts it. >> It should be safe for a server to purge a client's lease immediately >> if there is no open or lock state associated with it. > > In this case, each client has opened files so there are open states > associated with them. > >> >> When an NFSv4.0 client unmounts, all files should be closed at that >> point, > > I'm not sure pynfs does proper clean up after each subtest, I will > check. There must be state associated with the client in order for > it to become courtesy client. pynfs 4.0 test uses LOOKUP, OPEN with OPEN4_CREATE to create the test file and uses PUTFH and REMOVE to remove the test file when done. I don't see where the open state associated the removed file being freed by nfsd_remove. I guess for 4.0, the open state remains valid on the server until the client lease expires. I attached the pcap of OPEN18 test for reference. -Dai > >> so the server can wait for the lease to expire and purge it >> normally. Or am I missing something? > > When 4.0 client lease expires and there are still states associated > with the client then the server allows this client to become courtesy > client. > > -Dai > >> >> >>> Since there is >>> no destroy session/client with 4.0, the courteous server allows the >>> client to be around and becomes a courtesy client. So after awhile, >>> even with normal usage, there will be lots 4.0 courtesy clients >>> hanging around and these clients won't be destroyed until 24hrs >>> later, or until they cause conflicts with other clients. >>> >>> We can reduce the courtesy_client_expiry time for 4.0 clients from >>> 24hrs to 15/20 mins, enough for most network partition to heal?, >>> or limit the number of 4.0 courtesy clients. Or don't support 4.0 >>> clients at all which is my preference since I think in general users >>> should skip 4.0 and use 4.1 instead. >>> >>> -Dai >> -- >> Chuck Lever >> >> >>
> On Nov 29, 2021, at 2:36 PM, Dai Ngo <dai.ngo@oracle.com> wrote: > > > On 11/29/21 11:03 AM, Chuck Lever III wrote: >> Hello Dai! >> >> >>> On Nov 29, 2021, at 1:32 PM, Dai Ngo <dai.ngo@oracle.com> wrote: >>> >>> >>> On 11/29/21 9:30 AM, J. Bruce Fields wrote: >>>> On Mon, Nov 29, 2021 at 09:13:16AM -0800, dai.ngo@oracle.com wrote: >>>>> Hi Bruce, >>>>> >>>>> On 11/21/21 7:04 PM, dai.ngo@oracle.com wrote: >>>>>> On 11/17/21 4:34 PM, J. Bruce Fields wrote: >>>>>>> On Wed, Nov 17, 2021 at 01:46:02PM -0800, dai.ngo@oracle.com wrote: >>>>>>>> On 11/17/21 9:59 AM, dai.ngo@oracle.com wrote: >>>>>>>>> On 11/17/21 6:14 AM, J. Bruce Fields wrote: >>>>>>>>>> On Tue, Nov 16, 2021 at 03:06:32PM -0800, dai.ngo@oracle.com wrote: >>>>>>>>>>> Just a reminder that this patch is still waiting for your review. >>>>>>>>>> Yeah, I was procrastinating and hoping yo'ud figure out the pynfs >>>>>>>>>> failure for me.... >>>>>>>>> Last time I ran 4.0 OPEN18 test by itself and it passed. I will run >>>>>>>>> all OPEN tests together with 5.15-rc7 to see if the problem you've >>>>>>>>> seen still there. >>>>>>>> I ran all tests in nfsv4.1 and nfsv4.0 with courteous and non-courteous >>>>>>>> 5.15-rc7 server. >>>>>>>> >>>>>>>> Nfs4.1 results are the same for both courteous and >>>>>>>> non-courteous server: >>>>>>>>> Of those: 0 Skipped, 0 Failed, 0 Warned, 169 Passed >>>>>>>> Results of nfs4.0 with non-courteous server: >>>>>>>>> Of those: 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>>>>>>> test failed: LOCK24 >>>>>>>> >>>>>>>> Results of nfs4.0 with courteous server: >>>>>>>>> Of those: 8 Skipped, 3 Failed, 0 Warned, 575 Passed >>>>>>>> tests failed: LOCK24, OPEN18, OPEN30 >>>>>>>> >>>>>>>> OPEN18 and OPEN30 test pass if each is run by itself. >>>>>>> Could well be a bug in the tests, I don't know. >>>>>> The reason OPEN18 failed was because the test timed out waiting for >>>>>> the reply of an OPEN call. The RPC connection used for the test was >>>>>> configured with 15 secs timeout. Note that OPEN18 only fails when >>>>>> the tests were run with 'all' option, this test passes if it's run >>>>>> by itself. >>>>>> >>>>>> With courteous server, by the time OPEN18 runs, there are about 1026 >>>>>> courtesy 4.0 clients on the server and all of these clients have opened >>>>>> the same file X with WRITE access. These clients were created by the >>>>>> previous tests. After each test completed, since 4.0 does not have >>>>>> session, the client states are not cleaned up immediately on the >>>>>> server and are allowed to become courtesy clients. >>>>>> >>>>>> When OPEN18 runs (about 20 minutes after the 1st test started), it >>>>>> sends OPEN of file X with OPEN4_SHARE_DENY_WRITE which causes the >>>>>> server to check for conflicts with courtesy clients. The loop that >>>>>> checks 1026 courtesy clients for share/access conflict took less >>>>>> than 1 sec. But it took about 55 secs, on my VM, for the server >>>>>> to expire all 1026 courtesy clients. >>>>>> >>>>>> I modified pynfs to configure the 4.0 RPC connection with 60 seconds >>>>>> timeout and OPEN18 now consistently passed. The 4.0 test results are >>>>>> now the same for courteous and non-courteous server: >>>>>> >>>>>> 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>>>>> >>>>>> Note that 4.1 tests do not suffer this timeout problem because the >>>>>> 4.1 clients and sessions are destroyed after each test completes. >>>>> Do you want me to send the patch to increase the timeout for pynfs? >>>>> or is there any other things you think we should do? >>>> I don't know. >>>> >>>> 55 seconds to clean up 1026 clients is about 50ms per client, which is >>>> pretty slow. I wonder why. I guess it's probably updating the stable >>>> storage information. Is /var/lib/nfs/ on your server backed by a hard >>>> drive or an SSD or something else? >>> My server is a virtualbox VM that has 1 CPU, 4GB RAM and 64GB of hard >>> disk. I think a production system that supports this many clients should >>> have faster CPUs, faster storage. >>> >>>> I wonder if that's an argument for limiting the number of courtesy >>>> clients. >>> I think we might want to treat 4.0 clients a bit different from 4.1 >>> clients. With 4.0, every client will become a courtesy client after >>> the client is done with the export and unmounts it. >> It should be safe for a server to purge a client's lease immediately >> if there is no open or lock state associated with it. > > In this case, each client has opened files so there are open states > associated with them. > >> >> When an NFSv4.0 client unmounts, all files should be closed at that >> point, > > I'm not sure pynfs does proper clean up after each subtest, I will > check. There must be state associated with the client in order for > it to become courtesy client. Makes sense. Then a synthetic client like pynfs can DoS a courteous server. >> so the server can wait for the lease to expire and purge it >> normally. Or am I missing something? > > When 4.0 client lease expires and there are still states associated > with the client then the server allows this client to become courtesy > client. I think the same thing happens if an NFSv4.1 client neglects to send DESTROY_SESSION / DESTROY_CLIENTID. Either such a client is broken or malicious, but the server faces the same issue of protecting itself from a DoS attack. IMO you should consider limiting the number of courteous clients the server can hold onto. Let's say that number is 1000. When the server wants to turn a 1001st client into a courteous client, it can simply expire and purge the oldest courteous client on its list. Otherwise, over time, the 24-hour expiry will reduce the set of courteous clients back to zero. What do you think? >>> Since there is >>> no destroy session/client with 4.0, the courteous server allows the >>> client to be around and becomes a courtesy client. So after awhile, >>> even with normal usage, there will be lots 4.0 courtesy clients >>> hanging around and these clients won't be destroyed until 24hrs >>> later, or until they cause conflicts with other clients. >>> >>> We can reduce the courtesy_client_expiry time for 4.0 clients from >>> 24hrs to 15/20 mins, enough for most network partition to heal?, >>> or limit the number of 4.0 courtesy clients. Or don't support 4.0 >>> clients at all which is my preference since I think in general users >>> should skip 4.0 and use 4.1 instead. >>> >>> -Dai >> -- >> Chuck Lever >> >> >> -- Chuck Lever
On 11/29/21 1:10 PM, Chuck Lever III wrote: > >> On Nov 29, 2021, at 2:36 PM, Dai Ngo <dai.ngo@oracle.com> wrote: >> >> >> On 11/29/21 11:03 AM, Chuck Lever III wrote: >>> Hello Dai! >>> >>> >>>> On Nov 29, 2021, at 1:32 PM, Dai Ngo <dai.ngo@oracle.com> wrote: >>>> >>>> >>>> On 11/29/21 9:30 AM, J. Bruce Fields wrote: >>>>> On Mon, Nov 29, 2021 at 09:13:16AM -0800, dai.ngo@oracle.com wrote: >>>>>> Hi Bruce, >>>>>> >>>>>> On 11/21/21 7:04 PM, dai.ngo@oracle.com wrote: >>>>>>> On 11/17/21 4:34 PM, J. Bruce Fields wrote: >>>>>>>> On Wed, Nov 17, 2021 at 01:46:02PM -0800, dai.ngo@oracle.com wrote: >>>>>>>>> On 11/17/21 9:59 AM, dai.ngo@oracle.com wrote: >>>>>>>>>> On 11/17/21 6:14 AM, J. Bruce Fields wrote: >>>>>>>>>>> On Tue, Nov 16, 2021 at 03:06:32PM -0800, dai.ngo@oracle.com wrote: >>>>>>>>>>>> Just a reminder that this patch is still waiting for your review. >>>>>>>>>>> Yeah, I was procrastinating and hoping yo'ud figure out the pynfs >>>>>>>>>>> failure for me.... >>>>>>>>>> Last time I ran 4.0 OPEN18 test by itself and it passed. I will run >>>>>>>>>> all OPEN tests together with 5.15-rc7 to see if the problem you've >>>>>>>>>> seen still there. >>>>>>>>> I ran all tests in nfsv4.1 and nfsv4.0 with courteous and non-courteous >>>>>>>>> 5.15-rc7 server. >>>>>>>>> >>>>>>>>> Nfs4.1 results are the same for both courteous and >>>>>>>>> non-courteous server: >>>>>>>>>> Of those: 0 Skipped, 0 Failed, 0 Warned, 169 Passed >>>>>>>>> Results of nfs4.0 with non-courteous server: >>>>>>>>>> Of those: 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>>>>>>>> test failed: LOCK24 >>>>>>>>> >>>>>>>>> Results of nfs4.0 with courteous server: >>>>>>>>>> Of those: 8 Skipped, 3 Failed, 0 Warned, 575 Passed >>>>>>>>> tests failed: LOCK24, OPEN18, OPEN30 >>>>>>>>> >>>>>>>>> OPEN18 and OPEN30 test pass if each is run by itself. >>>>>>>> Could well be a bug in the tests, I don't know. >>>>>>> The reason OPEN18 failed was because the test timed out waiting for >>>>>>> the reply of an OPEN call. The RPC connection used for the test was >>>>>>> configured with 15 secs timeout. Note that OPEN18 only fails when >>>>>>> the tests were run with 'all' option, this test passes if it's run >>>>>>> by itself. >>>>>>> >>>>>>> With courteous server, by the time OPEN18 runs, there are about 1026 >>>>>>> courtesy 4.0 clients on the server and all of these clients have opened >>>>>>> the same file X with WRITE access. These clients were created by the >>>>>>> previous tests. After each test completed, since 4.0 does not have >>>>>>> session, the client states are not cleaned up immediately on the >>>>>>> server and are allowed to become courtesy clients. >>>>>>> >>>>>>> When OPEN18 runs (about 20 minutes after the 1st test started), it >>>>>>> sends OPEN of file X with OPEN4_SHARE_DENY_WRITE which causes the >>>>>>> server to check for conflicts with courtesy clients. The loop that >>>>>>> checks 1026 courtesy clients for share/access conflict took less >>>>>>> than 1 sec. But it took about 55 secs, on my VM, for the server >>>>>>> to expire all 1026 courtesy clients. >>>>>>> >>>>>>> I modified pynfs to configure the 4.0 RPC connection with 60 seconds >>>>>>> timeout and OPEN18 now consistently passed. The 4.0 test results are >>>>>>> now the same for courteous and non-courteous server: >>>>>>> >>>>>>> 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>>>>>> >>>>>>> Note that 4.1 tests do not suffer this timeout problem because the >>>>>>> 4.1 clients and sessions are destroyed after each test completes. >>>>>> Do you want me to send the patch to increase the timeout for pynfs? >>>>>> or is there any other things you think we should do? >>>>> I don't know. >>>>> >>>>> 55 seconds to clean up 1026 clients is about 50ms per client, which is >>>>> pretty slow. I wonder why. I guess it's probably updating the stable >>>>> storage information. Is /var/lib/nfs/ on your server backed by a hard >>>>> drive or an SSD or something else? >>>> My server is a virtualbox VM that has 1 CPU, 4GB RAM and 64GB of hard >>>> disk. I think a production system that supports this many clients should >>>> have faster CPUs, faster storage. >>>> >>>>> I wonder if that's an argument for limiting the number of courtesy >>>>> clients. >>>> I think we might want to treat 4.0 clients a bit different from 4.1 >>>> clients. With 4.0, every client will become a courtesy client after >>>> the client is done with the export and unmounts it. >>> It should be safe for a server to purge a client's lease immediately >>> if there is no open or lock state associated with it. >> In this case, each client has opened files so there are open states >> associated with them. >> >>> When an NFSv4.0 client unmounts, all files should be closed at that >>> point, >> I'm not sure pynfs does proper clean up after each subtest, I will >> check. There must be state associated with the client in order for >> it to become courtesy client. > Makes sense. Then a synthetic client like pynfs can DoS a courteous > server. > > >>> so the server can wait for the lease to expire and purge it >>> normally. Or am I missing something? >> When 4.0 client lease expires and there are still states associated >> with the client then the server allows this client to become courtesy >> client. > I think the same thing happens if an NFSv4.1 client neglects to send > DESTROY_SESSION / DESTROY_CLIENTID. Either such a client is broken > or malicious, but the server faces the same issue of protecting > itself from a DoS attack. > > IMO you should consider limiting the number of courteous clients > the server can hold onto. Let's say that number is 1000. When the > server wants to turn a 1001st client into a courteous client, it > can simply expire and purge the oldest courteous client on its > list. Otherwise, over time, the 24-hour expiry will reduce the > set of courteous clients back to zero. > > What do you think? Limiting the number of courteous clients to handle the cases of broken/malicious 4.1 clients seems reasonable as the last resort. I think if a malicious 4.1 clients could mount the server's export, opens a file (to create state) and repeats the same with a different client id then it seems like some basic security was already broken; allowing unauthorized clients to mount server's exports. I think if we have to enforce a limit, then it's only for handling of seriously buggy 4.1 clients which should not be the norm. The issue with this is how to pick an optimal number that is suitable for the running server which can be a very slow or a very fast server. Note that even if we impose an limit, that does not completely solve the problem with pynfs 4.0 test since its RPC timeout is configured with 15 secs which just enough to expire 277 clients based on 53ms for each client, unless we limit it ~270 clients which I think it's too low. This is what I plan to do: 1. do not support 4.0 courteous clients, for sure. 2. limit the number of courteous clients to 1000 (?), if you still think we need it. Pls let me know what you think. Thanks, -Dai > > >>>> Since there is >>>> no destroy session/client with 4.0, the courteous server allows the >>>> client to be around and becomes a courtesy client. So after awhile, >>>> even with normal usage, there will be lots 4.0 courtesy clients >>>> hanging around and these clients won't be destroyed until 24hrs >>>> later, or until they cause conflicts with other clients. >>>> >>>> We can reduce the courtesy_client_expiry time for 4.0 clients from >>>> 24hrs to 15/20 mins, enough for most network partition to heal?, >>>> or limit the number of 4.0 courtesy clients. Or don't support 4.0 >>>> clients at all which is my preference since I think in general users >>>> should skip 4.0 and use 4.1 instead. >>>> >>>> -Dai >>> -- >>> Chuck Lever >>> >>> >>> > -- > Chuck Lever > > >
> On Nov 29, 2021, at 7:11 PM, Dai Ngo <dai.ngo@oracle.com> wrote: > > >> On 11/29/21 1:10 PM, Chuck Lever III wrote: >> >>>> On Nov 29, 2021, at 2:36 PM, Dai Ngo <dai.ngo@oracle.com> wrote: >>> >>> >>> On 11/29/21 11:03 AM, Chuck Lever III wrote: >>>> Hello Dai! >>>> >>>> >>>>> On Nov 29, 2021, at 1:32 PM, Dai Ngo <dai.ngo@oracle.com> wrote: >>>>> >>>>> >>>>> On 11/29/21 9:30 AM, J. Bruce Fields wrote: >>>>>> On Mon, Nov 29, 2021 at 09:13:16AM -0800, dai.ngo@oracle.com wrote: >>>>>>> Hi Bruce, >>>>>>> >>>>>>> On 11/21/21 7:04 PM, dai.ngo@oracle.com wrote: >>>>>>>> On 11/17/21 4:34 PM, J. Bruce Fields wrote: >>>>>>>>> On Wed, Nov 17, 2021 at 01:46:02PM -0800, dai.ngo@oracle.com wrote: >>>>>>>>>> On 11/17/21 9:59 AM, dai.ngo@oracle.com wrote: >>>>>>>>>>> On 11/17/21 6:14 AM, J. Bruce Fields wrote: >>>>>>>>>>>> On Tue, Nov 16, 2021 at 03:06:32PM -0800, dai.ngo@oracle.com wrote: >>>>>>>>>>>>> Just a reminder that this patch is still waiting for your review. >>>>>>>>>>>> Yeah, I was procrastinating and hoping yo'ud figure out the pynfs >>>>>>>>>>>> failure for me.... >>>>>>>>>>> Last time I ran 4.0 OPEN18 test by itself and it passed. I will run >>>>>>>>>>> all OPEN tests together with 5.15-rc7 to see if the problem you've >>>>>>>>>>> seen still there. >>>>>>>>>> I ran all tests in nfsv4.1 and nfsv4.0 with courteous and non-courteous >>>>>>>>>> 5.15-rc7 server. >>>>>>>>>> >>>>>>>>>> Nfs4.1 results are the same for both courteous and >>>>>>>>>> non-courteous server: >>>>>>>>>>> Of those: 0 Skipped, 0 Failed, 0 Warned, 169 Passed >>>>>>>>>> Results of nfs4.0 with non-courteous server: >>>>>>>>>>> Of those: 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>>>>>>>>> test failed: LOCK24 >>>>>>>>>> >>>>>>>>>> Results of nfs4.0 with courteous server: >>>>>>>>>>> Of those: 8 Skipped, 3 Failed, 0 Warned, 575 Passed >>>>>>>>>> tests failed: LOCK24, OPEN18, OPEN30 >>>>>>>>>> >>>>>>>>>> OPEN18 and OPEN30 test pass if each is run by itself. >>>>>>>>> Could well be a bug in the tests, I don't know. >>>>>>>> The reason OPEN18 failed was because the test timed out waiting for >>>>>>>> the reply of an OPEN call. The RPC connection used for the test was >>>>>>>> configured with 15 secs timeout. Note that OPEN18 only fails when >>>>>>>> the tests were run with 'all' option, this test passes if it's run >>>>>>>> by itself. >>>>>>>> >>>>>>>> With courteous server, by the time OPEN18 runs, there are about 1026 >>>>>>>> courtesy 4.0 clients on the server and all of these clients have opened >>>>>>>> the same file X with WRITE access. These clients were created by the >>>>>>>> previous tests. After each test completed, since 4.0 does not have >>>>>>>> session, the client states are not cleaned up immediately on the >>>>>>>> server and are allowed to become courtesy clients. >>>>>>>> >>>>>>>> When OPEN18 runs (about 20 minutes after the 1st test started), it >>>>>>>> sends OPEN of file X with OPEN4_SHARE_DENY_WRITE which causes the >>>>>>>> server to check for conflicts with courtesy clients. The loop that >>>>>>>> checks 1026 courtesy clients for share/access conflict took less >>>>>>>> than 1 sec. But it took about 55 secs, on my VM, for the server >>>>>>>> to expire all 1026 courtesy clients. >>>>>>>> >>>>>>>> I modified pynfs to configure the 4.0 RPC connection with 60 seconds >>>>>>>> timeout and OPEN18 now consistently passed. The 4.0 test results are >>>>>>>> now the same for courteous and non-courteous server: >>>>>>>> >>>>>>>> 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>>>>>>> >>>>>>>> Note that 4.1 tests do not suffer this timeout problem because the >>>>>>>> 4.1 clients and sessions are destroyed after each test completes. >>>>>>> Do you want me to send the patch to increase the timeout for pynfs? >>>>>>> or is there any other things you think we should do? >>>>>> I don't know. >>>>>> >>>>>> 55 seconds to clean up 1026 clients is about 50ms per client, which is >>>>>> pretty slow. I wonder why. I guess it's probably updating the stable >>>>>> storage information. Is /var/lib/nfs/ on your server backed by a hard >>>>>> drive or an SSD or something else? >>>>> My server is a virtualbox VM that has 1 CPU, 4GB RAM and 64GB of hard >>>>> disk. I think a production system that supports this many clients should >>>>> have faster CPUs, faster storage. >>>>> >>>>>> I wonder if that's an argument for limiting the number of courtesy >>>>>> clients. >>>>> I think we might want to treat 4.0 clients a bit different from 4.1 >>>>> clients. With 4.0, every client will become a courtesy client after >>>>> the client is done with the export and unmounts it. >>>> It should be safe for a server to purge a client's lease immediately >>>> if there is no open or lock state associated with it. >>> In this case, each client has opened files so there are open states >>> associated with them. >>> >>>> When an NFSv4.0 client unmounts, all files should be closed at that >>>> point, >>> I'm not sure pynfs does proper clean up after each subtest, I will >>> check. There must be state associated with the client in order for >>> it to become courtesy client. >> Makes sense. Then a synthetic client like pynfs can DoS a courteous >> server. >> >> >>>> so the server can wait for the lease to expire and purge it >>>> normally. Or am I missing something? >>> When 4.0 client lease expires and there are still states associated >>> with the client then the server allows this client to become courtesy >>> client. >> I think the same thing happens if an NFSv4.1 client neglects to send >> DESTROY_SESSION / DESTROY_CLIENTID. Either such a client is broken >> or malicious, but the server faces the same issue of protecting >> itself from a DoS attack. >> >> IMO you should consider limiting the number of courteous clients >> the server can hold onto. Let's say that number is 1000. When the >> server wants to turn a 1001st client into a courteous client, it >> can simply expire and purge the oldest courteous client on its >> list. Otherwise, over time, the 24-hour expiry will reduce the >> set of courteous clients back to zero. >> >> What do you think? > > Limiting the number of courteous clients to handle the cases of > broken/malicious 4.1 clients seems reasonable as the last resort. > > I think if a malicious 4.1 clients could mount the server's export, > opens a file (to create state) and repeats the same with a different > client id then it seems like some basic security was already broken; > allowing unauthorized clients to mount server's exports. You can do this today with AUTH_SYS. I consider it a genuine attack surface. > I think if we have to enforce a limit, then it's only for handling > of seriously buggy 4.1 clients which should not be the norm. The > issue with this is how to pick an optimal number that is suitable > for the running server which can be a very slow or a very fast server. > > Note that even if we impose an limit, that does not completely solve > the problem with pynfs 4.0 test since its RPC timeout is configured > with 15 secs which just enough to expire 277 clients based on 53ms > for each client, unless we limit it ~270 clients which I think it's > too low. > > This is what I plan to do: > > 1. do not support 4.0 courteous clients, for sure. Not supporting 4.0 isn’t an option, IMHO. It is a fully supported protocol at this time, and the same exposure exists for 4.1, it’s just a little harder to exploit. If you submit the courteous server patch without support for 4.0, I think it needs to include a plan for how 4.0 will be added later. > 2. limit the number of courteous clients to 1000 (?), if you still > think we need it. I think this limit is necessary. It can be set based on the server’s physical memory size if a dynamic limit is desired. > Pls let me know what you think. > > Thanks, > -Dai > >> >> >>>>> Since there is >>>>> no destroy session/client with 4.0, the courteous server allows the >>>>> client to be around and becomes a courtesy client. So after awhile, >>>>> even with normal usage, there will be lots 4.0 courtesy clients >>>>> hanging around and these clients won't be destroyed until 24hrs >>>>> later, or until they cause conflicts with other clients. >>>>> >>>>> We can reduce the courtesy_client_expiry time for 4.0 clients from >>>>> 24hrs to 15/20 mins, enough for most network partition to heal?, >>>>> or limit the number of 4.0 courtesy clients. Or don't support 4.0 >>>>> clients at all which is my preference since I think in general users >>>>> should skip 4.0 and use 4.1 instead. >>>>> >>>>> -Dai >>>> -- >>>> Chuck Lever >>>> >>>> >>>> >> -- >> Chuck Lever >> >> >>
On Tue, 2021-11-30 at 01:42 +0000, Chuck Lever III wrote: > > > On Nov 29, 2021, at 7:11 PM, Dai Ngo <dai.ngo@oracle.com> wrote: > > > > > > > On 11/29/21 1:10 PM, Chuck Lever III wrote: > > > > > > > > On Nov 29, 2021, at 2:36 PM, Dai Ngo <dai.ngo@oracle.com> > > > > > wrote: > > > > > > > > > > > > On 11/29/21 11:03 AM, Chuck Lever III wrote: > > > > > Hello Dai! > > > > > > > > > > > > > > > > On Nov 29, 2021, at 1:32 PM, Dai Ngo <dai.ngo@oracle.com> > > > > > > wrote: > > > > > > > > > > > > > > > > > > On 11/29/21 9:30 AM, J. Bruce Fields wrote: > > > > > > > On Mon, Nov 29, 2021 at 09:13:16AM -0800, > > > > > > > dai.ngo@oracle.com wrote: > > > > > > > > Hi Bruce, > > > > > > > > > > > > > > > > On 11/21/21 7:04 PM, dai.ngo@oracle.com wrote: > > > > > > > > > On 11/17/21 4:34 PM, J. Bruce Fields wrote: > > > > > > > > > > On Wed, Nov 17, 2021 at 01:46:02PM -0800, > > > > > > > > > > dai.ngo@oracle.com wrote: > > > > > > > > > > > On 11/17/21 9:59 AM, dai.ngo@oracle.com wrote: > > > > > > > > > > > > On 11/17/21 6:14 AM, J. Bruce Fields wrote: > > > > > > > > > > > > > On Tue, Nov 16, 2021 at 03:06:32PM -0800, > > > > > > > > > > > > > dai.ngo@oracle.com wrote: > > > > > > > > > > > > > > Just a reminder that this patch is still > > > > > > > > > > > > > > waiting for your review. > > > > > > > > > > > > > Yeah, I was procrastinating and hoping yo'ud > > > > > > > > > > > > > figure out the pynfs > > > > > > > > > > > > > failure for me.... > > > > > > > > > > > > Last time I ran 4.0 OPEN18 test by itself and > > > > > > > > > > > > it passed. I will run > > > > > > > > > > > > all OPEN tests together with 5.15-rc7 to see if > > > > > > > > > > > > the problem you've > > > > > > > > > > > > seen still there. > > > > > > > > > > > I ran all tests in nfsv4.1 and nfsv4.0 with > > > > > > > > > > > courteous and non-courteous > > > > > > > > > > > 5.15-rc7 server. > > > > > > > > > > > > > > > > > > > > > > Nfs4.1 results are the same for both courteous > > > > > > > > > > > and > > > > > > > > > > > non-courteous server: > > > > > > > > > > > > Of those: 0 Skipped, 0 Failed, 0 Warned, 169 > > > > > > > > > > > > Passed > > > > > > > > > > > Results of nfs4.0 with non-courteous server: > > > > > > > > > > > > Of those: 8 Skipped, 1 Failed, 0 Warned, 577 > > > > > > > > > > > > Passed > > > > > > > > > > > test failed: LOCK24 > > > > > > > > > > > > > > > > > > > > > > Results of nfs4.0 with courteous server: > > > > > > > > > > > > Of those: 8 Skipped, 3 Failed, 0 Warned, 575 > > > > > > > > > > > > Passed > > > > > > > > > > > tests failed: LOCK24, OPEN18, OPEN30 > > > > > > > > > > > > > > > > > > > > > > OPEN18 and OPEN30 test pass if each is run by > > > > > > > > > > > itself. > > > > > > > > > > Could well be a bug in the tests, I don't know. > > > > > > > > > The reason OPEN18 failed was because the test timed > > > > > > > > > out waiting for > > > > > > > > > the reply of an OPEN call. The RPC connection used > > > > > > > > > for the test was > > > > > > > > > configured with 15 secs timeout. Note that OPEN18 > > > > > > > > > only fails when > > > > > > > > > the tests were run with 'all' option, this test > > > > > > > > > passes if it's run > > > > > > > > > by itself. > > > > > > > > > > > > > > > > > > With courteous server, by the time OPEN18 runs, there > > > > > > > > > are about 1026 > > > > > > > > > courtesy 4.0 clients on the server and all of these > > > > > > > > > clients have opened > > > > > > > > > the same file X with WRITE access. These clients were > > > > > > > > > created by the > > > > > > > > > previous tests. After each test completed, since 4.0 > > > > > > > > > does not have > > > > > > > > > session, the client states are not cleaned up > > > > > > > > > immediately on the > > > > > > > > > server and are allowed to become courtesy clients. > > > > > > > > > > > > > > > > > > When OPEN18 runs (about 20 minutes after the 1st test > > > > > > > > > started), it > > > > > > > > > sends OPEN of file X with OPEN4_SHARE_DENY_WRITE > > > > > > > > > which causes the > > > > > > > > > server to check for conflicts with courtesy clients. > > > > > > > > > The loop that > > > > > > > > > checks 1026 courtesy clients for share/access > > > > > > > > > conflict took less > > > > > > > > > than 1 sec. But it took about 55 secs, on my VM, for > > > > > > > > > the server > > > > > > > > > to expire all 1026 courtesy clients. > > > > > > > > > > > > > > > > > > I modified pynfs to configure the 4.0 RPC connection > > > > > > > > > with 60 seconds > > > > > > > > > timeout and OPEN18 now consistently passed. The 4.0 > > > > > > > > > test results are > > > > > > > > > now the same for courteous and non-courteous server: > > > > > > > > > > > > > > > > > > 8 Skipped, 1 Failed, 0 Warned, 577 Passed > > > > > > > > > > > > > > > > > > Note that 4.1 tests do not suffer this timeout > > > > > > > > > problem because the > > > > > > > > > 4.1 clients and sessions are destroyed after each > > > > > > > > > test completes. > > > > > > > > Do you want me to send the patch to increase the > > > > > > > > timeout for pynfs? > > > > > > > > or is there any other things you think we should do? > > > > > > > I don't know. > > > > > > > > > > > > > > 55 seconds to clean up 1026 clients is about 50ms per > > > > > > > client, which is > > > > > > > pretty slow. I wonder why. I guess it's probably > > > > > > > updating the stable > > > > > > > storage information. Is /var/lib/nfs/ on your server > > > > > > > backed by a hard > > > > > > > drive or an SSD or something else? > > > > > > My server is a virtualbox VM that has 1 CPU, 4GB RAM and > > > > > > 64GB of hard > > > > > > disk. I think a production system that supports this many > > > > > > clients should > > > > > > have faster CPUs, faster storage. > > > > > > > > > > > > > I wonder if that's an argument for limiting the number of > > > > > > > courtesy > > > > > > > clients. > > > > > > I think we might want to treat 4.0 clients a bit different > > > > > > from 4.1 > > > > > > clients. With 4.0, every client will become a courtesy > > > > > > client after > > > > > > the client is done with the export and unmounts it. > > > > > It should be safe for a server to purge a client's lease > > > > > immediately > > > > > if there is no open or lock state associated with it. > > > > In this case, each client has opened files so there are open > > > > states > > > > associated with them. > > > > > > > > > When an NFSv4.0 client unmounts, all files should be closed > > > > > at that > > > > > point, > > > > I'm not sure pynfs does proper clean up after each subtest, I > > > > will > > > > check. There must be state associated with the client in order > > > > for > > > > it to become courtesy client. > > > Makes sense. Then a synthetic client like pynfs can DoS a > > > courteous > > > server. > > > > > > > > > > > so the server can wait for the lease to expire and purge it > > > > > normally. Or am I missing something? > > > > When 4.0 client lease expires and there are still states > > > > associated > > > > with the client then the server allows this client to become > > > > courtesy > > > > client. > > > I think the same thing happens if an NFSv4.1 client neglects to > > > send > > > DESTROY_SESSION / DESTROY_CLIENTID. Either such a client is > > > broken > > > or malicious, but the server faces the same issue of protecting > > > itself from a DoS attack. > > > > > > IMO you should consider limiting the number of courteous clients > > > the server can hold onto. Let's say that number is 1000. When the > > > server wants to turn a 1001st client into a courteous client, it > > > can simply expire and purge the oldest courteous client on its > > > list. Otherwise, over time, the 24-hour expiry will reduce the > > > set of courteous clients back to zero. > > > > > > What do you think? > > > > Limiting the number of courteous clients to handle the cases of > > broken/malicious 4.1 clients seems reasonable as the last resort. > > > > I think if a malicious 4.1 clients could mount the server's export, > > opens a file (to create state) and repeats the same with a > > different > > client id then it seems like some basic security was already > > broken; > > allowing unauthorized clients to mount server's exports. > > You can do this today with AUTH_SYS. I consider it a genuine attack > surface. > > > > I think if we have to enforce a limit, then it's only for handling > > of seriously buggy 4.1 clients which should not be the norm. The > > issue with this is how to pick an optimal number that is suitable > > for the running server which can be a very slow or a very fast > > server. > > > > Note that even if we impose an limit, that does not completely > > solve > > the problem with pynfs 4.0 test since its RPC timeout is configured > > with 15 secs which just enough to expire 277 clients based on 53ms > > for each client, unless we limit it ~270 clients which I think it's > > too low. > > > > This is what I plan to do: > > > > 1. do not support 4.0 courteous clients, for sure. > > Not supporting 4.0 isn’t an option, IMHO. It is a fully supported > protocol at this time, and the same exposure exists for 4.1, it’s > just a little harder to exploit. > > If you submit the courteous server patch without support for 4.0, I > think it needs to include a plan for how 4.0 will be added later. > > > Why is there a problem here? The requirements are the same for 4.0 and 4.1 (or 4.2). If the lease under which the courtesy lock was established has expired, then that courtesy lock must be released if some other client requests a lock that conflicts with the cached lock (unless the client breaks the courtesy framework by renewing that original lease before the conflict occurs). Otherwise, it is completely up to the server when it decides to actually release the lock. For NFSv4.1 and NFSv4.2, we have DESTROY_CLIENTID, which tells the server when the client is actually done with the lease, making it easy to determine when it is safe to release all the courtesy locks. However if the client does not send DESTROY_CLIENTID, then we're in the same situation with 4.x (x>0) as we would be with bog standard NFSv4.0. The lease has expired, and so the courtesy locks are liable to being dropped. At Hammerspace we have implemented courtesy locks, and our strategy is that when a conflict occurs, we drop the entire set of courtesy locks so that we don't have to deal with the "some locks were revoked" scenario. The reason is that when we originally implemented courtesy locks, the Linux NFSv4 client support for lock revocation was a lot less sophisticated than today. My suggestion is that you might therefore consider starting along this path, and then refining the support to make revocation more nuanced once you are confident that the coarser strategy is working as expected.
> On Nov 29, 2021, at 11:08 PM, Trond Myklebust <trondmy@hammerspace.com> wrote: > > On Tue, 2021-11-30 at 01:42 +0000, Chuck Lever III wrote: >> >>>> On Nov 29, 2021, at 7:11 PM, Dai Ngo <dai.ngo@oracle.com> wrote: >>> >>> >>>> On 11/29/21 1:10 PM, Chuck Lever III wrote: >>>> >>>>>> On Nov 29, 2021, at 2:36 PM, Dai Ngo <dai.ngo@oracle.com> >>>>>> wrote: >>>>> >>>>> >>>>> On 11/29/21 11:03 AM, Chuck Lever III wrote: >>>>>> Hello Dai! >>>>>> >>>>>> >>>>>>> On Nov 29, 2021, at 1:32 PM, Dai Ngo <dai.ngo@oracle.com> >>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> On 11/29/21 9:30 AM, J. Bruce Fields wrote: >>>>>>>> On Mon, Nov 29, 2021 at 09:13:16AM -0800, >>>>>>>> dai.ngo@oracle.com wrote: >>>>>>>>> Hi Bruce, >>>>>>>>> >>>>>>>>> On 11/21/21 7:04 PM, dai.ngo@oracle.com wrote: >>>>>>>>>> On 11/17/21 4:34 PM, J. Bruce Fields wrote: >>>>>>>>>>> On Wed, Nov 17, 2021 at 01:46:02PM -0800, >>>>>>>>>>> dai.ngo@oracle.com wrote: >>>>>>>>>>>> On 11/17/21 9:59 AM, dai.ngo@oracle.com wrote: >>>>>>>>>>>>> On 11/17/21 6:14 AM, J. Bruce Fields wrote: >>>>>>>>>>>>>> On Tue, Nov 16, 2021 at 03:06:32PM -0800, >>>>>>>>>>>>>> dai.ngo@oracle.com wrote: >>>>>>>>>>>>>>> Just a reminder that this patch is still >>>>>>>>>>>>>>> waiting for your review. >>>>>>>>>>>>>> Yeah, I was procrastinating and hoping yo'ud >>>>>>>>>>>>>> figure out the pynfs >>>>>>>>>>>>>> failure for me.... >>>>>>>>>>>>> Last time I ran 4.0 OPEN18 test by itself and >>>>>>>>>>>>> it passed. I will run >>>>>>>>>>>>> all OPEN tests together with 5.15-rc7 to see if >>>>>>>>>>>>> the problem you've >>>>>>>>>>>>> seen still there. >>>>>>>>>>>> I ran all tests in nfsv4.1 and nfsv4.0 with >>>>>>>>>>>> courteous and non-courteous >>>>>>>>>>>> 5.15-rc7 server. >>>>>>>>>>>> >>>>>>>>>>>> Nfs4.1 results are the same for both courteous >>>>>>>>>>>> and >>>>>>>>>>>> non-courteous server: >>>>>>>>>>>>> Of those: 0 Skipped, 0 Failed, 0 Warned, 169 >>>>>>>>>>>>> Passed >>>>>>>>>>>> Results of nfs4.0 with non-courteous server: >>>>>>>>>>>>> Of those: 8 Skipped, 1 Failed, 0 Warned, 577 >>>>>>>>>>>>> Passed >>>>>>>>>>>> test failed: LOCK24 >>>>>>>>>>>> >>>>>>>>>>>> Results of nfs4.0 with courteous server: >>>>>>>>>>>>> Of those: 8 Skipped, 3 Failed, 0 Warned, 575 >>>>>>>>>>>>> Passed >>>>>>>>>>>> tests failed: LOCK24, OPEN18, OPEN30 >>>>>>>>>>>> >>>>>>>>>>>> OPEN18 and OPEN30 test pass if each is run by >>>>>>>>>>>> itself. >>>>>>>>>>> Could well be a bug in the tests, I don't know. >>>>>>>>>> The reason OPEN18 failed was because the test timed >>>>>>>>>> out waiting for >>>>>>>>>> the reply of an OPEN call. The RPC connection used >>>>>>>>>> for the test was >>>>>>>>>> configured with 15 secs timeout. Note that OPEN18 >>>>>>>>>> only fails when >>>>>>>>>> the tests were run with 'all' option, this test >>>>>>>>>> passes if it's run >>>>>>>>>> by itself. >>>>>>>>>> >>>>>>>>>> With courteous server, by the time OPEN18 runs, there >>>>>>>>>> are about 1026 >>>>>>>>>> courtesy 4.0 clients on the server and all of these >>>>>>>>>> clients have opened >>>>>>>>>> the same file X with WRITE access. These clients were >>>>>>>>>> created by the >>>>>>>>>> previous tests. After each test completed, since 4.0 >>>>>>>>>> does not have >>>>>>>>>> session, the client states are not cleaned up >>>>>>>>>> immediately on the >>>>>>>>>> server and are allowed to become courtesy clients. >>>>>>>>>> >>>>>>>>>> When OPEN18 runs (about 20 minutes after the 1st test >>>>>>>>>> started), it >>>>>>>>>> sends OPEN of file X with OPEN4_SHARE_DENY_WRITE >>>>>>>>>> which causes the >>>>>>>>>> server to check for conflicts with courtesy clients. >>>>>>>>>> The loop that >>>>>>>>>> checks 1026 courtesy clients for share/access >>>>>>>>>> conflict took less >>>>>>>>>> than 1 sec. But it took about 55 secs, on my VM, for >>>>>>>>>> the server >>>>>>>>>> to expire all 1026 courtesy clients. >>>>>>>>>> >>>>>>>>>> I modified pynfs to configure the 4.0 RPC connection >>>>>>>>>> with 60 seconds >>>>>>>>>> timeout and OPEN18 now consistently passed. The 4.0 >>>>>>>>>> test results are >>>>>>>>>> now the same for courteous and non-courteous server: >>>>>>>>>> >>>>>>>>>> 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>>>>>>>>> >>>>>>>>>> Note that 4.1 tests do not suffer this timeout >>>>>>>>>> problem because the >>>>>>>>>> 4.1 clients and sessions are destroyed after each >>>>>>>>>> test completes. >>>>>>>>> Do you want me to send the patch to increase the >>>>>>>>> timeout for pynfs? >>>>>>>>> or is there any other things you think we should do? >>>>>>>> I don't know. >>>>>>>> >>>>>>>> 55 seconds to clean up 1026 clients is about 50ms per >>>>>>>> client, which is >>>>>>>> pretty slow. I wonder why. I guess it's probably >>>>>>>> updating the stable >>>>>>>> storage information. Is /var/lib/nfs/ on your server >>>>>>>> backed by a hard >>>>>>>> drive or an SSD or something else? >>>>>>> My server is a virtualbox VM that has 1 CPU, 4GB RAM and >>>>>>> 64GB of hard >>>>>>> disk. I think a production system that supports this many >>>>>>> clients should >>>>>>> have faster CPUs, faster storage. >>>>>>> >>>>>>>> I wonder if that's an argument for limiting the number of >>>>>>>> courtesy >>>>>>>> clients. >>>>>>> I think we might want to treat 4.0 clients a bit different >>>>>>> from 4.1 >>>>>>> clients. With 4.0, every client will become a courtesy >>>>>>> client after >>>>>>> the client is done with the export and unmounts it. >>>>>> It should be safe for a server to purge a client's lease >>>>>> immediately >>>>>> if there is no open or lock state associated with it. >>>>> In this case, each client has opened files so there are open >>>>> states >>>>> associated with them. >>>>> >>>>>> When an NFSv4.0 client unmounts, all files should be closed >>>>>> at that >>>>>> point, >>>>> I'm not sure pynfs does proper clean up after each subtest, I >>>>> will >>>>> check. There must be state associated with the client in order >>>>> for >>>>> it to become courtesy client. >>>> Makes sense. Then a synthetic client like pynfs can DoS a >>>> courteous >>>> server. >>>> >>>> >>>>>> so the server can wait for the lease to expire and purge it >>>>>> normally. Or am I missing something? >>>>> When 4.0 client lease expires and there are still states >>>>> associated >>>>> with the client then the server allows this client to become >>>>> courtesy >>>>> client. >>>> I think the same thing happens if an NFSv4.1 client neglects to >>>> send >>>> DESTROY_SESSION / DESTROY_CLIENTID. Either such a client is >>>> broken >>>> or malicious, but the server faces the same issue of protecting >>>> itself from a DoS attack. >>>> >>>> IMO you should consider limiting the number of courteous clients >>>> the server can hold onto. Let's say that number is 1000. When the >>>> server wants to turn a 1001st client into a courteous client, it >>>> can simply expire and purge the oldest courteous client on its >>>> list. Otherwise, over time, the 24-hour expiry will reduce the >>>> set of courteous clients back to zero. >>>> >>>> What do you think? >>> >>> Limiting the number of courteous clients to handle the cases of >>> broken/malicious 4.1 clients seems reasonable as the last resort. >>> >>> I think if a malicious 4.1 clients could mount the server's export, >>> opens a file (to create state) and repeats the same with a >>> different >>> client id then it seems like some basic security was already >>> broken; >>> allowing unauthorized clients to mount server's exports. >> >> You can do this today with AUTH_SYS. I consider it a genuine attack >> surface. >> >> >>> I think if we have to enforce a limit, then it's only for handling >>> of seriously buggy 4.1 clients which should not be the norm. The >>> issue with this is how to pick an optimal number that is suitable >>> for the running server which can be a very slow or a very fast >>> server. >>> >>> Note that even if we impose an limit, that does not completely >>> solve >>> the problem with pynfs 4.0 test since its RPC timeout is configured >>> with 15 secs which just enough to expire 277 clients based on 53ms >>> for each client, unless we limit it ~270 clients which I think it's >>> too low. >>> >>> This is what I plan to do: >>> >>> 1. do not support 4.0 courteous clients, for sure. >> >> Not supporting 4.0 isn’t an option, IMHO. It is a fully supported >> protocol at this time, and the same exposure exists for 4.1, it’s >> just a little harder to exploit. >> >> If you submit the courteous server patch without support for 4.0, I >> think it needs to include a plan for how 4.0 will be added later. >> >>> > > Why is there a problem here? The requirements are the same for 4.0 and > 4.1 (or 4.2). If the lease under which the courtesy lock was > established has expired, then that courtesy lock must be released if > some other client requests a lock that conflicts with the cached lock > (unless the client breaks the courtesy framework by renewing that > original lease before the conflict occurs). Otherwise, it is completely > up to the server when it decides to actually release the lock. > > For NFSv4.1 and NFSv4.2, we have DESTROY_CLIENTID, which tells the > server when the client is actually done with the lease, making it easy > to determine when it is safe to release all the courtesy locks. However > if the client does not send DESTROY_CLIENTID, then we're in the same > situation with 4.x (x>0) as we would be with bog standard NFSv4.0. The > lease has expired, and so the courtesy locks are liable to being > dropped. I agree the situation is the same for all minor versions. > At Hammerspace we have implemented courtesy locks, and our strategy is > that when a conflict occurs, we drop the entire set of courtesy locks > so that we don't have to deal with the "some locks were revoked" > scenario. The reason is that when we originally implemented courtesy > locks, the Linux NFSv4 client support for lock revocation was a lot > less sophisticated than today. My suggestion is that you might > therefore consider starting along this path, and then refining the > support to make revocation more nuanced once you are confident that the > coarser strategy is working as expected. Dai’s implementation does all that, and takes the coarser approach at the moment. There are plans to explore the more nuanced behavior (by revoking only the conflicting lock instead of dropping the whole lease) after this initial work is merged. The issue is there are certain pathological client behaviors (whether malicious or accidental) that can run the server out of resources, since it is holding onto lease state for a much longer time. We are simply trying to design a lease garbage collection scheme to meet that challenge. I think limiting the number of courteous clients is a simple way to do this, but we could also shorten the courtesy lifetime as more clients enter that state, to ensure that they don’t overrun the server’s memory. Another approach might be to add a shrinker that purges the oldest courteous clients when the server comes under memory pressure.
On Tue, 2021-11-30 at 04:47 +0000, Chuck Lever III wrote: > > > On Nov 29, 2021, at 11:08 PM, Trond Myklebust > > <trondmy@hammerspace.com> wrote: > > > > On Tue, 2021-11-30 at 01:42 +0000, Chuck Lever III wrote: > > > > > > > > On Nov 29, 2021, at 7:11 PM, Dai Ngo <dai.ngo@oracle.com> > > > > > wrote: > > > > > > > > > > > > > On 11/29/21 1:10 PM, Chuck Lever III wrote: > > > > > > > > > > > > On Nov 29, 2021, at 2:36 PM, Dai Ngo <dai.ngo@oracle.com> > > > > > > > wrote: > > > > > > > > > > > > > > > > > > On 11/29/21 11:03 AM, Chuck Lever III wrote: > > > > > > > Hello Dai! > > > > > > > > > > > > > > > > > > > > > > On Nov 29, 2021, at 1:32 PM, Dai Ngo > > > > > > > > <dai.ngo@oracle.com> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > On 11/29/21 9:30 AM, J. Bruce Fields wrote: > > > > > > > > > On Mon, Nov 29, 2021 at 09:13:16AM -0800, > > > > > > > > > dai.ngo@oracle.com wrote: > > > > > > > > > > Hi Bruce, > > > > > > > > > > > > > > > > > > > > On 11/21/21 7:04 PM, dai.ngo@oracle.com wrote: > > > > > > > > > > > On 11/17/21 4:34 PM, J. Bruce Fields wrote: > > > > > > > > > > > > On Wed, Nov 17, 2021 at 01:46:02PM -0800, > > > > > > > > > > > > dai.ngo@oracle.com wrote: > > > > > > > > > > > > > On 11/17/21 9:59 AM, > > > > > > > > > > > > > dai.ngo@oracle.com wrote: > > > > > > > > > > > > > > On 11/17/21 6:14 AM, J. Bruce Fields wrote: > > > > > > > > > > > > > > > On Tue, Nov 16, 2021 at 03:06:32PM -0800, > > > > > > > > > > > > > > > dai.ngo@oracle.com wrote: > > > > > > > > > > > > > > > > Just a reminder that this patch is > > > > > > > > > > > > > > > > still > > > > > > > > > > > > > > > > waiting for your review. > > > > > > > > > > > > > > > Yeah, I was procrastinating and hoping > > > > > > > > > > > > > > > yo'ud > > > > > > > > > > > > > > > figure out the pynfs > > > > > > > > > > > > > > > failure for me.... > > > > > > > > > > > > > > Last time I ran 4.0 OPEN18 test by itself > > > > > > > > > > > > > > and > > > > > > > > > > > > > > it passed. I will run > > > > > > > > > > > > > > all OPEN tests together with 5.15-rc7 to > > > > > > > > > > > > > > see if > > > > > > > > > > > > > > the problem you've > > > > > > > > > > > > > > seen still there. > > > > > > > > > > > > > I ran all tests in nfsv4.1 and nfsv4.0 with > > > > > > > > > > > > > courteous and non-courteous > > > > > > > > > > > > > 5.15-rc7 server. > > > > > > > > > > > > > > > > > > > > > > > > > > Nfs4.1 results are the same for both > > > > > > > > > > > > > courteous > > > > > > > > > > > > > and > > > > > > > > > > > > > non-courteous server: > > > > > > > > > > > > > > Of those: 0 Skipped, 0 Failed, 0 Warned, > > > > > > > > > > > > > > 169 > > > > > > > > > > > > > > Passed > > > > > > > > > > > > > Results of nfs4.0 with non-courteous server: > > > > > > > > > > > > > > Of those: 8 Skipped, 1 Failed, 0 Warned, > > > > > > > > > > > > > > 577 > > > > > > > > > > > > > > Passed > > > > > > > > > > > > > test failed: LOCK24 > > > > > > > > > > > > > > > > > > > > > > > > > > Results of nfs4.0 with courteous server: > > > > > > > > > > > > > > Of those: 8 Skipped, 3 Failed, 0 Warned, > > > > > > > > > > > > > > 575 > > > > > > > > > > > > > > Passed > > > > > > > > > > > > > tests failed: LOCK24, OPEN18, OPEN30 > > > > > > > > > > > > > > > > > > > > > > > > > > OPEN18 and OPEN30 test pass if each is run by > > > > > > > > > > > > > itself. > > > > > > > > > > > > Could well be a bug in the tests, I don't know. > > > > > > > > > > > The reason OPEN18 failed was because the test > > > > > > > > > > > timed > > > > > > > > > > > out waiting for > > > > > > > > > > > the reply of an OPEN call. The RPC connection > > > > > > > > > > > used > > > > > > > > > > > for the test was > > > > > > > > > > > configured with 15 secs timeout. Note that OPEN18 > > > > > > > > > > > only fails when > > > > > > > > > > > the tests were run with 'all' option, this test > > > > > > > > > > > passes if it's run > > > > > > > > > > > by itself. > > > > > > > > > > > > > > > > > > > > > > With courteous server, by the time OPEN18 runs, > > > > > > > > > > > there > > > > > > > > > > > are about 1026 > > > > > > > > > > > courtesy 4.0 clients on the server and all of > > > > > > > > > > > these > > > > > > > > > > > clients have opened > > > > > > > > > > > the same file X with WRITE access. These clients > > > > > > > > > > > were > > > > > > > > > > > created by the > > > > > > > > > > > previous tests. After each test completed, since > > > > > > > > > > > 4.0 > > > > > > > > > > > does not have > > > > > > > > > > > session, the client states are not cleaned up > > > > > > > > > > > immediately on the > > > > > > > > > > > server and are allowed to become courtesy > > > > > > > > > > > clients. > > > > > > > > > > > > > > > > > > > > > > When OPEN18 runs (about 20 minutes after the 1st > > > > > > > > > > > test > > > > > > > > > > > started), it > > > > > > > > > > > sends OPEN of file X with OPEN4_SHARE_DENY_WRITE > > > > > > > > > > > which causes the > > > > > > > > > > > server to check for conflicts with courtesy > > > > > > > > > > > clients. > > > > > > > > > > > The loop that > > > > > > > > > > > checks 1026 courtesy clients for share/access > > > > > > > > > > > conflict took less > > > > > > > > > > > than 1 sec. But it took about 55 secs, on my VM, > > > > > > > > > > > for > > > > > > > > > > > the server > > > > > > > > > > > to expire all 1026 courtesy clients. > > > > > > > > > > > > > > > > > > > > > > I modified pynfs to configure the 4.0 RPC > > > > > > > > > > > connection > > > > > > > > > > > with 60 seconds > > > > > > > > > > > timeout and OPEN18 now consistently passed. The > > > > > > > > > > > 4.0 > > > > > > > > > > > test results are > > > > > > > > > > > now the same for courteous and non-courteous > > > > > > > > > > > server: > > > > > > > > > > > > > > > > > > > > > > 8 Skipped, 1 Failed, 0 Warned, 577 Passed > > > > > > > > > > > > > > > > > > > > > > Note that 4.1 tests do not suffer this timeout > > > > > > > > > > > problem because the > > > > > > > > > > > 4.1 clients and sessions are destroyed after each > > > > > > > > > > > test completes. > > > > > > > > > > Do you want me to send the patch to increase the > > > > > > > > > > timeout for pynfs? > > > > > > > > > > or is there any other things you think we should > > > > > > > > > > do? > > > > > > > > > I don't know. > > > > > > > > > > > > > > > > > > 55 seconds to clean up 1026 clients is about 50ms per > > > > > > > > > client, which is > > > > > > > > > pretty slow. I wonder why. I guess it's probably > > > > > > > > > updating the stable > > > > > > > > > storage information. Is /var/lib/nfs/ on your server > > > > > > > > > backed by a hard > > > > > > > > > drive or an SSD or something else? > > > > > > > > My server is a virtualbox VM that has 1 CPU, 4GB RAM > > > > > > > > and > > > > > > > > 64GB of hard > > > > > > > > disk. I think a production system that supports this > > > > > > > > many > > > > > > > > clients should > > > > > > > > have faster CPUs, faster storage. > > > > > > > > > > > > > > > > > I wonder if that's an argument for limiting the > > > > > > > > > number of > > > > > > > > > courtesy > > > > > > > > > clients. > > > > > > > > I think we might want to treat 4.0 clients a bit > > > > > > > > different > > > > > > > > from 4.1 > > > > > > > > clients. With 4.0, every client will become a courtesy > > > > > > > > client after > > > > > > > > the client is done with the export and unmounts it. > > > > > > > It should be safe for a server to purge a client's lease > > > > > > > immediately > > > > > > > if there is no open or lock state associated with it. > > > > > > In this case, each client has opened files so there are > > > > > > open > > > > > > states > > > > > > associated with them. > > > > > > > > > > > > > When an NFSv4.0 client unmounts, all files should be > > > > > > > closed > > > > > > > at that > > > > > > > point, > > > > > > I'm not sure pynfs does proper clean up after each subtest, > > > > > > I > > > > > > will > > > > > > check. There must be state associated with the client in > > > > > > order > > > > > > for > > > > > > it to become courtesy client. > > > > > Makes sense. Then a synthetic client like pynfs can DoS a > > > > > courteous > > > > > server. > > > > > > > > > > > > > > > > > so the server can wait for the lease to expire and purge > > > > > > > it > > > > > > > normally. Or am I missing something? > > > > > > When 4.0 client lease expires and there are still states > > > > > > associated > > > > > > with the client then the server allows this client to > > > > > > become > > > > > > courtesy > > > > > > client. > > > > > I think the same thing happens if an NFSv4.1 client neglects > > > > > to > > > > > send > > > > > DESTROY_SESSION / DESTROY_CLIENTID. Either such a client is > > > > > broken > > > > > or malicious, but the server faces the same issue of > > > > > protecting > > > > > itself from a DoS attack. > > > > > > > > > > IMO you should consider limiting the number of courteous > > > > > clients > > > > > the server can hold onto. Let's say that number is 1000. When > > > > > the > > > > > server wants to turn a 1001st client into a courteous client, > > > > > it > > > > > can simply expire and purge the oldest courteous client on > > > > > its > > > > > list. Otherwise, over time, the 24-hour expiry will reduce > > > > > the > > > > > set of courteous clients back to zero. > > > > > > > > > > What do you think? > > > > > > > > Limiting the number of courteous clients to handle the cases of > > > > broken/malicious 4.1 clients seems reasonable as the last > > > > resort. > > > > > > > > I think if a malicious 4.1 clients could mount the server's > > > > export, > > > > opens a file (to create state) and repeats the same with a > > > > different > > > > client id then it seems like some basic security was already > > > > broken; > > > > allowing unauthorized clients to mount server's exports. > > > > > > You can do this today with AUTH_SYS. I consider it a genuine > > > attack > > > surface. > > > > > > > > > > I think if we have to enforce a limit, then it's only for > > > > handling > > > > of seriously buggy 4.1 clients which should not be the norm. > > > > The > > > > issue with this is how to pick an optimal number that is > > > > suitable > > > > for the running server which can be a very slow or a very fast > > > > server. > > > > > > > > Note that even if we impose an limit, that does not completely > > > > solve > > > > the problem with pynfs 4.0 test since its RPC timeout is > > > > configured > > > > with 15 secs which just enough to expire 277 clients based on > > > > 53ms > > > > for each client, unless we limit it ~270 clients which I think > > > > it's > > > > too low. > > > > > > > > This is what I plan to do: > > > > > > > > 1. do not support 4.0 courteous clients, for sure. > > > > > > Not supporting 4.0 isn’t an option, IMHO. It is a fully supported > > > protocol at this time, and the same exposure exists for 4.1, it’s > > > just a little harder to exploit. > > > > > > If you submit the courteous server patch without support for 4.0, > > > I > > > think it needs to include a plan for how 4.0 will be added later. > > > > > > > > > > > Why is there a problem here? The requirements are the same for 4.0 > > and > > 4.1 (or 4.2). If the lease under which the courtesy lock was > > established has expired, then that courtesy lock must be released > > if > > some other client requests a lock that conflicts with the cached > > lock > > (unless the client breaks the courtesy framework by renewing that > > original lease before the conflict occurs). Otherwise, it is > > completely > > up to the server when it decides to actually release the lock. > > > > For NFSv4.1 and NFSv4.2, we have DESTROY_CLIENTID, which tells the > > server when the client is actually done with the lease, making it > > easy > > to determine when it is safe to release all the courtesy locks. > > However > > if the client does not send DESTROY_CLIENTID, then we're in the > > same > > situation with 4.x (x>0) as we would be with bog standard NFSv4.0. > > The > > lease has expired, and so the courtesy locks are liable to being > > dropped. > > I agree the situation is the same for all minor versions. > > > > At Hammerspace we have implemented courtesy locks, and our strategy > > is > > that when a conflict occurs, we drop the entire set of courtesy > > locks > > so that we don't have to deal with the "some locks were revoked" > > scenario. The reason is that when we originally implemented > > courtesy > > locks, the Linux NFSv4 client support for lock revocation was a lot > > less sophisticated than today. My suggestion is that you might > > therefore consider starting along this path, and then refining the > > support to make revocation more nuanced once you are confident that > > the > > coarser strategy is working as expected. > > Dai’s implementation does all that, and takes the coarser approach at > the moment. There are plans to explore the more nuanced behavior (by > revoking only the conflicting lock instead of dropping the whole > lease) after this initial work is merged. > > The issue is there are certain pathological client behaviors (whether > malicious or accidental) that can run the server out of resources, > since it is holding onto lease state for a much longer time. We are > simply trying to design a lease garbage collection scheme to meet > that challenge. > > I think limiting the number of courteous clients is a simple way to > do this, but we could also shorten the courtesy lifetime as more > clients enter that state, to ensure that they don’t overrun the > server’s memory. Another approach might be to add a shrinker that > purges the oldest courteous clients when the server comes under > memory pressure. > > We already have a scanner that tries to release all client state after 1 lease period. Just extend that to do it after 10 lease periods. If a network partition hasn't recovered after 10 minutes, you probably have bigger problems. You can limit the number of clients as well, but that leads into a rats nest of other issues that have nothing to do with courtesy locks and everything to do with the fact that any client can hold a lot of state.
On 11/29/21 5:42 PM, Chuck Lever III wrote: >> On Nov 29, 2021, at 7:11 PM, Dai Ngo <dai.ngo@oracle.com> wrote: >> >> >>> On 11/29/21 1:10 PM, Chuck Lever III wrote: >>> >>>>> On Nov 29, 2021, at 2:36 PM, Dai Ngo <dai.ngo@oracle.com> wrote: >>>> >>>> On 11/29/21 11:03 AM, Chuck Lever III wrote: >>>>> Hello Dai! >>>>> >>>>> >>>>>> On Nov 29, 2021, at 1:32 PM, Dai Ngo <dai.ngo@oracle.com> wrote: >>>>>> >>>>>> >>>>>> On 11/29/21 9:30 AM, J. Bruce Fields wrote: >>>>>>> On Mon, Nov 29, 2021 at 09:13:16AM -0800, dai.ngo@oracle.com wrote: >>>>>>>> Hi Bruce, >>>>>>>> >>>>>>>> On 11/21/21 7:04 PM, dai.ngo@oracle.com wrote: >>>>>>>>> On 11/17/21 4:34 PM, J. Bruce Fields wrote: >>>>>>>>>> On Wed, Nov 17, 2021 at 01:46:02PM -0800, dai.ngo@oracle.com wrote: >>>>>>>>>>> On 11/17/21 9:59 AM, dai.ngo@oracle.com wrote: >>>>>>>>>>>> On 11/17/21 6:14 AM, J. Bruce Fields wrote: >>>>>>>>>>>>> On Tue, Nov 16, 2021 at 03:06:32PM -0800, dai.ngo@oracle.com wrote: >>>>>>>>>>>>>> Just a reminder that this patch is still waiting for your review. >>>>>>>>>>>>> Yeah, I was procrastinating and hoping yo'ud figure out the pynfs >>>>>>>>>>>>> failure for me.... >>>>>>>>>>>> Last time I ran 4.0 OPEN18 test by itself and it passed. I will run >>>>>>>>>>>> all OPEN tests together with 5.15-rc7 to see if the problem you've >>>>>>>>>>>> seen still there. >>>>>>>>>>> I ran all tests in nfsv4.1 and nfsv4.0 with courteous and non-courteous >>>>>>>>>>> 5.15-rc7 server. >>>>>>>>>>> >>>>>>>>>>> Nfs4.1 results are the same for both courteous and >>>>>>>>>>> non-courteous server: >>>>>>>>>>>> Of those: 0 Skipped, 0 Failed, 0 Warned, 169 Passed >>>>>>>>>>> Results of nfs4.0 with non-courteous server: >>>>>>>>>>>> Of those: 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>>>>>>>>>> test failed: LOCK24 >>>>>>>>>>> >>>>>>>>>>> Results of nfs4.0 with courteous server: >>>>>>>>>>>> Of those: 8 Skipped, 3 Failed, 0 Warned, 575 Passed >>>>>>>>>>> tests failed: LOCK24, OPEN18, OPEN30 >>>>>>>>>>> >>>>>>>>>>> OPEN18 and OPEN30 test pass if each is run by itself. >>>>>>>>>> Could well be a bug in the tests, I don't know. >>>>>>>>> The reason OPEN18 failed was because the test timed out waiting for >>>>>>>>> the reply of an OPEN call. The RPC connection used for the test was >>>>>>>>> configured with 15 secs timeout. Note that OPEN18 only fails when >>>>>>>>> the tests were run with 'all' option, this test passes if it's run >>>>>>>>> by itself. >>>>>>>>> >>>>>>>>> With courteous server, by the time OPEN18 runs, there are about 1026 >>>>>>>>> courtesy 4.0 clients on the server and all of these clients have opened >>>>>>>>> the same file X with WRITE access. These clients were created by the >>>>>>>>> previous tests. After each test completed, since 4.0 does not have >>>>>>>>> session, the client states are not cleaned up immediately on the >>>>>>>>> server and are allowed to become courtesy clients. >>>>>>>>> >>>>>>>>> When OPEN18 runs (about 20 minutes after the 1st test started), it >>>>>>>>> sends OPEN of file X with OPEN4_SHARE_DENY_WRITE which causes the >>>>>>>>> server to check for conflicts with courtesy clients. The loop that >>>>>>>>> checks 1026 courtesy clients for share/access conflict took less >>>>>>>>> than 1 sec. But it took about 55 secs, on my VM, for the server >>>>>>>>> to expire all 1026 courtesy clients. >>>>>>>>> >>>>>>>>> I modified pynfs to configure the 4.0 RPC connection with 60 seconds >>>>>>>>> timeout and OPEN18 now consistently passed. The 4.0 test results are >>>>>>>>> now the same for courteous and non-courteous server: >>>>>>>>> >>>>>>>>> 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>>>>>>>> >>>>>>>>> Note that 4.1 tests do not suffer this timeout problem because the >>>>>>>>> 4.1 clients and sessions are destroyed after each test completes. >>>>>>>> Do you want me to send the patch to increase the timeout for pynfs? >>>>>>>> or is there any other things you think we should do? >>>>>>> I don't know. >>>>>>> >>>>>>> 55 seconds to clean up 1026 clients is about 50ms per client, which is >>>>>>> pretty slow. I wonder why. I guess it's probably updating the stable >>>>>>> storage information. Is /var/lib/nfs/ on your server backed by a hard >>>>>>> drive or an SSD or something else? >>>>>> My server is a virtualbox VM that has 1 CPU, 4GB RAM and 64GB of hard >>>>>> disk. I think a production system that supports this many clients should >>>>>> have faster CPUs, faster storage. >>>>>> >>>>>>> I wonder if that's an argument for limiting the number of courtesy >>>>>>> clients. >>>>>> I think we might want to treat 4.0 clients a bit different from 4.1 >>>>>> clients. With 4.0, every client will become a courtesy client after >>>>>> the client is done with the export and unmounts it. >>>>> It should be safe for a server to purge a client's lease immediately >>>>> if there is no open or lock state associated with it. >>>> In this case, each client has opened files so there are open states >>>> associated with them. >>>> >>>>> When an NFSv4.0 client unmounts, all files should be closed at that >>>>> point, >>>> I'm not sure pynfs does proper clean up after each subtest, I will >>>> check. There must be state associated with the client in order for >>>> it to become courtesy client. >>> Makes sense. Then a synthetic client like pynfs can DoS a courteous >>> server. >>> >>> >>>>> so the server can wait for the lease to expire and purge it >>>>> normally. Or am I missing something? >>>> When 4.0 client lease expires and there are still states associated >>>> with the client then the server allows this client to become courtesy >>>> client. >>> I think the same thing happens if an NFSv4.1 client neglects to send >>> DESTROY_SESSION / DESTROY_CLIENTID. Either such a client is broken >>> or malicious, but the server faces the same issue of protecting >>> itself from a DoS attack. >>> >>> IMO you should consider limiting the number of courteous clients >>> the server can hold onto. Let's say that number is 1000. When the >>> server wants to turn a 1001st client into a courteous client, it >>> can simply expire and purge the oldest courteous client on its >>> list. Otherwise, over time, the 24-hour expiry will reduce the >>> set of courteous clients back to zero. >>> >>> What do you think? >> Limiting the number of courteous clients to handle the cases of >> broken/malicious 4.1 clients seems reasonable as the last resort. >> >> I think if a malicious 4.1 clients could mount the server's export, >> opens a file (to create state) and repeats the same with a different >> client id then it seems like some basic security was already broken; >> allowing unauthorized clients to mount server's exports. > You can do this today with AUTH_SYS. I consider it a genuine attack surface. > > >> I think if we have to enforce a limit, then it's only for handling >> of seriously buggy 4.1 clients which should not be the norm. The >> issue with this is how to pick an optimal number that is suitable >> for the running server which can be a very slow or a very fast server. >> >> Note that even if we impose an limit, that does not completely solve >> the problem with pynfs 4.0 test since its RPC timeout is configured >> with 15 secs which just enough to expire 277 clients based on 53ms >> for each client, unless we limit it ~270 clients which I think it's >> too low. >> >> This is what I plan to do: >> >> 1. do not support 4.0 courteous clients, for sure. > Not supporting 4.0 isn’t an option, IMHO. It is a fully supported protocol at this time, and the same exposure exists for 4.1, it’s just a little harder to exploit. > > If you submit the courteous server patch without support for 4.0, I think it needs to include a plan for how 4.0 will be added later. Seems like we should support both 4.0 and 4.x (x>=1) at the same time. > > >> 2. limit the number of courteous clients to 1000 (?), if you still >> think we need it. > I think this limit is necessary. It can be set based on the server’s physical memory size if a dynamic limit is desired. Just to be clear, the problem of pynfs with 4.0 is that the server takes ~55 secs to expire 1026 4.0 courteous clients, which comes out to ~50ms per client. This causes the test to time out in waiting for RPC reply of the OPEN that triggers the conflicts. I don't know exactly where the time spent in the process of expiring a client. But as Bruce mentioned, it could be related to the time to access /var/lib/nfs to remove the client's persistent record. I think that is most likely the case because the number of states owned by each client should be small since each test is short and does simple ops. So I think this problem is related to the number of clients and not number of states owned by the clients. This is not the memory resource shortage problem due to too many state which we have planned to address it after the initial phase. I'd vote to use a static limit for now, say 1000 clients, to avoid complicating the courteous server code for something that would not happen most of the time. -Dai > > >> Pls let me know what you think. >> >> Thanks, >> -Dai >> >>> >>>>>> Since there is >>>>>> no destroy session/client with 4.0, the courteous server allows the >>>>>> client to be around and becomes a courtesy client. So after awhile, >>>>>> even with normal usage, there will be lots 4.0 courtesy clients >>>>>> hanging around and these clients won't be destroyed until 24hrs >>>>>> later, or until they cause conflicts with other clients. >>>>>> >>>>>> We can reduce the courtesy_client_expiry time for 4.0 clients from >>>>>> 24hrs to 15/20 mins, enough for most network partition to heal?, >>>>>> or limit the number of 4.0 courtesy clients. Or don't support 4.0 >>>>>> clients at all which is my preference since I think in general users >>>>>> should skip 4.0 and use 4.1 instead. >>>>>> >>>>>> -Dai >>>>> -- >>>>> Chuck Lever >>>>> >>>>> >>>>> >>> -- >>> Chuck Lever >>> >>> >>>
On 11/29/21 8:57 PM, Trond Myklebust wrote: > On Tue, 2021-11-30 at 04:47 +0000, Chuck Lever III wrote: >>> On Nov 29, 2021, at 11:08 PM, Trond Myklebust >>> <trondmy@hammerspace.com> wrote: >>> >>> On Tue, 2021-11-30 at 01:42 +0000, Chuck Lever III wrote: >>>>>> On Nov 29, 2021, at 7:11 PM, Dai Ngo <dai.ngo@oracle.com> >>>>>> wrote: >>>>> >>>>>> On 11/29/21 1:10 PM, Chuck Lever III wrote: >>>>>> >>>>>>>> On Nov 29, 2021, at 2:36 PM, Dai Ngo <dai.ngo@oracle.com> >>>>>>>> wrote: >>>>>>> >>>>>>> On 11/29/21 11:03 AM, Chuck Lever III wrote: >>>>>>>> Hello Dai! >>>>>>>> >>>>>>>> >>>>>>>>> On Nov 29, 2021, at 1:32 PM, Dai Ngo >>>>>>>>> <dai.ngo@oracle.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> On 11/29/21 9:30 AM, J. Bruce Fields wrote: >>>>>>>>>> On Mon, Nov 29, 2021 at 09:13:16AM -0800, >>>>>>>>>> dai.ngo@oracle.com wrote: >>>>>>>>>>> Hi Bruce, >>>>>>>>>>> >>>>>>>>>>> On 11/21/21 7:04 PM, dai.ngo@oracle.com wrote: >>>>>>>>>>>> On 11/17/21 4:34 PM, J. Bruce Fields wrote: >>>>>>>>>>>>> On Wed, Nov 17, 2021 at 01:46:02PM -0800, >>>>>>>>>>>>> dai.ngo@oracle.com wrote: >>>>>>>>>>>>>> On 11/17/21 9:59 AM, >>>>>>>>>>>>>> dai.ngo@oracle.com wrote: >>>>>>>>>>>>>>> On 11/17/21 6:14 AM, J. Bruce Fields wrote: >>>>>>>>>>>>>>>> On Tue, Nov 16, 2021 at 03:06:32PM -0800, >>>>>>>>>>>>>>>> dai.ngo@oracle.com wrote: >>>>>>>>>>>>>>>>> Just a reminder that this patch is >>>>>>>>>>>>>>>>> still >>>>>>>>>>>>>>>>> waiting for your review. >>>>>>>>>>>>>>>> Yeah, I was procrastinating and hoping >>>>>>>>>>>>>>>> yo'ud >>>>>>>>>>>>>>>> figure out the pynfs >>>>>>>>>>>>>>>> failure for me.... >>>>>>>>>>>>>>> Last time I ran 4.0 OPEN18 test by itself >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>> it passed. I will run >>>>>>>>>>>>>>> all OPEN tests together with 5.15-rc7 to >>>>>>>>>>>>>>> see if >>>>>>>>>>>>>>> the problem you've >>>>>>>>>>>>>>> seen still there. >>>>>>>>>>>>>> I ran all tests in nfsv4.1 and nfsv4.0 with >>>>>>>>>>>>>> courteous and non-courteous >>>>>>>>>>>>>> 5.15-rc7 server. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Nfs4.1 results are the same for both >>>>>>>>>>>>>> courteous >>>>>>>>>>>>>> and >>>>>>>>>>>>>> non-courteous server: >>>>>>>>>>>>>>> Of those: 0 Skipped, 0 Failed, 0 Warned, >>>>>>>>>>>>>>> 169 >>>>>>>>>>>>>>> Passed >>>>>>>>>>>>>> Results of nfs4.0 with non-courteous server: >>>>>>>>>>>>>>> Of those: 8 Skipped, 1 Failed, 0 Warned, >>>>>>>>>>>>>>> 577 >>>>>>>>>>>>>>> Passed >>>>>>>>>>>>>> test failed: LOCK24 >>>>>>>>>>>>>> >>>>>>>>>>>>>> Results of nfs4.0 with courteous server: >>>>>>>>>>>>>>> Of those: 8 Skipped, 3 Failed, 0 Warned, >>>>>>>>>>>>>>> 575 >>>>>>>>>>>>>>> Passed >>>>>>>>>>>>>> tests failed: LOCK24, OPEN18, OPEN30 >>>>>>>>>>>>>> >>>>>>>>>>>>>> OPEN18 and OPEN30 test pass if each is run by >>>>>>>>>>>>>> itself. >>>>>>>>>>>>> Could well be a bug in the tests, I don't know. >>>>>>>>>>>> The reason OPEN18 failed was because the test >>>>>>>>>>>> timed >>>>>>>>>>>> out waiting for >>>>>>>>>>>> the reply of an OPEN call. The RPC connection >>>>>>>>>>>> used >>>>>>>>>>>> for the test was >>>>>>>>>>>> configured with 15 secs timeout. Note that OPEN18 >>>>>>>>>>>> only fails when >>>>>>>>>>>> the tests were run with 'all' option, this test >>>>>>>>>>>> passes if it's run >>>>>>>>>>>> by itself. >>>>>>>>>>>> >>>>>>>>>>>> With courteous server, by the time OPEN18 runs, >>>>>>>>>>>> there >>>>>>>>>>>> are about 1026 >>>>>>>>>>>> courtesy 4.0 clients on the server and all of >>>>>>>>>>>> these >>>>>>>>>>>> clients have opened >>>>>>>>>>>> the same file X with WRITE access. These clients >>>>>>>>>>>> were >>>>>>>>>>>> created by the >>>>>>>>>>>> previous tests. After each test completed, since >>>>>>>>>>>> 4.0 >>>>>>>>>>>> does not have >>>>>>>>>>>> session, the client states are not cleaned up >>>>>>>>>>>> immediately on the >>>>>>>>>>>> server and are allowed to become courtesy >>>>>>>>>>>> clients. >>>>>>>>>>>> >>>>>>>>>>>> When OPEN18 runs (about 20 minutes after the 1st >>>>>>>>>>>> test >>>>>>>>>>>> started), it >>>>>>>>>>>> sends OPEN of file X with OPEN4_SHARE_DENY_WRITE >>>>>>>>>>>> which causes the >>>>>>>>>>>> server to check for conflicts with courtesy >>>>>>>>>>>> clients. >>>>>>>>>>>> The loop that >>>>>>>>>>>> checks 1026 courtesy clients for share/access >>>>>>>>>>>> conflict took less >>>>>>>>>>>> than 1 sec. But it took about 55 secs, on my VM, >>>>>>>>>>>> for >>>>>>>>>>>> the server >>>>>>>>>>>> to expire all 1026 courtesy clients. >>>>>>>>>>>> >>>>>>>>>>>> I modified pynfs to configure the 4.0 RPC >>>>>>>>>>>> connection >>>>>>>>>>>> with 60 seconds >>>>>>>>>>>> timeout and OPEN18 now consistently passed. The >>>>>>>>>>>> 4.0 >>>>>>>>>>>> test results are >>>>>>>>>>>> now the same for courteous and non-courteous >>>>>>>>>>>> server: >>>>>>>>>>>> >>>>>>>>>>>> 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>>>>>>>>>>> >>>>>>>>>>>> Note that 4.1 tests do not suffer this timeout >>>>>>>>>>>> problem because the >>>>>>>>>>>> 4.1 clients and sessions are destroyed after each >>>>>>>>>>>> test completes. >>>>>>>>>>> Do you want me to send the patch to increase the >>>>>>>>>>> timeout for pynfs? >>>>>>>>>>> or is there any other things you think we should >>>>>>>>>>> do? >>>>>>>>>> I don't know. >>>>>>>>>> >>>>>>>>>> 55 seconds to clean up 1026 clients is about 50ms per >>>>>>>>>> client, which is >>>>>>>>>> pretty slow. I wonder why. I guess it's probably >>>>>>>>>> updating the stable >>>>>>>>>> storage information. Is /var/lib/nfs/ on your server >>>>>>>>>> backed by a hard >>>>>>>>>> drive or an SSD or something else? >>>>>>>>> My server is a virtualbox VM that has 1 CPU, 4GB RAM >>>>>>>>> and >>>>>>>>> 64GB of hard >>>>>>>>> disk. I think a production system that supports this >>>>>>>>> many >>>>>>>>> clients should >>>>>>>>> have faster CPUs, faster storage. >>>>>>>>> >>>>>>>>>> I wonder if that's an argument for limiting the >>>>>>>>>> number of >>>>>>>>>> courtesy >>>>>>>>>> clients. >>>>>>>>> I think we might want to treat 4.0 clients a bit >>>>>>>>> different >>>>>>>>> from 4.1 >>>>>>>>> clients. With 4.0, every client will become a courtesy >>>>>>>>> client after >>>>>>>>> the client is done with the export and unmounts it. >>>>>>>> It should be safe for a server to purge a client's lease >>>>>>>> immediately >>>>>>>> if there is no open or lock state associated with it. >>>>>>> In this case, each client has opened files so there are >>>>>>> open >>>>>>> states >>>>>>> associated with them. >>>>>>> >>>>>>>> When an NFSv4.0 client unmounts, all files should be >>>>>>>> closed >>>>>>>> at that >>>>>>>> point, >>>>>>> I'm not sure pynfs does proper clean up after each subtest, >>>>>>> I >>>>>>> will >>>>>>> check. There must be state associated with the client in >>>>>>> order >>>>>>> for >>>>>>> it to become courtesy client. >>>>>> Makes sense. Then a synthetic client like pynfs can DoS a >>>>>> courteous >>>>>> server. >>>>>> >>>>>> >>>>>>>> so the server can wait for the lease to expire and purge >>>>>>>> it >>>>>>>> normally. Or am I missing something? >>>>>>> When 4.0 client lease expires and there are still states >>>>>>> associated >>>>>>> with the client then the server allows this client to >>>>>>> become >>>>>>> courtesy >>>>>>> client. >>>>>> I think the same thing happens if an NFSv4.1 client neglects >>>>>> to >>>>>> send >>>>>> DESTROY_SESSION / DESTROY_CLIENTID. Either such a client is >>>>>> broken >>>>>> or malicious, but the server faces the same issue of >>>>>> protecting >>>>>> itself from a DoS attack. >>>>>> >>>>>> IMO you should consider limiting the number of courteous >>>>>> clients >>>>>> the server can hold onto. Let's say that number is 1000. When >>>>>> the >>>>>> server wants to turn a 1001st client into a courteous client, >>>>>> it >>>>>> can simply expire and purge the oldest courteous client on >>>>>> its >>>>>> list. Otherwise, over time, the 24-hour expiry will reduce >>>>>> the >>>>>> set of courteous clients back to zero. >>>>>> >>>>>> What do you think? >>>>> Limiting the number of courteous clients to handle the cases of >>>>> broken/malicious 4.1 clients seems reasonable as the last >>>>> resort. >>>>> >>>>> I think if a malicious 4.1 clients could mount the server's >>>>> export, >>>>> opens a file (to create state) and repeats the same with a >>>>> different >>>>> client id then it seems like some basic security was already >>>>> broken; >>>>> allowing unauthorized clients to mount server's exports. >>>> You can do this today with AUTH_SYS. I consider it a genuine >>>> attack >>>> surface. >>>> >>>> >>>>> I think if we have to enforce a limit, then it's only for >>>>> handling >>>>> of seriously buggy 4.1 clients which should not be the norm. >>>>> The >>>>> issue with this is how to pick an optimal number that is >>>>> suitable >>>>> for the running server which can be a very slow or a very fast >>>>> server. >>>>> >>>>> Note that even if we impose an limit, that does not completely >>>>> solve >>>>> the problem with pynfs 4.0 test since its RPC timeout is >>>>> configured >>>>> with 15 secs which just enough to expire 277 clients based on >>>>> 53ms >>>>> for each client, unless we limit it ~270 clients which I think >>>>> it's >>>>> too low. >>>>> >>>>> This is what I plan to do: >>>>> >>>>> 1. do not support 4.0 courteous clients, for sure. >>>> Not supporting 4.0 isn’t an option, IMHO. It is a fully supported >>>> protocol at this time, and the same exposure exists for 4.1, it’s >>>> just a little harder to exploit. >>>> >>>> If you submit the courteous server patch without support for 4.0, >>>> I >>>> think it needs to include a plan for how 4.0 will be added later. >>>> >>> Why is there a problem here? The requirements are the same for 4.0 >>> and >>> 4.1 (or 4.2). If the lease under which the courtesy lock was >>> established has expired, then that courtesy lock must be released >>> if >>> some other client requests a lock that conflicts with the cached >>> lock >>> (unless the client breaks the courtesy framework by renewing that >>> original lease before the conflict occurs). Otherwise, it is >>> completely >>> up to the server when it decides to actually release the lock. >>> >>> For NFSv4.1 and NFSv4.2, we have DESTROY_CLIENTID, which tells the >>> server when the client is actually done with the lease, making it >>> easy >>> to determine when it is safe to release all the courtesy locks. >>> However >>> if the client does not send DESTROY_CLIENTID, then we're in the >>> same >>> situation with 4.x (x>0) as we would be with bog standard NFSv4.0. >>> The >>> lease has expired, and so the courtesy locks are liable to being >>> dropped. >> I agree the situation is the same for all minor versions. >> >> >>> At Hammerspace we have implemented courtesy locks, and our strategy >>> is >>> that when a conflict occurs, we drop the entire set of courtesy >>> locks >>> so that we don't have to deal with the "some locks were revoked" >>> scenario. The reason is that when we originally implemented >>> courtesy >>> locks, the Linux NFSv4 client support for lock revocation was a lot >>> less sophisticated than today. My suggestion is that you might >>> therefore consider starting along this path, and then refining the >>> support to make revocation more nuanced once you are confident that >>> the >>> coarser strategy is working as expected. >> Dai’s implementation does all that, and takes the coarser approach at >> the moment. There are plans to explore the more nuanced behavior (by >> revoking only the conflicting lock instead of dropping the whole >> lease) after this initial work is merged. >> >> The issue is there are certain pathological client behaviors (whether >> malicious or accidental) that can run the server out of resources, >> since it is holding onto lease state for a much longer time. We are >> simply trying to design a lease garbage collection scheme to meet >> that challenge. >> >> I think limiting the number of courteous clients is a simple way to >> do this, but we could also shorten the courtesy lifetime as more >> clients enter that state, to ensure that they don’t overrun the >> server’s memory. Another approach might be to add a shrinker that >> purges the oldest courteous clients when the server comes under >> memory pressure. >> >> > We already have a scanner that tries to release all client state after > 1 lease period. Just extend that to do it after 10 lease periods. If a > network partition hasn't recovered after 10 minutes, you probably have > bigger problems. Currently the courteous server allows 24hr for the network partition to heal before releasing all client state. That seems to be excessive but it was suggested for longer network partition conditions when switch/routers being repaired/upgraded. > > You can limit the number of clients as well, but that leads into a rats > nest of other issues that have nothing to do with courtesy locks and > everything to do with the fact that any client can hold a lot of state. The issue we currently have with courteous server and pynfs 4.0 tests is the number of courteous 4.0 clients the server has to expire when a share reservation conflict occurs when servicing the OPEN. Each client owns only few state in this case so we think the server spent most time for deleting client's record in /var/lib/nfs. This is why we plan to limit the number of courteous clients for now. As a side effect, it might also help to reduce resource consumption too. -Dai >
On Mon, 2021-11-29 at 23:22 -0800, dai.ngo@oracle.com wrote: > > On 11/29/21 8:57 PM, Trond Myklebust wrote: > > On Tue, 2021-11-30 at 04:47 +0000, Chuck Lever III wrote: > > > > On Nov 29, 2021, at 11:08 PM, Trond Myklebust > > > > <trondmy@hammerspace.com> wrote: > > > > > > > > On Tue, 2021-11-30 at 01:42 +0000, Chuck Lever III wrote: > > > > > > > On Nov 29, 2021, at 7:11 PM, Dai Ngo <dai.ngo@oracle.com> > > > > > > > wrote: > > > > > > > > > > > > > On 11/29/21 1:10 PM, Chuck Lever III wrote: > > > > > > > > > > > > > > > > On Nov 29, 2021, at 2:36 PM, Dai Ngo > > > > > > > > > <dai.ngo@oracle.com> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > On 11/29/21 11:03 AM, Chuck Lever III wrote: > > > > > > > > > Hello Dai! > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Nov 29, 2021, at 1:32 PM, Dai Ngo > > > > > > > > > > <dai.ngo@oracle.com> > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 11/29/21 9:30 AM, J. Bruce Fields wrote: > > > > > > > > > > > On Mon, Nov 29, 2021 at 09:13:16AM -0800, > > > > > > > > > > > dai.ngo@oracle.com wrote: > > > > > > > > > > > > Hi Bruce, > > > > > > > > > > > > > > > > > > > > > > > > On 11/21/21 7:04 PM, dai.ngo@oracle.com wrote: > > > > > > > > > > > > > On 11/17/21 4:34 PM, J. Bruce Fields wrote: > > > > > > > > > > > > > > On Wed, Nov 17, 2021 at 01:46:02PM -0800, > > > > > > > > > > > > > > dai.ngo@oracle.com wrote: > > > > > > > > > > > > > > > On 11/17/21 9:59 AM, > > > > > > > > > > > > > > > dai.ngo@oracle.com wrote: > > > > > > > > > > > > > > > > On 11/17/21 6:14 AM, J. Bruce Fields > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > On Tue, Nov 16, 2021 at 03:06:32PM - > > > > > > > > > > > > > > > > > 0800, > > > > > > > > > > > > > > > > > dai.ngo@oracle.com wrote: > > > > > > > > > > > > > > > > > > Just a reminder that this patch is > > > > > > > > > > > > > > > > > > still > > > > > > > > > > > > > > > > > > waiting for your review. > > > > > > > > > > > > > > > > > Yeah, I was procrastinating and > > > > > > > > > > > > > > > > > hoping > > > > > > > > > > > > > > > > > yo'ud > > > > > > > > > > > > > > > > > figure out the pynfs > > > > > > > > > > > > > > > > > failure for me.... > > > > > > > > > > > > > > > > Last time I ran 4.0 OPEN18 test by > > > > > > > > > > > > > > > > itself > > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > it passed. I will run > > > > > > > > > > > > > > > > all OPEN tests together with 5.15-rc7 > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > see if > > > > > > > > > > > > > > > > the problem you've > > > > > > > > > > > > > > > > seen still there. > > > > > > > > > > > > > > > I ran all tests in nfsv4.1 and nfsv4.0 > > > > > > > > > > > > > > > with > > > > > > > > > > > > > > > courteous and non-courteous > > > > > > > > > > > > > > > 5.15-rc7 server. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Nfs4.1 results are the same for both > > > > > > > > > > > > > > > courteous > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > non-courteous server: > > > > > > > > > > > > > > > > Of those: 0 Skipped, 0 Failed, 0 > > > > > > > > > > > > > > > > Warned, > > > > > > > > > > > > > > > > 169 > > > > > > > > > > > > > > > > Passed > > > > > > > > > > > > > > > Results of nfs4.0 with non-courteous > > > > > > > > > > > > > > > server: > > > > > > > > > > > > > > > > Of those: 8 Skipped, 1 Failed, 0 > > > > > > > > > > > > > > > > Warned, > > > > > > > > > > > > > > > > 577 > > > > > > > > > > > > > > > > Passed > > > > > > > > > > > > > > > test failed: LOCK24 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Results of nfs4.0 with courteous server: > > > > > > > > > > > > > > > > Of those: 8 Skipped, 3 Failed, 0 > > > > > > > > > > > > > > > > Warned, > > > > > > > > > > > > > > > > 575 > > > > > > > > > > > > > > > > Passed > > > > > > > > > > > > > > > tests failed: LOCK24, OPEN18, OPEN30 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > OPEN18 and OPEN30 test pass if each is > > > > > > > > > > > > > > > run by > > > > > > > > > > > > > > > itself. > > > > > > > > > > > > > > Could well be a bug in the tests, I don't > > > > > > > > > > > > > > know. > > > > > > > > > > > > > The reason OPEN18 failed was because the test > > > > > > > > > > > > > timed > > > > > > > > > > > > > out waiting for > > > > > > > > > > > > > the reply of an OPEN call. The RPC connection > > > > > > > > > > > > > used > > > > > > > > > > > > > for the test was > > > > > > > > > > > > > configured with 15 secs timeout. Note that > > > > > > > > > > > > > OPEN18 > > > > > > > > > > > > > only fails when > > > > > > > > > > > > > the tests were run with 'all' option, this > > > > > > > > > > > > > test > > > > > > > > > > > > > passes if it's run > > > > > > > > > > > > > by itself. > > > > > > > > > > > > > > > > > > > > > > > > > > With courteous server, by the time OPEN18 > > > > > > > > > > > > > runs, > > > > > > > > > > > > > there > > > > > > > > > > > > > are about 1026 > > > > > > > > > > > > > courtesy 4.0 clients on the server and all of > > > > > > > > > > > > > these > > > > > > > > > > > > > clients have opened > > > > > > > > > > > > > the same file X with WRITE access. These > > > > > > > > > > > > > clients > > > > > > > > > > > > > were > > > > > > > > > > > > > created by the > > > > > > > > > > > > > previous tests. After each test completed, > > > > > > > > > > > > > since > > > > > > > > > > > > > 4.0 > > > > > > > > > > > > > does not have > > > > > > > > > > > > > session, the client states are not cleaned up > > > > > > > > > > > > > immediately on the > > > > > > > > > > > > > server and are allowed to become courtesy > > > > > > > > > > > > > clients. > > > > > > > > > > > > > > > > > > > > > > > > > > When OPEN18 runs (about 20 minutes after the > > > > > > > > > > > > > 1st > > > > > > > > > > > > > test > > > > > > > > > > > > > started), it > > > > > > > > > > > > > sends OPEN of file X with > > > > > > > > > > > > > OPEN4_SHARE_DENY_WRITE > > > > > > > > > > > > > which causes the > > > > > > > > > > > > > server to check for conflicts with courtesy > > > > > > > > > > > > > clients. > > > > > > > > > > > > > The loop that > > > > > > > > > > > > > checks 1026 courtesy clients for share/access > > > > > > > > > > > > > conflict took less > > > > > > > > > > > > > than 1 sec. But it took about 55 secs, on my > > > > > > > > > > > > > VM, > > > > > > > > > > > > > for > > > > > > > > > > > > > the server > > > > > > > > > > > > > to expire all 1026 courtesy clients. > > > > > > > > > > > > > > > > > > > > > > > > > > I modified pynfs to configure the 4.0 RPC > > > > > > > > > > > > > connection > > > > > > > > > > > > > with 60 seconds > > > > > > > > > > > > > timeout and OPEN18 now consistently passed. > > > > > > > > > > > > > The > > > > > > > > > > > > > 4.0 > > > > > > > > > > > > > test results are > > > > > > > > > > > > > now the same for courteous and non-courteous > > > > > > > > > > > > > server: > > > > > > > > > > > > > > > > > > > > > > > > > > 8 Skipped, 1 Failed, 0 Warned, 577 Passed > > > > > > > > > > > > > > > > > > > > > > > > > > Note that 4.1 tests do not suffer this > > > > > > > > > > > > > timeout > > > > > > > > > > > > > problem because the > > > > > > > > > > > > > 4.1 clients and sessions are destroyed after > > > > > > > > > > > > > each > > > > > > > > > > > > > test completes. > > > > > > > > > > > > Do you want me to send the patch to increase > > > > > > > > > > > > the > > > > > > > > > > > > timeout for pynfs? > > > > > > > > > > > > or is there any other things you think we > > > > > > > > > > > > should > > > > > > > > > > > > do? > > > > > > > > > > > I don't know. > > > > > > > > > > > > > > > > > > > > > > 55 seconds to clean up 1026 clients is about 50ms > > > > > > > > > > > per > > > > > > > > > > > client, which is > > > > > > > > > > > pretty slow. I wonder why. I guess it's > > > > > > > > > > > probably > > > > > > > > > > > updating the stable > > > > > > > > > > > storage information. Is /var/lib/nfs/ on your > > > > > > > > > > > server > > > > > > > > > > > backed by a hard > > > > > > > > > > > drive or an SSD or something else? > > > > > > > > > > My server is a virtualbox VM that has 1 CPU, 4GB > > > > > > > > > > RAM > > > > > > > > > > and > > > > > > > > > > 64GB of hard > > > > > > > > > > disk. I think a production system that supports > > > > > > > > > > this > > > > > > > > > > many > > > > > > > > > > clients should > > > > > > > > > > have faster CPUs, faster storage. > > > > > > > > > > > > > > > > > > > > > I wonder if that's an argument for limiting the > > > > > > > > > > > number of > > > > > > > > > > > courtesy > > > > > > > > > > > clients. > > > > > > > > > > I think we might want to treat 4.0 clients a bit > > > > > > > > > > different > > > > > > > > > > from 4.1 > > > > > > > > > > clients. With 4.0, every client will become a > > > > > > > > > > courtesy > > > > > > > > > > client after > > > > > > > > > > the client is done with the export and unmounts it. > > > > > > > > > It should be safe for a server to purge a client's > > > > > > > > > lease > > > > > > > > > immediately > > > > > > > > > if there is no open or lock state associated with it. > > > > > > > > In this case, each client has opened files so there are > > > > > > > > open > > > > > > > > states > > > > > > > > associated with them. > > > > > > > > > > > > > > > > > When an NFSv4.0 client unmounts, all files should be > > > > > > > > > closed > > > > > > > > > at that > > > > > > > > > point, > > > > > > > > I'm not sure pynfs does proper clean up after each > > > > > > > > subtest, > > > > > > > > I > > > > > > > > will > > > > > > > > check. There must be state associated with the client > > > > > > > > in > > > > > > > > order > > > > > > > > for > > > > > > > > it to become courtesy client. > > > > > > > Makes sense. Then a synthetic client like pynfs can DoS a > > > > > > > courteous > > > > > > > server. > > > > > > > > > > > > > > > > > > > > > > > so the server can wait for the lease to expire and > > > > > > > > > purge > > > > > > > > > it > > > > > > > > > normally. Or am I missing something? > > > > > > > > When 4.0 client lease expires and there are still > > > > > > > > states > > > > > > > > associated > > > > > > > > with the client then the server allows this client to > > > > > > > > become > > > > > > > > courtesy > > > > > > > > client. > > > > > > > I think the same thing happens if an NFSv4.1 client > > > > > > > neglects > > > > > > > to > > > > > > > send > > > > > > > DESTROY_SESSION / DESTROY_CLIENTID. Either such a client > > > > > > > is > > > > > > > broken > > > > > > > or malicious, but the server faces the same issue of > > > > > > > protecting > > > > > > > itself from a DoS attack. > > > > > > > > > > > > > > IMO you should consider limiting the number of courteous > > > > > > > clients > > > > > > > the server can hold onto. Let's say that number is 1000. > > > > > > > When > > > > > > > the > > > > > > > server wants to turn a 1001st client into a courteous > > > > > > > client, > > > > > > > it > > > > > > > can simply expire and purge the oldest courteous client > > > > > > > on > > > > > > > its > > > > > > > list. Otherwise, over time, the 24-hour expiry will > > > > > > > reduce > > > > > > > the > > > > > > > set of courteous clients back to zero. > > > > > > > > > > > > > > What do you think? > > > > > > Limiting the number of courteous clients to handle the > > > > > > cases of > > > > > > broken/malicious 4.1 clients seems reasonable as the last > > > > > > resort. > > > > > > > > > > > > I think if a malicious 4.1 clients could mount the server's > > > > > > export, > > > > > > opens a file (to create state) and repeats the same with a > > > > > > different > > > > > > client id then it seems like some basic security was > > > > > > already > > > > > > broken; > > > > > > allowing unauthorized clients to mount server's exports. > > > > > You can do this today with AUTH_SYS. I consider it a genuine > > > > > attack > > > > > surface. > > > > > > > > > > > > > > > > I think if we have to enforce a limit, then it's only for > > > > > > handling > > > > > > of seriously buggy 4.1 clients which should not be the > > > > > > norm. > > > > > > The > > > > > > issue with this is how to pick an optimal number that is > > > > > > suitable > > > > > > for the running server which can be a very slow or a very > > > > > > fast > > > > > > server. > > > > > > > > > > > > Note that even if we impose an limit, that does not > > > > > > completely > > > > > > solve > > > > > > the problem with pynfs 4.0 test since its RPC timeout is > > > > > > configured > > > > > > with 15 secs which just enough to expire 277 clients based > > > > > > on > > > > > > 53ms > > > > > > for each client, unless we limit it ~270 clients which I > > > > > > think > > > > > > it's > > > > > > too low. > > > > > > > > > > > > This is what I plan to do: > > > > > > > > > > > > 1. do not support 4.0 courteous clients, for sure. > > > > > Not supporting 4.0 isn’t an option, IMHO. It is a fully > > > > > supported > > > > > protocol at this time, and the same exposure exists for 4.1, > > > > > it’s > > > > > just a little harder to exploit. > > > > > > > > > > If you submit the courteous server patch without support for > > > > > 4.0, > > > > > I > > > > > think it needs to include a plan for how 4.0 will be added > > > > > later. > > > > > > > > > Why is there a problem here? The requirements are the same for > > > > 4.0 > > > > and > > > > 4.1 (or 4.2). If the lease under which the courtesy lock was > > > > established has expired, then that courtesy lock must be > > > > released > > > > if > > > > some other client requests a lock that conflicts with the > > > > cached > > > > lock > > > > (unless the client breaks the courtesy framework by renewing > > > > that > > > > original lease before the conflict occurs). Otherwise, it is > > > > completely > > > > up to the server when it decides to actually release the lock. > > > > > > > > For NFSv4.1 and NFSv4.2, we have DESTROY_CLIENTID, which tells > > > > the > > > > server when the client is actually done with the lease, making > > > > it > > > > easy > > > > to determine when it is safe to release all the courtesy locks. > > > > However > > > > if the client does not send DESTROY_CLIENTID, then we're in the > > > > same > > > > situation with 4.x (x>0) as we would be with bog standard > > > > NFSv4.0. > > > > The > > > > lease has expired, and so the courtesy locks are liable to > > > > being > > > > dropped. > > > I agree the situation is the same for all minor versions. > > > > > > > > > > At Hammerspace we have implemented courtesy locks, and our > > > > strategy > > > > is > > > > that when a conflict occurs, we drop the entire set of courtesy > > > > locks > > > > so that we don't have to deal with the "some locks were > > > > revoked" > > > > scenario. The reason is that when we originally implemented > > > > courtesy > > > > locks, the Linux NFSv4 client support for lock revocation was a > > > > lot > > > > less sophisticated than today. My suggestion is that you might > > > > therefore consider starting along this path, and then refining > > > > the > > > > support to make revocation more nuanced once you are confident > > > > that > > > > the > > > > coarser strategy is working as expected. > > > Dai’s implementation does all that, and takes the coarser > > > approach at > > > the moment. There are plans to explore the more nuanced behavior > > > (by > > > revoking only the conflicting lock instead of dropping the whole > > > lease) after this initial work is merged. > > > > > > The issue is there are certain pathological client behaviors > > > (whether > > > malicious or accidental) that can run the server out of > > > resources, > > > since it is holding onto lease state for a much longer time. We > > > are > > > simply trying to design a lease garbage collection scheme to meet > > > that challenge. > > > > > > I think limiting the number of courteous clients is a simple way > > > to > > > do this, but we could also shorten the courtesy lifetime as more > > > clients enter that state, to ensure that they don’t overrun the > > > server’s memory. Another approach might be to add a shrinker that > > > purges the oldest courteous clients when the server comes under > > > memory pressure. > > > > > > > > We already have a scanner that tries to release all client state > > after > > 1 lease period. Just extend that to do it after 10 lease periods. > > If a > > network partition hasn't recovered after 10 minutes, you probably > > have > > bigger problems. > > Currently the courteous server allows 24hr for the network partition > to > heal before releasing all client state. That seems to be excessive > but > it was suggested for longer network partition conditions when > switch/routers > being repaired/upgraded. > > > > > You can limit the number of clients as well, but that leads into a > > rats > > nest of other issues that have nothing to do with courtesy locks > > and > > everything to do with the fact that any client can hold a lot of > > state. > > The issue we currently have with courteous server and pynfs 4.0 tests > is the number of courteous 4.0 clients the server has to expire when > a > share reservation conflict occurs when servicing the OPEN. Each > client > owns only few state in this case so we think the server spent most > time > for deleting client's record in /var/lib/nfs. This is why we plan to > limit the number of courteous clients for now. As a side effect, it > might > also help to reduce resource consumption too. Then kick off a thread or work item to do that asynchronously in the background, and return NFS4ERR_DELAY to the clients that were trying to grab locks in the meantime. The above process is hardly just confined to NFSv4.0 clients. If there is a network partition, then the exact same record deleting needs to be applied to all NFSv4.1 and NFSv4.2 clients that hold locks and are unable to renew their leases, so you might as well make it work for everyone.
On Mon, Nov 29, 2021 at 11:13:34PM -0800, dai.ngo@oracle.com wrote: > Just to be clear, the problem of pynfs with 4.0 is that the server takes > ~55 secs to expire 1026 4.0 courteous clients, which comes out to ~50ms > per client. This causes the test to time out in waiting for RPC reply of > the OPEN that triggers the conflicts. > > I don't know exactly where the time spent in the process of expiring a > client. But as Bruce mentioned, it could be related to the time to access > /var/lib/nfs to remove the client's persistent record. Could you try something like strace -r -$(pidof) -oTRACE and maybe we could take a look at TRACE? My hope would be that there'd be a clear set of syscalls whose time, multiplied by 1026, explains most of that 55 seconds. Then it might be worth checking whether there are any easy optimizations possible. --b.
> On Nov 30, 2021, at 2:22 AM, Dai Ngo <dai.ngo@oracle.com> wrote: > > > On 11/29/21 8:57 PM, Trond Myklebust wrote: >> On Tue, 2021-11-30 at 04:47 +0000, Chuck Lever III wrote: >>>> On Nov 29, 2021, at 11:08 PM, Trond Myklebust >>>> <trondmy@hammerspace.com> wrote: >>>> >>>> On Tue, 2021-11-30 at 01:42 +0000, Chuck Lever III wrote: >>>>>>> On Nov 29, 2021, at 7:11 PM, Dai Ngo <dai.ngo@oracle.com> >>>>>>> wrote: >>>>>> >>>>>>> On 11/29/21 1:10 PM, Chuck Lever III wrote: >>>>>>> >>>>>>>>> On Nov 29, 2021, at 2:36 PM, Dai Ngo <dai.ngo@oracle.com> >>>>>>>>> wrote: >>>>>>>> >>>>>>>> On 11/29/21 11:03 AM, Chuck Lever III wrote: >>>>>>>>> Hello Dai! >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Nov 29, 2021, at 1:32 PM, Dai Ngo >>>>>>>>>> <dai.ngo@oracle.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 11/29/21 9:30 AM, J. Bruce Fields wrote: >>>>>>>>>>> On Mon, Nov 29, 2021 at 09:13:16AM -0800, >>>>>>>>>>> dai.ngo@oracle.com wrote: >>>>>>>>>>>> Hi Bruce, >>>>>>>>>>>> >>>>>>>>>>>> On 11/21/21 7:04 PM, dai.ngo@oracle.com wrote: >>>>>>>>>>>>> On 11/17/21 4:34 PM, J. Bruce Fields wrote: >>>>>>>>>>>>>> On Wed, Nov 17, 2021 at 01:46:02PM -0800, >>>>>>>>>>>>>> dai.ngo@oracle.com wrote: >>>>>>>>>>>>>>> On 11/17/21 9:59 AM, >>>>>>>>>>>>>>> dai.ngo@oracle.com wrote: >>>>>>>>>>>>>>>> On 11/17/21 6:14 AM, J. Bruce Fields wrote: >>>>>>>>>>>>>>>>> On Tue, Nov 16, 2021 at 03:06:32PM -0800, >>>>>>>>>>>>>>>>> dai.ngo@oracle.com wrote: >>>>>>>>>>>>>>>>>> Just a reminder that this patch is >>>>>>>>>>>>>>>>>> still >>>>>>>>>>>>>>>>>> waiting for your review. >>>>>>>>>>>>>>>>> Yeah, I was procrastinating and hoping >>>>>>>>>>>>>>>>> yo'ud >>>>>>>>>>>>>>>>> figure out the pynfs >>>>>>>>>>>>>>>>> failure for me.... >>>>>>>>>>>>>>>> Last time I ran 4.0 OPEN18 test by itself >>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>> it passed. I will run >>>>>>>>>>>>>>>> all OPEN tests together with 5.15-rc7 to >>>>>>>>>>>>>>>> see if >>>>>>>>>>>>>>>> the problem you've >>>>>>>>>>>>>>>> seen still there. >>>>>>>>>>>>>>> I ran all tests in nfsv4.1 and nfsv4.0 with >>>>>>>>>>>>>>> courteous and non-courteous >>>>>>>>>>>>>>> 5.15-rc7 server. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Nfs4.1 results are the same for both >>>>>>>>>>>>>>> courteous >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>> non-courteous server: >>>>>>>>>>>>>>>> Of those: 0 Skipped, 0 Failed, 0 Warned, >>>>>>>>>>>>>>>> 169 >>>>>>>>>>>>>>>> Passed >>>>>>>>>>>>>>> Results of nfs4.0 with non-courteous server: >>>>>>>>>>>>>>>> Of those: 8 Skipped, 1 Failed, 0 Warned, >>>>>>>>>>>>>>>> 577 >>>>>>>>>>>>>>>> Passed >>>>>>>>>>>>>>> test failed: LOCK24 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Results of nfs4.0 with courteous server: >>>>>>>>>>>>>>>> Of those: 8 Skipped, 3 Failed, 0 Warned, >>>>>>>>>>>>>>>> 575 >>>>>>>>>>>>>>>> Passed >>>>>>>>>>>>>>> tests failed: LOCK24, OPEN18, OPEN30 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> OPEN18 and OPEN30 test pass if each is run by >>>>>>>>>>>>>>> itself. >>>>>>>>>>>>>> Could well be a bug in the tests, I don't know. >>>>>>>>>>>>> The reason OPEN18 failed was because the test >>>>>>>>>>>>> timed >>>>>>>>>>>>> out waiting for >>>>>>>>>>>>> the reply of an OPEN call. The RPC connection >>>>>>>>>>>>> used >>>>>>>>>>>>> for the test was >>>>>>>>>>>>> configured with 15 secs timeout. Note that OPEN18 >>>>>>>>>>>>> only fails when >>>>>>>>>>>>> the tests were run with 'all' option, this test >>>>>>>>>>>>> passes if it's run >>>>>>>>>>>>> by itself. >>>>>>>>>>>>> >>>>>>>>>>>>> With courteous server, by the time OPEN18 runs, >>>>>>>>>>>>> there >>>>>>>>>>>>> are about 1026 >>>>>>>>>>>>> courtesy 4.0 clients on the server and all of >>>>>>>>>>>>> these >>>>>>>>>>>>> clients have opened >>>>>>>>>>>>> the same file X with WRITE access. These clients >>>>>>>>>>>>> were >>>>>>>>>>>>> created by the >>>>>>>>>>>>> previous tests. After each test completed, since >>>>>>>>>>>>> 4.0 >>>>>>>>>>>>> does not have >>>>>>>>>>>>> session, the client states are not cleaned up >>>>>>>>>>>>> immediately on the >>>>>>>>>>>>> server and are allowed to become courtesy >>>>>>>>>>>>> clients. >>>>>>>>>>>>> >>>>>>>>>>>>> When OPEN18 runs (about 20 minutes after the 1st >>>>>>>>>>>>> test >>>>>>>>>>>>> started), it >>>>>>>>>>>>> sends OPEN of file X with OPEN4_SHARE_DENY_WRITE >>>>>>>>>>>>> which causes the >>>>>>>>>>>>> server to check for conflicts with courtesy >>>>>>>>>>>>> clients. >>>>>>>>>>>>> The loop that >>>>>>>>>>>>> checks 1026 courtesy clients for share/access >>>>>>>>>>>>> conflict took less >>>>>>>>>>>>> than 1 sec. But it took about 55 secs, on my VM, >>>>>>>>>>>>> for >>>>>>>>>>>>> the server >>>>>>>>>>>>> to expire all 1026 courtesy clients. >>>>>>>>>>>>> >>>>>>>>>>>>> I modified pynfs to configure the 4.0 RPC >>>>>>>>>>>>> connection >>>>>>>>>>>>> with 60 seconds >>>>>>>>>>>>> timeout and OPEN18 now consistently passed. The >>>>>>>>>>>>> 4.0 >>>>>>>>>>>>> test results are >>>>>>>>>>>>> now the same for courteous and non-courteous >>>>>>>>>>>>> server: >>>>>>>>>>>>> >>>>>>>>>>>>> 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>>>>>>>>>>>> >>>>>>>>>>>>> Note that 4.1 tests do not suffer this timeout >>>>>>>>>>>>> problem because the >>>>>>>>>>>>> 4.1 clients and sessions are destroyed after each >>>>>>>>>>>>> test completes. >>>>>>>>>>>> Do you want me to send the patch to increase the >>>>>>>>>>>> timeout for pynfs? >>>>>>>>>>>> or is there any other things you think we should >>>>>>>>>>>> do? >>>>>>>>>>> I don't know. >>>>>>>>>>> >>>>>>>>>>> 55 seconds to clean up 1026 clients is about 50ms per >>>>>>>>>>> client, which is >>>>>>>>>>> pretty slow. I wonder why. I guess it's probably >>>>>>>>>>> updating the stable >>>>>>>>>>> storage information. Is /var/lib/nfs/ on your server >>>>>>>>>>> backed by a hard >>>>>>>>>>> drive or an SSD or something else? >>>>>>>>>> My server is a virtualbox VM that has 1 CPU, 4GB RAM >>>>>>>>>> and >>>>>>>>>> 64GB of hard >>>>>>>>>> disk. I think a production system that supports this >>>>>>>>>> many >>>>>>>>>> clients should >>>>>>>>>> have faster CPUs, faster storage. >>>>>>>>>> >>>>>>>>>>> I wonder if that's an argument for limiting the >>>>>>>>>>> number of >>>>>>>>>>> courtesy >>>>>>>>>>> clients. >>>>>>>>>> I think we might want to treat 4.0 clients a bit >>>>>>>>>> different >>>>>>>>>> from 4.1 >>>>>>>>>> clients. With 4.0, every client will become a courtesy >>>>>>>>>> client after >>>>>>>>>> the client is done with the export and unmounts it. >>>>>>>>> It should be safe for a server to purge a client's lease >>>>>>>>> immediately >>>>>>>>> if there is no open or lock state associated with it. >>>>>>>> In this case, each client has opened files so there are >>>>>>>> open >>>>>>>> states >>>>>>>> associated with them. >>>>>>>> >>>>>>>>> When an NFSv4.0 client unmounts, all files should be >>>>>>>>> closed >>>>>>>>> at that >>>>>>>>> point, >>>>>>>> I'm not sure pynfs does proper clean up after each subtest, >>>>>>>> I >>>>>>>> will >>>>>>>> check. There must be state associated with the client in >>>>>>>> order >>>>>>>> for >>>>>>>> it to become courtesy client. >>>>>>> Makes sense. Then a synthetic client like pynfs can DoS a >>>>>>> courteous >>>>>>> server. >>>>>>> >>>>>>> >>>>>>>>> so the server can wait for the lease to expire and purge >>>>>>>>> it >>>>>>>>> normally. Or am I missing something? >>>>>>>> When 4.0 client lease expires and there are still states >>>>>>>> associated >>>>>>>> with the client then the server allows this client to >>>>>>>> become >>>>>>>> courtesy >>>>>>>> client. >>>>>>> I think the same thing happens if an NFSv4.1 client neglects >>>>>>> to >>>>>>> send >>>>>>> DESTROY_SESSION / DESTROY_CLIENTID. Either such a client is >>>>>>> broken >>>>>>> or malicious, but the server faces the same issue of >>>>>>> protecting >>>>>>> itself from a DoS attack. >>>>>>> >>>>>>> IMO you should consider limiting the number of courteous >>>>>>> clients >>>>>>> the server can hold onto. Let's say that number is 1000. When >>>>>>> the >>>>>>> server wants to turn a 1001st client into a courteous client, >>>>>>> it >>>>>>> can simply expire and purge the oldest courteous client on >>>>>>> its >>>>>>> list. Otherwise, over time, the 24-hour expiry will reduce >>>>>>> the >>>>>>> set of courteous clients back to zero. >>>>>>> >>>>>>> What do you think? >>>>>> Limiting the number of courteous clients to handle the cases of >>>>>> broken/malicious 4.1 clients seems reasonable as the last >>>>>> resort. >>>>>> >>>>>> I think if a malicious 4.1 clients could mount the server's >>>>>> export, >>>>>> opens a file (to create state) and repeats the same with a >>>>>> different >>>>>> client id then it seems like some basic security was already >>>>>> broken; >>>>>> allowing unauthorized clients to mount server's exports. >>>>> You can do this today with AUTH_SYS. I consider it a genuine >>>>> attack >>>>> surface. >>>>> >>>>> >>>>>> I think if we have to enforce a limit, then it's only for >>>>>> handling >>>>>> of seriously buggy 4.1 clients which should not be the norm. >>>>>> The >>>>>> issue with this is how to pick an optimal number that is >>>>>> suitable >>>>>> for the running server which can be a very slow or a very fast >>>>>> server. >>>>>> >>>>>> Note that even if we impose an limit, that does not completely >>>>>> solve >>>>>> the problem with pynfs 4.0 test since its RPC timeout is >>>>>> configured >>>>>> with 15 secs which just enough to expire 277 clients based on >>>>>> 53ms >>>>>> for each client, unless we limit it ~270 clients which I think >>>>>> it's >>>>>> too low. >>>>>> >>>>>> This is what I plan to do: >>>>>> >>>>>> 1. do not support 4.0 courteous clients, for sure. >>>>> Not supporting 4.0 isn’t an option, IMHO. It is a fully supported >>>>> protocol at this time, and the same exposure exists for 4.1, it’s >>>>> just a little harder to exploit. >>>>> >>>>> If you submit the courteous server patch without support for 4.0, >>>>> I >>>>> think it needs to include a plan for how 4.0 will be added later. >>>>> >>>> Why is there a problem here? The requirements are the same for 4.0 >>>> and >>>> 4.1 (or 4.2). If the lease under which the courtesy lock was >>>> established has expired, then that courtesy lock must be released >>>> if >>>> some other client requests a lock that conflicts with the cached >>>> lock >>>> (unless the client breaks the courtesy framework by renewing that >>>> original lease before the conflict occurs). Otherwise, it is >>>> completely >>>> up to the server when it decides to actually release the lock. >>>> >>>> For NFSv4.1 and NFSv4.2, we have DESTROY_CLIENTID, which tells the >>>> server when the client is actually done with the lease, making it >>>> easy >>>> to determine when it is safe to release all the courtesy locks. >>>> However >>>> if the client does not send DESTROY_CLIENTID, then we're in the >>>> same >>>> situation with 4.x (x>0) as we would be with bog standard NFSv4.0. >>>> The >>>> lease has expired, and so the courtesy locks are liable to being >>>> dropped. >>> I agree the situation is the same for all minor versions. >>> >>> >>>> At Hammerspace we have implemented courtesy locks, and our strategy >>>> is >>>> that when a conflict occurs, we drop the entire set of courtesy >>>> locks >>>> so that we don't have to deal with the "some locks were revoked" >>>> scenario. The reason is that when we originally implemented >>>> courtesy >>>> locks, the Linux NFSv4 client support for lock revocation was a lot >>>> less sophisticated than today. My suggestion is that you might >>>> therefore consider starting along this path, and then refining the >>>> support to make revocation more nuanced once you are confident that >>>> the >>>> coarser strategy is working as expected. >>> Dai’s implementation does all that, and takes the coarser approach at >>> the moment. There are plans to explore the more nuanced behavior (by >>> revoking only the conflicting lock instead of dropping the whole >>> lease) after this initial work is merged. >>> >>> The issue is there are certain pathological client behaviors (whether >>> malicious or accidental) that can run the server out of resources, >>> since it is holding onto lease state for a much longer time. We are >>> simply trying to design a lease garbage collection scheme to meet >>> that challenge. >>> >>> I think limiting the number of courteous clients is a simple way to >>> do this, but we could also shorten the courtesy lifetime as more >>> clients enter that state, to ensure that they don’t overrun the >>> server’s memory. Another approach might be to add a shrinker that >>> purges the oldest courteous clients when the server comes under >>> memory pressure. >>> >>> >> We already have a scanner that tries to release all client state after >> 1 lease period. Just extend that to do it after 10 lease periods. If a >> network partition hasn't recovered after 10 minutes, you probably have >> bigger problems. > > Currently the courteous server allows 24hr for the network partition to > heal before releasing all client state. That seems to be excessive but > it was suggested for longer network partition conditions when switch/routers > being repaired/upgraded. Sure, 24 hours is a long time. For the benefit of others on the list, we have seen customer failure scenarios where networks were partitioned for that long. But it's an arbitrary number, and there's no specification for how long a server needs to hold a courtesy client. We can make this number anything that is both convenient for the server implementation and valuable for outage recovery. >> You can limit the number of clients as well, but that leads into a rats >> nest of other issues that have nothing to do with courtesy locks and >> everything to do with the fact that any client can hold a lot of state. > > The issue we currently have with courteous server and pynfs 4.0 tests > is the number of courteous 4.0 clients the server has to expire when a > share reservation conflict occurs when servicing the OPEN. Each client > owns only few state in this case so we think the server spent most time > for deleting client's record in /var/lib/nfs. This is why we plan to > limit the number of courteous clients for now. As a side effect, it might > also help to reduce resource consumption too. I am a little concerned that we are trying to optimize a case that won't happen during practice. pynfs does not reflect any kind of realistic or reasonable client behavior -- it's designed to test very specific server operations. All that needs to happen, IMO, is that the server needs to protect itself from resource exhaustion (which may occur for any of the minor versions). So I'm taking a shine to the idea of using a shrinker to trim the older courtesy clients, rather than placing an arbitrary limit on the number of courtesy clients the server can hold at once. A shrinker should take into account the wide variance in the amount of lease state each client might have. -- Chuck Lever
On Tue, Nov 30, 2021 at 03:36:43PM +0000, Chuck Lever III wrote: > I am a little concerned that we are trying to optimize a case > that won't happen during practice. pynfs does not reflect any > kind of realistic or reasonable client behavior -- it's designed > to test very specific server operations. I wonder how hard this problem would be to hit in normal use. I mean, a few hundred or a thousand clients doesn't sound that crazy. This case depends on an open deny, but you could hit the same problem with file locks. Would it be that weird to have a client trying to get a write lock on a file read-locked by a bunch of other clients? --b.
On Tue, 2021-11-30 at 11:05 -0500, Bruce Fields wrote: > On Tue, Nov 30, 2021 at 03:36:43PM +0000, Chuck Lever III wrote: > > I am a little concerned that we are trying to optimize a case > > that won't happen during practice. pynfs does not reflect any > > kind of realistic or reasonable client behavior -- it's designed > > to test very specific server operations. > > I wonder how hard this problem would be to hit in normal use. I > mean, a > few hundred or a thousand clients doesn't sound that crazy. This > case > depends on an open deny, but you could hit the same problem with file > locks. Would it be that weird to have a client trying to get a write > lock on a file read-locked by a bunch of other clients? > That's a scenario that is subject to starvation problems anyway. Particularly so on NFSv4.0, which lacks CB_NOTIFY_LOCK.
On Tue, Nov 30, 2021 at 04:14:10PM +0000, Trond Myklebust wrote: > On Tue, 2021-11-30 at 11:05 -0500, Bruce Fields wrote: > > On Tue, Nov 30, 2021 at 03:36:43PM +0000, Chuck Lever III wrote: > > > I am a little concerned that we are trying to optimize a case > > > that won't happen during practice. pynfs does not reflect any > > > kind of realistic or reasonable client behavior -- it's designed > > > to test very specific server operations. > > > > I wonder how hard this problem would be to hit in normal use. I > > mean, a > > few hundred or a thousand clients doesn't sound that crazy. This > > case > > depends on an open deny, but you could hit the same problem with file > > locks. Would it be that weird to have a client trying to get a write > > lock on a file read-locked by a bunch of other clients? > > > > That's a scenario that is subject to starvation problems anyway. Yes, if it's hundreds of clients continuously grabbing read locks. But if it's something like: send all the readers a signal, then request a write lock as a way to wait for them to finish; then you'd normally expect to get it soon after the last client drops its lock. I don't know, maybe that's uncommon. --b.
On 11/30/21 7:32 AM, Bruce Fields wrote: > On Mon, Nov 29, 2021 at 11:13:34PM -0800, dai.ngo@oracle.com wrote: >> Just to be clear, the problem of pynfs with 4.0 is that the server takes >> ~55 secs to expire 1026 4.0 courteous clients, which comes out to ~50ms >> per client. This causes the test to time out in waiting for RPC reply of >> the OPEN that triggers the conflicts. >> >> I don't know exactly where the time spent in the process of expiring a >> client. But as Bruce mentioned, it could be related to the time to access >> /var/lib/nfs to remove the client's persistent record. > Could you try something like > > strace -r -$(pidof) -oTRACE Strace does not have any info that shows where the server spent time when expiring client state. The client record is removed by nfsd4_umh_cltrack_remove doing upcall to user space helper /sbin/nfsdcltrack to do the job. I used the low-tech debug tool, printk, to measure the time spent by nfsd4_client_record_remove. Here is a sample of the output, START and END are in milliseconds: Nov 30 12:31:04 localhost kernel: nfsd4_client_record_remove: START [0x15d418] clp[ffff888119206040] client_tracking_ops[ffffffffa04bc2e0] Nov 30 12:31:04 localhost kernel: nfsd4_client_record_remove: END [0x15d459] clp[ffff888119206040] client_tracking_ops[ffffffffa04bc2e0] Nov 30 12:31:04 localhost kernel: nfsd4_client_record_remove: START [0x15d461] clp[ffff888119206740] client_tracking_ops[ffffffffa04bc2e0] Nov 30 12:31:04 localhost kernel: nfsd4_client_record_remove: END [0x15d48e] clp[ffff888119206740] client_tracking_ops[ffffffffa04bc2e0] Nov 30 12:31:04 localhost kernel: nfsd4_client_record_remove: START [0x15d49c] clp[ffff88811b54e000] client_tracking_ops[ffffffffa04bc2e0] Nov 30 12:31:04 localhost kernel: nfsd4_client_record_remove: END [0x15d4c5] clp[ffff88811b54e000] client_tracking_ops[ffffffffa04bc2e0] The average time spent to remove the client record is about ~50ms, matches with the time reported by pynfs test. This confirms what Bruce suspected earlier. -Dai > > and maybe we could take a look at TRACE? My hope would be that there'd > be a clear set of syscalls whose time, multiplied by 1026, explains most > of that 55 seconds. Then it might be worth checking whether there are > any easy optimizations possible. > > --b.
On 11/30/21 5:37 AM, Trond Myklebust wrote: > On Mon, 2021-11-29 at 23:22 -0800, dai.ngo@oracle.com wrote: >> On 11/29/21 8:57 PM, Trond Myklebust wrote: >>> On Tue, 2021-11-30 at 04:47 +0000, Chuck Lever III wrote: >>>>> On Nov 29, 2021, at 11:08 PM, Trond Myklebust >>>>> <trondmy@hammerspace.com> wrote: >>>>> >>>>> On Tue, 2021-11-30 at 01:42 +0000, Chuck Lever III wrote: >>>>>>>> On Nov 29, 2021, at 7:11 PM, Dai Ngo <dai.ngo@oracle.com> >>>>>>>> wrote: >>>>>>> >>>>>>>> On 11/29/21 1:10 PM, Chuck Lever III wrote: >>>>>>>> >>>>>>>>>> On Nov 29, 2021, at 2:36 PM, Dai Ngo >>>>>>>>>> <dai.ngo@oracle.com> >>>>>>>>>> wrote: >>>>>>>>> On 11/29/21 11:03 AM, Chuck Lever III wrote: >>>>>>>>>> Hello Dai! >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Nov 29, 2021, at 1:32 PM, Dai Ngo >>>>>>>>>>> <dai.ngo@oracle.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 11/29/21 9:30 AM, J. Bruce Fields wrote: >>>>>>>>>>>> On Mon, Nov 29, 2021 at 09:13:16AM -0800, >>>>>>>>>>>> dai.ngo@oracle.com wrote: >>>>>>>>>>>>> Hi Bruce, >>>>>>>>>>>>> >>>>>>>>>>>>> On 11/21/21 7:04 PM, dai.ngo@oracle.com wrote: >>>>>>>>>>>>>> On 11/17/21 4:34 PM, J. Bruce Fields wrote: >>>>>>>>>>>>>>> On Wed, Nov 17, 2021 at 01:46:02PM -0800, >>>>>>>>>>>>>>> dai.ngo@oracle.com wrote: >>>>>>>>>>>>>>>> On 11/17/21 9:59 AM, >>>>>>>>>>>>>>>> dai.ngo@oracle.com wrote: >>>>>>>>>>>>>>>>> On 11/17/21 6:14 AM, J. Bruce Fields >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> On Tue, Nov 16, 2021 at 03:06:32PM - >>>>>>>>>>>>>>>>>> 0800, >>>>>>>>>>>>>>>>>> dai.ngo@oracle.com wrote: >>>>>>>>>>>>>>>>>>> Just a reminder that this patch is >>>>>>>>>>>>>>>>>>> still >>>>>>>>>>>>>>>>>>> waiting for your review. >>>>>>>>>>>>>>>>>> Yeah, I was procrastinating and >>>>>>>>>>>>>>>>>> hoping >>>>>>>>>>>>>>>>>> yo'ud >>>>>>>>>>>>>>>>>> figure out the pynfs >>>>>>>>>>>>>>>>>> failure for me.... >>>>>>>>>>>>>>>>> Last time I ran 4.0 OPEN18 test by >>>>>>>>>>>>>>>>> itself >>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>> it passed. I will run >>>>>>>>>>>>>>>>> all OPEN tests together with 5.15-rc7 >>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>> see if >>>>>>>>>>>>>>>>> the problem you've >>>>>>>>>>>>>>>>> seen still there. >>>>>>>>>>>>>>>> I ran all tests in nfsv4.1 and nfsv4.0 >>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>> courteous and non-courteous >>>>>>>>>>>>>>>> 5.15-rc7 server. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Nfs4.1 results are the same for both >>>>>>>>>>>>>>>> courteous >>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>> non-courteous server: >>>>>>>>>>>>>>>>> Of those: 0 Skipped, 0 Failed, 0 >>>>>>>>>>>>>>>>> Warned, >>>>>>>>>>>>>>>>> 169 >>>>>>>>>>>>>>>>> Passed >>>>>>>>>>>>>>>> Results of nfs4.0 with non-courteous >>>>>>>>>>>>>>>> server: >>>>>>>>>>>>>>>>> Of those: 8 Skipped, 1 Failed, 0 >>>>>>>>>>>>>>>>> Warned, >>>>>>>>>>>>>>>>> 577 >>>>>>>>>>>>>>>>> Passed >>>>>>>>>>>>>>>> test failed: LOCK24 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Results of nfs4.0 with courteous server: >>>>>>>>>>>>>>>>> Of those: 8 Skipped, 3 Failed, 0 >>>>>>>>>>>>>>>>> Warned, >>>>>>>>>>>>>>>>> 575 >>>>>>>>>>>>>>>>> Passed >>>>>>>>>>>>>>>> tests failed: LOCK24, OPEN18, OPEN30 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> OPEN18 and OPEN30 test pass if each is >>>>>>>>>>>>>>>> run by >>>>>>>>>>>>>>>> itself. >>>>>>>>>>>>>>> Could well be a bug in the tests, I don't >>>>>>>>>>>>>>> know. >>>>>>>>>>>>>> The reason OPEN18 failed was because the test >>>>>>>>>>>>>> timed >>>>>>>>>>>>>> out waiting for >>>>>>>>>>>>>> the reply of an OPEN call. The RPC connection >>>>>>>>>>>>>> used >>>>>>>>>>>>>> for the test was >>>>>>>>>>>>>> configured with 15 secs timeout. Note that >>>>>>>>>>>>>> OPEN18 >>>>>>>>>>>>>> only fails when >>>>>>>>>>>>>> the tests were run with 'all' option, this >>>>>>>>>>>>>> test >>>>>>>>>>>>>> passes if it's run >>>>>>>>>>>>>> by itself. >>>>>>>>>>>>>> >>>>>>>>>>>>>> With courteous server, by the time OPEN18 >>>>>>>>>>>>>> runs, >>>>>>>>>>>>>> there >>>>>>>>>>>>>> are about 1026 >>>>>>>>>>>>>> courtesy 4.0 clients on the server and all of >>>>>>>>>>>>>> these >>>>>>>>>>>>>> clients have opened >>>>>>>>>>>>>> the same file X with WRITE access. These >>>>>>>>>>>>>> clients >>>>>>>>>>>>>> were >>>>>>>>>>>>>> created by the >>>>>>>>>>>>>> previous tests. After each test completed, >>>>>>>>>>>>>> since >>>>>>>>>>>>>> 4.0 >>>>>>>>>>>>>> does not have >>>>>>>>>>>>>> session, the client states are not cleaned up >>>>>>>>>>>>>> immediately on the >>>>>>>>>>>>>> server and are allowed to become courtesy >>>>>>>>>>>>>> clients. >>>>>>>>>>>>>> >>>>>>>>>>>>>> When OPEN18 runs (about 20 minutes after the >>>>>>>>>>>>>> 1st >>>>>>>>>>>>>> test >>>>>>>>>>>>>> started), it >>>>>>>>>>>>>> sends OPEN of file X with >>>>>>>>>>>>>> OPEN4_SHARE_DENY_WRITE >>>>>>>>>>>>>> which causes the >>>>>>>>>>>>>> server to check for conflicts with courtesy >>>>>>>>>>>>>> clients. >>>>>>>>>>>>>> The loop that >>>>>>>>>>>>>> checks 1026 courtesy clients for share/access >>>>>>>>>>>>>> conflict took less >>>>>>>>>>>>>> than 1 sec. But it took about 55 secs, on my >>>>>>>>>>>>>> VM, >>>>>>>>>>>>>> for >>>>>>>>>>>>>> the server >>>>>>>>>>>>>> to expire all 1026 courtesy clients. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I modified pynfs to configure the 4.0 RPC >>>>>>>>>>>>>> connection >>>>>>>>>>>>>> with 60 seconds >>>>>>>>>>>>>> timeout and OPEN18 now consistently passed. >>>>>>>>>>>>>> The >>>>>>>>>>>>>> 4.0 >>>>>>>>>>>>>> test results are >>>>>>>>>>>>>> now the same for courteous and non-courteous >>>>>>>>>>>>>> server: >>>>>>>>>>>>>> >>>>>>>>>>>>>> 8 Skipped, 1 Failed, 0 Warned, 577 Passed >>>>>>>>>>>>>> >>>>>>>>>>>>>> Note that 4.1 tests do not suffer this >>>>>>>>>>>>>> timeout >>>>>>>>>>>>>> problem because the >>>>>>>>>>>>>> 4.1 clients and sessions are destroyed after >>>>>>>>>>>>>> each >>>>>>>>>>>>>> test completes. >>>>>>>>>>>>> Do you want me to send the patch to increase >>>>>>>>>>>>> the >>>>>>>>>>>>> timeout for pynfs? >>>>>>>>>>>>> or is there any other things you think we >>>>>>>>>>>>> should >>>>>>>>>>>>> do? >>>>>>>>>>>> I don't know. >>>>>>>>>>>> >>>>>>>>>>>> 55 seconds to clean up 1026 clients is about 50ms >>>>>>>>>>>> per >>>>>>>>>>>> client, which is >>>>>>>>>>>> pretty slow. I wonder why. I guess it's >>>>>>>>>>>> probably >>>>>>>>>>>> updating the stable >>>>>>>>>>>> storage information. Is /var/lib/nfs/ on your >>>>>>>>>>>> server >>>>>>>>>>>> backed by a hard >>>>>>>>>>>> drive or an SSD or something else? >>>>>>>>>>> My server is a virtualbox VM that has 1 CPU, 4GB >>>>>>>>>>> RAM >>>>>>>>>>> and >>>>>>>>>>> 64GB of hard >>>>>>>>>>> disk. I think a production system that supports >>>>>>>>>>> this >>>>>>>>>>> many >>>>>>>>>>> clients should >>>>>>>>>>> have faster CPUs, faster storage. >>>>>>>>>>> >>>>>>>>>>>> I wonder if that's an argument for limiting the >>>>>>>>>>>> number of >>>>>>>>>>>> courtesy >>>>>>>>>>>> clients. >>>>>>>>>>> I think we might want to treat 4.0 clients a bit >>>>>>>>>>> different >>>>>>>>>>> from 4.1 >>>>>>>>>>> clients. With 4.0, every client will become a >>>>>>>>>>> courtesy >>>>>>>>>>> client after >>>>>>>>>>> the client is done with the export and unmounts it. >>>>>>>>>> It should be safe for a server to purge a client's >>>>>>>>>> lease >>>>>>>>>> immediately >>>>>>>>>> if there is no open or lock state associated with it. >>>>>>>>> In this case, each client has opened files so there are >>>>>>>>> open >>>>>>>>> states >>>>>>>>> associated with them. >>>>>>>>> >>>>>>>>>> When an NFSv4.0 client unmounts, all files should be >>>>>>>>>> closed >>>>>>>>>> at that >>>>>>>>>> point, >>>>>>>>> I'm not sure pynfs does proper clean up after each >>>>>>>>> subtest, >>>>>>>>> I >>>>>>>>> will >>>>>>>>> check. There must be state associated with the client >>>>>>>>> in >>>>>>>>> order >>>>>>>>> for >>>>>>>>> it to become courtesy client. >>>>>>>> Makes sense. Then a synthetic client like pynfs can DoS a >>>>>>>> courteous >>>>>>>> server. >>>>>>>> >>>>>>>> >>>>>>>>>> so the server can wait for the lease to expire and >>>>>>>>>> purge >>>>>>>>>> it >>>>>>>>>> normally. Or am I missing something? >>>>>>>>> When 4.0 client lease expires and there are still >>>>>>>>> states >>>>>>>>> associated >>>>>>>>> with the client then the server allows this client to >>>>>>>>> become >>>>>>>>> courtesy >>>>>>>>> client. >>>>>>>> I think the same thing happens if an NFSv4.1 client >>>>>>>> neglects >>>>>>>> to >>>>>>>> send >>>>>>>> DESTROY_SESSION / DESTROY_CLIENTID. Either such a client >>>>>>>> is >>>>>>>> broken >>>>>>>> or malicious, but the server faces the same issue of >>>>>>>> protecting >>>>>>>> itself from a DoS attack. >>>>>>>> >>>>>>>> IMO you should consider limiting the number of courteous >>>>>>>> clients >>>>>>>> the server can hold onto. Let's say that number is 1000. >>>>>>>> When >>>>>>>> the >>>>>>>> server wants to turn a 1001st client into a courteous >>>>>>>> client, >>>>>>>> it >>>>>>>> can simply expire and purge the oldest courteous client >>>>>>>> on >>>>>>>> its >>>>>>>> list. Otherwise, over time, the 24-hour expiry will >>>>>>>> reduce >>>>>>>> the >>>>>>>> set of courteous clients back to zero. >>>>>>>> >>>>>>>> What do you think? >>>>>>> Limiting the number of courteous clients to handle the >>>>>>> cases of >>>>>>> broken/malicious 4.1 clients seems reasonable as the last >>>>>>> resort. >>>>>>> >>>>>>> I think if a malicious 4.1 clients could mount the server's >>>>>>> export, >>>>>>> opens a file (to create state) and repeats the same with a >>>>>>> different >>>>>>> client id then it seems like some basic security was >>>>>>> already >>>>>>> broken; >>>>>>> allowing unauthorized clients to mount server's exports. >>>>>> You can do this today with AUTH_SYS. I consider it a genuine >>>>>> attack >>>>>> surface. >>>>>> >>>>>> >>>>>>> I think if we have to enforce a limit, then it's only for >>>>>>> handling >>>>>>> of seriously buggy 4.1 clients which should not be the >>>>>>> norm. >>>>>>> The >>>>>>> issue with this is how to pick an optimal number that is >>>>>>> suitable >>>>>>> for the running server which can be a very slow or a very >>>>>>> fast >>>>>>> server. >>>>>>> >>>>>>> Note that even if we impose an limit, that does not >>>>>>> completely >>>>>>> solve >>>>>>> the problem with pynfs 4.0 test since its RPC timeout is >>>>>>> configured >>>>>>> with 15 secs which just enough to expire 277 clients based >>>>>>> on >>>>>>> 53ms >>>>>>> for each client, unless we limit it ~270 clients which I >>>>>>> think >>>>>>> it's >>>>>>> too low. >>>>>>> >>>>>>> This is what I plan to do: >>>>>>> >>>>>>> 1. do not support 4.0 courteous clients, for sure. >>>>>> Not supporting 4.0 isn’t an option, IMHO. It is a fully >>>>>> supported >>>>>> protocol at this time, and the same exposure exists for 4.1, >>>>>> it’s >>>>>> just a little harder to exploit. >>>>>> >>>>>> If you submit the courteous server patch without support for >>>>>> 4.0, >>>>>> I >>>>>> think it needs to include a plan for how 4.0 will be added >>>>>> later. >>>>>> >>>>> Why is there a problem here? The requirements are the same for >>>>> 4.0 >>>>> and >>>>> 4.1 (or 4.2). If the lease under which the courtesy lock was >>>>> established has expired, then that courtesy lock must be >>>>> released >>>>> if >>>>> some other client requests a lock that conflicts with the >>>>> cached >>>>> lock >>>>> (unless the client breaks the courtesy framework by renewing >>>>> that >>>>> original lease before the conflict occurs). Otherwise, it is >>>>> completely >>>>> up to the server when it decides to actually release the lock. >>>>> >>>>> For NFSv4.1 and NFSv4.2, we have DESTROY_CLIENTID, which tells >>>>> the >>>>> server when the client is actually done with the lease, making >>>>> it >>>>> easy >>>>> to determine when it is safe to release all the courtesy locks. >>>>> However >>>>> if the client does not send DESTROY_CLIENTID, then we're in the >>>>> same >>>>> situation with 4.x (x>0) as we would be with bog standard >>>>> NFSv4.0. >>>>> The >>>>> lease has expired, and so the courtesy locks are liable to >>>>> being >>>>> dropped. >>>> I agree the situation is the same for all minor versions. >>>> >>>> >>>>> At Hammerspace we have implemented courtesy locks, and our >>>>> strategy >>>>> is >>>>> that when a conflict occurs, we drop the entire set of courtesy >>>>> locks >>>>> so that we don't have to deal with the "some locks were >>>>> revoked" >>>>> scenario. The reason is that when we originally implemented >>>>> courtesy >>>>> locks, the Linux NFSv4 client support for lock revocation was a >>>>> lot >>>>> less sophisticated than today. My suggestion is that you might >>>>> therefore consider starting along this path, and then refining >>>>> the >>>>> support to make revocation more nuanced once you are confident >>>>> that >>>>> the >>>>> coarser strategy is working as expected. >>>> Dai’s implementation does all that, and takes the coarser >>>> approach at >>>> the moment. There are plans to explore the more nuanced behavior >>>> (by >>>> revoking only the conflicting lock instead of dropping the whole >>>> lease) after this initial work is merged. >>>> >>>> The issue is there are certain pathological client behaviors >>>> (whether >>>> malicious or accidental) that can run the server out of >>>> resources, >>>> since it is holding onto lease state for a much longer time. We >>>> are >>>> simply trying to design a lease garbage collection scheme to meet >>>> that challenge. >>>> >>>> I think limiting the number of courteous clients is a simple way >>>> to >>>> do this, but we could also shorten the courtesy lifetime as more >>>> clients enter that state, to ensure that they don’t overrun the >>>> server’s memory. Another approach might be to add a shrinker that >>>> purges the oldest courteous clients when the server comes under >>>> memory pressure. >>>> >>>> >>> We already have a scanner that tries to release all client state >>> after >>> 1 lease period. Just extend that to do it after 10 lease periods. >>> If a >>> network partition hasn't recovered after 10 minutes, you probably >>> have >>> bigger problems. >> Currently the courteous server allows 24hr for the network partition >> to >> heal before releasing all client state. That seems to be excessive >> but >> it was suggested for longer network partition conditions when >> switch/routers >> being repaired/upgraded. >> >>> You can limit the number of clients as well, but that leads into a >>> rats >>> nest of other issues that have nothing to do with courtesy locks >>> and >>> everything to do with the fact that any client can hold a lot of >>> state. >> The issue we currently have with courteous server and pynfs 4.0 tests >> is the number of courteous 4.0 clients the server has to expire when >> a >> share reservation conflict occurs when servicing the OPEN. Each >> client >> owns only few state in this case so we think the server spent most >> time >> for deleting client's record in /var/lib/nfs. This is why we plan to >> limit the number of courteous clients for now. As a side effect, it >> might >> also help to reduce resource consumption too. > Then kick off a thread or work item to do that asynchronously in the > background, and return NFS4ERR_DELAY to the clients that were trying to > grab locks in the meantime. Thanks Trond, I think this is a reasonable approach. The behavior would be similar to a delegation recall during the OPEN. My plan is: 1. If the number of conflict clients is less than 100 (some numbers that cover realistic usage) then release all their state synchronously in the OPEN call, and returns NFS4_OK to the NFS client. Most of conflicts should be handled by this case. 2. If the number of conflict clients is more than 100 then release the state of the 1st 100 clients as in (1) and trigger the laundromat thread to release state of the rest of the conflict clients, and return NFS4ERR_DELAY to the NFS client. This should be a rare condition. -Dai > > The above process is hardly just confined to NFSv4.0 clients. If there > is a network partition, then the exact same record deleting needs to be > applied to all NFSv4.1 and NFSv4.2 clients that hold locks and are > unable to renew their leases, so you might as well make it work for > everyone. >
On Tue, Nov 30, 2021 at 07:52:10PM -0800, dai.ngo@oracle.com wrote: > On 11/30/21 5:37 AM, Trond Myklebust wrote: > >Then kick off a thread or work item to do that asynchronously in the > >background, and return NFS4ERR_DELAY to the clients that were trying to > >grab locks in the meantime. > > Thanks Trond, I think this is a reasonable approach. The behavior would > be similar to a delegation recall during the OPEN. > > My plan is: > > 1. If the number of conflict clients is less than 100 (some numbers that > cover realistic usage) then release all their state synchronously in > the OPEN call, and returns NFS4_OK to the NFS client. Most of conflicts > should be handled by this case. > > 2. If the number of conflict clients is more than 100 then release the > state of the 1st 100 clients as in (1) and trigger the laundromat thread > to release state of the rest of the conflict clients, and return > NFS4ERR_DELAY to the NFS client. This should be a rare condition. Honestly, conflict with a courtesy client is itself not going to be that common, so personally I'd start simple and handle everything with the asynchronous approach. --b.
On Tue, Nov 30, 2021 at 07:50:13PM -0800, dai.ngo@oracle.com wrote: > > On 11/30/21 7:32 AM, Bruce Fields wrote: > >On Mon, Nov 29, 2021 at 11:13:34PM -0800, dai.ngo@oracle.com wrote: > >>Just to be clear, the problem of pynfs with 4.0 is that the server takes > >>~55 secs to expire 1026 4.0 courteous clients, which comes out to ~50ms > >>per client. This causes the test to time out in waiting for RPC reply of > >>the OPEN that triggers the conflicts. > >> > >>I don't know exactly where the time spent in the process of expiring a > >>client. But as Bruce mentioned, it could be related to the time to access > >>/var/lib/nfs to remove the client's persistent record. > >Could you try something like > > > > strace -r -$(pidof) -oTRACE Oops, I mean $(pidof nfsdcld). But, your system isn't using that: > > Strace does not have any info that shows where the server spent time when > expiring client state. The client record is removed by nfsd4_umh_cltrack_remove > doing upcall to user space helper /sbin/nfsdcltrack to do the job. I used > the low-tech debug tool, printk, to measure the time spent by > nfsd4_client_record_remove. Here is a sample of the output, START and END > are in milliseconds: > > Nov 30 12:31:04 localhost kernel: nfsd4_client_record_remove: START [0x15d418] clp[ffff888119206040] client_tracking_ops[ffffffffa04bc2e0] > Nov 30 12:31:04 localhost kernel: nfsd4_client_record_remove: END [0x15d459] clp[ffff888119206040] client_tracking_ops[ffffffffa04bc2e0] > Nov 30 12:31:04 localhost kernel: nfsd4_client_record_remove: START [0x15d461] clp[ffff888119206740] client_tracking_ops[ffffffffa04bc2e0] > Nov 30 12:31:04 localhost kernel: nfsd4_client_record_remove: END [0x15d48e] clp[ffff888119206740] client_tracking_ops[ffffffffa04bc2e0] > Nov 30 12:31:04 localhost kernel: nfsd4_client_record_remove: START [0x15d49c] clp[ffff88811b54e000] client_tracking_ops[ffffffffa04bc2e0] > Nov 30 12:31:04 localhost kernel: nfsd4_client_record_remove: END [0x15d4c5] clp[ffff88811b54e000] client_tracking_ops[ffffffffa04bc2e0] > > The average time spent to remove the client record is about ~50ms, matches > with the time reported by pynfs test. This confirms what Bruce suspected > earlier. OK, good to know. It'd be interesting to dig into where nfsdcltrack is spending its time, which we could do by replacing it with a wrapper that runs the real nfsdcltrack under strace. Though maybe it'd be better to do this on a system using nfsdcld, since that's what we're transitioning to. --b.
Do you have a public git tree with your latest patches? --b.
On Wed, Dec 01, 2021 at 09:36:30AM -0500, Bruce Fields wrote: > OK, good to know. It'd be interesting to dig into where nfsdcltrack is > spending its time, which we could do by replacing it with a wrapper that > runs the real nfsdcltrack under strace. > > Though maybe it'd be better to do this on a system using nfsdcld, since > that's what we're transitioning to. Trying that on a test VM here, I see each upcall doing 3 fdatasyncs() of an sqlite-journal file. On my setup, each of those is taking a few milliseconds. I wonder if it an do better. --b.
On Wed, Dec 01, 2021 at 12:42:05PM -0500, Bruce Fields wrote: > On Wed, Dec 01, 2021 at 09:36:30AM -0500, Bruce Fields wrote: > > OK, good to know. It'd be interesting to dig into where nfsdcltrack is > > spending its time, which we could do by replacing it with a wrapper that > > runs the real nfsdcltrack under strace. > > > > Though maybe it'd be better to do this on a system using nfsdcld, since > > that's what we're transitioning to. > > Trying that on a test VM here, I see each upcall doing 3 fdatasyncs() of > an sqlite-journal file. On my setup, each of those is taking a few > milliseconds. I wonder if it an do better. If I understand the sqlite documentation correctly, I *think* that if we use journal_mode WAL with synchronous FULL, we should get the assurances nfsd needs with one sync per transaction. --b.
On 12/1/21 6:51 AM, Bruce Fields wrote: > Do you have a public git tree with your latest patches? No, I don't but I can push it to Chuck's public tree. I need to prepare the patch. -Dai > > --b.
On Wed, Dec 01, 2021 at 10:47:28AM -0800, dai.ngo@oracle.com wrote: > > On 12/1/21 6:51 AM, Bruce Fields wrote: > >Do you have a public git tree with your latest patches? > > No, I don't but I can push it to Chuck's public tree. I need to prepare the patch. OK, it's not a big deal.--b.
On Wed, Dec 01, 2021 at 01:03:39PM -0500, Bruce Fields wrote: > On Wed, Dec 01, 2021 at 12:42:05PM -0500, Bruce Fields wrote: > > On Wed, Dec 01, 2021 at 09:36:30AM -0500, Bruce Fields wrote: > > > OK, good to know. It'd be interesting to dig into where nfsdcltrack is > > > spending its time, which we could do by replacing it with a wrapper that > > > runs the real nfsdcltrack under strace. > > > > > > Though maybe it'd be better to do this on a system using nfsdcld, since > > > that's what we're transitioning to. > > > > Trying that on a test VM here, I see each upcall doing 3 fdatasyncs() of > > an sqlite-journal file. On my setup, each of those is taking a few > > milliseconds. I wonder if it an do better. > > If I understand the sqlite documentation correctly, I *think* that if we > use journal_mode WAL with synchronous FULL, we should get the assurances > nfsd needs with one sync per transaction. So I *think* that would mean just doing something like (untested, don't have much idea what I'm doing): diff --git a/utils/nfsdcld/sqlite.c b/utils/nfsdcld/sqlite.c index 03016fb95823..b30f2614497b 100644 --- a/utils/nfsdcld/sqlite.c +++ b/utils/nfsdcld/sqlite.c @@ -826,6 +826,13 @@ sqlite_prepare_dbh(const char *topdir) goto out_close; } + ret = sqlite3_exec(dbh, "PRAGMA journal_mode = WAL;", NULL, NULL, NULL); + if (ret) + goto out_close; + ret = sqlite3_exec(dbh, "PRAGMA synchronous = FULL;", NULL, NULL, NULL); + if (ret) + goto out_close; + ret = sqlite_query_schema_version(); switch (ret) { case CLD_SQLITE_LATEST_SCHEMA_VERSION: I also wonder how expensive may be the extra overhead of starting up nfsdcltrack each time. --b.
> On Dec 1, 2021, at 9:51 AM, Bruce Fields <bfields@fieldses.org> wrote: > > Do you have a public git tree with your latest patches? > > --b. Dai's patches have been pushed to the nfsd-courteous-server topic branch at git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git I can fold them into my for-next branch if we agree they are ready for broader test exposure. -- Chuck Lever
On Wed, Dec 01, 2021 at 02:50:50PM -0500, Bruce Fields wrote: > On Wed, Dec 01, 2021 at 01:03:39PM -0500, Bruce Fields wrote: > > On Wed, Dec 01, 2021 at 12:42:05PM -0500, Bruce Fields wrote: > > > On Wed, Dec 01, 2021 at 09:36:30AM -0500, Bruce Fields wrote: > > > > OK, good to know. It'd be interesting to dig into where nfsdcltrack is > > > > spending its time, which we could do by replacing it with a wrapper that > > > > runs the real nfsdcltrack under strace. > > > > > > > > Though maybe it'd be better to do this on a system using nfsdcld, since > > > > that's what we're transitioning to. > > > > > > Trying that on a test VM here, I see each upcall doing 3 fdatasyncs() of > > > an sqlite-journal file. On my setup, each of those is taking a few > > > milliseconds. I wonder if it an do better. > > > > If I understand the sqlite documentation correctly, I *think* that if we > > use journal_mode WAL with synchronous FULL, we should get the assurances > > nfsd needs with one sync per transaction. > > So I *think* that would mean just doing something like (untested, don't have > much idea what I'm doing): OK, tried that out on my test VM, and: yes, the resulting strace was much simpler (and, in particular, had only one fdatasync per upcall instead of 3), and total time to expire 1000 courtesy clients was 6.5 seconds instead of 15.9. So, I'll clean up that patch and pass it along to Steve D. This is all a bit of a derail, I know, but I suspect this will be a bottleneck in other cases too, like when a lot of clients are reclaiming after reboot. We do need nfsdcld to sync to disk before returning to the kernel, so this probably can't be further optimized without doing something more complicated to allow some kind of parallelism and batching. So if you have a ton of clients you'll just need /var/lib/nfs to be on low-latency storage. --b. > > diff --git a/utils/nfsdcld/sqlite.c b/utils/nfsdcld/sqlite.c > index 03016fb95823..b30f2614497b 100644 > --- a/utils/nfsdcld/sqlite.c > +++ b/utils/nfsdcld/sqlite.c > @@ -826,6 +826,13 @@ sqlite_prepare_dbh(const char *topdir) > goto out_close; > } > > + ret = sqlite3_exec(dbh, "PRAGMA journal_mode = WAL;", NULL, NULL, NULL); > + if (ret) > + goto out_close; > + ret = sqlite3_exec(dbh, "PRAGMA synchronous = FULL;", NULL, NULL, NULL); > + if (ret) > + goto out_close; > + > ret = sqlite_query_schema_version(); > switch (ret) { > case CLD_SQLITE_LATEST_SCHEMA_VERSION: > > I also wonder how expensive may be the extra overhead of starting up > nfsdcltrack each time. > > --b.