[RFC-PATCH] nfsd: when unhashing openowners, increment openowner's refcount
diff mbox series

Message ID 1566406146-7887-1-git-send-email-alex@zadara.com
State New
Headers show
Series
  • [RFC-PATCH] nfsd: when unhashing openowners, increment openowner's refcount
Related show

Commit Message

Alex Lyakas Aug. 21, 2019, 4:49 p.m. UTC
release_openowner() expects an extra refcnt taken for the openowner,
which it is releasing.

 With nfsd_inject_forget_client_openowners() and nfsd_inject_forget_openowners(),
we unhash openowners and collect them into a reaplist. Later we call
nfsd_reap_openowners(), which calls release_openowner(), which releases all openowner's stateids.
Each OPEN stateid holds a refcnt on the openowner. Therefore, after releasing
the last OPEN stateid via its sc_free function, which is nfs4_free_ol_stateid,
nfs4_put_stateowner() will be called, which will realize its the last
refcnt for the openowner. As a result, openowner will be freed.
But later, release_openowner() will go ahead and call release_last_closed_stateid()
and nfs4_put_stateowner() on the same openowner which was just released.
This corrupts memory and causes random crashes.

After we fixed this, we confirmed that the openowner is not freed
prematurely. It is freed by release_openowner() final call
to nfs4_put_stateowner().

However, we still get (other) random crashes and memory corruptions
when nfsd_inject_forget_client_openowners() and
nfsd_inject_forget_openowners().
According to our analysis, we don't see any other refcount issues.
Can anybody from the community review these flows for other potentials issues?

Signed-off-by: Alex Lyakas <alex@zadara.com>
---
 fs/nfsd/nfs4state.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Alex Lyakas Aug. 25, 2019, 10:12 a.m. UTC | #1
Hi Bruce, Chuck,

You are listed as maintainers of nfsd. Can you please take a look at
the below patch?

Thanks,
Alex.

On Wed, Aug 21, 2019 at 7:50 PM Alex Lyakas <alex@zadara.com> wrote:
>
> release_openowner() expects an extra refcnt taken for the openowner,
> which it is releasing.
>
>  With nfsd_inject_forget_client_openowners() and nfsd_inject_forget_openowners(),
> we unhash openowners and collect them into a reaplist. Later we call
> nfsd_reap_openowners(), which calls release_openowner(), which releases all openowner's stateids.
> Each OPEN stateid holds a refcnt on the openowner. Therefore, after releasing
> the last OPEN stateid via its sc_free function, which is nfs4_free_ol_stateid,
> nfs4_put_stateowner() will be called, which will realize its the last
> refcnt for the openowner. As a result, openowner will be freed.
> But later, release_openowner() will go ahead and call release_last_closed_stateid()
> and nfs4_put_stateowner() on the same openowner which was just released.
> This corrupts memory and causes random crashes.
>
> After we fixed this, we confirmed that the openowner is not freed
> prematurely. It is freed by release_openowner() final call
> to nfs4_put_stateowner().
>
> However, we still get (other) random crashes and memory corruptions
> when nfsd_inject_forget_client_openowners() and
> nfsd_inject_forget_openowners().
> According to our analysis, we don't see any other refcount issues.
> Can anybody from the community review these flows for other potentials issues?
>
> Signed-off-by: Alex Lyakas <alex@zadara.com>
> ---
>  fs/nfsd/nfs4state.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 7857942..4e9afca 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -7251,6 +7251,7 @@ static u64 nfsd_foreach_client_lock(struct nfs4_client *clp, u64 max,
>                         func(oop);
>                         if (collect) {
>                                 atomic_inc(&clp->cl_rpc_users);
> +                               nfs4_get_stateowner(&oop->oo_owner);
>                                 list_add(&oop->oo_perclient, collect);
>                         }
>                 }
> --
> 1.9.1
>
J . Bruce Fields Aug. 26, 2019, 1:39 p.m. UTC | #2
On Sun, Aug 25, 2019 at 01:12:34PM +0300, Alex Lyakas wrote:
> You are listed as maintainers of nfsd. Can you please take a look at
> the below patch?

Thanks!

I take it this was found by some kind of code analysis or fuzzing, not
use in production?

Asking because I've been considering just deprecating it, so:

> > After we fixed this, we confirmed that the openowner is not freed
> > prematurely. It is freed by release_openowner() final call
> > to nfs4_put_stateowner().
> >
> > However, we still get (other) random crashes and memory corruptions
> > when nfsd_inject_forget_client_openowners() and
> > nfsd_inject_forget_openowners().
> > According to our analysis, we don't see any other refcount issues.
> > Can anybody from the community review these flows for other potentials issues?

I'm wondering how much effort we want to put into tracking all that
down.

--b.
J . Bruce Fields Aug. 26, 2019, 2:38 p.m. UTC | #3
On Mon, Aug 26, 2019 at 09:39:51AM -0400, J. Bruce Fields wrote:
> On Sun, Aug 25, 2019 at 01:12:34PM +0300, Alex Lyakas wrote:
> > You are listed as maintainers of nfsd. Can you please take a look at
> > the below patch?
> 
> Thanks!
> 
> I take it this was found by some kind of code analysis or fuzzing, not
> use in production?
> 
> Asking because I've been considering just deprecating it, so:

So, unless someone objects I'd like to queue this up for 5.4.

--b.

commit 9d60d93198c6
Author: J. Bruce Fields <bfields@redhat.com>
Date:   Mon Aug 26 10:28:58 2019 -0400

    Deprecate nfsd fault injection
    
    This is only useful for client testing.  I haven't really maintained it,
    and reference counting and locking are wrong at this point.  You can get
    some of the same functionality now from nfsd/clients/.
    
    It was a good idea but I think its time has passed.
    
    In the unlikely event of users, hopefully the BROKEN dependency will
    prompt them to speak up.  Otherwise I expect to remove it soon.
    
    Reported-by: Alex Lyakas <alex@zadara.com>
    Signed-off-by: J. Bruce Fields <bfields@redhat.com>

diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig
index bff8456220e0..10cefb0c07c7 100644
--- a/fs/nfsd/Kconfig
+++ b/fs/nfsd/Kconfig
@@ -148,7 +148,7 @@ config NFSD_V4_SECURITY_LABEL
 
 config NFSD_FAULT_INJECTION
 	bool "NFS server manual fault injection"
-	depends on NFSD_V4 && DEBUG_KERNEL && DEBUG_FS
+	depends on NFSD_V4 && DEBUG_KERNEL && DEBUG_FS && BROKEN
 	help
 	  This option enables support for manually injecting faults
 	  into the NFS server.  This is intended to be used for
Alex Lyakas Aug. 27, 2019, 9:05 a.m. UTC | #4
Hi Bruce,

Thank you for your response.

On Mon, Aug 26, 2019 at 4:39 PM J. Bruce Fields <bfields@fieldses.org> wrote:
>
> On Sun, Aug 25, 2019 at 01:12:34PM +0300, Alex Lyakas wrote:
> > You are listed as maintainers of nfsd. Can you please take a look at
> > the below patch?
>
> Thanks!
>
> I take it this was found by some kind of code analysis or fuzzing, not
> use in production?

We are hitting the following issue in production quite frequently:
- We create two local file systems FS1 and FS2 on the server machine S
- We export both FS1 and FS2 through nfsd to the same nfs client,
running on client machine C
- On C, we mount both exported file systems and start writing files to
both of them
- After few minutes, on server machine S, we un-export FS1 only. We
don't unmount FS1 on the client machine C prior to un-exporting. Also,
FS2 remains exported to C.
- We want to unmount FS1 on the server machine S, but we fail, because
there are still open files on FS1 by nfsd.

Debugging this issue showed the following root cause: we have a
nfs4_client entry for the client C. This entry has two
nfs4_openowners, for FS1 and FS2, although FS1 was un-exported.
Looking at the stateids of both openowners, we see that they have
stateids of kind NFS4_OPEN_STID, and each stateid is holding a
nfs4_file. The reason we cannot unmount FS1, is because we still have
an openowner for FS1, holding open-stateids, which hold open files on
FS1.

The laundromat doesn't help in this case, because it can only decide
per-nfs4_client that it should be purged. But in this case, since FS2
is still exported to C, there is no reason to purge the nfs4_client.

This situation remains until we un-export FS2 as well. Then the whole
nfs4_client is purged, and all the files get closed, and we can
unmount both FS1 and FS2.

We started looking around, and we found the failure injection code
that can "forget openowners". We wrote some custom code that allows us
to select the openowner which is not needed anymore. And then we
unhash this openowner, we unhash and close all of its stateids. Then
the files get closed, and we can unmount FS1.

Is the described issue familiar to you? It is very easily
reproducible. What is the way to solve it? To our understanding, if we
un-export a FS from nfsd, we should be able to unmount it.

For example, can we introduce a sysfs or procfs entry that will list
all clients and openowners. Then we add another sysfs entry allowing
the user to "forget" a particular openowner? If you feel this is the
way to move forward, we can try to provide patches for review.

Thanks,
Alex.







>
> Asking because I've been considering just deprecating it, so:
>
> > > After we fixed this, we confirmed that the openowner is not freed
> > > prematurely. It is freed by release_openowner() final call
> > > to nfs4_put_stateowner().
> > >
> > > However, we still get (other) random crashes and memory corruptions
> > > when nfsd_inject_forget_client_openowners() and
> > > nfsd_inject_forget_openowners().
> > > According to our analysis, we don't see any other refcount issues.
> > > Can anybody from the community review these flows for other potentials issues?
>
> I'm wondering how much effort we want to put into tracking all that
> down.
>
> --b.
J . Bruce Fields Aug. 27, 2019, 8:51 p.m. UTC | #5
On Tue, Aug 27, 2019 at 12:05:28PM +0300, Alex Lyakas wrote:
> Is the described issue familiar to you?

Yep, got it, but I haven't seen anyone try to solve it using the fault
injection code, that's interesting!

There's also fs/nfsd/unlock_filesystem.  It only unlocks NLM (NFSv3)
locks.  But it'd probably be reasonable to teach it to get NFSv4 state
too (locks, opens, delegations, and layouts).

But my feeling's always been that the cleanest way to do it is to create
two containers with separate net namespaces and run nfsd in both of
them.  You can start and stop the servers in the different containers
independently.

> It is very easily reproducible. What is the way to solve it? To our
> understanding, if we un-export a FS from nfsd, we should be able to
> unmount it.

Unexporting has never removed locks or opens or other state, for what
it's worth.

--b.
Alex Lyakas Aug. 28, 2019, 3:20 p.m. UTC | #6
Hi Bruce,

Thanks for the clarifications.

On Tue, Aug 27, 2019 at 11:51 PM J. Bruce Fields <bfields@fieldses.org> wrote:
>
> On Tue, Aug 27, 2019 at 12:05:28PM +0300, Alex Lyakas wrote:
> > Is the described issue familiar to you?
>
> Yep, got it, but I haven't seen anyone try to solve it using the fault
> injection code, that's interesting!
>
> There's also fs/nfsd/unlock_filesystem.  It only unlocks NLM (NFSv3)
> locks.  But it'd probably be reasonable to teach it to get NFSv4 state
> too (locks, opens, delegations, and layouts).
>
> But my feeling's always been that the cleanest way to do it is to create
> two containers with separate net namespaces and run nfsd in both of
> them.  You can start and stop the servers in the different containers
> independently.

I am looking at the code, and currently nfsd creates a single
namespace subsystem in init_nfsd. All nfs4_clients run in this
subsystem.

So the proposal is to use register_pernet_subsys() for every
filesystem that is exported? I presume that current nfsd code cannot
do this, and some rework is required to move away from a single
subsystem to per-export subsystem. Also, grepping through kernel code,
I see that namespace subsystems are created by different modules as
part of module initialization, rather than doing that dynamically.
Furthermore, in our case the same nfsd machine S can export tens or
even hundreds of local filesystems.Is this fine to have hundreds of
subsystems?

Otherwise, I understand that the current behavior is a "won't fix",
and it is expected for the client machine to unmount the export before
un-exporting the file system at nfsd machine. Is this correct?

Thanks,
Alex.


>
> > It is very easily reproducible. What is the way to solve it? To our
> > understanding, if we un-export a FS from nfsd, we should be able to
> > unmount it.
>
> Unexporting has never removed locks or opens or other state, for what
> it's worth.
>
> --b.
J . Bruce Fields Aug. 28, 2019, 4:54 p.m. UTC | #7
On Wed, Aug 28, 2019 at 06:20:22PM +0300, Alex Lyakas wrote:
> On Tue, Aug 27, 2019 at 11:51 PM J. Bruce Fields <bfields@fieldses.org> wrote:
> >
> > On Tue, Aug 27, 2019 at 12:05:28PM +0300, Alex Lyakas wrote:
> > > Is the described issue familiar to you?
> >
> > Yep, got it, but I haven't seen anyone try to solve it using the fault
> > injection code, that's interesting!
> >
> > There's also fs/nfsd/unlock_filesystem.  It only unlocks NLM (NFSv3)
> > locks.  But it'd probably be reasonable to teach it to get NFSv4 state
> > too (locks, opens, delegations, and layouts).
> >
> > But my feeling's always been that the cleanest way to do it is to create
> > two containers with separate net namespaces and run nfsd in both of
> > them.  You can start and stop the servers in the different containers
> > independently.
> 
> I am looking at the code, and currently nfsd creates a single
> namespace subsystem in init_nfsd. All nfs4_clients run in this
> subsystem.
> 
> So the proposal is to use register_pernet_subsys() for every
> filesystem that is exported?

No, I'm proposing any krenel changes.  Just create separate net
namespaces from userspace and start nfsd from within them.  And you'll
also need to arrange for them different nfsd's to get different exports.

In practice, the best way to do this may be using some container
management service, I'm not sure.

> I presume that current nfsd code cannot
> do this, and some rework is required to move away from a single
> subsystem to per-export subsystem. Also, grepping through kernel code,
> I see that namespace subsystems are created by different modules as
> part of module initialization, rather than doing that dynamically.
> Furthermore, in our case the same nfsd machine S can export tens or
> even hundreds of local filesystems.Is this fine to have hundreds of
> subsystems?

I haven't done it myself, but I suspect hundreds of containers should be
OK.  It may depend on available resources, of course.

> Otherwise, I understand that the current behavior is a "won't fix",
> and it is expected for the client machine to unmount the export before
> un-exporting the file system at nfsd machine. Is this correct?

You're definitely not the only ones to request this, so I'd like to have
a working solution.

My preference would be to try the namespace/container approach first.
And if that turns out no to work well for some reason, to update
fs/nfsd/unlock_filesystem to handle NFSv4 stuff.

The fault injection code isn't the right interface for this.  Even if we
did decide it was worth fixing up and maintaining--it's really only
designed for testing clients.  I'd expect distros not to build it in
their default kernels.

--b.
Alex Lyakas Aug. 29, 2019, 6:12 p.m. UTC | #8
Hi Bruce,

On Wed, Aug 28, 2019 at 7:54 PM J. Bruce Fields <bfields@fieldses.org> wrote:
>
> On Wed, Aug 28, 2019 at 06:20:22PM +0300, Alex Lyakas wrote:
> > On Tue, Aug 27, 2019 at 11:51 PM J. Bruce Fields <bfields@fieldses.org> wrote:
> > >
> > > On Tue, Aug 27, 2019 at 12:05:28PM +0300, Alex Lyakas wrote:
> > > > Is the described issue familiar to you?
> > >
> > > Yep, got it, but I haven't seen anyone try to solve it using the fault
> > > injection code, that's interesting!
> > >
> > > There's also fs/nfsd/unlock_filesystem.  It only unlocks NLM (NFSv3)
> > > locks.  But it'd probably be reasonable to teach it to get NFSv4 state
> > > too (locks, opens, delegations, and layouts).
> > >
> > > But my feeling's always been that the cleanest way to do it is to create
> > > two containers with separate net namespaces and run nfsd in both of
> > > them.  You can start and stop the servers in the different containers
> > > independently.
> >
> > I am looking at the code, and currently nfsd creates a single
> > namespace subsystem in init_nfsd. All nfs4_clients run in this
> > subsystem.
> >
> > So the proposal is to use register_pernet_subsys() for every
> > filesystem that is exported?
>
> No, I'm proposing any krenel changes.  Just create separate net
> namespaces from userspace and start nfsd from within them.  And you'll
> also need to arrange for them different nfsd's to get different exports.
>
> In practice, the best way to do this may be using some container
> management service, I'm not sure.
>
> > I presume that current nfsd code cannot
> > do this, and some rework is required to move away from a single
> > subsystem to per-export subsystem. Also, grepping through kernel code,
> > I see that namespace subsystems are created by different modules as
> > part of module initialization, rather than doing that dynamically.
> > Furthermore, in our case the same nfsd machine S can export tens or
> > even hundreds of local filesystems.Is this fine to have hundreds of
> > subsystems?
>
> I haven't done it myself, but I suspect hundreds of containers should be
> OK.  It may depend on available resources, of course.
>
> > Otherwise, I understand that the current behavior is a "won't fix",
> > and it is expected for the client machine to unmount the export before
> > un-exporting the file system at nfsd machine. Is this correct?
>
> You're definitely not the only ones to request this, so I'd like to have
> a working solution.
>
> My preference would be to try the namespace/container approach first.
> And if that turns out no to work well for some reason, to update
> fs/nfsd/unlock_filesystem to handle NFSv4 stuff.
>
> The fault injection code isn't the right interface for this.  Even if we
> did decide it was worth fixing up and maintaining--it's really only
> designed for testing clients.  I'd expect distros not to build it in
> their default kernels.

Hi Bruce,

We evaluated the network namespaces approach. But, unfortunately, it
doesn't fit easily into how our system is currently structured. We
would have to create and configure interfaces for every namespace, and
have a separate IP address (presumably) for every namespace. All this
seems a bit of an overkill, to just have several local filesystems
exported to the same client (which is when we hit the issue). I would
assume that some other users would argue as well that creating a
separate network namespace for every local filesystem is not the way
to go from the administration point of view.

Regarding the failure injection code, we did not actually enable and
use it. We instead wrote some custom code that is highly modeled after
the failure injection code. And that's how we found the openowner
refcnt issue, from which this thread started. Similarly to the failure
injection code, our code iterates over all the clients, all the
openowners, all the OPEN-stateids and the opened files. Then for each
opened file, it decides whether this file belongs to the file system
that has been just un-exported. To accomplish that, we added a custom
field to nfs4_file:
struct nfs4_file {
    atomic_t        fi_ref;
    spinlock_t        fi_lock;
    ...
#ifdef CONFIG_NFSD_ZADARA
    u64    fi_share_id; /* which exported FS this file belongs to; non-zero */
#endif /*CONFIG_NFSD_ZADARA*/
};

We also added some simple data structure that tracks all the exported
file systems, and assigns a unique "share_id" for each.In
find_or_add_file(), we consult the custom data structure that we
added, and assign a "fi_share_id" for every nfs4_file that is
newly-created.

The code that "forgets" all the relevant openowners looks like:

void forget_openowners_of_share_id(uint64 share_id)
{
    spin_lock(&nn->client_lock);
    list_for_each_entry(clp, &nn->client_lru, cl_lru) {
        spin_lock(&clp->cl_lock);
        list_for_each_entry_safe(oo, oo_next, &clp->cl_openowners,
oo_perclient) {
            list_for_each_entry(stp, &oo->oo_owner.so_stateids,
st_perstateowner) {
                if (stp->st_stid.sc_file->fi_share_id == share_id) {
                   // This file belongs to relevant share_id
                    ...
                }
            }
            if (openowner has only files from this "share_id") {
                // we don't expect same openowner to have files from
different share_ids
                nfs4_get_stateowner(&oo->oo_owner);// take extra refcnt
                unhash_openowner_locked(oo);
                atomic_inc(&clp->cl_refcount);
                list_add(&oo->oo_perclient, &openowner_reaplist);
            }
        }
        spin_unlock(&clp->cl_lock);
    }
    spin_unlock(&nn->client_lock);

    nfsd_reap_openowners(&reaplist);
}

So this code is very similar to the existing failure injection code.
It's just that the criteria for openowner selection is different.

Currently this code is invoked from a custom procfs entry, by
user-space application, before unmounting the local file system.

Would moving this code into the "unlock_filesystem" infrastructure be
acceptable? Since the "share_id" approach is very custom for our
usage, what criteria would you suggest for selecting the openowners to
be "forgotten"?

Thanks,
Alex.
J . Bruce Fields Aug. 30, 2019, 7:08 p.m. UTC | #9
On Thu, Aug 29, 2019 at 09:12:49PM +0300, Alex Lyakas wrote:
> We evaluated the network namespaces approach. But, unfortunately, it
> doesn't fit easily into how our system is currently structured. We
> would have to create and configure interfaces for every namespace, and
> have a separate IP address (presumably) for every namespace.

Yes.

> All this
> seems a bit of an overkill, to just have several local filesystems
> exported to the same client (which is when we hit the issue). I would
> assume that some other users would argue as well that creating a
> separate network namespace for every local filesystem is not the way
> to go from the administration point of view.

OK, makes sense.

And I take it you don't want to go around to each client and shut things
down cleanly.  And you're fine with the client applications erroring out
when you yank their filesystem out from underneath them.

(I wonder what happens these days when that happens on a linux client
when there are dirty pages.  I think you may just end up with a useless
mount that you can't get rid of till you power down the client.)

> Regarding the failure injection code, we did not actually enable and
> use it. We instead wrote some custom code that is highly modeled after
> the failure injection code.

Sounds interesting....  I'll try to understand it and give some comments
later.
...
> Currently this code is invoked from a custom procfs entry, by
> user-space application, before unmounting the local file system.
> 
> Would moving this code into the "unlock_filesystem" infrastructure be
> acceptable?

Yes.  I'd be interested in patches.

> Since the "share_id" approach is very custom for our
> usage, what criteria would you suggest for selecting the openowners to
> be "forgotten"?

The share_id shouldn't be necessary.  I'll think about it.

--b.
J . Bruce Fields Aug. 30, 2019, 7:54 p.m. UTC | #10
On Thu, Aug 29, 2019 at 09:12:49PM +0300, Alex Lyakas wrote:
> Would moving this code into the "unlock_filesystem" infrastructure be
> acceptable? Since the "share_id" approach is very custom for our
> usage, what criteria would you suggest for selecting the openowners to
> be "forgotten"?

Have you looked at what unlock_filesystem()?  It's just translating the
given path to a superblock, then matching that against inodes in
nlmsvc_match_sb().

It's a little more complicated for nfs4_files since they don't have a
pointer to the inode. (Maybe it should.)  You can see how I get around
this in e.g.  fs/nfsd/nfs4state.c:nfs4_show_lock().

A superblock isn't the same thing as an export, thanks to bind mounts
and subdirectory exports.  But if the goal is to be able to unmount,
then a superblock is probably what you want.

--b.
Alex Lyakas Sept. 3, 2019, 1:47 p.m. UTC | #11
Hi Bruce,

On Fri, Aug 30, 2019 at 10:08 PM J. Bruce Fields <bfields@fieldses.org> wrote:
>
> On Thu, Aug 29, 2019 at 09:12:49PM +0300, Alex Lyakas wrote:
> > We evaluated the network namespaces approach. But, unfortunately, it
> > doesn't fit easily into how our system is currently structured. We
> > would have to create and configure interfaces for every namespace, and
> > have a separate IP address (presumably) for every namespace.
>
> Yes.
>
> > All this
> > seems a bit of an overkill, to just have several local filesystems
> > exported to the same client (which is when we hit the issue). I would
> > assume that some other users would argue as well that creating a
> > separate network namespace for every local filesystem is not the way
> > to go from the administration point of view.
>
> OK, makes sense.
>
> And I take it you don't want to go around to each client and shut things
> down cleanly.  And you're fine with the client applications erroring out
> when you yank their filesystem out from underneath them.
It's not that we don't want to unmount at each client. The problem is
that we don't control the client machines, as they are owned by
customers. We definitely recommend customers to unmount, before
un-exporting. But in some cases it still doesn't happen, most notably
in automated environments. Since the un-export operation is initiated
by customer, I presume she understands that the nfs client might error
out, if not unmounted properly beforehand.

>
> (I wonder what happens these days when that happens on a linux client
> when there are dirty pages.  I think you may just end up with a useless
> mount that you can't get rid of till you power down the client.)
Right, in some cases, the IO process gets stuck forever.

>
> > Regarding the failure injection code, we did not actually enable and
> > use it. We instead wrote some custom code that is highly modeled after
> > the failure injection code.
>
> Sounds interesting....  I'll try to understand it and give some comments
> later.
> ...
> > Currently this code is invoked from a custom procfs entry, by
> > user-space application, before unmounting the local file system.
> >
> > Would moving this code into the "unlock_filesystem" infrastructure be
> > acceptable?
>
> Yes.  I'd be interested in patches.
>
> > Since the "share_id" approach is very custom for our
> > usage, what criteria would you suggest for selecting the openowners to
> > be "forgotten"?
>
> The share_id shouldn't be necessary.  I'll think about it.
>
> --b.
Alex Lyakas Sept. 3, 2019, 1:48 p.m. UTC | #12
Hi Bruce,

On Fri, Aug 30, 2019 at 10:54 PM J. Bruce Fields <bfields@fieldses.org> wrote:
>
> On Thu, Aug 29, 2019 at 09:12:49PM +0300, Alex Lyakas wrote:
> > Would moving this code into the "unlock_filesystem" infrastructure be
> > acceptable? Since the "share_id" approach is very custom for our
> > usage, what criteria would you suggest for selecting the openowners to
> > be "forgotten"?
>
> Have you looked at what unlock_filesystem()?  It's just translating the
> given path to a superblock, then matching that against inodes in
> nlmsvc_match_sb().
>
> It's a little more complicated for nfs4_files since they don't have a
> pointer to the inode. (Maybe it should.)  You can see how I get around
> this in e.g.  fs/nfsd/nfs4state.c:nfs4_show_lock().
>
> A superblock isn't the same thing as an export, thanks to bind mounts
> and subdirectory exports.  But if the goal is to be able to unmount,
> then a superblock is probably what you want.
Thanks for your suggestion.The superblock approach works very well for
us. Initial patch is on its way.

Thanks,
Alex.

>
> --b.

Patch
diff mbox series

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 7857942..4e9afca 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -7251,6 +7251,7 @@  static u64 nfsd_foreach_client_lock(struct nfs4_client *clp, u64 max,
 			func(oop);
 			if (collect) {
 				atomic_inc(&clp->cl_rpc_users);
+				nfs4_get_stateowner(&oop->oo_owner);
 				list_add(&oop->oo_perclient, collect);
 			}
 		}