diff mbox series

ceph: ensure we flush delayed caps when unmounting

Message ID 20210603134812.80276-1-jlayton@kernel.org (mailing list archive)
State New, archived
Headers show
Series ceph: ensure we flush delayed caps when unmounting | expand

Commit Message

Jeff Layton June 3, 2021, 1:48 p.m. UTC
I've seen some warnings when testing recently that indicate that there
are caps still delayed on the delayed list even after we've started
unmounting.

When checking delayed caps, process the whole list if we're unmounting,
and check for delayed caps after setting the stopping var and flushing
dirty caps.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/caps.c       | 3 ++-
 fs/ceph/mds_client.c | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

Comments

Jeff Layton June 3, 2021, 4:57 p.m. UTC | #1
On Thu, 2021-06-03 at 09:48 -0400, Jeff Layton wrote:
> I've seen some warnings when testing recently that indicate that there
> are caps still delayed on the delayed list even after we've started
> unmounting.
> 
> When checking delayed caps, process the whole list if we're unmounting,
> and check for delayed caps after setting the stopping var and flushing
> dirty caps.
> 
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  fs/ceph/caps.c       | 3 ++-
>  fs/ceph/mds_client.c | 1 +
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> index a5e93b185515..68b4c6dfe4db 100644
> --- a/fs/ceph/caps.c
> +++ b/fs/ceph/caps.c
> @@ -4236,7 +4236,8 @@ void ceph_check_delayed_caps(struct ceph_mds_client *mdsc)
>  		ci = list_first_entry(&mdsc->cap_delay_list,
>  				      struct ceph_inode_info,
>  				      i_cap_delay_list);
> -		if ((ci->i_ceph_flags & CEPH_I_FLUSH) == 0 &&
> +		if (!mdsc->stopping &&
> +		    (ci->i_ceph_flags & CEPH_I_FLUSH) == 0 &&
>  		    time_before(jiffies, ci->i_hold_caps_max))
>  			break;
>  		list_del_init(&ci->i_cap_delay_list);
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index e5af591d3bd4..916af5497829 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -4691,6 +4691,7 @@ void ceph_mdsc_pre_umount(struct ceph_mds_client *mdsc)
>  
>  	lock_unlock_sessions(mdsc);
>  	ceph_flush_dirty_caps(mdsc);
> +	ceph_check_delayed_caps(mdsc);
>  	wait_requests(mdsc);
>  
>  	/*

I'm going to self-NAK this patch for now. Initially this looked good in
testing, but I think it's just papering over the real problem, which is
that ceph_async_iput can queue a job to a workqueue after the point
where we've flushed that workqueue on umount.

I think the right approach is to look at how to ensure that calling iput
doesn't end up taking these coarse-grained locks so we don't need to
queue it in so many codepaths.
Luís Henriques June 4, 2021, 9:35 a.m. UTC | #2
On Thu, Jun 03, 2021 at 12:57:22PM -0400, Jeff Layton wrote:
> On Thu, 2021-06-03 at 09:48 -0400, Jeff Layton wrote:
> > I've seen some warnings when testing recently that indicate that there
> > are caps still delayed on the delayed list even after we've started
> > unmounting.
> > 
> > When checking delayed caps, process the whole list if we're unmounting,
> > and check for delayed caps after setting the stopping var and flushing
> > dirty caps.
> > 
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> >  fs/ceph/caps.c       | 3 ++-
> >  fs/ceph/mds_client.c | 1 +
> >  2 files changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> > index a5e93b185515..68b4c6dfe4db 100644
> > --- a/fs/ceph/caps.c
> > +++ b/fs/ceph/caps.c
> > @@ -4236,7 +4236,8 @@ void ceph_check_delayed_caps(struct ceph_mds_client *mdsc)
> >  		ci = list_first_entry(&mdsc->cap_delay_list,
> >  				      struct ceph_inode_info,
> >  				      i_cap_delay_list);
> > -		if ((ci->i_ceph_flags & CEPH_I_FLUSH) == 0 &&
> > +		if (!mdsc->stopping &&
> > +		    (ci->i_ceph_flags & CEPH_I_FLUSH) == 0 &&
> >  		    time_before(jiffies, ci->i_hold_caps_max))
> >  			break;
> >  		list_del_init(&ci->i_cap_delay_list);
> > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> > index e5af591d3bd4..916af5497829 100644
> > --- a/fs/ceph/mds_client.c
> > +++ b/fs/ceph/mds_client.c
> > @@ -4691,6 +4691,7 @@ void ceph_mdsc_pre_umount(struct ceph_mds_client *mdsc)
> >  
> >  	lock_unlock_sessions(mdsc);
> >  	ceph_flush_dirty_caps(mdsc);
> > +	ceph_check_delayed_caps(mdsc);
> >  	wait_requests(mdsc);
> >  
> >  	/*
> 
> I'm going to self-NAK this patch for now. Initially this looked good in
> testing, but I think it's just papering over the real problem, which is
> that ceph_async_iput can queue a job to a workqueue after the point
> where we've flushed that workqueue on umount.

Ah, yeah.  I think I saw this a few times with generic/014 (and I believe
we chatted about it on irc).  I've been on and off trying to figure out
the way to fix it but it's really tricky.

Cheers,
--
Luís


> I think the right approach is to look at how to ensure that calling iput
> doesn't end up taking these coarse-grained locks so we don't need to
> queue it in so many codepaths.
> -- 
> Jeff Layton <jlayton@kernel.org>
>
Jeff Layton June 4, 2021, 12:26 p.m. UTC | #3
On Fri, 2021-06-04 at 10:35 +0100, Luis Henriques wrote:
> On Thu, Jun 03, 2021 at 12:57:22PM -0400, Jeff Layton wrote:
> > On Thu, 2021-06-03 at 09:48 -0400, Jeff Layton wrote:
> > > I've seen some warnings when testing recently that indicate that there
> > > are caps still delayed on the delayed list even after we've started
> > > unmounting.
> > > 
> > > When checking delayed caps, process the whole list if we're unmounting,
> > > and check for delayed caps after setting the stopping var and flushing
> > > dirty caps.
> > > 
> > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > ---
> > >  fs/ceph/caps.c       | 3 ++-
> > >  fs/ceph/mds_client.c | 1 +
> > >  2 files changed, 3 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> > > index a5e93b185515..68b4c6dfe4db 100644
> > > --- a/fs/ceph/caps.c
> > > +++ b/fs/ceph/caps.c
> > > @@ -4236,7 +4236,8 @@ void ceph_check_delayed_caps(struct ceph_mds_client *mdsc)
> > >  		ci = list_first_entry(&mdsc->cap_delay_list,
> > >  				      struct ceph_inode_info,
> > >  				      i_cap_delay_list);
> > > -		if ((ci->i_ceph_flags & CEPH_I_FLUSH) == 0 &&
> > > +		if (!mdsc->stopping &&
> > > +		    (ci->i_ceph_flags & CEPH_I_FLUSH) == 0 &&
> > >  		    time_before(jiffies, ci->i_hold_caps_max))
> > >  			break;
> > >  		list_del_init(&ci->i_cap_delay_list);
> > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> > > index e5af591d3bd4..916af5497829 100644
> > > --- a/fs/ceph/mds_client.c
> > > +++ b/fs/ceph/mds_client.c
> > > @@ -4691,6 +4691,7 @@ void ceph_mdsc_pre_umount(struct ceph_mds_client *mdsc)
> > >  
> > >  	lock_unlock_sessions(mdsc);
> > >  	ceph_flush_dirty_caps(mdsc);
> > > +	ceph_check_delayed_caps(mdsc);
> > >  	wait_requests(mdsc);
> > >  
> > >  	/*
> > 
> > I'm going to self-NAK this patch for now. Initially this looked good in
> > testing, but I think it's just papering over the real problem, which is
> > that ceph_async_iput can queue a job to a workqueue after the point
> > where we've flushed that workqueue on umount.
> 
> Ah, yeah.  I think I saw this a few times with generic/014 (and I believe
> we chatted about it on irc).  I've been on and off trying to figure out
> the way to fix it but it's really tricky.
> 

Yeah, that's putting it mildly. 

The biggest issue here is the session->s_mutex, which is held over large
swaths of the code, but it's not fully clear what it protects. The
original patch that added ceph_async_iput did it to avoid the session
mutex that gets held for ceph_iterate_session_caps.

My current thinking is that we probably don't need to hold the session
mutex over that function in some cases, if we can guarantee that the
ceph_cap objects we're iterating over don't go away when the lock is
dropped. So, I'm trying to add some refcounting to the ceph_cap
structures themselves to see if that helps.

It may turn out to be a dead end, but if we don't chip away at the edges
of the fundamental problem, we'll never get there...
diff mbox series

Patch

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index a5e93b185515..68b4c6dfe4db 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -4236,7 +4236,8 @@  void ceph_check_delayed_caps(struct ceph_mds_client *mdsc)
 		ci = list_first_entry(&mdsc->cap_delay_list,
 				      struct ceph_inode_info,
 				      i_cap_delay_list);
-		if ((ci->i_ceph_flags & CEPH_I_FLUSH) == 0 &&
+		if (!mdsc->stopping &&
+		    (ci->i_ceph_flags & CEPH_I_FLUSH) == 0 &&
 		    time_before(jiffies, ci->i_hold_caps_max))
 			break;
 		list_del_init(&ci->i_cap_delay_list);
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index e5af591d3bd4..916af5497829 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -4691,6 +4691,7 @@  void ceph_mdsc_pre_umount(struct ceph_mds_client *mdsc)
 
 	lock_unlock_sessions(mdsc);
 	ceph_flush_dirty_caps(mdsc);
+	ceph_check_delayed_caps(mdsc);
 	wait_requests(mdsc);
 
 	/*