mbox series

[RFC,v5,0/9] xfs: automatic relogging experiment

Message ID 20200227134321.7238-1-bfoster@redhat.com (mailing list archive)
Headers show
Series xfs: automatic relogging experiment | expand

Message

Brian Foster Feb. 27, 2020, 1:43 p.m. UTC
Hi all,

Here's a v5 RFC of the automatic item relogging experiment. Firstly,
note that this is still a POC and experimental code with various quirks.
Some are documented in the code, others might not be (such as abusing
the AIL lock, etc.). The primary purpose of this series is still to
express and review a fundamental design. Based on discussion on the last
version, there is specific focus towards addressing log reservation and
pre-item locking deadlock vectors. While the code is still quite hacky,
I believe this design addresses both of those fundamental issues.
Further details on the design and approach are documented in the
individual commit logs.

In addition, the final few patches introduce buffer relogging capability
and test infrastructure, which currently has no use case other than to
demonstrate development flexibility and the ability to support arbitrary
log items in the future, if ever desired. If this approach is taken
forward, the current use cases are still centered around intent items
such as the quotaoff use case and extent freeing use case defined by
online repair of free space trees.

On somewhat of a tangent, another intent oriented use case idea crossed
my mind recently related to the long standing writeback stale data
exposure problem (i.e. if we crash after a delalloc extent is converted
but before writeback fully completes on the extent). The obvious
approach of using unwritten extents has been rebuffed due to performance
concerns over extent conversion. I wonder if we had the ability to log a
"writeback pending" intent on some reasonable level of granularity (i.e.
something between a block and extent), whether we could use that to
allow log recovery to zero (or convert) such extents in the event of a
crash. This is a whole separate design discussion, however, as it
involves tracking outstanding writeback, etc. In this context it simply
serves as a prospective use case for relogging, as such intents would
otherwise risk similar log subsystem deadlocks as the quotaoff use case.

Thoughts, reviews, flames appreciated.

Brian

rfcv5:
- More fleshed out design to prevent log reservation deadlock and
  locking problems.
- Split out core patches between pre-reservation management, relog item
  state management and relog mechanism.
- Added experimental buffer relogging capability.
rfcv4: https://lore.kernel.org/linux-xfs/20191205175037.52529-1-bfoster@redhat.com/
- AIL based approach.
rfcv3: https://lore.kernel.org/linux-xfs/20191125185523.47556-1-bfoster@redhat.com/
- CIL based approach.
rfcv2: https://lore.kernel.org/linux-xfs/20191122181927.32870-1-bfoster@redhat.com/
- Different approach based on workqueue and transaction rolling.
rfc: https://lore.kernel.org/linux-xfs/20191024172850.7698-1-bfoster@redhat.com/

Brian Foster (9):
  xfs: set t_task at wait time instead of alloc time
  xfs: introduce ->tr_relog transaction
  xfs: automatic relogging reservation management
  xfs: automatic relogging item management
  xfs: automatic log item relog mechanism
  xfs: automatically relog the quotaoff start intent
  xfs: buffer relogging support prototype
  xfs: create an error tag for random relog reservation
  xfs: relog random buffers based on errortag

 fs/xfs/libxfs/xfs_errortag.h   |   4 +-
 fs/xfs/libxfs/xfs_shared.h     |   1 +
 fs/xfs/libxfs/xfs_trans_resv.c |  24 +++-
 fs/xfs/libxfs/xfs_trans_resv.h |   1 +
 fs/xfs/xfs_buf_item.c          |   5 +
 fs/xfs/xfs_dquot_item.c        |   7 ++
 fs/xfs/xfs_error.c             |   3 +
 fs/xfs/xfs_log.c               |   2 +-
 fs/xfs/xfs_qm_syscalls.c       |  12 +-
 fs/xfs/xfs_trace.h             |   3 +
 fs/xfs/xfs_trans.c             |  79 +++++++++++-
 fs/xfs/xfs_trans.h             |  13 +-
 fs/xfs/xfs_trans_ail.c         | 216 ++++++++++++++++++++++++++++++++-
 fs/xfs/xfs_trans_buf.c         |  35 ++++++
 fs/xfs/xfs_trans_priv.h        |   6 +
 15 files changed, 399 insertions(+), 12 deletions(-)

Comments

Darrick J. Wong Feb. 27, 2020, 3:09 p.m. UTC | #1
On Thu, Feb 27, 2020 at 08:43:12AM -0500, Brian Foster wrote:
> Hi all,
> 
> Here's a v5 RFC of the automatic item relogging experiment. Firstly,
> note that this is still a POC and experimental code with various quirks.

Heh, funny, I was going to ask you if you might have time next week to
review the latest iteration of the btree bulk loading series so that I
could get closer to merging the rest of online repair and/or refactoring
offline repair.  I'll take a closer look at this after I read through
everything else that came in overnight.

--D

> Some are documented in the code, others might not be (such as abusing
> the AIL lock, etc.). The primary purpose of this series is still to
> express and review a fundamental design. Based on discussion on the last
> version, there is specific focus towards addressing log reservation and
> pre-item locking deadlock vectors. While the code is still quite hacky,
> I believe this design addresses both of those fundamental issues.
> Further details on the design and approach are documented in the
> individual commit logs.
> 
> In addition, the final few patches introduce buffer relogging capability
> and test infrastructure, which currently has no use case other than to
> demonstrate development flexibility and the ability to support arbitrary
> log items in the future, if ever desired. If this approach is taken
> forward, the current use cases are still centered around intent items
> such as the quotaoff use case and extent freeing use case defined by
> online repair of free space trees.
> 
> On somewhat of a tangent, another intent oriented use case idea crossed
> my mind recently related to the long standing writeback stale data
> exposure problem (i.e. if we crash after a delalloc extent is converted
> but before writeback fully completes on the extent). The obvious
> approach of using unwritten extents has been rebuffed due to performance
> concerns over extent conversion. I wonder if we had the ability to log a
> "writeback pending" intent on some reasonable level of granularity (i.e.
> something between a block and extent), whether we could use that to
> allow log recovery to zero (or convert) such extents in the event of a
> crash. This is a whole separate design discussion, however, as it
> involves tracking outstanding writeback, etc. In this context it simply
> serves as a prospective use case for relogging, as such intents would
> otherwise risk similar log subsystem deadlocks as the quotaoff use case.
> 
> Thoughts, reviews, flames appreciated.
> 
> Brian
> 
> rfcv5:
> - More fleshed out design to prevent log reservation deadlock and
>   locking problems.
> - Split out core patches between pre-reservation management, relog item
>   state management and relog mechanism.
> - Added experimental buffer relogging capability.
> rfcv4: https://lore.kernel.org/linux-xfs/20191205175037.52529-1-bfoster@redhat.com/
> - AIL based approach.
> rfcv3: https://lore.kernel.org/linux-xfs/20191125185523.47556-1-bfoster@redhat.com/
> - CIL based approach.
> rfcv2: https://lore.kernel.org/linux-xfs/20191122181927.32870-1-bfoster@redhat.com/
> - Different approach based on workqueue and transaction rolling.
> rfc: https://lore.kernel.org/linux-xfs/20191024172850.7698-1-bfoster@redhat.com/
> 
> Brian Foster (9):
>   xfs: set t_task at wait time instead of alloc time
>   xfs: introduce ->tr_relog transaction
>   xfs: automatic relogging reservation management
>   xfs: automatic relogging item management
>   xfs: automatic log item relog mechanism
>   xfs: automatically relog the quotaoff start intent
>   xfs: buffer relogging support prototype
>   xfs: create an error tag for random relog reservation
>   xfs: relog random buffers based on errortag
> 
>  fs/xfs/libxfs/xfs_errortag.h   |   4 +-
>  fs/xfs/libxfs/xfs_shared.h     |   1 +
>  fs/xfs/libxfs/xfs_trans_resv.c |  24 +++-
>  fs/xfs/libxfs/xfs_trans_resv.h |   1 +
>  fs/xfs/xfs_buf_item.c          |   5 +
>  fs/xfs/xfs_dquot_item.c        |   7 ++
>  fs/xfs/xfs_error.c             |   3 +
>  fs/xfs/xfs_log.c               |   2 +-
>  fs/xfs/xfs_qm_syscalls.c       |  12 +-
>  fs/xfs/xfs_trace.h             |   3 +
>  fs/xfs/xfs_trans.c             |  79 +++++++++++-
>  fs/xfs/xfs_trans.h             |  13 +-
>  fs/xfs/xfs_trans_ail.c         | 216 ++++++++++++++++++++++++++++++++-
>  fs/xfs/xfs_trans_buf.c         |  35 ++++++
>  fs/xfs/xfs_trans_priv.h        |   6 +
>  15 files changed, 399 insertions(+), 12 deletions(-)
> 
> -- 
> 2.21.1
>
Brian Foster Feb. 27, 2020, 3:18 p.m. UTC | #2
On Thu, Feb 27, 2020 at 07:09:36AM -0800, Darrick J. Wong wrote:
> On Thu, Feb 27, 2020 at 08:43:12AM -0500, Brian Foster wrote:
> > Hi all,
> > 
> > Here's a v5 RFC of the automatic item relogging experiment. Firstly,
> > note that this is still a POC and experimental code with various quirks.
> 
> Heh, funny, I was going to ask you if you might have time next week to
> review the latest iteration of the btree bulk loading series so that I
> could get closer to merging the rest of online repair and/or refactoring
> offline repair.  I'll take a closer look at this after I read through
> everything else that came in overnight.
> 

Sure.. I can put that next on the list. Is the latest release pending a
post or already posted? Being out for over a month (effectively closer
to two when considering proximity to the holidays) caused me to pretty
much clear everything in my mailbox for obvious reasons. ;) As a result,
anything that might have been on my radar prior to that timeframe has
most likely dropped completely off it. :P

Brian

> --D
> 
> > Some are documented in the code, others might not be (such as abusing
> > the AIL lock, etc.). The primary purpose of this series is still to
> > express and review a fundamental design. Based on discussion on the last
> > version, there is specific focus towards addressing log reservation and
> > pre-item locking deadlock vectors. While the code is still quite hacky,
> > I believe this design addresses both of those fundamental issues.
> > Further details on the design and approach are documented in the
> > individual commit logs.
> > 
> > In addition, the final few patches introduce buffer relogging capability
> > and test infrastructure, which currently has no use case other than to
> > demonstrate development flexibility and the ability to support arbitrary
> > log items in the future, if ever desired. If this approach is taken
> > forward, the current use cases are still centered around intent items
> > such as the quotaoff use case and extent freeing use case defined by
> > online repair of free space trees.
> > 
> > On somewhat of a tangent, another intent oriented use case idea crossed
> > my mind recently related to the long standing writeback stale data
> > exposure problem (i.e. if we crash after a delalloc extent is converted
> > but before writeback fully completes on the extent). The obvious
> > approach of using unwritten extents has been rebuffed due to performance
> > concerns over extent conversion. I wonder if we had the ability to log a
> > "writeback pending" intent on some reasonable level of granularity (i.e.
> > something between a block and extent), whether we could use that to
> > allow log recovery to zero (or convert) such extents in the event of a
> > crash. This is a whole separate design discussion, however, as it
> > involves tracking outstanding writeback, etc. In this context it simply
> > serves as a prospective use case for relogging, as such intents would
> > otherwise risk similar log subsystem deadlocks as the quotaoff use case.
> > 
> > Thoughts, reviews, flames appreciated.
> > 
> > Brian
> > 
> > rfcv5:
> > - More fleshed out design to prevent log reservation deadlock and
> >   locking problems.
> > - Split out core patches between pre-reservation management, relog item
> >   state management and relog mechanism.
> > - Added experimental buffer relogging capability.
> > rfcv4: https://lore.kernel.org/linux-xfs/20191205175037.52529-1-bfoster@redhat.com/
> > - AIL based approach.
> > rfcv3: https://lore.kernel.org/linux-xfs/20191125185523.47556-1-bfoster@redhat.com/
> > - CIL based approach.
> > rfcv2: https://lore.kernel.org/linux-xfs/20191122181927.32870-1-bfoster@redhat.com/
> > - Different approach based on workqueue and transaction rolling.
> > rfc: https://lore.kernel.org/linux-xfs/20191024172850.7698-1-bfoster@redhat.com/
> > 
> > Brian Foster (9):
> >   xfs: set t_task at wait time instead of alloc time
> >   xfs: introduce ->tr_relog transaction
> >   xfs: automatic relogging reservation management
> >   xfs: automatic relogging item management
> >   xfs: automatic log item relog mechanism
> >   xfs: automatically relog the quotaoff start intent
> >   xfs: buffer relogging support prototype
> >   xfs: create an error tag for random relog reservation
> >   xfs: relog random buffers based on errortag
> > 
> >  fs/xfs/libxfs/xfs_errortag.h   |   4 +-
> >  fs/xfs/libxfs/xfs_shared.h     |   1 +
> >  fs/xfs/libxfs/xfs_trans_resv.c |  24 +++-
> >  fs/xfs/libxfs/xfs_trans_resv.h |   1 +
> >  fs/xfs/xfs_buf_item.c          |   5 +
> >  fs/xfs/xfs_dquot_item.c        |   7 ++
> >  fs/xfs/xfs_error.c             |   3 +
> >  fs/xfs/xfs_log.c               |   2 +-
> >  fs/xfs/xfs_qm_syscalls.c       |  12 +-
> >  fs/xfs/xfs_trace.h             |   3 +
> >  fs/xfs/xfs_trans.c             |  79 +++++++++++-
> >  fs/xfs/xfs_trans.h             |  13 +-
> >  fs/xfs/xfs_trans_ail.c         | 216 ++++++++++++++++++++++++++++++++-
> >  fs/xfs/xfs_trans_buf.c         |  35 ++++++
> >  fs/xfs/xfs_trans_priv.h        |   6 +
> >  15 files changed, 399 insertions(+), 12 deletions(-)
> > 
> > -- 
> > 2.21.1
> > 
>
Darrick J. Wong Feb. 27, 2020, 3:22 p.m. UTC | #3
On Thu, Feb 27, 2020 at 10:18:14AM -0500, Brian Foster wrote:
> On Thu, Feb 27, 2020 at 07:09:36AM -0800, Darrick J. Wong wrote:
> > On Thu, Feb 27, 2020 at 08:43:12AM -0500, Brian Foster wrote:
> > > Hi all,
> > > 
> > > Here's a v5 RFC of the automatic item relogging experiment. Firstly,
> > > note that this is still a POC and experimental code with various quirks.
> > 
> > Heh, funny, I was going to ask you if you might have time next week to
> > review the latest iteration of the btree bulk loading series so that I
> > could get closer to merging the rest of online repair and/or refactoring
> > offline repair.  I'll take a closer look at this after I read through
> > everything else that came in overnight.
> > 
> 
> Sure.. I can put that next on the list. Is the latest release pending a
> post or already posted? Being out for over a month (effectively closer
> to two when considering proximity to the holidays) caused me to pretty
> much clear everything in my mailbox for obvious reasons. ;) As a result,
> anything that might have been on my radar prior to that timeframe has
> most likely dropped completely off it. :P

Pending.  The patches themselves haven't changed much since the end of
October when I fixed all the things we talked about at the beginning of
that month, but you might as well wait for a new version rebased off
5.6. :)

(If you get really really bored and/or I get bogged down in something
else, the NYE patchbomb version is pretty close to what's in my tree
now...)

--D

> Brian
> 
> > --D
> > 
> > > Some are documented in the code, others might not be (such as abusing
> > > the AIL lock, etc.). The primary purpose of this series is still to
> > > express and review a fundamental design. Based on discussion on the last
> > > version, there is specific focus towards addressing log reservation and
> > > pre-item locking deadlock vectors. While the code is still quite hacky,
> > > I believe this design addresses both of those fundamental issues.
> > > Further details on the design and approach are documented in the
> > > individual commit logs.
> > > 
> > > In addition, the final few patches introduce buffer relogging capability
> > > and test infrastructure, which currently has no use case other than to
> > > demonstrate development flexibility and the ability to support arbitrary
> > > log items in the future, if ever desired. If this approach is taken
> > > forward, the current use cases are still centered around intent items
> > > such as the quotaoff use case and extent freeing use case defined by
> > > online repair of free space trees.
> > > 
> > > On somewhat of a tangent, another intent oriented use case idea crossed
> > > my mind recently related to the long standing writeback stale data
> > > exposure problem (i.e. if we crash after a delalloc extent is converted
> > > but before writeback fully completes on the extent). The obvious
> > > approach of using unwritten extents has been rebuffed due to performance
> > > concerns over extent conversion. I wonder if we had the ability to log a
> > > "writeback pending" intent on some reasonable level of granularity (i.e.
> > > something between a block and extent), whether we could use that to
> > > allow log recovery to zero (or convert) such extents in the event of a
> > > crash. This is a whole separate design discussion, however, as it
> > > involves tracking outstanding writeback, etc. In this context it simply
> > > serves as a prospective use case for relogging, as such intents would
> > > otherwise risk similar log subsystem deadlocks as the quotaoff use case.
> > > 
> > > Thoughts, reviews, flames appreciated.
> > > 
> > > Brian
> > > 
> > > rfcv5:
> > > - More fleshed out design to prevent log reservation deadlock and
> > >   locking problems.
> > > - Split out core patches between pre-reservation management, relog item
> > >   state management and relog mechanism.
> > > - Added experimental buffer relogging capability.
> > > rfcv4: https://lore.kernel.org/linux-xfs/20191205175037.52529-1-bfoster@redhat.com/
> > > - AIL based approach.
> > > rfcv3: https://lore.kernel.org/linux-xfs/20191125185523.47556-1-bfoster@redhat.com/
> > > - CIL based approach.
> > > rfcv2: https://lore.kernel.org/linux-xfs/20191122181927.32870-1-bfoster@redhat.com/
> > > - Different approach based on workqueue and transaction rolling.
> > > rfc: https://lore.kernel.org/linux-xfs/20191024172850.7698-1-bfoster@redhat.com/
> > > 
> > > Brian Foster (9):
> > >   xfs: set t_task at wait time instead of alloc time
> > >   xfs: introduce ->tr_relog transaction
> > >   xfs: automatic relogging reservation management
> > >   xfs: automatic relogging item management
> > >   xfs: automatic log item relog mechanism
> > >   xfs: automatically relog the quotaoff start intent
> > >   xfs: buffer relogging support prototype
> > >   xfs: create an error tag for random relog reservation
> > >   xfs: relog random buffers based on errortag
> > > 
> > >  fs/xfs/libxfs/xfs_errortag.h   |   4 +-
> > >  fs/xfs/libxfs/xfs_shared.h     |   1 +
> > >  fs/xfs/libxfs/xfs_trans_resv.c |  24 +++-
> > >  fs/xfs/libxfs/xfs_trans_resv.h |   1 +
> > >  fs/xfs/xfs_buf_item.c          |   5 +
> > >  fs/xfs/xfs_dquot_item.c        |   7 ++
> > >  fs/xfs/xfs_error.c             |   3 +
> > >  fs/xfs/xfs_log.c               |   2 +-
> > >  fs/xfs/xfs_qm_syscalls.c       |  12 +-
> > >  fs/xfs/xfs_trace.h             |   3 +
> > >  fs/xfs/xfs_trans.c             |  79 +++++++++++-
> > >  fs/xfs/xfs_trans.h             |  13 +-
> > >  fs/xfs/xfs_trans_ail.c         | 216 ++++++++++++++++++++++++++++++++-
> > >  fs/xfs/xfs_trans_buf.c         |  35 ++++++
> > >  fs/xfs/xfs_trans_priv.h        |   6 +
> > >  15 files changed, 399 insertions(+), 12 deletions(-)
> > > 
> > > -- 
> > > 2.21.1
> > > 
> > 
>