Message ID | 20240110071347.3711925-1-wenjian1@xiaomi.com (mailing list archive) |
---|---|
State | Deferred, archived |
Headers | show |
Series | xfs: explicitly call cond_resched in xfs_itruncate_extents_flags | expand |
On Wed, Jan 10, 2024 at 03:13:47PM +0800, Jian Wen wrote: > From: Jian Wen <wenjianhn@gmail.com> > > Deleting a file with lots of extents may cause a soft lockup if the > preemption model is none(CONFIG_PREEMPT_NONE=y or preempt=none is set > in the kernel cmdline). Alibaba cloud kernel and Oracle UEK container > kernel are affected by the issue, since they select CONFIG_PREEMPT_NONE=y. > > Explicitly call cond_resched in xfs_itruncate_extents_flags avoid > the below softlockup warning. > watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/0:13:139] Wowwee, how many extents does this file have, anyway? > CPU: 0 PID: 139 Comm: kworker/0:13 Not tainted 6.7.0-rc8-g610a9b8f49fb #23 > Workqueue: xfs-inodegc/vda1 xfs_inodegc_worker > Call Trace: > _raw_spin_lock+0x30/0x80 > ? xfs_extent_busy_trim+0x38/0x200 > xfs_extent_busy_trim+0x38/0x200 > xfs_alloc_compute_aligned+0x38/0xd0 > xfs_alloc_ag_vextent_size+0x1f1/0x870 > xfs_alloc_fix_freelist+0x58a/0x770 > xfs_free_extent_fix_freelist+0x60/0xa0 > __xfs_free_extent+0x66/0x1d0 > xfs_trans_free_extent+0xac/0x290 > xfs_extent_free_finish_item+0xf/0x40 > xfs_defer_finish_noroll+0x1db/0x7f0 > xfs_defer_finish+0x10/0xa0 > xfs_itruncate_extents_flags+0x169/0x4c0 > xfs_inactive_truncate+0xba/0x140 > xfs_inactive+0x239/0x2a0 > xfs_inodegc_worker+0xa3/0x210 > ? process_scheduled_works+0x273/0x550 > process_scheduled_works+0x2da/0x550 > worker_thread+0x180/0x350 > > Most of the Linux distributions default to voluntary preemption, > might_sleep() would yield CPU if needed. Thus they are not affected. > kworker/0:24+xf 298 [000] 7294.810021: probe:__cond_resched: > __cond_resched+0x5 ([kernel.kallsyms]) > __kmem_cache_alloc_node+0x17c ([kernel.kallsyms]) > __kmalloc+0x4d ([kernel.kallsyms]) > kmem_alloc+0x7a ([kernel.kallsyms]) > xfs_extent_busy_insert_list+0x2e ([kernel.kallsyms]) > __xfs_free_extent+0xd3 ([kernel.kallsyms]) > xfs_trans_free_extent+0x93 ([kernel.kallsyms]) > xfs_extent_free_finish_item+0xf ([kernel.kallsyms]) > > kworker/0:24+xf 298 [000] 7294.810045: probe:__cond_resched: > __cond_resched+0x5 ([kernel.kallsyms]) > down+0x11 ([kernel.kallsyms]) > xfs_buf_lock+0x2c ([kernel.kallsyms]) > xfs_buf_find_lock+0x62 ([kernel.kallsyms]) > xfs_buf_get_map+0x1fd ([kernel.kallsyms]) > xfs_buf_read_map+0x51 ([kernel.kallsyms]) > xfs_trans_read_buf_map+0x1c5 ([kernel.kallsyms]) > xfs_btree_read_buf_block.constprop.0+0xa1 ([kernel.kallsyms]) > xfs_btree_lookup_get_block+0x97 ([kernel.kallsyms]) > xfs_btree_lookup+0xc5 ([kernel.kallsyms]) > xfs_alloc_lookup_eq+0x18 ([kernel.kallsyms]) > xfs_free_ag_extent+0x63f ([kernel.kallsyms]) > __xfs_free_extent+0xa7 ([kernel.kallsyms]) > xfs_trans_free_extent+0x93 ([kernel.kallsyms]) > xfs_extent_free_finish_item+0xf ([kernel.kallsyms]) > > Signed-off-by: Jian Wen <wenjian1@xiaomi.com> > --- > fs/xfs/xfs_inode.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c > index c0f1c89786c2..194381e10472 100644 > --- a/fs/xfs/xfs_inode.c > +++ b/fs/xfs/xfs_inode.c > @@ -4,6 +4,7 @@ > * All Rights Reserved. > */ > #include <linux/iversion.h> > +#include <linux/sched.h> > > #include "xfs.h" > #include "xfs_fs.h" > @@ -1383,6 +1384,8 @@ xfs_itruncate_extents_flags( > error = xfs_defer_finish(&tp); > if (error) > goto out; > + > + cond_resched(); Is there a particular reason why truncation uses a giant chained transaction for all extents (with xfs_defer_finish) instead of, say, one chain per extent? (If you've actually studied this and know why then I guess cond_resched is a reasonable bandaid, but I don't want to play cond_resched whackamole here.) --D > } > > if (whichfork == XFS_DATA_FORK) { > -- > 2.25.1 > >
On Wed, Jan 10, 2024 at 03:13:47PM +0800, Jian Wen wrote: > From: Jian Wen <wenjianhn@gmail.com> > > Deleting a file with lots of extents may cause a soft lockup if the > preemption model is none(CONFIG_PREEMPT_NONE=y or preempt=none is set > in the kernel cmdline). Alibaba cloud kernel and Oracle UEK container > kernel are affected by the issue, since they select CONFIG_PREEMPT_NONE=y. Time for them to move to CONFIG_PREEMPT_DYNAMIC? Also there has been recent action towards removing CONFIG_PREEMPT_NONE/VOLUNTARY and cond_resched() altogether because the lazy preemption model coming present in the RTPREEMPT patchset solves the performance issues with full preemption that PREEMPT_NONE works around... https://lwn.net/Articles/944686/ https://lwn.net/Articles/945422/ Further, Thomas Gleixner has stated in those discussions that: "Though definitely I'm putting a permanent NAK in place for any attempts to duct tape the preempt=NONE model any further by sprinkling more cond*() and whatever warts around." https://lwn.net/ml/linux-kernel/87jzshhexi.ffs@tglx/ > Explicitly call cond_resched in xfs_itruncate_extents_flags avoid > the below softlockup warning. IOWs, this is no longer considered an acceptible solution by core kernel maintainers. Regardless of these policy issues, the code change: > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c > index c0f1c89786c2..194381e10472 100644 > --- a/fs/xfs/xfs_inode.c > +++ b/fs/xfs/xfs_inode.c > @@ -4,6 +4,7 @@ > * All Rights Reserved. > */ > #include <linux/iversion.h> > +#include <linux/sched.h> Global includes like this go in fs/xfs/xfs_linux.h, but I don't think that's even necessary because we have cond_resched() calls elsewhere in XFS with the same include list as xfs_inode.c... > #include "xfs.h" > #include "xfs_fs.h" > @@ -1383,6 +1384,8 @@ xfs_itruncate_extents_flags( > error = xfs_defer_finish(&tp); > if (error) > goto out; > + > + cond_resched(); > } Shouldn't this go in xfs_defer_finish() so that we capture all the cases where we loop indefinitely over a range continually rolling a permanent transaction via xfs_defer_finish()? -Dave.
On Thu, Jan 11, 2024 at 1:45 AM Darrick J. Wong <djwong@kernel.org> wrote: > > On Wed, Jan 10, 2024 at 03:13:47PM +0800, Jian Wen wrote: > > From: Jian Wen <wenjianhn@gmail.com> > > > > Deleting a file with lots of extents may cause a soft lockup if the > > preemption model is none(CONFIG_PREEMPT_NONE=y or preempt=none is set > > in the kernel cmdline). Alibaba cloud kernel and Oracle UEK container > > kernel are affected by the issue, since they select CONFIG_PREEMPT_NONE=y. > > > > Explicitly call cond_resched in xfs_itruncate_extents_flags avoid > > the below softlockup warning. > > watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/0:13:139] > > Wowwee, how many extents does this file have, anyway? I deleted a file with 346915 extents in my test environment. I should have mentioned lockdep was enabled: CONFIG_LOCKDEP=y Without CONFIG_LOCKDEP the scheduling latency was several seconds. P99 latency of one of our services increased from a few ms to 280 ms because of the issue. Below are the steps to make a file badly fragmented. root@xfsunlink:~/vda# uname -r 6.7.0-rc8-g610a9b8f49fb root@xfsunlink:~/vda# img=test.raw ; qemu-img create -f raw -o extent_size_hint=0 ${img} 10G && qemu-img bench -f raw -t none -n -w ${img} -c 1000000 -S 8192 -o 0 && filefrag ${img} Formatting 'test.raw', fmt=raw size=10737418240 extent_size_hint=0 Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 0, step size 8192) Run completed in 42.506 seconds. test.raw: 346915 extents found > > > CPU: 0 PID: 139 Comm: kworker/0:13 Not tainted 6.7.0-rc8-g610a9b8f49fb #23 > > Workqueue: xfs-inodegc/vda1 xfs_inodegc_worker > > Call Trace: > > _raw_spin_lock+0x30/0x80 > > ? xfs_extent_busy_trim+0x38/0x200 > > xfs_extent_busy_trim+0x38/0x200 > > xfs_alloc_compute_aligned+0x38/0xd0 > > xfs_alloc_ag_vextent_size+0x1f1/0x870 > > xfs_alloc_fix_freelist+0x58a/0x770 > > xfs_free_extent_fix_freelist+0x60/0xa0 > > __xfs_free_extent+0x66/0x1d0 > > xfs_trans_free_extent+0xac/0x290 > > xfs_extent_free_finish_item+0xf/0x40 > > xfs_defer_finish_noroll+0x1db/0x7f0 > > xfs_defer_finish+0x10/0xa0 > > xfs_itruncate_extents_flags+0x169/0x4c0 > > xfs_inactive_truncate+0xba/0x140 > > xfs_inactive+0x239/0x2a0 > > xfs_inodegc_worker+0xa3/0x210 > > ? process_scheduled_works+0x273/0x550 > > process_scheduled_works+0x2da/0x550 > > worker_thread+0x180/0x350 > > > > Most of the Linux distributions default to voluntary preemption, > > might_sleep() would yield CPU if needed. Thus they are not affected. > > kworker/0:24+xf 298 [000] 7294.810021: probe:__cond_resched: > > __cond_resched+0x5 ([kernel.kallsyms]) > > __kmem_cache_alloc_node+0x17c ([kernel.kallsyms]) > > __kmalloc+0x4d ([kernel.kallsyms]) > > kmem_alloc+0x7a ([kernel.kallsyms]) > > xfs_extent_busy_insert_list+0x2e ([kernel.kallsyms]) > > __xfs_free_extent+0xd3 ([kernel.kallsyms]) > > xfs_trans_free_extent+0x93 ([kernel.kallsyms]) > > xfs_extent_free_finish_item+0xf ([kernel.kallsyms]) > > > > kworker/0:24+xf 298 [000] 7294.810045: probe:__cond_resched: > > __cond_resched+0x5 ([kernel.kallsyms]) > > down+0x11 ([kernel.kallsyms]) > > xfs_buf_lock+0x2c ([kernel.kallsyms]) > > xfs_buf_find_lock+0x62 ([kernel.kallsyms]) > > xfs_buf_get_map+0x1fd ([kernel.kallsyms]) > > xfs_buf_read_map+0x51 ([kernel.kallsyms]) > > xfs_trans_read_buf_map+0x1c5 ([kernel.kallsyms]) > > xfs_btree_read_buf_block.constprop.0+0xa1 ([kernel.kallsyms]) > > xfs_btree_lookup_get_block+0x97 ([kernel.kallsyms]) > > xfs_btree_lookup+0xc5 ([kernel.kallsyms]) > > xfs_alloc_lookup_eq+0x18 ([kernel.kallsyms]) > > xfs_free_ag_extent+0x63f ([kernel.kallsyms]) > > __xfs_free_extent+0xa7 ([kernel.kallsyms]) > > xfs_trans_free_extent+0x93 ([kernel.kallsyms]) > > xfs_extent_free_finish_item+0xf ([kernel.kallsyms]) > > > > Signed-off-by: Jian Wen <wenjian1@xiaomi.com> > > --- > > fs/xfs/xfs_inode.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c > > index c0f1c89786c2..194381e10472 100644 > > --- a/fs/xfs/xfs_inode.c > > +++ b/fs/xfs/xfs_inode.c > > @@ -4,6 +4,7 @@ > > * All Rights Reserved. > > */ > > #include <linux/iversion.h> > > +#include <linux/sched.h> > > > > #include "xfs.h" > > #include "xfs_fs.h" > > @@ -1383,6 +1384,8 @@ xfs_itruncate_extents_flags( > > error = xfs_defer_finish(&tp); > > if (error) > > goto out; > > + > > + cond_resched(); > > Is there a particular reason why truncation uses a giant chained > transaction for all extents (with xfs_defer_finish) instead of, say, one > chain per extent? I have no clue yet. > > (If you've actually studied this and know why then I guess cond_resched > is a reasonable bandaid, but I don't want to play cond_resched > whackamole here.) > > --D > > > } > > > > if (whichfork == XFS_DATA_FORK) { > > -- > > 2.25.1 > > > >
On Thu, Jan 11, 2024 at 5:38 AM Dave Chinner <david@fromorbit.com> wrote: > > On Wed, Jan 10, 2024 at 03:13:47PM +0800, Jian Wen wrote: > > From: Jian Wen <wenjianhn@gmail.com> > > > > Deleting a file with lots of extents may cause a soft lockup if the > > preemption model is none(CONFIG_PREEMPT_NONE=y or preempt=none is set > > in the kernel cmdline). Alibaba cloud kernel and Oracle UEK container > > kernel are affected by the issue, since they select CONFIG_PREEMPT_NONE=y. > > Time for them to move to CONFIG_PREEMPT_DYNAMIC? I had asked one of them to support CONFIG_PREEMPT_DYNAMIC before sending the patch. > > Also there has been recent action towards removing > CONFIG_PREEMPT_NONE/VOLUNTARY and cond_resched() altogether because > the lazy preemption model coming present in the RTPREEMPT patchset > solves the performance issues with full preemption that PREEMPT_NONE > works around... > > https://lwn.net/Articles/944686/ > https://lwn.net/Articles/945422/ > > Further, Thomas Gleixner has stated in those discussions that: > > "Though definitely I'm putting a permanent NAK in place for > any attempts to duct tape the preempt=NONE model any > further by sprinkling more cond*() and whatever warts > around." > > https://lwn.net/ml/linux-kernel/87jzshhexi.ffs@tglx/ > > > Explicitly call cond_resched in xfs_itruncate_extents_flags avoid > > the below softlockup warning. > > IOWs, this is no longer considered an acceptible solution by core > kernel maintainers. Understood. I will only build a hotfix for our production kernel then. > > Regardless of these policy issues, the code change: > > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c > > index c0f1c89786c2..194381e10472 100644 > > --- a/fs/xfs/xfs_inode.c > > +++ b/fs/xfs/xfs_inode.c > > @@ -4,6 +4,7 @@ > > * All Rights Reserved. > > */ > > #include <linux/iversion.h> > > +#include <linux/sched.h> > > Global includes like this go in fs/xfs/xfs_linux.h, but I don't > think that's even necessary because we have cond_resched() calls > elsewhere in XFS with the same include list as xfs_inode.c... > > > #include "xfs.h" > > #include "xfs_fs.h" > > @@ -1383,6 +1384,8 @@ xfs_itruncate_extents_flags( > > error = xfs_defer_finish(&tp); > > if (error) > > goto out; > > + > > + cond_resched(); > > } > > Shouldn't this go in xfs_defer_finish() so that we capture all the > cases where we loop indefinitely over a range continually rolling a > permanent transaction via xfs_defer_finish()? It seems xfs_collapse_file_space and xfs_insert_file_space also need to yield CPU. I don't have use cases for them yet. > > -Dave. > -- > Dave Chinner > david@fromorbit.com
On Wed, Jan 10, 2024 at 09:45:01AM -0800, Darrick J. Wong wrote: > On Wed, Jan 10, 2024 at 03:13:47PM +0800, Jian Wen wrote: > > From: Jian Wen <wenjianhn@gmail.com> > > > > Deleting a file with lots of extents may cause a soft lockup if the > > preemption model is none(CONFIG_PREEMPT_NONE=y or preempt=none is set > > in the kernel cmdline). Alibaba cloud kernel and Oracle UEK container > > kernel are affected by the issue, since they select CONFIG_PREEMPT_NONE=y. > > > > Explicitly call cond_resched in xfs_itruncate_extents_flags avoid > > the below softlockup warning. > > watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/0:13:139] > > Wowwee, how many extents does this file have, anyway? IIRC, extent removal runs at about 50,000 extents/s on a typical modern CPU on a production kernel build when everything is cached. 23s then equates to a bit above a million extents.... > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c > > index c0f1c89786c2..194381e10472 100644 > > --- a/fs/xfs/xfs_inode.c > > +++ b/fs/xfs/xfs_inode.c > > @@ -4,6 +4,7 @@ > > * All Rights Reserved. > > */ > > #include <linux/iversion.h> > > +#include <linux/sched.h> > > > > #include "xfs.h" > > #include "xfs_fs.h" > > @@ -1383,6 +1384,8 @@ xfs_itruncate_extents_flags( > > error = xfs_defer_finish(&tp); > > if (error) > > goto out; > > + > > + cond_resched(); > > Is there a particular reason why truncation uses a giant chained > transaction for all extents (with xfs_defer_finish) instead of, say, one > chain per extent? IIUC the question correctly, extent removal for a given range ibeing truncated has to been seen externally as an atomic change so the ILOCK has to held across the entire removal operation. This means it has to be executed as a single rolling transaction as transaction reservation cannot be done with the ILOCK held and the atomicity of extent removal prevents the ILOCK from being dropped. This atomicity only really matters for active operations like truncate, hole punching, etc when other operations may also be trying to act on the extent list. However, in inodegc, we are essentially guaranteed that nobody else will need to access the extent list whilst we are truncating them away, so it could be done as a bunch of fine grained transactions. However, this doesn't solve the problem the CPU never being yeilded if everything is cached and no locks or transaction reservations are ever contended. It will still hold the CPU for the same length of time. Hence I'm not sure changing the way the transactions are run will really change anything material w.r.t. the issue at hand. -Dave.
[cc Thomas, lkml] On Thu, Jan 11, 2024 at 08:52:22PM +0800, Jian Wen wrote: > On Thu, Jan 11, 2024 at 5:38 AM Dave Chinner <david@fromorbit.com> wrote: > > > > On Wed, Jan 10, 2024 at 03:13:47PM +0800, Jian Wen wrote: > > > From: Jian Wen <wenjianhn@gmail.com> > > > > > > Deleting a file with lots of extents may cause a soft lockup if the > > > preemption model is none(CONFIG_PREEMPT_NONE=y or preempt=none is set > > > in the kernel cmdline). Alibaba cloud kernel and Oracle UEK container > > > kernel are affected by the issue, since they select CONFIG_PREEMPT_NONE=y. > > > > Time for them to move to CONFIG_PREEMPT_DYNAMIC? > I had asked one of them to support CONFIG_PREEMPT_DYNAMIC before > sending the patch. OK. > > Also there has been recent action towards removing > > CONFIG_PREEMPT_NONE/VOLUNTARY and cond_resched() altogether because > > the lazy preemption model coming present in the RTPREEMPT patchset > > solves the performance issues with full preemption that PREEMPT_NONE > > works around... > > > > https://lwn.net/Articles/944686/ > > https://lwn.net/Articles/945422/ > > > > Further, Thomas Gleixner has stated in those discussions that: > > > > "Though definitely I'm putting a permanent NAK in place for > > any attempts to duct tape the preempt=NONE model any > > further by sprinkling more cond*() and whatever warts > > around." > > > > https://lwn.net/ml/linux-kernel/87jzshhexi.ffs@tglx/ > > > > > Explicitly call cond_resched in xfs_itruncate_extents_flags avoid > > > the below softlockup warning. > > > > IOWs, this is no longer considered an acceptible solution by core > > kernel maintainers. > Understood. I will only build a hotfix for our production kernel then. Yeah, that may be your best short term fix. We'll need to clarify what the current policy is on adding cond_resched points before we go any further in this direction. Thomas, any update on what is happening with cond_resched() - is there an ETA on it going away/being unnecessary? > > Regardless of these policy issues, the code change: > > > > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c > > > index c0f1c89786c2..194381e10472 100644 > > > --- a/fs/xfs/xfs_inode.c > > > +++ b/fs/xfs/xfs_inode.c > > > @@ -4,6 +4,7 @@ > > > * All Rights Reserved. > > > */ > > > #include <linux/iversion.h> > > > +#include <linux/sched.h> > > > > Global includes like this go in fs/xfs/xfs_linux.h, but I don't > > think that's even necessary because we have cond_resched() calls > > elsewhere in XFS with the same include list as xfs_inode.c... > > > > > #include "xfs.h" > > > #include "xfs_fs.h" > > > @@ -1383,6 +1384,8 @@ xfs_itruncate_extents_flags( > > > error = xfs_defer_finish(&tp); > > > if (error) > > > goto out; > > > + > > > + cond_resched(); > > > } > > > > Shouldn't this go in xfs_defer_finish() so that we capture all the > > cases where we loop indefinitely over a range continually rolling a > > permanent transaction via xfs_defer_finish()? > It seems xfs_collapse_file_space and xfs_insert_file_space also need > to yield CPU. > I don't have use cases for them yet. Yup, they do, but they also call xfs_defer_finish(), so having the cond_resched() in that function will capture them as well. Also, the current upstream tree has moved this code from xfs_itruncate_extents_flags() to xfs_bunmapi_range(), so the cond_resched() has to be moved, anyway. We may as well put it in xfs_defer_finish() if we end up doing this. Cheers, Dave.
On Fri, Jan 12 2024 at 07:27, Dave Chinner wrote: Cc: Ankur > On Thu, Jan 11, 2024 at 08:52:22PM +0800, Jian Wen wrote: >> On Thu, Jan 11, 2024 at 5:38 AM Dave Chinner <david@fromorbit.com> wrote: >> > IOWs, this is no longer considered an acceptible solution by core >> > kernel maintainers. >> Understood. I will only build a hotfix for our production kernel then. > > Yeah, that may be your best short term fix. We'll need to clarify > what the current policy is on adding cond_resched points before we > go any further in this direction. Well, right now until the scheduler situation is sorted there is no other solution than to add the cond_resched() muck. > Thomas, any update on what is happening with cond_resched() - is > there an ETA on it going away/being unnecessary? Ankur is working on that...
[ Missed this email until now. ] Thomas Gleixner writes: >On Fri, Jan 12 2024 at 07:27, Dave Chinner wrote: > >Cc: Ankur > > > On Thu, Jan 11, 2024 at 08:52:22PM +0800, Jian Wen wrote: > >> On Thu, Jan 11, 2024 at 5:38 AM Dave Chinner <david@fromorbit.com> wrote: > >> > IOWs, this is no longer considered an acceptible solution by core > >> > kernel maintainers. > >> Understood. I will only build a hotfix for our production kernel then. > > > > Yeah, that may be your best short term fix. We'll need to clarify > > what the current policy is on adding cond_resched points before we > > go any further in this direction. > > Well, right now until the scheduler situation is sorted there is no > other solution than to add the cond_resched() muck. > > > Thomas, any update on what is happening with cond_resched() - is > > there an ETA on it going away/being unnecessary? > > Ankur is working on that... Yeah, running through a final round of tests before sending out the series. Dave, on the status of cond_resched(): the work on this adds a new scheduling model (as Thomas implemented in his PoC) undwer which cond_resched() would basically be a stub. However, given that other preemption models continue to use cond_resched(), we would need to live with cond_resched() for a while -- at least while this model works well enough under a wide enough variety of loads. Thanks Ankur
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index c0f1c89786c2..194381e10472 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -4,6 +4,7 @@ * All Rights Reserved. */ #include <linux/iversion.h> +#include <linux/sched.h> #include "xfs.h" #include "xfs_fs.h" @@ -1383,6 +1384,8 @@ xfs_itruncate_extents_flags( error = xfs_defer_finish(&tp); if (error) goto out; + + cond_resched(); } if (whichfork == XFS_DATA_FORK) {