diff mbox

Is the checkpoint interval adjustable?

Message ID 20130731221008.156770@gmx.com (mailing list archive)
State New, archived
Headers show

Commit Message

Mike Audia July 31, 2013, 10:10 p.m. UTC
> On Wed, Jul 31, 2013 at 04:02:29PM -0400, Mike Audia wrote:
> > I believe 30 sec is the default for the checkpoint interval.  Is this adjustable?
>
> It doesn't look like it. It looks like it's implemented with raw '30's
> in the code.
>
>  delay = HZ * 30;
> ...
>  (now < cur->start_time || now - cur->start_time <
> 30)) {
>
> If you want more frequent forced commits you could always syncfs()
> regularly from userspace, I suppose.

Thank you kindly for the prompt reply.  My goal is to make them _less_ frequent.  I am NO programmer by any stretch.  Let's say I want them to be once every 5 min (300 sec).  Is the attached patch sane to acheive this?  Are there any unforeseen and effects of doing this?  Thank you for the consideration.

Comments

Zach Brown July 31, 2013, 10:56 p.m. UTC | #1
> Thank you kindly for the prompt reply.  My goal is to make them _less_
> frequent.

I assumed as much.  I should have added some sympathy smileys :).

>  I am NO programmer by any stretch.  Let's say I want them to be once
> every 5 min (300 sec).  Is the attached patch sane to acheive this?

I think it's a reasonable patch to try, yeah.

>  Are there any unforeseen and effects of doing this?  Thank you for
> the consideration.

I don't *think* that there should be.  One way of looking at it is that
both 30 and 300 seconds are an *eternity* for cpu, memory, and storage.
Any trouble that you could get in to in 300 seconds some other machine
could trivially get in to in 30 with beefier hardware.

But I reserve the right to be wrong.

- z
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Duncan Aug. 1, 2013, 3:11 a.m. UTC | #2
Zach Brown posted on Wed, 31 Jul 2013 15:56:40 -0700 as excerpted:

[Mike Audia wrote...]

>>  I am NO programmer by any stretch.  Let's say I want them to be once
>> every 5 min (300 sec).  Is the attached patch sane to acheive this?

>>  Are there any unforeseen and effects of doing this?

> I don't *think* that there should be.  One way of looking at it is that
> both 30 and 300 seconds are an *eternity* for cpu, memory, and storage.
> Any trouble that you could get in to in 300 seconds some other machine
> could trivially get in to in 30 with beefier hardware.

As a sysadmin (not a programmer) that has messed around with, for 
example, vm.dirty_bytes/ratio, vm.dirty_writeback_centisecs, etc, the 
concern I'd have is that longer commit periods and larger commit buffers 
increase the possibility of writeback storms.  While I've not tweaked 
btrfs and I probably need to reexamine my current settings since I've 
switched to SSD and btrfs, for spinning rust and reiserfs, I ended up 
tweaking vm.dirty_* here.

The files are /proc/sys/vm/* and the kernel documentation for them in 
Documentation/sysctl/vm.txt.  Most distros have an initscript that writes 
any custom values at boot, using values set in /etc/sysctl.conf and/or
/etc/sysctl.d/*, so that's where you'd normally set them once you've 
settled on values that work for you.

The following are the defaults and what I settled on for a wall-powered 
system.

vm.dirty_ratio defaults to 10 (percent of RAM). I've read and agree with 
opinions that 10% of RAM when RAM is say half a gig (so 10% is ~50 MB) 
isn't too bad on spinning rust, but it can be MUCH worse when RAM is say 
my current 16 gig (so 10% is ~1.6 gig), as that's several seconds of 
writeback on spinning rust.  I reset that to 3% (~half a gig), here.

vm.dirty_background_ratio similarly, 5 (% of RAM) by default, reset to 1 
(~160 MB).

(The vm.dirty_(background_)bytes knobs parallel the above "ratio" knobs 
and may be easier to set for those thinking in terms of writeback backlog 
size and corresponding system responsiveness or lack thereof during that 
writeback, instead of percentage of memory dirty.  Set one set or the 
other.)

OTOH, vm.dirty_expire_centisecs defaults to 2999 (30 seconds, this is the 
high priority foreground value and might well be the reason btrfs is 
coded for a 30 second commit time as well) and 
vm.dirty_writeback_centisecs defaults to 499 (5 seconds, this is the 
lower priority background value).  I left expire where it was, but 
decided with the stricter ratio settings, writeback could be 10 seconds, 
doubling the background writeback time.


Before tuning btrfs' hardcoded defaults, I'd suggest tuning these values 
if you haven't already done so, and keeping them in mind if you do decide 
to tune btrfs as well.

For battery powered systems, also take a look at laptop mode (and laptop-
mode-tools), which I use here on my laptop (which I don't have at hand to 
check what I set for vm.dirty_* on it).
David Sterba Aug. 1, 2013, 3:40 p.m. UTC | #3
On Wed, Jul 31, 2013 at 03:56:40PM -0700, Zach Brown wrote:
> >  I am NO programmer by any stretch.  Let's say I want them to be once
> > every 5 min (300 sec).  Is the attached patch sane to acheive this?
> 
> I think it's a reasonable patch to try, yeah.

There were a few requests to tune the interval. This finally made me to
finish the patch and will send it in a second.

> >  Are there any unforeseen and effects of doing this?  Thank you for
> > the consideration.
> 
> I don't *think* that there should be.  One way of looking at it is that
> both 30 and 300 seconds are an *eternity* for cpu, memory, and storage.
> Any trouble that you could get in to in 300 seconds some other machine
> could trivially get in to in 30 with beefier hardware.

That's a good point and lowers my worries a bit, though it would be
interesting to see in what way a beefy machine blows with 300 seconds
set.

david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zach Brown Aug. 1, 2013, 5:59 p.m. UTC | #4
> There were a few requests to tune the interval. This finally made me to
> finish the patch and will send it in a second.

Great, thanks.

> That's a good point and lowers my worries a bit, though it would be
> interesting to see in what way a beefy machine blows with 300 seconds
> set.

Agreed.  Ideally the transaction machinery decides at some point that a
transaction is sufficiently huge that it'll saturate the storage
pipeline and kicks it off.

- z
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- a/fs/btrfs/disk-io.c	2013-07-31 18:05:22.581062955 -0400
+++ b/fs/btrfs/disk-io.c	2013-07-31 18:06:15.243201652 -0400
@@ -1713,7 +1713,7 @@ 
 
 	do {
 		cannot_commit = false;
-		delay = HZ * 30;
+		delay = HZ * 300;
 		mutex_lock(&root->fs_info->transaction_kthread_mutex);
 
 		spin_lock(&root->fs_info->trans_lock);
@@ -1725,7 +1725,7 @@ 
 
 		now = get_seconds();
 		if (!cur->blocked &&
-		    (now < cur->start_time || now - cur->start_time < 30)) {
+		    (now < cur->start_time || now - cur->start_time < 300)) {
 			spin_unlock(&root->fs_info->trans_lock);
 			delay = HZ * 5;
 			goto sleep;