mbox series

[00/13] Throttle delayed refs based on time

Message ID 20200313212330.149024-1-josef@toxicpanda.com (mailing list archive)
Headers show
Series Throttle delayed refs based on time | expand

Message

Josef Bacik March 13, 2020, 9:23 p.m. UTC
Zygo reported a problem on IRC where he was seeing multi-hour long latencies
with his test rig with transaction commits.  This turned out to be because his
test rig runs rsync, balance, snapshot create and delete, dedup, a infinite loop
of mkdir/rmdirs, and I'm sure some other horrors I'm forgetting.

When I added the delayed refs reserve, I assumed that the space pressure
generated by generating a lot of delayed refs would result in transactions being
ended if they needed to be, and thus we no longer needed to throttle delayed
refs based on time.

This assumption was wrong, because in Zygo's case he has a multi terabyte file
system, so overcommit allows him to generate as many delayed refs as he wants.
This meant that he would need to run hundreds of thousands of delayed refs at
commit time.  To make matters worse, we didn't have a way to stop people from
generating more delayed refs, so the transaction commit could be held open
indefinitely by balance and snapshot delete.  This is how we were getting
transaction commits happening every few hours.

The solution to this problem is to bring back the time based delayed ref
flushing, and then add the ability for people to throttle themselves on that.

Balance and truncate already had this ability, it only needed to be added to
snapshot delete.

I've also added back the async delayed ref flushing, and I've added code to help
throttle people when we're generating delayed refs too fast for the system to
keep up with them.

This whole patch queue has been running in some form or another on Zygo's awful
test bed, and appears to be performing better than it was before I ripped out
the original code.  Thanks,

Josef