diff mbox series

[RFC,v1,15/18] mm: support write throttling for async buffered writes

Message ID 20220426174335.4004987-16-shr@fb.com (mailing list archive)
State New, archived
Headers show
Series io-uring/xfs: support async buffered writes | expand

Commit Message

Stefan Roesch April 26, 2022, 5:43 p.m. UTC
This change adds support for async write throttling in the function
balance_dirty_pages(). So far if throttling was required, the code was
waiting synchronously as long as the writes were throttled. This change
introduces asynchronous throttling. Instead of waiting in the function
balance_dirty_pages(), the timeout is set in the task_struct field
bdp_pause. Once the timeout has expired, the writes are no longer
throttled.

- Add a new parameter to the balance_dirty_pages() function
  - This allows the caller to pass in the nowait flag
  - When the nowait flag is specified, the code does not wait in
    balance_dirty_pages(), but instead stores the wait expiration in the
    new task_struct field bdp_pause.

- The function balance_dirty_pages_ratelimited() resets the new values
  in the task_struct, once the timeout has expired

This change is required to support write throttling for the async
buffered writes. While the writes are throttled, io_uring still can make
progress with processing other requests.

Signed-off-by: Stefan Roesch <shr@fb.com>
---
 include/linux/writeback.h |  1 +
 mm/page-writeback.c       | 54 ++++++++++++++++++++++++++++-----------
 2 files changed, 40 insertions(+), 15 deletions(-)

Comments

Jan Kara April 28, 2022, 5:47 p.m. UTC | #1
On Tue 26-04-22 10:43:32, Stefan Roesch wrote:
> This change adds support for async write throttling in the function
> balance_dirty_pages(). So far if throttling was required, the code was
> waiting synchronously as long as the writes were throttled. This change
> introduces asynchronous throttling. Instead of waiting in the function
> balance_dirty_pages(), the timeout is set in the task_struct field
> bdp_pause. Once the timeout has expired, the writes are no longer
> throttled.
> 
> - Add a new parameter to the balance_dirty_pages() function
>   - This allows the caller to pass in the nowait flag
>   - When the nowait flag is specified, the code does not wait in
>     balance_dirty_pages(), but instead stores the wait expiration in the
>     new task_struct field bdp_pause.
> 
> - The function balance_dirty_pages_ratelimited() resets the new values
>   in the task_struct, once the timeout has expired
> 
> This change is required to support write throttling for the async
> buffered writes. While the writes are throttled, io_uring still can make
> progress with processing other requests.
> 
> Signed-off-by: Stefan Roesch <shr@fb.com>

Maybe I miss something but I don't think this will throttle writers enough.
For three reasons:

1) The calculated throttling pauses should accumulate for the task so that
if we compute that say it takes 0.1s to write 100 pages and the task writes
300 pages, the delay adds up to 0.3s properly. Otherwise the task would not
be throttled as long as we expect the writeback to take.

2) We must not allow the amount of dirty pages to exceed the dirty limit.
That can easily lead to page reclaim getting into trouble reclaiming pages
and thus machine stalls, oom kills etc. So if we are coming close to dirty
limit and we cannot sleep, we must just fail the nowait write.

3) Even with above two problems fixed I suspect results will be suboptimal
because balance_dirty_pages() heuristics assume they get called reasonably
often and throttle writes so if amount of dirty pages is coming close to
dirty limit, they think we are overestimating writeback speed and update
throttling parameters accordingly. So if io_uring code does not throttle
writers often enough, I think dirty throttling parameters will be jumping
wildly resulting in poor behavior.

So what I'd probably suggest is that if balance_dirty_pages() is called in
"async" mode, we'd give tasks a pass until dirty_freerun_ceiling(). If
balance_dirty_pages() decides the task needs to wait, we store the pause
and bail all the way up into the place where we can sleep (io_uring code I
assume), sleep there, and then continue doing write.

								Honza
Stefan Roesch April 28, 2022, 8:16 p.m. UTC | #2
On 4/28/22 10:47 AM, Jan Kara wrote:
> On Tue 26-04-22 10:43:32, Stefan Roesch wrote:
>> This change adds support for async write throttling in the function
>> balance_dirty_pages(). So far if throttling was required, the code was
>> waiting synchronously as long as the writes were throttled. This change
>> introduces asynchronous throttling. Instead of waiting in the function
>> balance_dirty_pages(), the timeout is set in the task_struct field
>> bdp_pause. Once the timeout has expired, the writes are no longer
>> throttled.
>>
>> - Add a new parameter to the balance_dirty_pages() function
>>   - This allows the caller to pass in the nowait flag
>>   - When the nowait flag is specified, the code does not wait in
>>     balance_dirty_pages(), but instead stores the wait expiration in the
>>     new task_struct field bdp_pause.
>>
>> - The function balance_dirty_pages_ratelimited() resets the new values
>>   in the task_struct, once the timeout has expired
>>
>> This change is required to support write throttling for the async
>> buffered writes. While the writes are throttled, io_uring still can make
>> progress with processing other requests.
>>
>> Signed-off-by: Stefan Roesch <shr@fb.com>
> 
> Maybe I miss something but I don't think this will throttle writers enough.
> For three reasons:
> 
> 1) The calculated throttling pauses should accumulate for the task so that
> if we compute that say it takes 0.1s to write 100 pages and the task writes
> 300 pages, the delay adds up to 0.3s properly. Otherwise the task would not
> be throttled as long as we expect the writeback to take.
> 
> 2) We must not allow the amount of dirty pages to exceed the dirty limit.
> That can easily lead to page reclaim getting into trouble reclaiming pages
> and thus machine stalls, oom kills etc. So if we are coming close to dirty
> limit and we cannot sleep, we must just fail the nowait write.
> 
> 3) Even with above two problems fixed I suspect results will be suboptimal
> because balance_dirty_pages() heuristics assume they get called reasonably
> often and throttle writes so if amount of dirty pages is coming close to
> dirty limit, they think we are overestimating writeback speed and update
> throttling parameters accordingly. So if io_uring code does not throttle
> writers often enough, I think dirty throttling parameters will be jumping
> wildly resulting in poor behavior.
> 
> So what I'd probably suggest is that if balance_dirty_pages() is called in
> "async" mode, we'd give tasks a pass until dirty_freerun_ceiling(). If
> balance_dirty_pages() decides the task needs to wait, we store the pause
> and bail all the way up into the place where we can sleep (io_uring code I
> assume), sleep there, and then continue doing write.
> 

Jan, thanks for the feedback. Are you suggesting to change the following check
in the function balance_dirty_pages():

                /*
                 * Throttle it only when the background writeback cannot
                 * catch-up. This avoids (excessively) small writeouts
                 * when the wb limits are ramping up in case of !strictlimit.
                 *
                 * In strictlimit case make decision based on the wb counters
                 * and limits. Small writeouts when the wb limits are ramping
                 * up are the price we consciously pay for strictlimit-ing.
                 *
                 * If memcg domain is in effect, @dirty should be under
                 * both global and memcg freerun ceilings.
                 */
                if (dirty <= dirty_freerun_ceiling(thresh, bg_thresh) &&
                    (!mdtc ||
                     m_dirty <= dirty_freerun_ceiling(m_thresh, m_bg_thresh))) {
                        unsigned long intv;
                        unsigned long m_intv;

to include if we are in async mode?


There is no direct way to return that the process should sleep. Instead two new
fields are introduced in the proc structure. These two fields are then used in
io_uring to determine if the writes for a task need to be throttled.

In case the writes need to be throttled, the writes are not issued, but instead
inserted on a wait queue. We cannot sleep in the general io_uring code path as
we still want to process other requests which are affected by the throttling.


> 								Honza
>
Jan Kara May 10, 2022, 9:50 a.m. UTC | #3
Sorry for delayed reply. This has fallen through the cracks...

On Thu 28-04-22 13:16:19, Stefan Roesch wrote:
> On 4/28/22 10:47 AM, Jan Kara wrote:
> > On Tue 26-04-22 10:43:32, Stefan Roesch wrote:
> >> This change adds support for async write throttling in the function
> >> balance_dirty_pages(). So far if throttling was required, the code was
> >> waiting synchronously as long as the writes were throttled. This change
> >> introduces asynchronous throttling. Instead of waiting in the function
> >> balance_dirty_pages(), the timeout is set in the task_struct field
> >> bdp_pause. Once the timeout has expired, the writes are no longer
> >> throttled.
> >>
> >> - Add a new parameter to the balance_dirty_pages() function
> >>   - This allows the caller to pass in the nowait flag
> >>   - When the nowait flag is specified, the code does not wait in
> >>     balance_dirty_pages(), but instead stores the wait expiration in the
> >>     new task_struct field bdp_pause.
> >>
> >> - The function balance_dirty_pages_ratelimited() resets the new values
> >>   in the task_struct, once the timeout has expired
> >>
> >> This change is required to support write throttling for the async
> >> buffered writes. While the writes are throttled, io_uring still can make
> >> progress with processing other requests.
> >>
> >> Signed-off-by: Stefan Roesch <shr@fb.com>
> > 
> > Maybe I miss something but I don't think this will throttle writers enough.
> > For three reasons:
> > 
> > 1) The calculated throttling pauses should accumulate for the task so that
> > if we compute that say it takes 0.1s to write 100 pages and the task writes
> > 300 pages, the delay adds up to 0.3s properly. Otherwise the task would not
> > be throttled as long as we expect the writeback to take.
> > 
> > 2) We must not allow the amount of dirty pages to exceed the dirty limit.
> > That can easily lead to page reclaim getting into trouble reclaiming pages
> > and thus machine stalls, oom kills etc. So if we are coming close to dirty
> > limit and we cannot sleep, we must just fail the nowait write.
> > 
> > 3) Even with above two problems fixed I suspect results will be suboptimal
> > because balance_dirty_pages() heuristics assume they get called reasonably
> > often and throttle writes so if amount of dirty pages is coming close to
> > dirty limit, they think we are overestimating writeback speed and update
> > throttling parameters accordingly. So if io_uring code does not throttle
> > writers often enough, I think dirty throttling parameters will be jumping
> > wildly resulting in poor behavior.
> > 
> > So what I'd probably suggest is that if balance_dirty_pages() is called in
> > "async" mode, we'd give tasks a pass until dirty_freerun_ceiling(). If
> > balance_dirty_pages() decides the task needs to wait, we store the pause
> > and bail all the way up into the place where we can sleep (io_uring code I
> > assume), sleep there, and then continue doing write.
> > 
> 
> Jan, thanks for the feedback. Are you suggesting to change the following
> check in the function balance_dirty_pages():
> 
>                 /*
>                  * Throttle it only when the background writeback cannot
>                  * catch-up. This avoids (excessively) small writeouts
>                  * when the wb limits are ramping up in case of !strictlimit.
>                  *
>                  * In strictlimit case make decision based on the wb counters
>                  * and limits. Small writeouts when the wb limits are ramping
>                  * up are the price we consciously pay for strictlimit-ing.
>                  *
>                  * If memcg domain is in effect, @dirty should be under
>                  * both global and memcg freerun ceilings.
>                  */
>                 if (dirty <= dirty_freerun_ceiling(thresh, bg_thresh) &&
>                     (!mdtc ||
>                      m_dirty <= dirty_freerun_ceiling(m_thresh, m_bg_thresh))) {
>                         unsigned long intv;
>                         unsigned long m_intv;
> 
> to include if we are in async mode?

Actually no. This condition is the one that gives any task a free pass
until dirty_freerun_ceiling(). So there's no need to do any modification
for that. Sorry, I've probably formulated my suggestion in a bit confusing
way.

> There is no direct way to return that the process should sleep. Instead
> two new fields are introduced in the proc structure. These two fields are
> then used in io_uring to determine if the writes for a task need to be
> throttled.
> 
> In case the writes need to be throttled, the writes are not issued, but
> instead inserted on a wait queue. We cannot sleep in the general io_uring
> code path as we still want to process other requests which are affected
> by the throttling.

Probably you wanted to say "are not affected by the throttling" in the
above.

I know that you're using fields in task_struct to propagate the delay info.
But IMHO that is unnecessary (although I don't care too much). Instead we
could factor out a variant of balance_dirty_pages() that returns 'pause' to
sleep, 0 if no sleeping needed. Normal balance_dirty_pages() would use this
for pause calculation, places wanting async throttling would only get the
pause to sleep. So e.g. iomap_write_iter() would then check and if returned
pause is > 0, it would abort the loop similary as we'd abort it for any
other reason when NOWAIT write is aborted because we need to sleep. Iouring
code then detects short write / EAGAIN and offloads the write to the
workqueue where normal balance_dirty_pages() can sleep as needed.

This will make sure dirty limits are properly observed and we don't need
that much special handling for it.

								Honza
Stefan Roesch May 10, 2022, 8:16 p.m. UTC | #4
On 5/10/22 2:50 AM, Jan Kara wrote:
> Sorry for delayed reply. This has fallen through the cracks...
> 
> On Thu 28-04-22 13:16:19, Stefan Roesch wrote:
>> On 4/28/22 10:47 AM, Jan Kara wrote:
>>> On Tue 26-04-22 10:43:32, Stefan Roesch wrote:
>>>> This change adds support for async write throttling in the function
>>>> balance_dirty_pages(). So far if throttling was required, the code was
>>>> waiting synchronously as long as the writes were throttled. This change
>>>> introduces asynchronous throttling. Instead of waiting in the function
>>>> balance_dirty_pages(), the timeout is set in the task_struct field
>>>> bdp_pause. Once the timeout has expired, the writes are no longer
>>>> throttled.
>>>>
>>>> - Add a new parameter to the balance_dirty_pages() function
>>>>   - This allows the caller to pass in the nowait flag
>>>>   - When the nowait flag is specified, the code does not wait in
>>>>     balance_dirty_pages(), but instead stores the wait expiration in the
>>>>     new task_struct field bdp_pause.
>>>>
>>>> - The function balance_dirty_pages_ratelimited() resets the new values
>>>>   in the task_struct, once the timeout has expired
>>>>
>>>> This change is required to support write throttling for the async
>>>> buffered writes. While the writes are throttled, io_uring still can make
>>>> progress with processing other requests.
>>>>
>>>> Signed-off-by: Stefan Roesch <shr@fb.com>
>>>
>>> Maybe I miss something but I don't think this will throttle writers enough.
>>> For three reasons:
>>>
>>> 1) The calculated throttling pauses should accumulate for the task so that
>>> if we compute that say it takes 0.1s to write 100 pages and the task writes
>>> 300 pages, the delay adds up to 0.3s properly. Otherwise the task would not
>>> be throttled as long as we expect the writeback to take.
>>>
>>> 2) We must not allow the amount of dirty pages to exceed the dirty limit.
>>> That can easily lead to page reclaim getting into trouble reclaiming pages
>>> and thus machine stalls, oom kills etc. So if we are coming close to dirty
>>> limit and we cannot sleep, we must just fail the nowait write.
>>>
>>> 3) Even with above two problems fixed I suspect results will be suboptimal
>>> because balance_dirty_pages() heuristics assume they get called reasonably
>>> often and throttle writes so if amount of dirty pages is coming close to
>>> dirty limit, they think we are overestimating writeback speed and update
>>> throttling parameters accordingly. So if io_uring code does not throttle
>>> writers often enough, I think dirty throttling parameters will be jumping
>>> wildly resulting in poor behavior.
>>>
>>> So what I'd probably suggest is that if balance_dirty_pages() is called in
>>> "async" mode, we'd give tasks a pass until dirty_freerun_ceiling(). If
>>> balance_dirty_pages() decides the task needs to wait, we store the pause
>>> and bail all the way up into the place where we can sleep (io_uring code I
>>> assume), sleep there, and then continue doing write.
>>>
>>
>> Jan, thanks for the feedback. Are you suggesting to change the following
>> check in the function balance_dirty_pages():
>>
>>                 /*
>>                  * Throttle it only when the background writeback cannot
>>                  * catch-up. This avoids (excessively) small writeouts
>>                  * when the wb limits are ramping up in case of !strictlimit.
>>                  *
>>                  * In strictlimit case make decision based on the wb counters
>>                  * and limits. Small writeouts when the wb limits are ramping
>>                  * up are the price we consciously pay for strictlimit-ing.
>>                  *
>>                  * If memcg domain is in effect, @dirty should be under
>>                  * both global and memcg freerun ceilings.
>>                  */
>>                 if (dirty <= dirty_freerun_ceiling(thresh, bg_thresh) &&
>>                     (!mdtc ||
>>                      m_dirty <= dirty_freerun_ceiling(m_thresh, m_bg_thresh))) {
>>                         unsigned long intv;
>>                         unsigned long m_intv;
>>
>> to include if we are in async mode?
> 
> Actually no. This condition is the one that gives any task a free pass
> until dirty_freerun_ceiling(). So there's no need to do any modification
> for that. Sorry, I've probably formulated my suggestion in a bit confusing
> way.
> 
>> There is no direct way to return that the process should sleep. Instead
>> two new fields are introduced in the proc structure. These two fields are
>> then used in io_uring to determine if the writes for a task need to be
>> throttled.
>>
>> In case the writes need to be throttled, the writes are not issued, but
>> instead inserted on a wait queue. We cannot sleep in the general io_uring
>> code path as we still want to process other requests which are affected
>> by the throttling.
> 
> Probably you wanted to say "are not affected by the throttling" in the
> above.
> 

Yes, that's correct.

> I know that you're using fields in task_struct to propagate the delay info.
> But IMHO that is unnecessary (although I don't care too much). Instead we
> could factor out a variant of balance_dirty_pages() that returns 'pause' to
> sleep, 0 if no sleeping needed. Normal balance_dirty_pages() would use this
> for pause calculation, places wanting async throttling would only get the
> pause to sleep. So e.g. iomap_write_iter() would then check and if returned
> pause is > 0, it would abort the loop similary as we'd abort it for any
> other reason when NOWAIT write is aborted because we need to sleep. Iouring
> code then detects short write / EAGAIN and offloads the write to the
> workqueue where normal balance_dirty_pages() can sleep as needed.
> 
> This will make sure dirty limits are properly observed and we don't need
> that much special handling for it.
>

I like the idea of factoring out a function out balance_dirty_pages(), however

I see two challenges:
- the write operation has already completed at this point,
- so we can't really sleep on its completion in the io-worker in io-uring
- we don't know how long to sleep in io-uring

Currently balance_dirty_pages_ratelimited() is called at the end of the function
iomap_write_iter(). If the function balance_dirty_pages_ratelimited() would instead
be called at the beginning of the function iomap_write_iter() we could return -EAGAIN
and then complete it in the io-worker.

I'm not sure what the implications are of moving the function call to the beginning of
the function iomap_write_iter().
 
> 								Honza
Jan Kara May 11, 2022, 10:38 a.m. UTC | #5
On Tue 10-05-22 13:16:30, Stefan Roesch wrote:
> On 5/10/22 2:50 AM, Jan Kara wrote:
> > I know that you're using fields in task_struct to propagate the delay info.
> > But IMHO that is unnecessary (although I don't care too much). Instead we
> > could factor out a variant of balance_dirty_pages() that returns 'pause' to
> > sleep, 0 if no sleeping needed. Normal balance_dirty_pages() would use this
> > for pause calculation, places wanting async throttling would only get the
> > pause to sleep. So e.g. iomap_write_iter() would then check and if returned
> > pause is > 0, it would abort the loop similary as we'd abort it for any
> > other reason when NOWAIT write is aborted because we need to sleep. Iouring
> > code then detects short write / EAGAIN and offloads the write to the
> > workqueue where normal balance_dirty_pages() can sleep as needed.
> > 
> > This will make sure dirty limits are properly observed and we don't need
> > that much special handling for it.
> >
> 
> I like the idea of factoring out a function out balance_dirty_pages(), however
> 
> I see two challenges:
> - the write operation has already completed at this point,
> - so we can't really sleep on its completion in the io-worker in io-uring
> - we don't know how long to sleep in io-uring
> 
> Currently balance_dirty_pages_ratelimited() is called at the end of the
> function iomap_write_iter(). If the function
> balance_dirty_pages_ratelimited() would instead be called at the
> beginning of the function iomap_write_iter() we could return -EAGAIN and
> then complete it in the io-worker.

Well, we call balance_dirty_pages_ratelimited() after each page. So it does
not really matter much if the sleep is pushed to happen one page later.
balance_dirty_pages_ratelimited() does ratelimiting of when
balance_dirty_pages() are called so we have to make sure
current->nr_dirtied is not zeroed out before we really do wait (because
that is what determines whether we enter balance_dirty_pages() and how long
we sleep there) but looking at the code that should work out just fine.

								Honza
Stefan Roesch May 13, 2022, 6:57 p.m. UTC | #6
On 5/11/22 3:38 AM, Jan Kara wrote:
> On Tue 10-05-22 13:16:30, Stefan Roesch wrote:
>> On 5/10/22 2:50 AM, Jan Kara wrote:
>>> I know that you're using fields in task_struct to propagate the delay info.
>>> But IMHO that is unnecessary (although I don't care too much). Instead we
>>> could factor out a variant of balance_dirty_pages() that returns 'pause' to
>>> sleep, 0 if no sleeping needed. Normal balance_dirty_pages() would use this
>>> for pause calculation, places wanting async throttling would only get the
>>> pause to sleep. So e.g. iomap_write_iter() would then check and if returned
>>> pause is > 0, it would abort the loop similary as we'd abort it for any
>>> other reason when NOWAIT write is aborted because we need to sleep. Iouring
>>> code then detects short write / EAGAIN and offloads the write to the
>>> workqueue where normal balance_dirty_pages() can sleep as needed.
>>>
>>> This will make sure dirty limits are properly observed and we don't need
>>> that much special handling for it.
>>>
>>
>> I like the idea of factoring out a function out balance_dirty_pages(), however
>>
>> I see two challenges:
>> - the write operation has already completed at this point,
>> - so we can't really sleep on its completion in the io-worker in io-uring
>> - we don't know how long to sleep in io-uring
>>
>> Currently balance_dirty_pages_ratelimited() is called at the end of the
>> function iomap_write_iter(). If the function
>> balance_dirty_pages_ratelimited() would instead be called at the
>> beginning of the function iomap_write_iter() we could return -EAGAIN and
>> then complete it in the io-worker.
> 
> Well, we call balance_dirty_pages_ratelimited() after each page. So it does
> not really matter much if the sleep is pushed to happen one page later.
> balance_dirty_pages_ratelimited() does ratelimiting of when
> balance_dirty_pages() are called so we have to make sure
> current->nr_dirtied is not zeroed out before we really do wait (because
> that is what determines whether we enter balance_dirty_pages() and how long
> we sleep there) but looking at the code that should work out just fine.
> 

I'll make the changes to balance_dirty_pages() for the next version of the
patch series.

> 								Honza
diff mbox series

Patch

diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index fec248ab1fec..48176a8047db 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -373,6 +373,7 @@  unsigned long wb_calc_thresh(struct bdi_writeback *wb, unsigned long thresh);
 
 void wb_update_bandwidth(struct bdi_writeback *wb);
 void balance_dirty_pages_ratelimited(struct address_space *mapping);
+void  balance_dirty_pages_ratelimited_flags(struct address_space *mapping, bool is_async);
 bool wb_over_bg_thresh(struct bdi_writeback *wb);
 
 typedef int (*writepage_t)(struct page *page, struct writeback_control *wbc,
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 7e2da284e427..a62aa8a4c2f2 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1546,7 +1546,7 @@  static inline void wb_dirty_limits(struct dirty_throttle_control *dtc)
  * perform some writeout.
  */
 static void balance_dirty_pages(struct bdi_writeback *wb,
-				unsigned long pages_dirtied)
+				unsigned long pages_dirtied, bool is_async)
 {
 	struct dirty_throttle_control gdtc_stor = { GDTC_INIT(wb) };
 	struct dirty_throttle_control mdtc_stor = { MDTC_INIT(wb, &gdtc_stor) };
@@ -1780,6 +1780,14 @@  static void balance_dirty_pages(struct bdi_writeback *wb,
 					  period,
 					  pause,
 					  start_time);
+		if (is_async) {
+			if (current->bdp_nr_dirtied_pause == -1) {
+				current->bdp_pause = now + pause;
+				current->bdp_nr_dirtied_pause = nr_dirtied_pause;
+			}
+			break;
+		}
+
 		__set_current_state(TASK_KILLABLE);
 		wb->dirty_sleep = now;
 		io_schedule_timeout(pause);
@@ -1787,6 +1795,8 @@  static void balance_dirty_pages(struct bdi_writeback *wb,
 		current->dirty_paused_when = now + pause;
 		current->nr_dirtied = 0;
 		current->nr_dirtied_pause = nr_dirtied_pause;
+		current->bdp_nr_dirtied_pause = -1;
+		current->bdp_pause = 0;
 
 		/*
 		 * This is typically equal to (dirty < thresh) and can also
@@ -1851,19 +1861,7 @@  static DEFINE_PER_CPU(int, bdp_ratelimits);
  */
 DEFINE_PER_CPU(int, dirty_throttle_leaks) = 0;
 
-/**
- * balance_dirty_pages_ratelimited - balance dirty memory state
- * @mapping: address_space which was dirtied
- *
- * Processes which are dirtying memory should call in here once for each page
- * which was newly dirtied.  The function will periodically check the system's
- * dirty state and will initiate writeback if needed.
- *
- * Once we're over the dirty memory limit we decrease the ratelimiting
- * by a lot, to prevent individual processes from overshooting the limit
- * by (ratelimit_pages) each.
- */
-void balance_dirty_pages_ratelimited(struct address_space *mapping)
+void balance_dirty_pages_ratelimited_flags(struct address_space *mapping, bool is_async)
 {
 	struct inode *inode = mapping->host;
 	struct backing_dev_info *bdi = inode_to_bdi(inode);
@@ -1874,6 +1872,15 @@  void balance_dirty_pages_ratelimited(struct address_space *mapping)
 	if (!(bdi->capabilities & BDI_CAP_WRITEBACK))
 		return;
 
+	if (current->bdp_nr_dirtied_pause != -1 && time_after(jiffies, current->bdp_pause)) {
+		current->dirty_paused_when = current->bdp_pause;
+		current->nr_dirtied = 0;
+		current->nr_dirtied_pause = current->bdp_nr_dirtied_pause;
+
+		current->bdp_nr_dirtied_pause = -1;
+		current->bdp_pause = 0;
+	}
+
 	if (inode_cgwb_enabled(inode))
 		wb = wb_get_create_current(bdi, GFP_KERNEL);
 	if (!wb)
@@ -1912,10 +1919,27 @@  void balance_dirty_pages_ratelimited(struct address_space *mapping)
 	preempt_enable();
 
 	if (unlikely(current->nr_dirtied >= ratelimit))
-		balance_dirty_pages(wb, current->nr_dirtied);
+		balance_dirty_pages(wb, current->nr_dirtied, is_async);
 
 	wb_put(wb);
 }
+
+/**
+ * balance_dirty_pages_ratelimited - balance dirty memory state
+ * @mapping: address_space which was dirtied
+ *
+ * Processes which are dirtying memory should call in here once for each page
+ * which was newly dirtied.  The function will periodically check the system's
+ * dirty state and will initiate writeback if needed.
+ *
+ * Once we're over the dirty memory limit we decrease the ratelimiting
+ * by a lot, to prevent individual processes from overshooting the limit
+ * by (ratelimit_pages) each.
+ */
+void balance_dirty_pages_ratelimited(struct address_space *mapping)
+{
+	balance_dirty_pages_ratelimited_flags(mapping, false);
+}
 EXPORT_SYMBOL(balance_dirty_pages_ratelimited);
 
 /**