mbox series

[0/6] Improve visibility of writeback

Message ID 20240320110222.6564-1-shikemeng@huaweicloud.com (mailing list archive)
Headers show
Series Improve visibility of writeback | expand

Message

Kemeng Shi March 20, 2024, 11:02 a.m. UTC
This series tries to improve visilibity of writeback. Patch 1 make
/sys/kernel/debug/bdi/xxx/stats show writeback info of whole bdi
instead of only writeback info in root cgroup. Patch 2 add a new
debug file /sys/kernel/debug/bdi/xxx/wb_stats to show per wb writeback
info. Patch 4 add wb_monitor.py to monitor basic writeback info
of running system, more info could be added on demand. Rest patches
are some random cleanups. More details can be found in respective
patches. Thanks!

Following domain hierarchy is tested:
                global domain (320G)
                /                 \
        cgroup domain1(10G)     cgroup domain2(10G)
                |                 |
bdi            wb1               wb2

/* all writeback info of bdi is successfully collected */
# cat /sys/kernel/debug/bdi/252:16/stats:
BdiWriteback:              448 kB
BdiReclaimable:        1303904 kB
BdiDirtyThresh:      189914124 kB
DirtyThresh:         195337564 kB
BackgroundThresh:     32516508 kB
BdiDirtied:            3591392 kB
BdiWritten:            2287488 kB
BdiWriteBandwidth:      322248 kBps
b_dirty:                     0
b_io:                        0
b_more_io:                   2
b_dirty_time:                0
bdi_list:                    1
state:                       1

/* per wb writeback info is collected */
# cat /sys/kernel/debug/bdi/252:16/wb_stats:
cat wb_stats
WbCgIno:                    1
WbWriteback:                0 kB
WbReclaimable:              0 kB
WbDirtyThresh:              0 kB
WbDirtied:                  0 kB
WbWritten:                  0 kB
WbWriteBandwidth:      102400 kBps
b_dirty:                    0
b_io:                       0
b_more_io:                  0
b_dirty_time:               0
state:                      1
WbCgIno:                 4284
WbWriteback:              448 kB
WbReclaimable:         818944 kB
WbDirtyThresh:        3096524 kB
WbDirtied:            2266880 kB
WbWritten:            1447936 kB
WbWriteBandwidth:      214036 kBps
b_dirty:                    0
b_io:                       0
b_more_io:                  1
b_dirty_time:               0
state:                      5
WbCgIno:                 4325
WbWriteback:              224 kB
WbReclaimable:         819392 kB
WbDirtyThresh:        2920088 kB
WbDirtied:            2551808 kB
WbWritten:            1732416 kB
WbWriteBandwidth:      201832 kBps
b_dirty:                    0
b_io:                       0
b_more_io:                  1
b_dirty_time:               0
state:                      5

/* monitor writeback info */
# ./wb_monitor.py 252:16 -c
                  writeback  reclaimable   dirtied   written    avg_bw
252:16_1                  0            0         0         0    102400
252:16_4284             672       820064   9230368   8410304    685612
252:16_4325             896       819840  10491264   9671648    652348
252:16                 1568      1639904  19721632  18081952   1440360


                  writeback  reclaimable   dirtied   written    avg_bw
252:16_1                  0            0         0         0    102400
252:16_4284             672       820064   9230368   8410304    685612
252:16_4325             896       819840  10491264   9671648    652348
252:16                 1568      1639904  19721632  18081952   1440360
...



Kemeng Shi (6):
  writeback: collect stats of all wb of bdi in bdi_debug_stats_show
  writeback: support retrieving per group debug writeback stats of bdi
  workqueue: remove unnecessary import and function in wq_monitor.py
  writeback: add wb_monitor.py script to monitor writeback info on bdi
  writeback: rename nr_reclaimable to nr_dirty in balance_dirty_pages
  writeback: remove unneeded GDTC_INIT_NO_WB

 include/linux/writeback.h     |   1 +
 mm/backing-dev.c              | 159 ++++++++++++++++++++++++++-----
 mm/page-writeback.c           |  32 +++++--
 tools/workqueue/wq_monitor.py |   9 +-
 tools/writeback/wb_monitor.py | 172 ++++++++++++++++++++++++++++++++++
 5 files changed, 334 insertions(+), 39 deletions(-)
 create mode 100644 tools/writeback/wb_monitor.py

Comments

Tejun Heo March 20, 2024, 3:20 p.m. UTC | #1
Hello,

On Wed, Mar 20, 2024 at 07:02:16PM +0800, Kemeng Shi wrote:
> /* all writeback info of bdi is successfully collected */
> # cat /sys/kernel/debug/bdi/252:16/stats:
> BdiWriteback:              448 kB
...
> 
> /* per wb writeback info is collected */
> # cat /sys/kernel/debug/bdi/252:16/wb_stats:
> cat wb_stats
> WbCgIno:                    1
...
> 
> /* monitor writeback info */
> # ./wb_monitor.py 252:16 -c
>                   writeback  reclaimable   dirtied   written    avg_bw
> 252:16_1                  0            0         0         0    102400
> 252:16_4284             672       820064   9230368   8410304    685612
> 252:16_4325             896       819840  10491264   9671648    652348
> 252:16                 1568      1639904  19721632  18081952   1440360
> 
> 
>                   writeback  reclaimable   dirtied   written    avg_bw
> 252:16_1                  0            0         0         0    102400
> 252:16_4284             672       820064   9230368   8410304    685612
> 252:16_4325             896       819840  10491264   9671648    652348
> 252:16                 1568      1639904  19721632  18081952   1440360
> ...

Ah you have the example outputs here. It'd be nice to have these in the
commit messages too.

I'm not sure about the last patch but patchset looks good to me otherwise. I
don't feel particularly enthusiastic about adding more debugfs files
especially given that some distros disable debugfs completely, but no harm
in adding them either.

Thanks.
Jan Kara March 20, 2024, 5:22 p.m. UTC | #2
On Wed 20-03-24 19:02:16, Kemeng Shi wrote:
> This series tries to improve visilibity of writeback. Patch 1 make
> /sys/kernel/debug/bdi/xxx/stats show writeback info of whole bdi
> instead of only writeback info in root cgroup. Patch 2 add a new
> debug file /sys/kernel/debug/bdi/xxx/wb_stats to show per wb writeback
> info. Patch 4 add wb_monitor.py to monitor basic writeback info
> of running system, more info could be added on demand. Rest patches
> are some random cleanups. More details can be found in respective
> patches. Thanks!
> 
> Following domain hierarchy is tested:
>                 global domain (320G)
>                 /                 \
>         cgroup domain1(10G)     cgroup domain2(10G)
>                 |                 |
> bdi            wb1               wb2
> 
> /* all writeback info of bdi is successfully collected */
> # cat /sys/kernel/debug/bdi/252:16/stats:
> BdiWriteback:              448 kB
> BdiReclaimable:        1303904 kB
> BdiDirtyThresh:      189914124 kB
> DirtyThresh:         195337564 kB
> BackgroundThresh:     32516508 kB
> BdiDirtied:            3591392 kB
> BdiWritten:            2287488 kB
> BdiWriteBandwidth:      322248 kBps
> b_dirty:                     0
> b_io:                        0
> b_more_io:                   2
> b_dirty_time:                0
> bdi_list:                    1
> state:                       1
> 
> /* per wb writeback info is collected */
> # cat /sys/kernel/debug/bdi/252:16/wb_stats:
> cat wb_stats
> WbCgIno:                    1
> WbWriteback:                0 kB
> WbReclaimable:              0 kB
> WbDirtyThresh:              0 kB
> WbDirtied:                  0 kB
> WbWritten:                  0 kB
> WbWriteBandwidth:      102400 kBps
> b_dirty:                    0
> b_io:                       0
> b_more_io:                  0
> b_dirty_time:               0
> state:                      1
> WbCgIno:                 4284
> WbWriteback:              448 kB
> WbReclaimable:         818944 kB
> WbDirtyThresh:        3096524 kB
> WbDirtied:            2266880 kB
> WbWritten:            1447936 kB
> WbWriteBandwidth:      214036 kBps
> b_dirty:                    0
> b_io:                       0
> b_more_io:                  1
> b_dirty_time:               0
> state:                      5
> WbCgIno:                 4325
> WbWriteback:              224 kB
> WbReclaimable:         819392 kB
> WbDirtyThresh:        2920088 kB
> WbDirtied:            2551808 kB
> WbWritten:            1732416 kB
> WbWriteBandwidth:      201832 kBps
> b_dirty:                    0
> b_io:                       0
> b_more_io:                  1
> b_dirty_time:               0
> state:                      5
> 
> /* monitor writeback info */
> # ./wb_monitor.py 252:16 -c
>                   writeback  reclaimable   dirtied   written    avg_bw
> 252:16_1                  0            0         0         0    102400
> 252:16_4284             672       820064   9230368   8410304    685612
> 252:16_4325             896       819840  10491264   9671648    652348
> 252:16                 1568      1639904  19721632  18081952   1440360
> 
> 
>                   writeback  reclaimable   dirtied   written    avg_bw
> 252:16_1                  0            0         0         0    102400
> 252:16_4284             672       820064   9230368   8410304    685612
> 252:16_4325             896       819840  10491264   9671648    652348
> 252:16                 1568      1639904  19721632  18081952   1440360
> ...

So I'm wondering: Are you implementing this just because this looks
interesting or do you have a real need for the functionality? Why?

								Honza
Kemeng Shi March 21, 2024, 8:12 a.m. UTC | #3
on 3/21/2024 1:22 AM, Jan Kara wrote:
> On Wed 20-03-24 19:02:16, Kemeng Shi wrote:
>> This series tries to improve visilibity of writeback. Patch 1 make
>> /sys/kernel/debug/bdi/xxx/stats show writeback info of whole bdi
>> instead of only writeback info in root cgroup. Patch 2 add a new
>> debug file /sys/kernel/debug/bdi/xxx/wb_stats to show per wb writeback
>> info. Patch 4 add wb_monitor.py to monitor basic writeback info
>> of running system, more info could be added on demand. Rest patches
>> are some random cleanups. More details can be found in respective
>> patches. Thanks!
>>
>> Following domain hierarchy is tested:
>>                 global domain (320G)
>>                 /                 \
>>         cgroup domain1(10G)     cgroup domain2(10G)
>>                 |                 |
>> bdi            wb1               wb2
>>
>> /* all writeback info of bdi is successfully collected */
>> # cat /sys/kernel/debug/bdi/252:16/stats:
>> BdiWriteback:              448 kB
>> BdiReclaimable:        1303904 kB
>> BdiDirtyThresh:      189914124 kB
>> DirtyThresh:         195337564 kB
>> BackgroundThresh:     32516508 kB
>> BdiDirtied:            3591392 kB
>> BdiWritten:            2287488 kB
>> BdiWriteBandwidth:      322248 kBps
>> b_dirty:                     0
>> b_io:                        0
>> b_more_io:                   2
>> b_dirty_time:                0
>> bdi_list:                    1
>> state:                       1
>>
>> /* per wb writeback info is collected */
>> # cat /sys/kernel/debug/bdi/252:16/wb_stats:
>> cat wb_stats
>> WbCgIno:                    1
>> WbWriteback:                0 kB
>> WbReclaimable:              0 kB
>> WbDirtyThresh:              0 kB
>> WbDirtied:                  0 kB
>> WbWritten:                  0 kB
>> WbWriteBandwidth:      102400 kBps
>> b_dirty:                    0
>> b_io:                       0
>> b_more_io:                  0
>> b_dirty_time:               0
>> state:                      1
>> WbCgIno:                 4284
>> WbWriteback:              448 kB
>> WbReclaimable:         818944 kB
>> WbDirtyThresh:        3096524 kB
>> WbDirtied:            2266880 kB
>> WbWritten:            1447936 kB
>> WbWriteBandwidth:      214036 kBps
>> b_dirty:                    0
>> b_io:                       0
>> b_more_io:                  1
>> b_dirty_time:               0
>> state:                      5
>> WbCgIno:                 4325
>> WbWriteback:              224 kB
>> WbReclaimable:         819392 kB
>> WbDirtyThresh:        2920088 kB
>> WbDirtied:            2551808 kB
>> WbWritten:            1732416 kB
>> WbWriteBandwidth:      201832 kBps
>> b_dirty:                    0
>> b_io:                       0
>> b_more_io:                  1
>> b_dirty_time:               0
>> state:                      5
>>
>> /* monitor writeback info */
>> # ./wb_monitor.py 252:16 -c
>>                   writeback  reclaimable   dirtied   written    avg_bw
>> 252:16_1                  0            0         0         0    102400
>> 252:16_4284             672       820064   9230368   8410304    685612
>> 252:16_4325             896       819840  10491264   9671648    652348
>> 252:16                 1568      1639904  19721632  18081952   1440360
>>
>>
>>                   writeback  reclaimable   dirtied   written    avg_bw
>> 252:16_1                  0            0         0         0    102400
>> 252:16_4284             672       820064   9230368   8410304    685612
>> 252:16_4325             896       819840  10491264   9671648    652348
>> 252:16                 1568      1639904  19721632  18081952   1440360
>> ...
> 
> So I'm wondering: Are you implementing this just because this looks
> interesting or do you have a real need for the functionality? Why?
Hi Jan, I added debug files to test change in [1] which changes the way how
dirty background threshold of wb is calculated. Without debug files, we could
only monitor writeback to imply that threshold is corrected.
In current patchset, debug info has not included dirty background threshold yet,
I will add it when discution of calculation of dirty background threshold in [1]
is done.
The wb_monitor.py is suggested by Tejun in [2] to improve visibility of writeback.
The script is more convenient than trace to monitor writeback behavior of the running
system.

Thanks

[1] https://lore.kernel.org/lkml/a747dc7d-f24a-08bd-d969-d3fb35e151b7@huaweicloud.com/
[2] https://lore.kernel.org/lkml/ZcUsOb_fyvYr-zZ-@slm.duckdns.org/
> 
> 								Honza
>
Jan Kara March 21, 2024, 6:07 p.m. UTC | #4
On Thu 21-03-24 16:12:52, Kemeng Shi wrote:
> on 3/21/2024 1:22 AM, Jan Kara wrote:
> > On Wed 20-03-24 19:02:16, Kemeng Shi wrote:
> >> This series tries to improve visilibity of writeback. Patch 1 make
> >> /sys/kernel/debug/bdi/xxx/stats show writeback info of whole bdi
> >> instead of only writeback info in root cgroup. Patch 2 add a new
> >> debug file /sys/kernel/debug/bdi/xxx/wb_stats to show per wb writeback
> >> info. Patch 4 add wb_monitor.py to monitor basic writeback info
> >> of running system, more info could be added on demand. Rest patches
> >> are some random cleanups. More details can be found in respective
> >> patches. Thanks!
> >>
> >> Following domain hierarchy is tested:
> >>                 global domain (320G)
> >>                 /                 \
> >>         cgroup domain1(10G)     cgroup domain2(10G)
> >>                 |                 |
> >> bdi            wb1               wb2
> >>
> >> /* all writeback info of bdi is successfully collected */
> >> # cat /sys/kernel/debug/bdi/252:16/stats:
> >> BdiWriteback:              448 kB
> >> BdiReclaimable:        1303904 kB
> >> BdiDirtyThresh:      189914124 kB
> >> DirtyThresh:         195337564 kB
> >> BackgroundThresh:     32516508 kB
> >> BdiDirtied:            3591392 kB
> >> BdiWritten:            2287488 kB
> >> BdiWriteBandwidth:      322248 kBps
> >> b_dirty:                     0
> >> b_io:                        0
> >> b_more_io:                   2
> >> b_dirty_time:                0
> >> bdi_list:                    1
> >> state:                       1
> >>
> >> /* per wb writeback info is collected */
> >> # cat /sys/kernel/debug/bdi/252:16/wb_stats:
> >> cat wb_stats
> >> WbCgIno:                    1
> >> WbWriteback:                0 kB
> >> WbReclaimable:              0 kB
> >> WbDirtyThresh:              0 kB
> >> WbDirtied:                  0 kB
> >> WbWritten:                  0 kB
> >> WbWriteBandwidth:      102400 kBps
> >> b_dirty:                    0
> >> b_io:                       0
> >> b_more_io:                  0
> >> b_dirty_time:               0
> >> state:                      1
> >> WbCgIno:                 4284
> >> WbWriteback:              448 kB
> >> WbReclaimable:         818944 kB
> >> WbDirtyThresh:        3096524 kB
> >> WbDirtied:            2266880 kB
> >> WbWritten:            1447936 kB
> >> WbWriteBandwidth:      214036 kBps
> >> b_dirty:                    0
> >> b_io:                       0
> >> b_more_io:                  1
> >> b_dirty_time:               0
> >> state:                      5
> >> WbCgIno:                 4325
> >> WbWriteback:              224 kB
> >> WbReclaimable:         819392 kB
> >> WbDirtyThresh:        2920088 kB
> >> WbDirtied:            2551808 kB
> >> WbWritten:            1732416 kB
> >> WbWriteBandwidth:      201832 kBps
> >> b_dirty:                    0
> >> b_io:                       0
> >> b_more_io:                  1
> >> b_dirty_time:               0
> >> state:                      5
> >>
> >> /* monitor writeback info */
> >> # ./wb_monitor.py 252:16 -c
> >>                   writeback  reclaimable   dirtied   written    avg_bw
> >> 252:16_1                  0            0         0         0    102400
> >> 252:16_4284             672       820064   9230368   8410304    685612
> >> 252:16_4325             896       819840  10491264   9671648    652348
> >> 252:16                 1568      1639904  19721632  18081952   1440360
> >>
> >>
> >>                   writeback  reclaimable   dirtied   written    avg_bw
> >> 252:16_1                  0            0         0         0    102400
> >> 252:16_4284             672       820064   9230368   8410304    685612
> >> 252:16_4325             896       819840  10491264   9671648    652348
> >> 252:16                 1568      1639904  19721632  18081952   1440360
> >> ...
> > 
> > So I'm wondering: Are you implementing this just because this looks
> > interesting or do you have a real need for the functionality? Why?
> Hi Jan, I added debug files to test change in [1] which changes the way how
> dirty background threshold of wb is calculated. Without debug files, we could
> only monitor writeback to imply that threshold is corrected.
> In current patchset, debug info has not included dirty background threshold yet,
> I will add it when discution of calculation of dirty background threshold in [1]
> is done.
> The wb_monitor.py is suggested by Tejun in [2] to improve visibility of writeback.
> The script is more convenient than trace to monitor writeback behavior of the running
> system.

Thanks for the pointer. OK, I agree this is useful so let's have a look
into the code :)

								Honza