mbox series

[V7,0/2] percpu_ref & block: reduce memory footprint of percpu_ref in fast path

Message ID 20201001154842.26896-1-ming.lei@redhat.com (mailing list archive)
Headers show
Series percpu_ref & block: reduce memory footprint of percpu_ref in fast path | expand

Message

Ming Lei Oct. 1, 2020, 3:48 p.m. UTC
Hi,

The 1st patch removes memory footprint of percpu_ref in fast path
from 7 words to 2 words, since it is often used in fast path and
embedded in user struct.

The 2nd patch moves .q_usage_counter to 1st cacheline of
'request_queue'.

Simple test on null_blk shows ~2% IOPS boost on one 16cores(two threads
per core) machine, dual socket/numa.

V7:
	- add comments about reason for struct split

V6:
	- drop the 1st patch which adds percpu_ref_is_initialized() for MD
	only, since Christoph doesn't like it

V5:
	- fix memory leak on ref->data, only percpu_ref_exit() of patch 2
	is modified.

V4:
	- rename percpu_ref_inited as percpu_ref_is_initialized

V3:
	- fix kernel oops on MD
	- add patch for avoiding to use percpu-refcount internal from md
	  code
	- pass Red Hat CKI test which is done by Veronika Kabatova

V2:
	- pass 'gfp' to kzalloc() for fixing block/027 failure reported by
	kernel test robot
	- protect percpu_ref_is_zero() with destroying percpu-refcount by
	spin lock  

Ming Lei (2):
  percpu_ref: reduce memory footprint of percpu_ref in fast path
  block: move 'q_usage_counter' into front of 'request_queue'

 drivers/infiniband/sw/rdmavt/mr.c |   2 +-
 include/linux/blkdev.h            |   3 +-
 include/linux/percpu-refcount.h   |  52 ++++++------
 lib/percpu-refcount.c             | 131 ++++++++++++++++++++++--------
 4 files changed, 125 insertions(+), 63 deletions(-)

Cc: Veronika Kabatova <vkabatov@redhat.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Bart Van Assche <bvanassche@acm.org>

Comments

Ming Lei Oct. 6, 2020, 7:41 a.m. UTC | #1
On Thu, Oct 01, 2020 at 11:48:40PM +0800, Ming Lei wrote:
> Hi,
> 
> The 1st patch removes memory footprint of percpu_ref in fast path
> from 7 words to 2 words, since it is often used in fast path and
> embedded in user struct.
> 
> The 2nd patch moves .q_usage_counter to 1st cacheline of
> 'request_queue'.
> 
> Simple test on null_blk shows ~2% IOPS boost on one 16cores(two threads
> per core) machine, dual socket/numa.
> 
> V7:
> 	- add comments about reason for struct split

Hello Jens

Can you consider to merge the patchset in block tree if you are fine?


Thanks,
Ming
Jens Axboe Oct. 6, 2020, 1:30 p.m. UTC | #2
On 10/1/20 9:48 AM, Ming Lei wrote:
> Hi,
> 
> The 1st patch removes memory footprint of percpu_ref in fast path
> from 7 words to 2 words, since it is often used in fast path and
> embedded in user struct.
> 
> The 2nd patch moves .q_usage_counter to 1st cacheline of
> 'request_queue'.
> 
> Simple test on null_blk shows ~2% IOPS boost on one 16cores(two threads
> per core) machine, dual socket/numa.

Applied, thanks.