diff mbox series

[V2] raid1: ensure write behind bio has less than BIO_MAX_VECS sectors

Message ID 20210818073738.1271033-1-guoqing.jiang@linux.dev (mailing list archive)
State New, archived
Headers show
Series [V2] raid1: ensure write behind bio has less than BIO_MAX_VECS sectors | expand

Commit Message

Guoqing Jiang Aug. 18, 2021, 7:37 a.m. UTC
From: Guoqing Jiang <jiangguoqing@kylinos.cn>

We can't split write behind bio with more than BIO_MAX_VECS sectors,
otherwise the below call trace was triggered because we could allocate
oversized write behind bio later.

[ 8.097936] bvec_alloc+0x90/0xc0
[ 8.098934] bio_alloc_bioset+0x1b3/0x260
[ 8.099959] raid1_make_request+0x9ce/0xc50 [raid1]
[ 8.100988] ? __bio_clone_fast+0xa8/0xe0
[ 8.102008] md_handle_request+0x158/0x1d0 [md_mod]
[ 8.103050] md_submit_bio+0xcd/0x110 [md_mod]
[ 8.104084] submit_bio_noacct+0x139/0x530
[ 8.105127] submit_bio+0x78/0x1d0
[ 8.106163] ext4_io_submit+0x48/0x60 [ext4]
[ 8.107242] ext4_writepages+0x652/0x1170 [ext4]
[ 8.108300] ? do_writepages+0x41/0x100
[ 8.109338] ? __ext4_mark_inode_dirty+0x240/0x240 [ext4]
[ 8.110406] do_writepages+0x41/0x100
[ 8.111450] __filemap_fdatawrite_range+0xc5/0x100
[ 8.112513] file_write_and_wait_range+0x61/0xb0
[ 8.113564] ext4_sync_file+0x73/0x370 [ext4]
[ 8.114607] __x64_sys_fsync+0x33/0x60
[ 8.115635] do_syscall_64+0x33/0x40
[ 8.116670] entry_SYSCALL_64_after_hwframe+0x44/0xae

Thanks for the comment from Christoph.

[1]. https://bugs.archlinux.org/task/70992

Cc: Song Liu <song@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Reported-by: Jens Stutte <jens@chianterastutte.eu>
Tested-by: Jens Stutte <jens@chianterastutte.eu>
Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn>
---
Hi Jens,

Could you consider apply this to block tree? Since it depends
on commit 018eca456c4b4dca56aaf1ec27f309c74d0fe246 in block
tree for-next branch, otherwise lkp would complain about compile
issue again.

Thanks,
Guoqing

V2 change:
1. add checking for write-behind case and relevant comments from Christoph.

 drivers/md/raid1.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

Comments

Guoqing Jiang Aug. 18, 2021, 7:57 a.m. UTC | #1
Missed to cc Song and Christoph, now actually do it ...

On 8/18/21 3:37 PM, Guoqing Jiang wrote:
> From: Guoqing Jiang <jiangguoqing@kylinos.cn>
>
> We can't split write behind bio with more than BIO_MAX_VECS sectors,
> otherwise the below call trace was triggered because we could allocate
> oversized write behind bio later.
>
> [ 8.097936] bvec_alloc+0x90/0xc0
> [ 8.098934] bio_alloc_bioset+0x1b3/0x260
> [ 8.099959] raid1_make_request+0x9ce/0xc50 [raid1]
> [ 8.100988] ? __bio_clone_fast+0xa8/0xe0
> [ 8.102008] md_handle_request+0x158/0x1d0 [md_mod]
> [ 8.103050] md_submit_bio+0xcd/0x110 [md_mod]
> [ 8.104084] submit_bio_noacct+0x139/0x530
> [ 8.105127] submit_bio+0x78/0x1d0
> [ 8.106163] ext4_io_submit+0x48/0x60 [ext4]
> [ 8.107242] ext4_writepages+0x652/0x1170 [ext4]
> [ 8.108300] ? do_writepages+0x41/0x100
> [ 8.109338] ? __ext4_mark_inode_dirty+0x240/0x240 [ext4]
> [ 8.110406] do_writepages+0x41/0x100
> [ 8.111450] __filemap_fdatawrite_range+0xc5/0x100
> [ 8.112513] file_write_and_wait_range+0x61/0xb0
> [ 8.113564] ext4_sync_file+0x73/0x370 [ext4]
> [ 8.114607] __x64_sys_fsync+0x33/0x60
> [ 8.115635] do_syscall_64+0x33/0x40
> [ 8.116670] entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> Thanks for the comment from Christoph.
>
> [1]. https://bugs.archlinux.org/task/70992
>
> Cc: Song Liu <song@kernel.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Reported-by: Jens Stutte <jens@chianterastutte.eu>
> Tested-by: Jens Stutte <jens@chianterastutte.eu>
> Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn>
> ---
> Hi Jens,
>
> Could you consider apply this to block tree? Since it depends
> on commit 018eca456c4b4dca56aaf1ec27f309c74d0fe246 in block
> tree for-next branch, otherwise lkp would complain about compile
> issue again.
>
> Thanks,
> Guoqing
>
> V2 change:
> 1. add checking for write-behind case and relevant comments from Christoph.
>
>   drivers/md/raid1.c | 13 +++++++++++++
>   1 file changed, 13 insertions(+)
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 3c44c4bb40fc..e8c8e6bb0439 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1329,6 +1329,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
>   	struct raid1_plug_cb *plug = NULL;
>   	int first_clone;
>   	int max_sectors;
> +	bool write_behind = false;
>   
>   	if (mddev_is_clustered(mddev) &&
>   	     md_cluster_ops->area_resyncing(mddev, WRITE,
> @@ -1381,6 +1382,10 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
>   	max_sectors = r1_bio->sectors;
>   	for (i = 0;  i < disks; i++) {
>   		struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev);
> +
> +		if (test_bit(WriteMostly, &mirror->rdev->flags))
> +			write_behind = true;
> +
>   		if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) {
>   			atomic_inc(&rdev->nr_pending);
>   			blocked_rdev = rdev;
> @@ -1454,6 +1459,14 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
>   		goto retry_write;
>   	}
>   
> +	/*
> +	 * When using a bitmap, we may call alloc_behind_master_bio below.
> +	 * alloc_behind_master_bio allocates a copy of the data payload a page
> +	 * at a time and thus needs a new bio that can fit the whole payload
> +	 * this bio in page sized chunks.
> +	 */
> +	if (write_behind && bitmap)
> +		max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);
>   	if (max_sectors < bio_sectors(bio)) {
>   		struct bio *split = bio_split(bio, max_sectors,
>   					      GFP_NOIO, &conf->bio_split);
kernel test robot Aug. 18, 2021, 11:03 a.m. UTC | #2
Hi Guoqing,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on song-md/md-next]
[also build test ERROR on v5.14-rc6 next-20210818]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Guoqing-Jiang/raid1-ensure-write-behind-bio-has-less-than-BIO_MAX_VECS-sectors/20210818-154106
base:   git://git.kernel.org/pub/scm/linux/kernel/git/song/md.git md-next
config: arc-randconfig-r043-20210816 (attached as .config)
compiler: arc-elf-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/abf22557456363eb6fd1d1d09332f5261d61796c
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Guoqing-Jiang/raid1-ensure-write-behind-bio-has-less-than-BIO_MAX_VECS-sectors/20210818-154106
        git checkout abf22557456363eb6fd1d1d09332f5261d61796c
        # save the attached .config to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=arc SHELL=/bin/bash drivers/md/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   drivers/md/raid1.c: In function 'raid1_write_request':
>> drivers/md/raid1.c:1388:44: error: 'mirror' undeclared (first use in this function); did you mean 'md_error'?
    1388 |                 if (test_bit(WriteMostly, &mirror->rdev->flags))
         |                                            ^~~~~~
         |                                            md_error
   drivers/md/raid1.c:1388:44: note: each undeclared identifier is reported only once for each function it appears in
   In file included from include/linux/kernel.h:16,
                    from include/linux/list.h:9,
                    from include/linux/preempt.h:11,
                    from include/linux/spinlock.h:51,
                    from include/linux/mmzone.h:8,
                    from include/linux/gfp.h:6,
                    from include/linux/slab.h:15,
                    from drivers/md/raid1.c:26:
   drivers/md/raid1.c:1471:70: error: 'PAGE_SECTORS' undeclared (first use in this function)
    1471 |                 max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);
         |                                                                      ^~~~~~~~~~~~
   include/linux/minmax.h:20:46: note: in definition of macro '__typecheck'
      20 |         (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
         |                                              ^
   include/linux/minmax.h:36:31: note: in expansion of macro '__safe_cmp'
      36 |         __builtin_choose_expr(__safe_cmp(x, y), \
         |                               ^~~~~~~~~~
   include/linux/minmax.h:104:33: note: in expansion of macro '__careful_cmp'
     104 | #define min_t(type, x, y)       __careful_cmp((type)(x), (type)(y), <)
         |                                 ^~~~~~~~~~~~~
   drivers/md/raid1.c:1471:31: note: in expansion of macro 'min_t'
    1471 |                 max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);
         |                               ^~~~~
   include/linux/minmax.h:36:9: error: first argument to '__builtin_choose_expr' not a constant
      36 |         __builtin_choose_expr(__safe_cmp(x, y), \
         |         ^~~~~~~~~~~~~~~~~~~~~
   include/linux/minmax.h:104:33: note: in expansion of macro '__careful_cmp'
     104 | #define min_t(type, x, y)       __careful_cmp((type)(x), (type)(y), <)
         |                                 ^~~~~~~~~~~~~
   drivers/md/raid1.c:1471:31: note: in expansion of macro 'min_t'
    1471 |                 max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);
         |                               ^~~~~


vim +1388 drivers/md/raid1.c

  1320	
  1321	static void raid1_write_request(struct mddev *mddev, struct bio *bio,
  1322					int max_write_sectors)
  1323	{
  1324		struct r1conf *conf = mddev->private;
  1325		struct r1bio *r1_bio;
  1326		int i, disks;
  1327		struct bitmap *bitmap = mddev->bitmap;
  1328		unsigned long flags;
  1329		struct md_rdev *blocked_rdev;
  1330		struct blk_plug_cb *cb;
  1331		struct raid1_plug_cb *plug = NULL;
  1332		int first_clone;
  1333		int max_sectors;
  1334		bool write_behind = false;
  1335	
  1336		if (mddev_is_clustered(mddev) &&
  1337		     md_cluster_ops->area_resyncing(mddev, WRITE,
  1338			     bio->bi_iter.bi_sector, bio_end_sector(bio))) {
  1339	
  1340			DEFINE_WAIT(w);
  1341			for (;;) {
  1342				prepare_to_wait(&conf->wait_barrier,
  1343						&w, TASK_IDLE);
  1344				if (!md_cluster_ops->area_resyncing(mddev, WRITE,
  1345								bio->bi_iter.bi_sector,
  1346								bio_end_sector(bio)))
  1347					break;
  1348				schedule();
  1349			}
  1350			finish_wait(&conf->wait_barrier, &w);
  1351		}
  1352	
  1353		/*
  1354		 * Register the new request and wait if the reconstruction
  1355		 * thread has put up a bar for new requests.
  1356		 * Continue immediately if no resync is active currently.
  1357		 */
  1358		wait_barrier(conf, bio->bi_iter.bi_sector);
  1359	
  1360		r1_bio = alloc_r1bio(mddev, bio);
  1361		r1_bio->sectors = max_write_sectors;
  1362	
  1363		if (conf->pending_count >= max_queued_requests) {
  1364			md_wakeup_thread(mddev->thread);
  1365			raid1_log(mddev, "wait queued");
  1366			wait_event(conf->wait_barrier,
  1367				   conf->pending_count < max_queued_requests);
  1368		}
  1369		/* first select target devices under rcu_lock and
  1370		 * inc refcount on their rdev.  Record them by setting
  1371		 * bios[x] to bio
  1372		 * If there are known/acknowledged bad blocks on any device on
  1373		 * which we have seen a write error, we want to avoid writing those
  1374		 * blocks.
  1375		 * This potentially requires several writes to write around
  1376		 * the bad blocks.  Each set of writes gets it's own r1bio
  1377		 * with a set of bios attached.
  1378		 */
  1379	
  1380		disks = conf->raid_disks * 2;
  1381	 retry_write:
  1382		blocked_rdev = NULL;
  1383		rcu_read_lock();
  1384		max_sectors = r1_bio->sectors;
  1385		for (i = 0;  i < disks; i++) {
  1386			struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev);
  1387	
> 1388			if (test_bit(WriteMostly, &mirror->rdev->flags))
  1389				write_behind = true;
  1390	
  1391			if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) {
  1392				atomic_inc(&rdev->nr_pending);
  1393				blocked_rdev = rdev;
  1394				break;
  1395			}
  1396			r1_bio->bios[i] = NULL;
  1397			if (!rdev || test_bit(Faulty, &rdev->flags)) {
  1398				if (i < conf->raid_disks)
  1399					set_bit(R1BIO_Degraded, &r1_bio->state);
  1400				continue;
  1401			}
  1402	
  1403			atomic_inc(&rdev->nr_pending);
  1404			if (test_bit(WriteErrorSeen, &rdev->flags)) {
  1405				sector_t first_bad;
  1406				int bad_sectors;
  1407				int is_bad;
  1408	
  1409				is_bad = is_badblock(rdev, r1_bio->sector, max_sectors,
  1410						     &first_bad, &bad_sectors);
  1411				if (is_bad < 0) {
  1412					/* mustn't write here until the bad block is
  1413					 * acknowledged*/
  1414					set_bit(BlockedBadBlocks, &rdev->flags);
  1415					blocked_rdev = rdev;
  1416					break;
  1417				}
  1418				if (is_bad && first_bad <= r1_bio->sector) {
  1419					/* Cannot write here at all */
  1420					bad_sectors -= (r1_bio->sector - first_bad);
  1421					if (bad_sectors < max_sectors)
  1422						/* mustn't write more than bad_sectors
  1423						 * to other devices yet
  1424						 */
  1425						max_sectors = bad_sectors;
  1426					rdev_dec_pending(rdev, mddev);
  1427					/* We don't set R1BIO_Degraded as that
  1428					 * only applies if the disk is
  1429					 * missing, so it might be re-added,
  1430					 * and we want to know to recover this
  1431					 * chunk.
  1432					 * In this case the device is here,
  1433					 * and the fact that this chunk is not
  1434					 * in-sync is recorded in the bad
  1435					 * block log
  1436					 */
  1437					continue;
  1438				}
  1439				if (is_bad) {
  1440					int good_sectors = first_bad - r1_bio->sector;
  1441					if (good_sectors < max_sectors)
  1442						max_sectors = good_sectors;
  1443				}
  1444			}
  1445			r1_bio->bios[i] = bio;
  1446		}
  1447		rcu_read_unlock();
  1448	
  1449		if (unlikely(blocked_rdev)) {
  1450			/* Wait for this device to become unblocked */
  1451			int j;
  1452	
  1453			for (j = 0; j < i; j++)
  1454				if (r1_bio->bios[j])
  1455					rdev_dec_pending(conf->mirrors[j].rdev, mddev);
  1456			r1_bio->state = 0;
  1457			allow_barrier(conf, bio->bi_iter.bi_sector);
  1458			raid1_log(mddev, "wait rdev %d blocked", blocked_rdev->raid_disk);
  1459			md_wait_for_blocked_rdev(blocked_rdev, mddev);
  1460			wait_barrier(conf, bio->bi_iter.bi_sector);
  1461			goto retry_write;
  1462		}
  1463	
  1464		/*
  1465		 * When using a bitmap, we may call alloc_behind_master_bio below.
  1466		 * alloc_behind_master_bio allocates a copy of the data payload a page
  1467		 * at a time and thus needs a new bio that can fit the whole payload
  1468		 * this bio in page sized chunks.
  1469		 */
  1470		if (write_behind && bitmap)
  1471			max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);
  1472		if (max_sectors < bio_sectors(bio)) {
  1473			struct bio *split = bio_split(bio, max_sectors,
  1474						      GFP_NOIO, &conf->bio_split);
  1475			bio_chain(split, bio);
  1476			submit_bio_noacct(bio);
  1477			bio = split;
  1478			r1_bio->master_bio = bio;
  1479			r1_bio->sectors = max_sectors;
  1480		}
  1481	
  1482		if (blk_queue_io_stat(bio->bi_bdev->bd_disk->queue))
  1483			r1_bio->start_time = bio_start_io_acct(bio);
  1484		atomic_set(&r1_bio->remaining, 1);
  1485		atomic_set(&r1_bio->behind_remaining, 0);
  1486	
  1487		first_clone = 1;
  1488	
  1489		for (i = 0; i < disks; i++) {
  1490			struct bio *mbio = NULL;
  1491			struct md_rdev *rdev = conf->mirrors[i].rdev;
  1492			if (!r1_bio->bios[i])
  1493				continue;
  1494	
  1495			if (first_clone) {
  1496				/* do behind I/O ?
  1497				 * Not if there are too many, or cannot
  1498				 * allocate memory, or a reader on WriteMostly
  1499				 * is waiting for behind writes to flush */
  1500				if (bitmap &&
  1501				    (atomic_read(&bitmap->behind_writes)
  1502				     < mddev->bitmap_info.max_write_behind) &&
  1503				    !waitqueue_active(&bitmap->behind_wait)) {
  1504					alloc_behind_master_bio(r1_bio, bio);
  1505				}
  1506	
  1507				md_bitmap_startwrite(bitmap, r1_bio->sector, r1_bio->sectors,
  1508						     test_bit(R1BIO_BehindIO, &r1_bio->state));
  1509				first_clone = 0;
  1510			}
  1511	
  1512			if (r1_bio->behind_master_bio)
  1513				mbio = bio_clone_fast(r1_bio->behind_master_bio,
  1514						      GFP_NOIO, &mddev->bio_set);
  1515			else
  1516				mbio = bio_clone_fast(bio, GFP_NOIO, &mddev->bio_set);
  1517	
  1518			if (r1_bio->behind_master_bio) {
  1519				if (test_bit(CollisionCheck, &rdev->flags))
  1520					wait_for_serialization(rdev, r1_bio);
  1521				if (test_bit(WriteMostly, &rdev->flags))
  1522					atomic_inc(&r1_bio->behind_remaining);
  1523			} else if (mddev->serialize_policy)
  1524				wait_for_serialization(rdev, r1_bio);
  1525	
  1526			r1_bio->bios[i] = mbio;
  1527	
  1528			mbio->bi_iter.bi_sector	= (r1_bio->sector +
  1529					   conf->mirrors[i].rdev->data_offset);
  1530			bio_set_dev(mbio, conf->mirrors[i].rdev->bdev);
  1531			mbio->bi_end_io	= raid1_end_write_request;
  1532			mbio->bi_opf = bio_op(bio) | (bio->bi_opf & (REQ_SYNC | REQ_FUA));
  1533			if (test_bit(FailFast, &conf->mirrors[i].rdev->flags) &&
  1534			    !test_bit(WriteMostly, &conf->mirrors[i].rdev->flags) &&
  1535			    conf->raid_disks - mddev->degraded > 1)
  1536				mbio->bi_opf |= MD_FAILFAST;
  1537			mbio->bi_private = r1_bio;
  1538	
  1539			atomic_inc(&r1_bio->remaining);
  1540	
  1541			if (mddev->gendisk)
  1542				trace_block_bio_remap(mbio, disk_devt(mddev->gendisk),
  1543						      r1_bio->sector);
  1544			/* flush_pending_writes() needs access to the rdev so...*/
  1545			mbio->bi_bdev = (void *)conf->mirrors[i].rdev;
  1546	
  1547			cb = blk_check_plugged(raid1_unplug, mddev, sizeof(*plug));
  1548			if (cb)
  1549				plug = container_of(cb, struct raid1_plug_cb, cb);
  1550			else
  1551				plug = NULL;
  1552			if (plug) {
  1553				bio_list_add(&plug->pending, mbio);
  1554				plug->pending_cnt++;
  1555			} else {
  1556				spin_lock_irqsave(&conf->device_lock, flags);
  1557				bio_list_add(&conf->pending_bio_list, mbio);
  1558				conf->pending_count++;
  1559				spin_unlock_irqrestore(&conf->device_lock, flags);
  1560				md_wakeup_thread(mddev->thread);
  1561			}
  1562		}
  1563	
  1564		r1_bio_write_done(r1_bio);
  1565	
  1566		/* In case raid1d snuck in to freeze_array */
  1567		wake_up(&conf->wait_barrier);
  1568	}
  1569	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
kernel test robot Aug. 18, 2021, 12:02 p.m. UTC | #3
Hi Guoqing,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on song-md/md-next]
[also build test ERROR on v5.14-rc6 next-20210818]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Guoqing-Jiang/raid1-ensure-write-behind-bio-has-less-than-BIO_MAX_VECS-sectors/20210818-154106
base:   git://git.kernel.org/pub/scm/linux/kernel/git/song/md.git md-next
config: x86_64-randconfig-a011-20210816 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project d2b574a4dea5b718e4386bf2e26af0126e5978ce)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/abf22557456363eb6fd1d1d09332f5261d61796c
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Guoqing-Jiang/raid1-ensure-write-behind-bio-has-less-than-BIO_MAX_VECS-sectors/20210818-154106
        git checkout abf22557456363eb6fd1d1d09332f5261d61796c
        # save the attached .config to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross O=build_dir ARCH=x86_64 SHELL=/bin/bash drivers/md/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> drivers/md/raid1.c:1388:30: error: use of undeclared identifier 'mirror'
                   if (test_bit(WriteMostly, &mirror->rdev->flags))
                                              ^
   drivers/md/raid1.c:1471:56: error: use of undeclared identifier 'PAGE_SECTORS'
                   max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);
                                                                        ^
   drivers/md/raid1.c:1471:56: error: use of undeclared identifier 'PAGE_SECTORS'
   3 errors generated.


vim +/mirror +1388 drivers/md/raid1.c

  1320	
  1321	static void raid1_write_request(struct mddev *mddev, struct bio *bio,
  1322					int max_write_sectors)
  1323	{
  1324		struct r1conf *conf = mddev->private;
  1325		struct r1bio *r1_bio;
  1326		int i, disks;
  1327		struct bitmap *bitmap = mddev->bitmap;
  1328		unsigned long flags;
  1329		struct md_rdev *blocked_rdev;
  1330		struct blk_plug_cb *cb;
  1331		struct raid1_plug_cb *plug = NULL;
  1332		int first_clone;
  1333		int max_sectors;
  1334		bool write_behind = false;
  1335	
  1336		if (mddev_is_clustered(mddev) &&
  1337		     md_cluster_ops->area_resyncing(mddev, WRITE,
  1338			     bio->bi_iter.bi_sector, bio_end_sector(bio))) {
  1339	
  1340			DEFINE_WAIT(w);
  1341			for (;;) {
  1342				prepare_to_wait(&conf->wait_barrier,
  1343						&w, TASK_IDLE);
  1344				if (!md_cluster_ops->area_resyncing(mddev, WRITE,
  1345								bio->bi_iter.bi_sector,
  1346								bio_end_sector(bio)))
  1347					break;
  1348				schedule();
  1349			}
  1350			finish_wait(&conf->wait_barrier, &w);
  1351		}
  1352	
  1353		/*
  1354		 * Register the new request and wait if the reconstruction
  1355		 * thread has put up a bar for new requests.
  1356		 * Continue immediately if no resync is active currently.
  1357		 */
  1358		wait_barrier(conf, bio->bi_iter.bi_sector);
  1359	
  1360		r1_bio = alloc_r1bio(mddev, bio);
  1361		r1_bio->sectors = max_write_sectors;
  1362	
  1363		if (conf->pending_count >= max_queued_requests) {
  1364			md_wakeup_thread(mddev->thread);
  1365			raid1_log(mddev, "wait queued");
  1366			wait_event(conf->wait_barrier,
  1367				   conf->pending_count < max_queued_requests);
  1368		}
  1369		/* first select target devices under rcu_lock and
  1370		 * inc refcount on their rdev.  Record them by setting
  1371		 * bios[x] to bio
  1372		 * If there are known/acknowledged bad blocks on any device on
  1373		 * which we have seen a write error, we want to avoid writing those
  1374		 * blocks.
  1375		 * This potentially requires several writes to write around
  1376		 * the bad blocks.  Each set of writes gets it's own r1bio
  1377		 * with a set of bios attached.
  1378		 */
  1379	
  1380		disks = conf->raid_disks * 2;
  1381	 retry_write:
  1382		blocked_rdev = NULL;
  1383		rcu_read_lock();
  1384		max_sectors = r1_bio->sectors;
  1385		for (i = 0;  i < disks; i++) {
  1386			struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev);
  1387	
> 1388			if (test_bit(WriteMostly, &mirror->rdev->flags))
  1389				write_behind = true;
  1390	
  1391			if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) {
  1392				atomic_inc(&rdev->nr_pending);
  1393				blocked_rdev = rdev;
  1394				break;
  1395			}
  1396			r1_bio->bios[i] = NULL;
  1397			if (!rdev || test_bit(Faulty, &rdev->flags)) {
  1398				if (i < conf->raid_disks)
  1399					set_bit(R1BIO_Degraded, &r1_bio->state);
  1400				continue;
  1401			}
  1402	
  1403			atomic_inc(&rdev->nr_pending);
  1404			if (test_bit(WriteErrorSeen, &rdev->flags)) {
  1405				sector_t first_bad;
  1406				int bad_sectors;
  1407				int is_bad;
  1408	
  1409				is_bad = is_badblock(rdev, r1_bio->sector, max_sectors,
  1410						     &first_bad, &bad_sectors);
  1411				if (is_bad < 0) {
  1412					/* mustn't write here until the bad block is
  1413					 * acknowledged*/
  1414					set_bit(BlockedBadBlocks, &rdev->flags);
  1415					blocked_rdev = rdev;
  1416					break;
  1417				}
  1418				if (is_bad && first_bad <= r1_bio->sector) {
  1419					/* Cannot write here at all */
  1420					bad_sectors -= (r1_bio->sector - first_bad);
  1421					if (bad_sectors < max_sectors)
  1422						/* mustn't write more than bad_sectors
  1423						 * to other devices yet
  1424						 */
  1425						max_sectors = bad_sectors;
  1426					rdev_dec_pending(rdev, mddev);
  1427					/* We don't set R1BIO_Degraded as that
  1428					 * only applies if the disk is
  1429					 * missing, so it might be re-added,
  1430					 * and we want to know to recover this
  1431					 * chunk.
  1432					 * In this case the device is here,
  1433					 * and the fact that this chunk is not
  1434					 * in-sync is recorded in the bad
  1435					 * block log
  1436					 */
  1437					continue;
  1438				}
  1439				if (is_bad) {
  1440					int good_sectors = first_bad - r1_bio->sector;
  1441					if (good_sectors < max_sectors)
  1442						max_sectors = good_sectors;
  1443				}
  1444			}
  1445			r1_bio->bios[i] = bio;
  1446		}
  1447		rcu_read_unlock();
  1448	
  1449		if (unlikely(blocked_rdev)) {
  1450			/* Wait for this device to become unblocked */
  1451			int j;
  1452	
  1453			for (j = 0; j < i; j++)
  1454				if (r1_bio->bios[j])
  1455					rdev_dec_pending(conf->mirrors[j].rdev, mddev);
  1456			r1_bio->state = 0;
  1457			allow_barrier(conf, bio->bi_iter.bi_sector);
  1458			raid1_log(mddev, "wait rdev %d blocked", blocked_rdev->raid_disk);
  1459			md_wait_for_blocked_rdev(blocked_rdev, mddev);
  1460			wait_barrier(conf, bio->bi_iter.bi_sector);
  1461			goto retry_write;
  1462		}
  1463	
  1464		/*
  1465		 * When using a bitmap, we may call alloc_behind_master_bio below.
  1466		 * alloc_behind_master_bio allocates a copy of the data payload a page
  1467		 * at a time and thus needs a new bio that can fit the whole payload
  1468		 * this bio in page sized chunks.
  1469		 */
  1470		if (write_behind && bitmap)
  1471			max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);
  1472		if (max_sectors < bio_sectors(bio)) {
  1473			struct bio *split = bio_split(bio, max_sectors,
  1474						      GFP_NOIO, &conf->bio_split);
  1475			bio_chain(split, bio);
  1476			submit_bio_noacct(bio);
  1477			bio = split;
  1478			r1_bio->master_bio = bio;
  1479			r1_bio->sectors = max_sectors;
  1480		}
  1481	
  1482		if (blk_queue_io_stat(bio->bi_bdev->bd_disk->queue))
  1483			r1_bio->start_time = bio_start_io_acct(bio);
  1484		atomic_set(&r1_bio->remaining, 1);
  1485		atomic_set(&r1_bio->behind_remaining, 0);
  1486	
  1487		first_clone = 1;
  1488	
  1489		for (i = 0; i < disks; i++) {
  1490			struct bio *mbio = NULL;
  1491			struct md_rdev *rdev = conf->mirrors[i].rdev;
  1492			if (!r1_bio->bios[i])
  1493				continue;
  1494	
  1495			if (first_clone) {
  1496				/* do behind I/O ?
  1497				 * Not if there are too many, or cannot
  1498				 * allocate memory, or a reader on WriteMostly
  1499				 * is waiting for behind writes to flush */
  1500				if (bitmap &&
  1501				    (atomic_read(&bitmap->behind_writes)
  1502				     < mddev->bitmap_info.max_write_behind) &&
  1503				    !waitqueue_active(&bitmap->behind_wait)) {
  1504					alloc_behind_master_bio(r1_bio, bio);
  1505				}
  1506	
  1507				md_bitmap_startwrite(bitmap, r1_bio->sector, r1_bio->sectors,
  1508						     test_bit(R1BIO_BehindIO, &r1_bio->state));
  1509				first_clone = 0;
  1510			}
  1511	
  1512			if (r1_bio->behind_master_bio)
  1513				mbio = bio_clone_fast(r1_bio->behind_master_bio,
  1514						      GFP_NOIO, &mddev->bio_set);
  1515			else
  1516				mbio = bio_clone_fast(bio, GFP_NOIO, &mddev->bio_set);
  1517	
  1518			if (r1_bio->behind_master_bio) {
  1519				if (test_bit(CollisionCheck, &rdev->flags))
  1520					wait_for_serialization(rdev, r1_bio);
  1521				if (test_bit(WriteMostly, &rdev->flags))
  1522					atomic_inc(&r1_bio->behind_remaining);
  1523			} else if (mddev->serialize_policy)
  1524				wait_for_serialization(rdev, r1_bio);
  1525	
  1526			r1_bio->bios[i] = mbio;
  1527	
  1528			mbio->bi_iter.bi_sector	= (r1_bio->sector +
  1529					   conf->mirrors[i].rdev->data_offset);
  1530			bio_set_dev(mbio, conf->mirrors[i].rdev->bdev);
  1531			mbio->bi_end_io	= raid1_end_write_request;
  1532			mbio->bi_opf = bio_op(bio) | (bio->bi_opf & (REQ_SYNC | REQ_FUA));
  1533			if (test_bit(FailFast, &conf->mirrors[i].rdev->flags) &&
  1534			    !test_bit(WriteMostly, &conf->mirrors[i].rdev->flags) &&
  1535			    conf->raid_disks - mddev->degraded > 1)
  1536				mbio->bi_opf |= MD_FAILFAST;
  1537			mbio->bi_private = r1_bio;
  1538	
  1539			atomic_inc(&r1_bio->remaining);
  1540	
  1541			if (mddev->gendisk)
  1542				trace_block_bio_remap(mbio, disk_devt(mddev->gendisk),
  1543						      r1_bio->sector);
  1544			/* flush_pending_writes() needs access to the rdev so...*/
  1545			mbio->bi_bdev = (void *)conf->mirrors[i].rdev;
  1546	
  1547			cb = blk_check_plugged(raid1_unplug, mddev, sizeof(*plug));
  1548			if (cb)
  1549				plug = container_of(cb, struct raid1_plug_cb, cb);
  1550			else
  1551				plug = NULL;
  1552			if (plug) {
  1553				bio_list_add(&plug->pending, mbio);
  1554				plug->pending_cnt++;
  1555			} else {
  1556				spin_lock_irqsave(&conf->device_lock, flags);
  1557				bio_list_add(&conf->pending_bio_list, mbio);
  1558				conf->pending_count++;
  1559				spin_unlock_irqrestore(&conf->device_lock, flags);
  1560				md_wakeup_thread(mddev->thread);
  1561			}
  1562		}
  1563	
  1564		r1_bio_write_done(r1_bio);
  1565	
  1566		/* In case raid1d snuck in to freeze_array */
  1567		wake_up(&conf->wait_barrier);
  1568	}
  1569	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Christoph Hellwig Aug. 19, 2021, 8:55 a.m. UTC | #4
On Wed, Aug 18, 2021 at 03:37:38PM +0800, Guoqing Jiang wrote:
>  	for (i = 0;  i < disks; i++) {
>  		struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev);
> +
> +		if (test_bit(WriteMostly, &mirror->rdev->flags))
> +			write_behind = true;

How does this condition relate to the ones used for actually calling
alloc_behind_master_bio?  It looks related, but as someone not familiar
with the code I can't really verify if this is correct, so a comment
explaining it might be useful.

> +	/*
> +	 * When using a bitmap, we may call alloc_behind_master_bio below.
> +	 * alloc_behind_master_bio allocates a copy of the data payload a page
> +	 * at a time and thus needs a new bio that can fit the whole payload
> +	 * this bio in page sized chunks.
> +	 */
> +	if (write_behind && bitmap)
> +		max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);

Overly long line here.
Guoqing Jiang Aug. 20, 2021, 8:19 a.m. UTC | #5
On 8/19/21 4:55 PM, Christoph Hellwig wrote:
> On Wed, Aug 18, 2021 at 03:37:38PM +0800, Guoqing Jiang wrote:
>>   	for (i = 0;  i < disks; i++) {
>>   		struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev);
>> +
>> +		if (test_bit(WriteMostly, &mirror->rdev->flags))
>> +			write_behind = true;
> How does this condition relate to the ones used for actually calling
> alloc_behind_master_bio?  It looks related, but as someone not familiar
> with the code I can't really verify if this is correct, so a comment
> explaining it might be useful.

How about this?

+               /*
+                * The write-behind io is only attempted on drives marked as
+                * write-mostly, which means we will allocate write behind
+                * bio later.
+                */
                 if (test_bit(WriteMostly, &mirror->rdev->flags))
                         write_behind = true;

>> +	/*
>> +	 * When using a bitmap, we may call alloc_behind_master_bio below.
>> +	 * alloc_behind_master_bio allocates a copy of the data payload a page
>> +	 * at a time and thus needs a new bio that can fit the whole payload
>> +	 * this bio in page sized chunks.
>> +	 */
>> +	if (write_behind && bitmap)
>> +		max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);
> Overly long line here.

I can change it given you still prefer the  limitation is 80 characters.

Thanks,
Guoqing
Christoph Hellwig Aug. 23, 2021, 6:50 a.m. UTC | #6
On Fri, Aug 20, 2021 at 04:19:22PM +0800, Guoqing Jiang wrote:
> How about this?
> 
> +???????????????????????????? /*
> +?????????????????????????????? * The write-behind io is only attempted on drives marked as
> +?????????????????????????????? * write-mostly, which means we will allocate write behind
> +?????????????????????????????? * bio later.
> +?????????????????????????????? */
> ?????????????????????????????? if (test_bit(WriteMostly, &mirror->rdev->flags))
> ?????????????????????????????????????????????? write_behind = true;

Except for the weird formatting of the whitespaces this looks good.
diff mbox series

Patch

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 3c44c4bb40fc..e8c8e6bb0439 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1329,6 +1329,7 @@  static void raid1_write_request(struct mddev *mddev, struct bio *bio,
 	struct raid1_plug_cb *plug = NULL;
 	int first_clone;
 	int max_sectors;
+	bool write_behind = false;
 
 	if (mddev_is_clustered(mddev) &&
 	     md_cluster_ops->area_resyncing(mddev, WRITE,
@@ -1381,6 +1382,10 @@  static void raid1_write_request(struct mddev *mddev, struct bio *bio,
 	max_sectors = r1_bio->sectors;
 	for (i = 0;  i < disks; i++) {
 		struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev);
+
+		if (test_bit(WriteMostly, &mirror->rdev->flags))
+			write_behind = true;
+
 		if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) {
 			atomic_inc(&rdev->nr_pending);
 			blocked_rdev = rdev;
@@ -1454,6 +1459,14 @@  static void raid1_write_request(struct mddev *mddev, struct bio *bio,
 		goto retry_write;
 	}
 
+	/*
+	 * When using a bitmap, we may call alloc_behind_master_bio below.
+	 * alloc_behind_master_bio allocates a copy of the data payload a page
+	 * at a time and thus needs a new bio that can fit the whole payload
+	 * this bio in page sized chunks.
+	 */
+	if (write_behind && bitmap)
+		max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);
 	if (max_sectors < bio_sectors(bio)) {
 		struct bio *split = bio_split(bio, max_sectors,
 					      GFP_NOIO, &conf->bio_split);