Message ID | 20210818073738.1271033-1-guoqing.jiang@linux.dev (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [V2] raid1: ensure write behind bio has less than BIO_MAX_VECS sectors | expand |
Missed to cc Song and Christoph, now actually do it ... On 8/18/21 3:37 PM, Guoqing Jiang wrote: > From: Guoqing Jiang <jiangguoqing@kylinos.cn> > > We can't split write behind bio with more than BIO_MAX_VECS sectors, > otherwise the below call trace was triggered because we could allocate > oversized write behind bio later. > > [ 8.097936] bvec_alloc+0x90/0xc0 > [ 8.098934] bio_alloc_bioset+0x1b3/0x260 > [ 8.099959] raid1_make_request+0x9ce/0xc50 [raid1] > [ 8.100988] ? __bio_clone_fast+0xa8/0xe0 > [ 8.102008] md_handle_request+0x158/0x1d0 [md_mod] > [ 8.103050] md_submit_bio+0xcd/0x110 [md_mod] > [ 8.104084] submit_bio_noacct+0x139/0x530 > [ 8.105127] submit_bio+0x78/0x1d0 > [ 8.106163] ext4_io_submit+0x48/0x60 [ext4] > [ 8.107242] ext4_writepages+0x652/0x1170 [ext4] > [ 8.108300] ? do_writepages+0x41/0x100 > [ 8.109338] ? __ext4_mark_inode_dirty+0x240/0x240 [ext4] > [ 8.110406] do_writepages+0x41/0x100 > [ 8.111450] __filemap_fdatawrite_range+0xc5/0x100 > [ 8.112513] file_write_and_wait_range+0x61/0xb0 > [ 8.113564] ext4_sync_file+0x73/0x370 [ext4] > [ 8.114607] __x64_sys_fsync+0x33/0x60 > [ 8.115635] do_syscall_64+0x33/0x40 > [ 8.116670] entry_SYSCALL_64_after_hwframe+0x44/0xae > > Thanks for the comment from Christoph. > > [1]. https://bugs.archlinux.org/task/70992 > > Cc: Song Liu <song@kernel.org> > Cc: Christoph Hellwig <hch@lst.de> > Reported-by: Jens Stutte <jens@chianterastutte.eu> > Tested-by: Jens Stutte <jens@chianterastutte.eu> > Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn> > --- > Hi Jens, > > Could you consider apply this to block tree? Since it depends > on commit 018eca456c4b4dca56aaf1ec27f309c74d0fe246 in block > tree for-next branch, otherwise lkp would complain about compile > issue again. > > Thanks, > Guoqing > > V2 change: > 1. add checking for write-behind case and relevant comments from Christoph. > > drivers/md/raid1.c | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > index 3c44c4bb40fc..e8c8e6bb0439 100644 > --- a/drivers/md/raid1.c > +++ b/drivers/md/raid1.c > @@ -1329,6 +1329,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio, > struct raid1_plug_cb *plug = NULL; > int first_clone; > int max_sectors; > + bool write_behind = false; > > if (mddev_is_clustered(mddev) && > md_cluster_ops->area_resyncing(mddev, WRITE, > @@ -1381,6 +1382,10 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio, > max_sectors = r1_bio->sectors; > for (i = 0; i < disks; i++) { > struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev); > + > + if (test_bit(WriteMostly, &mirror->rdev->flags)) > + write_behind = true; > + > if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) { > atomic_inc(&rdev->nr_pending); > blocked_rdev = rdev; > @@ -1454,6 +1459,14 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio, > goto retry_write; > } > > + /* > + * When using a bitmap, we may call alloc_behind_master_bio below. > + * alloc_behind_master_bio allocates a copy of the data payload a page > + * at a time and thus needs a new bio that can fit the whole payload > + * this bio in page sized chunks. > + */ > + if (write_behind && bitmap) > + max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS); > if (max_sectors < bio_sectors(bio)) { > struct bio *split = bio_split(bio, max_sectors, > GFP_NOIO, &conf->bio_split);
Hi Guoqing, Thank you for the patch! Yet something to improve: [auto build test ERROR on song-md/md-next] [also build test ERROR on v5.14-rc6 next-20210818] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Guoqing-Jiang/raid1-ensure-write-behind-bio-has-less-than-BIO_MAX_VECS-sectors/20210818-154106 base: git://git.kernel.org/pub/scm/linux/kernel/git/song/md.git md-next config: arc-randconfig-r043-20210816 (attached as .config) compiler: arc-elf-gcc (GCC) 11.2.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/abf22557456363eb6fd1d1d09332f5261d61796c git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Guoqing-Jiang/raid1-ensure-write-behind-bio-has-less-than-BIO_MAX_VECS-sectors/20210818-154106 git checkout abf22557456363eb6fd1d1d09332f5261d61796c # save the attached .config to linux build tree mkdir build_dir COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=arc SHELL=/bin/bash drivers/md/ If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): drivers/md/raid1.c: In function 'raid1_write_request': >> drivers/md/raid1.c:1388:44: error: 'mirror' undeclared (first use in this function); did you mean 'md_error'? 1388 | if (test_bit(WriteMostly, &mirror->rdev->flags)) | ^~~~~~ | md_error drivers/md/raid1.c:1388:44: note: each undeclared identifier is reported only once for each function it appears in In file included from include/linux/kernel.h:16, from include/linux/list.h:9, from include/linux/preempt.h:11, from include/linux/spinlock.h:51, from include/linux/mmzone.h:8, from include/linux/gfp.h:6, from include/linux/slab.h:15, from drivers/md/raid1.c:26: drivers/md/raid1.c:1471:70: error: 'PAGE_SECTORS' undeclared (first use in this function) 1471 | max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS); | ^~~~~~~~~~~~ include/linux/minmax.h:20:46: note: in definition of macro '__typecheck' 20 | (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1))) | ^ include/linux/minmax.h:36:31: note: in expansion of macro '__safe_cmp' 36 | __builtin_choose_expr(__safe_cmp(x, y), \ | ^~~~~~~~~~ include/linux/minmax.h:104:33: note: in expansion of macro '__careful_cmp' 104 | #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <) | ^~~~~~~~~~~~~ drivers/md/raid1.c:1471:31: note: in expansion of macro 'min_t' 1471 | max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS); | ^~~~~ include/linux/minmax.h:36:9: error: first argument to '__builtin_choose_expr' not a constant 36 | __builtin_choose_expr(__safe_cmp(x, y), \ | ^~~~~~~~~~~~~~~~~~~~~ include/linux/minmax.h:104:33: note: in expansion of macro '__careful_cmp' 104 | #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <) | ^~~~~~~~~~~~~ drivers/md/raid1.c:1471:31: note: in expansion of macro 'min_t' 1471 | max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS); | ^~~~~ vim +1388 drivers/md/raid1.c 1320 1321 static void raid1_write_request(struct mddev *mddev, struct bio *bio, 1322 int max_write_sectors) 1323 { 1324 struct r1conf *conf = mddev->private; 1325 struct r1bio *r1_bio; 1326 int i, disks; 1327 struct bitmap *bitmap = mddev->bitmap; 1328 unsigned long flags; 1329 struct md_rdev *blocked_rdev; 1330 struct blk_plug_cb *cb; 1331 struct raid1_plug_cb *plug = NULL; 1332 int first_clone; 1333 int max_sectors; 1334 bool write_behind = false; 1335 1336 if (mddev_is_clustered(mddev) && 1337 md_cluster_ops->area_resyncing(mddev, WRITE, 1338 bio->bi_iter.bi_sector, bio_end_sector(bio))) { 1339 1340 DEFINE_WAIT(w); 1341 for (;;) { 1342 prepare_to_wait(&conf->wait_barrier, 1343 &w, TASK_IDLE); 1344 if (!md_cluster_ops->area_resyncing(mddev, WRITE, 1345 bio->bi_iter.bi_sector, 1346 bio_end_sector(bio))) 1347 break; 1348 schedule(); 1349 } 1350 finish_wait(&conf->wait_barrier, &w); 1351 } 1352 1353 /* 1354 * Register the new request and wait if the reconstruction 1355 * thread has put up a bar for new requests. 1356 * Continue immediately if no resync is active currently. 1357 */ 1358 wait_barrier(conf, bio->bi_iter.bi_sector); 1359 1360 r1_bio = alloc_r1bio(mddev, bio); 1361 r1_bio->sectors = max_write_sectors; 1362 1363 if (conf->pending_count >= max_queued_requests) { 1364 md_wakeup_thread(mddev->thread); 1365 raid1_log(mddev, "wait queued"); 1366 wait_event(conf->wait_barrier, 1367 conf->pending_count < max_queued_requests); 1368 } 1369 /* first select target devices under rcu_lock and 1370 * inc refcount on their rdev. Record them by setting 1371 * bios[x] to bio 1372 * If there are known/acknowledged bad blocks on any device on 1373 * which we have seen a write error, we want to avoid writing those 1374 * blocks. 1375 * This potentially requires several writes to write around 1376 * the bad blocks. Each set of writes gets it's own r1bio 1377 * with a set of bios attached. 1378 */ 1379 1380 disks = conf->raid_disks * 2; 1381 retry_write: 1382 blocked_rdev = NULL; 1383 rcu_read_lock(); 1384 max_sectors = r1_bio->sectors; 1385 for (i = 0; i < disks; i++) { 1386 struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev); 1387 > 1388 if (test_bit(WriteMostly, &mirror->rdev->flags)) 1389 write_behind = true; 1390 1391 if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) { 1392 atomic_inc(&rdev->nr_pending); 1393 blocked_rdev = rdev; 1394 break; 1395 } 1396 r1_bio->bios[i] = NULL; 1397 if (!rdev || test_bit(Faulty, &rdev->flags)) { 1398 if (i < conf->raid_disks) 1399 set_bit(R1BIO_Degraded, &r1_bio->state); 1400 continue; 1401 } 1402 1403 atomic_inc(&rdev->nr_pending); 1404 if (test_bit(WriteErrorSeen, &rdev->flags)) { 1405 sector_t first_bad; 1406 int bad_sectors; 1407 int is_bad; 1408 1409 is_bad = is_badblock(rdev, r1_bio->sector, max_sectors, 1410 &first_bad, &bad_sectors); 1411 if (is_bad < 0) { 1412 /* mustn't write here until the bad block is 1413 * acknowledged*/ 1414 set_bit(BlockedBadBlocks, &rdev->flags); 1415 blocked_rdev = rdev; 1416 break; 1417 } 1418 if (is_bad && first_bad <= r1_bio->sector) { 1419 /* Cannot write here at all */ 1420 bad_sectors -= (r1_bio->sector - first_bad); 1421 if (bad_sectors < max_sectors) 1422 /* mustn't write more than bad_sectors 1423 * to other devices yet 1424 */ 1425 max_sectors = bad_sectors; 1426 rdev_dec_pending(rdev, mddev); 1427 /* We don't set R1BIO_Degraded as that 1428 * only applies if the disk is 1429 * missing, so it might be re-added, 1430 * and we want to know to recover this 1431 * chunk. 1432 * In this case the device is here, 1433 * and the fact that this chunk is not 1434 * in-sync is recorded in the bad 1435 * block log 1436 */ 1437 continue; 1438 } 1439 if (is_bad) { 1440 int good_sectors = first_bad - r1_bio->sector; 1441 if (good_sectors < max_sectors) 1442 max_sectors = good_sectors; 1443 } 1444 } 1445 r1_bio->bios[i] = bio; 1446 } 1447 rcu_read_unlock(); 1448 1449 if (unlikely(blocked_rdev)) { 1450 /* Wait for this device to become unblocked */ 1451 int j; 1452 1453 for (j = 0; j < i; j++) 1454 if (r1_bio->bios[j]) 1455 rdev_dec_pending(conf->mirrors[j].rdev, mddev); 1456 r1_bio->state = 0; 1457 allow_barrier(conf, bio->bi_iter.bi_sector); 1458 raid1_log(mddev, "wait rdev %d blocked", blocked_rdev->raid_disk); 1459 md_wait_for_blocked_rdev(blocked_rdev, mddev); 1460 wait_barrier(conf, bio->bi_iter.bi_sector); 1461 goto retry_write; 1462 } 1463 1464 /* 1465 * When using a bitmap, we may call alloc_behind_master_bio below. 1466 * alloc_behind_master_bio allocates a copy of the data payload a page 1467 * at a time and thus needs a new bio that can fit the whole payload 1468 * this bio in page sized chunks. 1469 */ 1470 if (write_behind && bitmap) 1471 max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS); 1472 if (max_sectors < bio_sectors(bio)) { 1473 struct bio *split = bio_split(bio, max_sectors, 1474 GFP_NOIO, &conf->bio_split); 1475 bio_chain(split, bio); 1476 submit_bio_noacct(bio); 1477 bio = split; 1478 r1_bio->master_bio = bio; 1479 r1_bio->sectors = max_sectors; 1480 } 1481 1482 if (blk_queue_io_stat(bio->bi_bdev->bd_disk->queue)) 1483 r1_bio->start_time = bio_start_io_acct(bio); 1484 atomic_set(&r1_bio->remaining, 1); 1485 atomic_set(&r1_bio->behind_remaining, 0); 1486 1487 first_clone = 1; 1488 1489 for (i = 0; i < disks; i++) { 1490 struct bio *mbio = NULL; 1491 struct md_rdev *rdev = conf->mirrors[i].rdev; 1492 if (!r1_bio->bios[i]) 1493 continue; 1494 1495 if (first_clone) { 1496 /* do behind I/O ? 1497 * Not if there are too many, or cannot 1498 * allocate memory, or a reader on WriteMostly 1499 * is waiting for behind writes to flush */ 1500 if (bitmap && 1501 (atomic_read(&bitmap->behind_writes) 1502 < mddev->bitmap_info.max_write_behind) && 1503 !waitqueue_active(&bitmap->behind_wait)) { 1504 alloc_behind_master_bio(r1_bio, bio); 1505 } 1506 1507 md_bitmap_startwrite(bitmap, r1_bio->sector, r1_bio->sectors, 1508 test_bit(R1BIO_BehindIO, &r1_bio->state)); 1509 first_clone = 0; 1510 } 1511 1512 if (r1_bio->behind_master_bio) 1513 mbio = bio_clone_fast(r1_bio->behind_master_bio, 1514 GFP_NOIO, &mddev->bio_set); 1515 else 1516 mbio = bio_clone_fast(bio, GFP_NOIO, &mddev->bio_set); 1517 1518 if (r1_bio->behind_master_bio) { 1519 if (test_bit(CollisionCheck, &rdev->flags)) 1520 wait_for_serialization(rdev, r1_bio); 1521 if (test_bit(WriteMostly, &rdev->flags)) 1522 atomic_inc(&r1_bio->behind_remaining); 1523 } else if (mddev->serialize_policy) 1524 wait_for_serialization(rdev, r1_bio); 1525 1526 r1_bio->bios[i] = mbio; 1527 1528 mbio->bi_iter.bi_sector = (r1_bio->sector + 1529 conf->mirrors[i].rdev->data_offset); 1530 bio_set_dev(mbio, conf->mirrors[i].rdev->bdev); 1531 mbio->bi_end_io = raid1_end_write_request; 1532 mbio->bi_opf = bio_op(bio) | (bio->bi_opf & (REQ_SYNC | REQ_FUA)); 1533 if (test_bit(FailFast, &conf->mirrors[i].rdev->flags) && 1534 !test_bit(WriteMostly, &conf->mirrors[i].rdev->flags) && 1535 conf->raid_disks - mddev->degraded > 1) 1536 mbio->bi_opf |= MD_FAILFAST; 1537 mbio->bi_private = r1_bio; 1538 1539 atomic_inc(&r1_bio->remaining); 1540 1541 if (mddev->gendisk) 1542 trace_block_bio_remap(mbio, disk_devt(mddev->gendisk), 1543 r1_bio->sector); 1544 /* flush_pending_writes() needs access to the rdev so...*/ 1545 mbio->bi_bdev = (void *)conf->mirrors[i].rdev; 1546 1547 cb = blk_check_plugged(raid1_unplug, mddev, sizeof(*plug)); 1548 if (cb) 1549 plug = container_of(cb, struct raid1_plug_cb, cb); 1550 else 1551 plug = NULL; 1552 if (plug) { 1553 bio_list_add(&plug->pending, mbio); 1554 plug->pending_cnt++; 1555 } else { 1556 spin_lock_irqsave(&conf->device_lock, flags); 1557 bio_list_add(&conf->pending_bio_list, mbio); 1558 conf->pending_count++; 1559 spin_unlock_irqrestore(&conf->device_lock, flags); 1560 md_wakeup_thread(mddev->thread); 1561 } 1562 } 1563 1564 r1_bio_write_done(r1_bio); 1565 1566 /* In case raid1d snuck in to freeze_array */ 1567 wake_up(&conf->wait_barrier); 1568 } 1569 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Hi Guoqing, Thank you for the patch! Yet something to improve: [auto build test ERROR on song-md/md-next] [also build test ERROR on v5.14-rc6 next-20210818] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Guoqing-Jiang/raid1-ensure-write-behind-bio-has-less-than-BIO_MAX_VECS-sectors/20210818-154106 base: git://git.kernel.org/pub/scm/linux/kernel/git/song/md.git md-next config: x86_64-randconfig-a011-20210816 (attached as .config) compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project d2b574a4dea5b718e4386bf2e26af0126e5978ce) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/abf22557456363eb6fd1d1d09332f5261d61796c git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Guoqing-Jiang/raid1-ensure-write-behind-bio-has-less-than-BIO_MAX_VECS-sectors/20210818-154106 git checkout abf22557456363eb6fd1d1d09332f5261d61796c # save the attached .config to linux build tree mkdir build_dir COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross O=build_dir ARCH=x86_64 SHELL=/bin/bash drivers/md/ If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): >> drivers/md/raid1.c:1388:30: error: use of undeclared identifier 'mirror' if (test_bit(WriteMostly, &mirror->rdev->flags)) ^ drivers/md/raid1.c:1471:56: error: use of undeclared identifier 'PAGE_SECTORS' max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS); ^ drivers/md/raid1.c:1471:56: error: use of undeclared identifier 'PAGE_SECTORS' 3 errors generated. vim +/mirror +1388 drivers/md/raid1.c 1320 1321 static void raid1_write_request(struct mddev *mddev, struct bio *bio, 1322 int max_write_sectors) 1323 { 1324 struct r1conf *conf = mddev->private; 1325 struct r1bio *r1_bio; 1326 int i, disks; 1327 struct bitmap *bitmap = mddev->bitmap; 1328 unsigned long flags; 1329 struct md_rdev *blocked_rdev; 1330 struct blk_plug_cb *cb; 1331 struct raid1_plug_cb *plug = NULL; 1332 int first_clone; 1333 int max_sectors; 1334 bool write_behind = false; 1335 1336 if (mddev_is_clustered(mddev) && 1337 md_cluster_ops->area_resyncing(mddev, WRITE, 1338 bio->bi_iter.bi_sector, bio_end_sector(bio))) { 1339 1340 DEFINE_WAIT(w); 1341 for (;;) { 1342 prepare_to_wait(&conf->wait_barrier, 1343 &w, TASK_IDLE); 1344 if (!md_cluster_ops->area_resyncing(mddev, WRITE, 1345 bio->bi_iter.bi_sector, 1346 bio_end_sector(bio))) 1347 break; 1348 schedule(); 1349 } 1350 finish_wait(&conf->wait_barrier, &w); 1351 } 1352 1353 /* 1354 * Register the new request and wait if the reconstruction 1355 * thread has put up a bar for new requests. 1356 * Continue immediately if no resync is active currently. 1357 */ 1358 wait_barrier(conf, bio->bi_iter.bi_sector); 1359 1360 r1_bio = alloc_r1bio(mddev, bio); 1361 r1_bio->sectors = max_write_sectors; 1362 1363 if (conf->pending_count >= max_queued_requests) { 1364 md_wakeup_thread(mddev->thread); 1365 raid1_log(mddev, "wait queued"); 1366 wait_event(conf->wait_barrier, 1367 conf->pending_count < max_queued_requests); 1368 } 1369 /* first select target devices under rcu_lock and 1370 * inc refcount on their rdev. Record them by setting 1371 * bios[x] to bio 1372 * If there are known/acknowledged bad blocks on any device on 1373 * which we have seen a write error, we want to avoid writing those 1374 * blocks. 1375 * This potentially requires several writes to write around 1376 * the bad blocks. Each set of writes gets it's own r1bio 1377 * with a set of bios attached. 1378 */ 1379 1380 disks = conf->raid_disks * 2; 1381 retry_write: 1382 blocked_rdev = NULL; 1383 rcu_read_lock(); 1384 max_sectors = r1_bio->sectors; 1385 for (i = 0; i < disks; i++) { 1386 struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev); 1387 > 1388 if (test_bit(WriteMostly, &mirror->rdev->flags)) 1389 write_behind = true; 1390 1391 if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) { 1392 atomic_inc(&rdev->nr_pending); 1393 blocked_rdev = rdev; 1394 break; 1395 } 1396 r1_bio->bios[i] = NULL; 1397 if (!rdev || test_bit(Faulty, &rdev->flags)) { 1398 if (i < conf->raid_disks) 1399 set_bit(R1BIO_Degraded, &r1_bio->state); 1400 continue; 1401 } 1402 1403 atomic_inc(&rdev->nr_pending); 1404 if (test_bit(WriteErrorSeen, &rdev->flags)) { 1405 sector_t first_bad; 1406 int bad_sectors; 1407 int is_bad; 1408 1409 is_bad = is_badblock(rdev, r1_bio->sector, max_sectors, 1410 &first_bad, &bad_sectors); 1411 if (is_bad < 0) { 1412 /* mustn't write here until the bad block is 1413 * acknowledged*/ 1414 set_bit(BlockedBadBlocks, &rdev->flags); 1415 blocked_rdev = rdev; 1416 break; 1417 } 1418 if (is_bad && first_bad <= r1_bio->sector) { 1419 /* Cannot write here at all */ 1420 bad_sectors -= (r1_bio->sector - first_bad); 1421 if (bad_sectors < max_sectors) 1422 /* mustn't write more than bad_sectors 1423 * to other devices yet 1424 */ 1425 max_sectors = bad_sectors; 1426 rdev_dec_pending(rdev, mddev); 1427 /* We don't set R1BIO_Degraded as that 1428 * only applies if the disk is 1429 * missing, so it might be re-added, 1430 * and we want to know to recover this 1431 * chunk. 1432 * In this case the device is here, 1433 * and the fact that this chunk is not 1434 * in-sync is recorded in the bad 1435 * block log 1436 */ 1437 continue; 1438 } 1439 if (is_bad) { 1440 int good_sectors = first_bad - r1_bio->sector; 1441 if (good_sectors < max_sectors) 1442 max_sectors = good_sectors; 1443 } 1444 } 1445 r1_bio->bios[i] = bio; 1446 } 1447 rcu_read_unlock(); 1448 1449 if (unlikely(blocked_rdev)) { 1450 /* Wait for this device to become unblocked */ 1451 int j; 1452 1453 for (j = 0; j < i; j++) 1454 if (r1_bio->bios[j]) 1455 rdev_dec_pending(conf->mirrors[j].rdev, mddev); 1456 r1_bio->state = 0; 1457 allow_barrier(conf, bio->bi_iter.bi_sector); 1458 raid1_log(mddev, "wait rdev %d blocked", blocked_rdev->raid_disk); 1459 md_wait_for_blocked_rdev(blocked_rdev, mddev); 1460 wait_barrier(conf, bio->bi_iter.bi_sector); 1461 goto retry_write; 1462 } 1463 1464 /* 1465 * When using a bitmap, we may call alloc_behind_master_bio below. 1466 * alloc_behind_master_bio allocates a copy of the data payload a page 1467 * at a time and thus needs a new bio that can fit the whole payload 1468 * this bio in page sized chunks. 1469 */ 1470 if (write_behind && bitmap) 1471 max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS); 1472 if (max_sectors < bio_sectors(bio)) { 1473 struct bio *split = bio_split(bio, max_sectors, 1474 GFP_NOIO, &conf->bio_split); 1475 bio_chain(split, bio); 1476 submit_bio_noacct(bio); 1477 bio = split; 1478 r1_bio->master_bio = bio; 1479 r1_bio->sectors = max_sectors; 1480 } 1481 1482 if (blk_queue_io_stat(bio->bi_bdev->bd_disk->queue)) 1483 r1_bio->start_time = bio_start_io_acct(bio); 1484 atomic_set(&r1_bio->remaining, 1); 1485 atomic_set(&r1_bio->behind_remaining, 0); 1486 1487 first_clone = 1; 1488 1489 for (i = 0; i < disks; i++) { 1490 struct bio *mbio = NULL; 1491 struct md_rdev *rdev = conf->mirrors[i].rdev; 1492 if (!r1_bio->bios[i]) 1493 continue; 1494 1495 if (first_clone) { 1496 /* do behind I/O ? 1497 * Not if there are too many, or cannot 1498 * allocate memory, or a reader on WriteMostly 1499 * is waiting for behind writes to flush */ 1500 if (bitmap && 1501 (atomic_read(&bitmap->behind_writes) 1502 < mddev->bitmap_info.max_write_behind) && 1503 !waitqueue_active(&bitmap->behind_wait)) { 1504 alloc_behind_master_bio(r1_bio, bio); 1505 } 1506 1507 md_bitmap_startwrite(bitmap, r1_bio->sector, r1_bio->sectors, 1508 test_bit(R1BIO_BehindIO, &r1_bio->state)); 1509 first_clone = 0; 1510 } 1511 1512 if (r1_bio->behind_master_bio) 1513 mbio = bio_clone_fast(r1_bio->behind_master_bio, 1514 GFP_NOIO, &mddev->bio_set); 1515 else 1516 mbio = bio_clone_fast(bio, GFP_NOIO, &mddev->bio_set); 1517 1518 if (r1_bio->behind_master_bio) { 1519 if (test_bit(CollisionCheck, &rdev->flags)) 1520 wait_for_serialization(rdev, r1_bio); 1521 if (test_bit(WriteMostly, &rdev->flags)) 1522 atomic_inc(&r1_bio->behind_remaining); 1523 } else if (mddev->serialize_policy) 1524 wait_for_serialization(rdev, r1_bio); 1525 1526 r1_bio->bios[i] = mbio; 1527 1528 mbio->bi_iter.bi_sector = (r1_bio->sector + 1529 conf->mirrors[i].rdev->data_offset); 1530 bio_set_dev(mbio, conf->mirrors[i].rdev->bdev); 1531 mbio->bi_end_io = raid1_end_write_request; 1532 mbio->bi_opf = bio_op(bio) | (bio->bi_opf & (REQ_SYNC | REQ_FUA)); 1533 if (test_bit(FailFast, &conf->mirrors[i].rdev->flags) && 1534 !test_bit(WriteMostly, &conf->mirrors[i].rdev->flags) && 1535 conf->raid_disks - mddev->degraded > 1) 1536 mbio->bi_opf |= MD_FAILFAST; 1537 mbio->bi_private = r1_bio; 1538 1539 atomic_inc(&r1_bio->remaining); 1540 1541 if (mddev->gendisk) 1542 trace_block_bio_remap(mbio, disk_devt(mddev->gendisk), 1543 r1_bio->sector); 1544 /* flush_pending_writes() needs access to the rdev so...*/ 1545 mbio->bi_bdev = (void *)conf->mirrors[i].rdev; 1546 1547 cb = blk_check_plugged(raid1_unplug, mddev, sizeof(*plug)); 1548 if (cb) 1549 plug = container_of(cb, struct raid1_plug_cb, cb); 1550 else 1551 plug = NULL; 1552 if (plug) { 1553 bio_list_add(&plug->pending, mbio); 1554 plug->pending_cnt++; 1555 } else { 1556 spin_lock_irqsave(&conf->device_lock, flags); 1557 bio_list_add(&conf->pending_bio_list, mbio); 1558 conf->pending_count++; 1559 spin_unlock_irqrestore(&conf->device_lock, flags); 1560 md_wakeup_thread(mddev->thread); 1561 } 1562 } 1563 1564 r1_bio_write_done(r1_bio); 1565 1566 /* In case raid1d snuck in to freeze_array */ 1567 wake_up(&conf->wait_barrier); 1568 } 1569 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
On Wed, Aug 18, 2021 at 03:37:38PM +0800, Guoqing Jiang wrote: > for (i = 0; i < disks; i++) { > struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev); > + > + if (test_bit(WriteMostly, &mirror->rdev->flags)) > + write_behind = true; How does this condition relate to the ones used for actually calling alloc_behind_master_bio? It looks related, but as someone not familiar with the code I can't really verify if this is correct, so a comment explaining it might be useful. > + /* > + * When using a bitmap, we may call alloc_behind_master_bio below. > + * alloc_behind_master_bio allocates a copy of the data payload a page > + * at a time and thus needs a new bio that can fit the whole payload > + * this bio in page sized chunks. > + */ > + if (write_behind && bitmap) > + max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS); Overly long line here.
On 8/19/21 4:55 PM, Christoph Hellwig wrote: > On Wed, Aug 18, 2021 at 03:37:38PM +0800, Guoqing Jiang wrote: >> for (i = 0; i < disks; i++) { >> struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev); >> + >> + if (test_bit(WriteMostly, &mirror->rdev->flags)) >> + write_behind = true; > How does this condition relate to the ones used for actually calling > alloc_behind_master_bio? It looks related, but as someone not familiar > with the code I can't really verify if this is correct, so a comment > explaining it might be useful. How about this? + /* + * The write-behind io is only attempted on drives marked as + * write-mostly, which means we will allocate write behind + * bio later. + */ if (test_bit(WriteMostly, &mirror->rdev->flags)) write_behind = true; >> + /* >> + * When using a bitmap, we may call alloc_behind_master_bio below. >> + * alloc_behind_master_bio allocates a copy of the data payload a page >> + * at a time and thus needs a new bio that can fit the whole payload >> + * this bio in page sized chunks. >> + */ >> + if (write_behind && bitmap) >> + max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS); > Overly long line here. I can change it given you still prefer the limitation is 80 characters. Thanks, Guoqing
On Fri, Aug 20, 2021 at 04:19:22PM +0800, Guoqing Jiang wrote: > How about this? > > +???????????????????????????? /* > +?????????????????????????????? * The write-behind io is only attempted on drives marked as > +?????????????????????????????? * write-mostly, which means we will allocate write behind > +?????????????????????????????? * bio later. > +?????????????????????????????? */ > ?????????????????????????????? if (test_bit(WriteMostly, &mirror->rdev->flags)) > ?????????????????????????????????????????????? write_behind = true; Except for the weird formatting of the whitespaces this looks good.
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 3c44c4bb40fc..e8c8e6bb0439 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -1329,6 +1329,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio, struct raid1_plug_cb *plug = NULL; int first_clone; int max_sectors; + bool write_behind = false; if (mddev_is_clustered(mddev) && md_cluster_ops->area_resyncing(mddev, WRITE, @@ -1381,6 +1382,10 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio, max_sectors = r1_bio->sectors; for (i = 0; i < disks; i++) { struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev); + + if (test_bit(WriteMostly, &mirror->rdev->flags)) + write_behind = true; + if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) { atomic_inc(&rdev->nr_pending); blocked_rdev = rdev; @@ -1454,6 +1459,14 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio, goto retry_write; } + /* + * When using a bitmap, we may call alloc_behind_master_bio below. + * alloc_behind_master_bio allocates a copy of the data payload a page + * at a time and thus needs a new bio that can fit the whole payload + * this bio in page sized chunks. + */ + if (write_behind && bitmap) + max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS); if (max_sectors < bio_sectors(bio)) { struct bio *split = bio_split(bio, max_sectors, GFP_NOIO, &conf->bio_split);