Message ID | 1409127505-5246-1-git-send-email-lwang@suse.com (mailing list archive) |
---|---|
State | Superseded, archived |
Delegated to: | Mike Snitzer |
Headers | show |
On Wed, Aug 27, 2014 at 04:18:25PM +0800, Liuhua Wang wrote: > On some devices, Which devices? What is the necessary condition to trigger the problem? > when create mirror device with lvcreate, it fails. > device-mapper: reload ioctl on failed: Invalid argument > Failed to activate new LV. What exactly causes this error? - Do you get a kernel error message too? > We can use read size 4096 since md always use 4096. How is that connected to problem? Why does it fix it? > +++ b/drivers/md/dm-raid.c > @@ -860,7 +860,7 @@ static int super_load(struct md_rdev *rdev, struct md_rdev *refdev) > rdev->sb_start = 0; > rdev->sb_size = sizeof(*sb); What should sb_size really be? > - ret = read_disk_sb(rdev, rdev->sb_size); > + ret = read_disk_sb(rdev, 4096); Why aren't they both MD_SB_BYTES? (Strictly, does sb_size need recalculating after the read?) Alasdair -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Wed, Sep 17, 2014 at 12:21:04AM +0100, Alasdair G Kergon wrote: > > rdev->sb_size = sizeof(*sb); > What should sb_size really be? > > - ret = read_disk_sb(rdev, rdev->sb_size); Or are the sizes in the existing code correct (the dm sb is always 512 bytes?), with the problem coming from something in sync_page_io()? (4k sector device? large page size? incorrectly uninitialised field?) Alasdair -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Wed, Sep 17, 2014 at 12:35:29AM +0100, Alasdair G Kergon wrote:
> > > - ret = read_disk_sb(rdev, rdev->sb_size);
And if that needed changing to 4096, why does this similar instance elsewhere
one not also need changing?
static void attempt_restore_of_faulty_devices(struct raid_set *rs)
{
...
sync_page_io(r, 0, r->sb_size, r->sb_page, READ, 1)) {
Alasdair
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Hi, Alasdair >>> On 9/17/2014 at 07:21 AM, in message <20140916232104.GC2407@agk-dp.fab.redhat.com>, Alasdair G Kergon <agk@redhat.com> wrote: > On Wed, Aug 27, 2014 at 04:18:25PM +0800, Liuhua Wang wrote: > > On some devices, > > Which devices? What is the necessary condition to trigger the problem? > When using dasd device formated with bocksize of 4096bytes. the problem occurs, while formated with blocksie of 512bytes, no problem. > > when create mirror device with lvcreate, it fails. > > device-mapper: reload ioctl on failed: Invalid argument > > Failed to activate new LV. > > What exactly causes this error? > - Do you get a kernel error message too? There are error messages: --------------------- [22798.077877] end_request: I/O error, dev dasdd, sector 2240 [22798.077902] device-mapper: raid: Failed to read superblock of device at position 1 [23087.875420] type=1006 audit(1404313201.346:28): pid=1596 uid=0 old auid=4294967295 new auid=0 old ses=4294967295 new ses=27 res=1 [23127.707296] dasd-eckd.d2bf98: 0.0.141d: ERP 0000000036ab1e40 has run out of retries and failed ---------------------- > > We can use read size 4096 since md always use 4096. > > How is that connected to problem? > Why does it fix it? > > > +++ b/drivers/md/dm-raid.c > > @@ -860,7 +860,7 @@ static int super_load(struct md_rdev *rdev, struct > md_rdev *refdev) > > rdev->sb_start = 0; > > rdev->sb_size = sizeof(*sb); > > What should sb_size really be? 512bytes > > > - ret = read_disk_sb(rdev, rdev->sb_size); > > + ret = read_disk_sb(rdev, 4096); > > Why aren't they both MD_SB_BYTES? > (Strictly, does sb_size need recalculating after the read?) > > Alasdair > Best regards, Liuhua -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Tue, Sep 16, 2014 at 11:11:35PM -0600, Liuhua Wang wrote: > There are error messages: > --------------------- > [22798.077877] end_request: I/O error, dev dasdd, sector 2240 > [22798.077902] device-mapper: raid: Failed to read superblock of device at position 1 > [23087.875420] type=1006 audit(1404313201.346:28): pid=1596 uid=0 old auid=4294967295 new auid=0 old ses=4294967295 new ses=27 res=1 > [23127.707296] dasd-eckd.d2bf98: 0.0.141d: ERP 0000000036ab1e40 has run out of retries and failed > ---------------------- OK, so it sounds like Neil's patch will be enough to fix this. Can you test that and confirm it solves the problem? Alasdair -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Thu, 18 Sep 2014 00:56:33 -0600 "Liuhua Wang" <lwang@suse.com> wrote: > Hi, Alasdair > Hi, Neil > > >>> On 9/17/2014 at 07:41 PM, in message > <20140917114151.GF2407@agk-dp.fab.redhat.com>, Alasdair G Kergon > <agk@redhat.com> wrote: > > On Tue, Sep 16, 2014 at 11:11:35PM -0600, Liuhua Wang wrote: > > > There are error messages: > > > --------------------- > > > [22798.077877] end_request: I/O error, dev dasdd, sector 2240 > > > [22798.077902] device-mapper: raid: Failed to read superblock of device at > > position 1 > > > [23087.875420] type=1006 audit(1404313201.346:28): pid=1596 uid=0 old > > auid=4294967295 new auid=0 old ses=4294967295 new ses=27 res=1 > > > [23127.707296] dasd-eckd.d2bf98: 0.0.141d: ERP 0000000036ab1e40 has run out > > of retries and failed > > > ---------------------- > > > > OK, so it sounds like Neil's patch will be enough to fix this. > > Can you test that and confirm it solves the problem? > > > > I tested Neil's patch, lvcreate succeed, the lv created can display, > be written successfully, but only there still are error messages: > -------------------------------- > 2014-09-18T06:39:52.653239+02:00 s390vsl158 kernel: device-mapper: raid: Superblocks created for new array > 2014-09-18T06:39:52.663361+02:00 s390vsl158 kernel: md/raid1:mdX: not clean -- starting background reconstruction > 2014-09-18T06:39:52.663366+02:00 s390vsl158 kernel: md/raid1:mdX: active with 2 out of 2 mirrors > 2014-09-18T06:39:52.663366+02:00 s390vsl158 kernel: Choosing daemon_sleep default (5 sec) > 2014-09-18T06:39:52.663367+02:00 s390vsl158 kernel: created bitmap (1 pages) for device mdX > 2014-09-18T06:39:52.663368+02:00 s390vsl158 kernel: end_request: I/O error, dev dasdb, sector 19202144 > 2014-09-18T06:39:52.663369+02:00 s390vsl158 kernel: md: super_written gets error=-5, uptodate=0 > 2014-09-18T06:39:52.663369+02:00 s390vsl158 kernel: md/raid1:mdX: Disk failure on dm-3, disabling device. > ------------------------- > > So I add also roundup to md_super_write() as following, > then the above error messages disappears. > -------------------------- > diff --git a/drivers/md/md.c b/drivers/md/md.c > index 1294238..18cd2e7 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -754,6 +754,8 @@ void md_super_write(struct mddev *mddev, struct md_rdev *rdev, > struct bio *bio = bio_alloc_mddev(GFP_NOIO, 1, mddev); > > bio->bi_bdev = rdev->meta_bdev ? rdev->meta_bdev : rdev->bdev; > + size = roundup(size, bdev_logical_block_size(bio->bi_bdev)); > + > bio->bi_iter.bi_sector = sector; > bio_add_page(bio, page, size, 0); > bio->bi_private = rdev; I'm not really comfortable with this change. It is always safe to read more than requested. Writing more than requested is quite different. I think I would rather that rdev->sb_size were rounded up by the code that sets it. That would make it clear to anyone reading the code that a full block was always written. Jon: how does lvm allocate space for the dm-raid metadata? Does it always allocate some minimum? Is the allocation aware of underlying block size? Is it reasonable to round up rdev->sb_size in load_super() ?? Thanks, NeilBrown > @@ -784,6 +786,10 @@ int sync_page_io(struct md_rdev *rdev, sector_t sector, int size, > > bio->bi_bdev = (metadata_op && rdev->meta_bdev) ? > rdev->meta_bdev : rdev->bdev; > + size = roundup(size, bdev_logical_block_size(bio->bi_bdev)); > + if (size > PAGE_SIZE) > + return -EINVAL; > + > if (metadata_op) > bio->bi_iter.bi_sector = sector + rdev->sb_start; > else if (rdev->mddev->reshape_position != MaxSector && -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c index 4880b69..621afbb 100644 --- a/drivers/md/dm-raid.c +++ b/drivers/md/dm-raid.c @@ -860,7 +860,7 @@ static int super_load(struct md_rdev *rdev, struct md_rdev *refdev) rdev->sb_start = 0; rdev->sb_size = sizeof(*sb); - ret = read_disk_sb(rdev, rdev->sb_size); + ret = read_disk_sb(rdev, 4096); if (ret) return ret;
On some devices, when create mirror device with lvcreate, it fails. device-mapper: reload ioctl on failed: Invalid argument Failed to activate new LV. We can use read size 4096 since md always use 4096. Signed-off-by: Liuhua Wang <lwang@suse.com> --- drivers/md/dm-raid.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)