Message ID | 20181030112043.6034-1-david@fromorbit.com (mailing list archive) |
---|---|
Headers | show |
Series | xfs_repair: scale to 150,000 iops | expand |
On 30/10/2018 12:20, Dave Chinner wrote: > Hi folks, > > This patchset enables me to successfully repair a rather large > metadump image (~500GB of metadata) that was provided to us because > it crashed xfs_repair. Darrick and Eric have already posted patches > to fix the crash bugs, and this series is built on top of them. I was finally able to repair my big fs using for-next + these patches. But it wasn't as easy as just running repair. With default bhash OOM killed repair in ~1/3 of phase6 (128GB of ram + 50GB of ssd swap). bhash=256000 worked. Sometimes segfault happens but I don't have any stack trace unfortunately and trying to reproduce on my other test machine gave me no luck. One time I got: xfs_repair: workqueue.c:142: workqueue_add: Assertion `wq->item_count == 0' failed.
On Wed, Nov 07, 2018 at 06:44:54AM +0100, Arkadiusz Miśkiewicz wrote: > On 30/10/2018 12:20, Dave Chinner wrote: > > Hi folks, > > > > This patchset enables me to successfully repair a rather large > > metadump image (~500GB of metadata) that was provided to us because > > it crashed xfs_repair. Darrick and Eric have already posted patches > > to fix the crash bugs, and this series is built on top of them. > > I was finally able to repair my big fs using for-next + these patches. > > But it wasn't as easy as just running repair. > > With default bhash OOM killed repair in ~1/3 of phase6 (128GB of ram + > 50GB of ssd swap). bhash=256000 worked. Yup, we need to work on the default bhash sizing. it comes out at about 750,000 for 128GB ram on your fs. It needs to be much smaller. > Sometimes segfault happens but I don't have any stack trace > unfortunately and trying to reproduce on my other test machine > gave me no luck. > > One time I got: > xfs_repair: workqueue.c:142: workqueue_add: Assertion `wq->item_count == > 0' failed. Yup, I think i've fixed that - a throttling wakeup related race condition - but I'm still trying to reproduce it to confirm I've fixed it... Cheers, Dave.