Message ID | 20200805023826.184-1-fangying1@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | qcow2: flush qcow2 l2 meta for new allocated clusters | expand |
Patchew URL: https://patchew.org/QEMU/20200805023826.184-1-fangying1@huawei.com/ Hi, This series failed the docker-quick@centos7 build test. Please find the testing commands and their output below. If you have Docker installed, you can probably reproduce it locally. The full log is available at http://patchew.org/logs/20200805023826.184-1-fangying1@huawei.com/testing.docker-quick@centos7/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-devel@redhat.com
On 8/5/2020 10:43 AM, no-reply@patchew.org wrote: > Patchew URL: https://patchew.org/QEMU/20200805023826.184-1-fangying1@huawei.com/ > > > > Hi, > > This series failed the docker-quick@centos7 build test. Please find the testing commands and > their output below. If you have Docker installed, you can probably reproduce it > locally. > I see some error message which says ** No space left on device ** However I do not know what is wrong with this build test. Could you give me some help here? Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 error: copy-fd: write returned No space left on device fatal: failed to copy file to '/var/tmp/patchew-tester-tmp-wtnwtuq5/src/.git/objects/pack/pack-518a8ad92e3ce11d2627a7221e2d360b337cb27d.pack': No space left on device fatal: The remote end hung up unexpectedly Traceback (most recent call last): File "patchew-tester/src/patchew-cli", line 521, in test_one git_clone_repo(clone, r["repo"], r["head"], logf, True) File "patchew-tester/src/patchew-cli", line 53, in git_clone_repo subprocess.check_call(clone_cmd, stderr=logf, stdout=logf) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/subprocess.py", line 291, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['git', 'clone', '-q', '/home/patchew/.cache/patchew-git-cache/httpsgithubcompatchewprojectqemu-3c8cf5a9c21ff8782164d1def7f44bd888713384', '/var/tmp/patchew-tester-tmp-wtnwtuq5/src']' returned non-zero exit status 128. > > > > > > The full log is available at > http://patchew.org/logs/20200805023826.184-1-fangying1@huawei.com/testing.docker-quick@centos7/?type=message. > --- > Email generated automatically by Patchew [https://patchew.org/]. > Please send your feedback to patchew-devel@redhat.com >
On Thu, Aug 06, 2020 at 05:01:51PM +0800, Ying Fang wrote: > > > On 8/5/2020 10:43 AM, no-reply@patchew.org wrote: > > Patchew URL: https://patchew.org/QEMU/20200805023826.184-1-fangying1@huawei.com/ > > > > > > > > Hi, > > > > This series failed the docker-quick@centos7 build test. Please find the testing commands and > > their output below. If you have Docker installed, you can probably reproduce it > > locally. > > I see some error message which says ** No space left on device ** > However I do not know what is wrong with this build test. > Could you give me some help here? It isn't your fault - this is just QEMU's patchew CI that is broken yet again due to lack of disk space. Just ignore the error report here. Regards, Daniel
Am 05.08.2020 um 04:38 hat Ying Fang geschrieben: > From: fangying <fangying1@huawei.com> > > When qemu or qemu-nbd process uses a qcow2 image and configured with > 'cache = none', it will write to the qcow2 image with a cache to cache > L2 tables, however the process will not use L2 tables without explicitly > calling the flush command or closing the mirror flash into the disk. > Which may cause the disk data inconsistent with the written data for > a long time. If an abnormal process exit occurs here, the issued written > data will be lost. > > Therefore, in order to keep data consistency we need to flush the changes > to the L2 entry to the disk in time for the newly allocated cluster. > > Signed-off-by: Ying Fang <fangying1@huawei.com> If you want to have data safely written to the disk after each write request, you need to use cache=writethrough/directsync (in other words, aliases that are equivalent to setting -device ...,write-cache=off). Note that this will have a major impact on write performance. cache=none means bypassing the kernel page cache (O_DIRECT), but not flushing after each write request. Kevin
On 8/6/2020 5:13 PM, Kevin Wolf wrote: > Am 05.08.2020 um 04:38 hat Ying Fang geschrieben: >> From: fangying <fangying1@huawei.com> >> >> When qemu or qemu-nbd process uses a qcow2 image and configured with >> 'cache = none', it will write to the qcow2 image with a cache to cache >> L2 tables, however the process will not use L2 tables without explicitly >> calling the flush command or closing the mirror flash into the disk. >> Which may cause the disk data inconsistent with the written data for >> a long time. If an abnormal process exit occurs here, the issued written >> data will be lost. >> >> Therefore, in order to keep data consistency we need to flush the changes >> to the L2 entry to the disk in time for the newly allocated cluster. >> >> Signed-off-by: Ying Fang <fangying1@huawei.com> > > If you want to have data safely written to the disk after each write > request, you need to use cache=writethrough/directsync (in other words, > aliases that are equivalent to setting -device ...,write-cache=off). > Note that this will have a major impact on write performance. > > cache=none means bypassing the kernel page cache (O_DIRECT), but not > flushing after each write request. Well, IIUC, cache=none does not guarantee data safety and we should not expect that. Then this patch can be ignored. Thanks. > > Kevin > > . >
Am 07.08.2020 um 09:42 hat Ying Fang geschrieben: > > > On 8/6/2020 5:13 PM, Kevin Wolf wrote: > > Am 05.08.2020 um 04:38 hat Ying Fang geschrieben: > > > From: fangying <fangying1@huawei.com> > > > > > > When qemu or qemu-nbd process uses a qcow2 image and configured with > > > 'cache = none', it will write to the qcow2 image with a cache to cache > > > L2 tables, however the process will not use L2 tables without explicitly > > > calling the flush command or closing the mirror flash into the disk. > > > Which may cause the disk data inconsistent with the written data for > > > a long time. If an abnormal process exit occurs here, the issued written > > > data will be lost. > > > > > > Therefore, in order to keep data consistency we need to flush the changes > > > to the L2 entry to the disk in time for the newly allocated cluster. > > > > > > Signed-off-by: Ying Fang <fangying1@huawei.com> > > > > If you want to have data safely written to the disk after each write > > request, you need to use cache=writethrough/directsync (in other words, > > aliases that are equivalent to setting -device ...,write-cache=off). > > Note that this will have a major impact on write performance. > > > > cache=none means bypassing the kernel page cache (O_DIRECT), but not > > flushing after each write request. > > Well, IIUC, cache=none does not guarantee data safety and we should not > expect that. Then this patch can be ignored. Indeed, cache=none is a writeback cache mode with all of the consequences. In practice, this is normally good enough because the guest OS will send flush requests when needed (e.g. because a guest application called fsync()), but if the guest doesn't do this, it may suffer data loss. This behaviour is comparable to a volatile disk cache on real hard disks and is a good default, but sometimes you need a writethrough cache mode at the cost of a performance penalty. Kevin
On 8/7/2020 4:13 PM, Kevin Wolf wrote: > Am 07.08.2020 um 09:42 hat Ying Fang geschrieben: >> >> >> On 8/6/2020 5:13 PM, Kevin Wolf wrote: >>> Am 05.08.2020 um 04:38 hat Ying Fang geschrieben: >>>> From: fangying <fangying1@huawei.com> >>>> >>>> When qemu or qemu-nbd process uses a qcow2 image and configured with >>>> 'cache = none', it will write to the qcow2 image with a cache to cache >>>> L2 tables, however the process will not use L2 tables without explicitly >>>> calling the flush command or closing the mirror flash into the disk. >>>> Which may cause the disk data inconsistent with the written data for >>>> a long time. If an abnormal process exit occurs here, the issued written >>>> data will be lost. >>>> >>>> Therefore, in order to keep data consistency we need to flush the changes >>>> to the L2 entry to the disk in time for the newly allocated cluster. >>>> >>>> Signed-off-by: Ying Fang <fangying1@huawei.com> >>> >>> If you want to have data safely written to the disk after each write >>> request, you need to use cache=writethrough/directsync (in other words, >>> aliases that are equivalent to setting -device ...,write-cache=off). >>> Note that this will have a major impact on write performance. >>> >>> cache=none means bypassing the kernel page cache (O_DIRECT), but not >>> flushing after each write request. >> >> Well, IIUC, cache=none does not guarantee data safety and we should not >> expect that. Then this patch can be ignored. > > Indeed, cache=none is a writeback cache mode with all of the > consequences. In practice, this is normally good enough because the > guest OS will send flush requests when needed (e.g. because a guest > application called fsync()), but if the guest doesn't do this, it may > suffer data loss. This behaviour is comparable to a volatile disk cache > on real hard disks and is a good default, but sometimes you need a > writethrough cache mode at the cost of a performance penalty. The late reply, thanks for your detailed explanation on the 'cache' option, having more understanding for it now. > > Kevin > > . >
diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c index 7444b9c..ab6e812 100644 --- a/block/qcow2-cache.c +++ b/block/qcow2-cache.c @@ -266,6 +266,22 @@ int qcow2_cache_flush(BlockDriverState *bs, Qcow2Cache *c) return result; } +#define L2_ENTRIES_PER_SECTOR 64 +int qcow2_cache_l2_write_entry(BlockDriverState *bs, Qcow2Cache *c, + void *table, int index, int num) +{ + int ret; + int i = qcow2_cache_get_table_idx(c, table); + int start_sector = index / L2_ENTRIES_PER_SECTOR; + int end_sector = (index + num - 1) / L2_ENTRIES_PER_SECTOR; + int nr_sectors = end_sector - start_sector + 1; + ret = bdrv_pwrite(bs->file, + c->entries[i].offset + start_sector * BDRV_SECTOR_SIZE, + table + start_sector * BDRV_SECTOR_SIZE, + nr_sectors * BDRV_SECTOR_SIZE); + return ret; +} + int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c, Qcow2Cache *dependency) { diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c index a677ba9..ae49a83 100644 --- a/block/qcow2-cluster.c +++ b/block/qcow2-cluster.c @@ -998,6 +998,9 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m) } + ret = qcow2_cache_l2_write_entry(bs, s->l2_table_cache, l2_slice, + l2_index, m->nb_clusters); + qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice); /* diff --git a/block/qcow2.h b/block/qcow2.h index 7ce2c23..168ab59 100644 --- a/block/qcow2.h +++ b/block/qcow2.h @@ -748,6 +748,8 @@ int qcow2_cache_destroy(Qcow2Cache *c); void qcow2_cache_entry_mark_dirty(Qcow2Cache *c, void *table); int qcow2_cache_flush(BlockDriverState *bs, Qcow2Cache *c); int qcow2_cache_write(BlockDriverState *bs, Qcow2Cache *c); +int qcow2_cache_l2_write_entry(BlockDriverState *bs, Qcow2Cache *c, + void *table, int index, int num); int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c, Qcow2Cache *dependency); void qcow2_cache_depends_on_flush(Qcow2Cache *c);