Message ID | 59E4EB5C0200007800186CF8@prv-mh.provo.novell.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Jan Beulich writes ("[PATCH] libxc: don't fail domain creation when unpacking initrd fails"): > At least Linux kernels have been able to work with gzip-ed initrd for > quite some time; initrd compressed with other methods aren't even being > attempted to unpack. Furthermore the unzip-ing routine used here isn't > capable of dealing with various forms of concatenated files, each of > which was gzip-ed separately (it is this particular case which has been > the source of observed VM creation failures). I'm not sure I really like this approach of attempting to ungzip it and then falling back. (And the size-checking logic is not particularly easy to follow.) Is there no way to tell that a kernel supports gzipped initrds by looking at the kernel ? A heuristic would probably do: it's OK if we sometimes insist on decompression ourselves, for a subset of old kernels where it's not needed. Ian.
>>> On 16.10.17 at 17:45, <ian.jackson@eu.citrix.com> wrote: > Jan Beulich writes ("[PATCH] libxc: don't fail domain creation when unpacking > initrd fails"): >> At least Linux kernels have been able to work with gzip-ed initrd for >> quite some time; initrd compressed with other methods aren't even being >> attempted to unpack. Furthermore the unzip-ing routine used here isn't >> capable of dealing with various forms of concatenated files, each of >> which was gzip-ed separately (it is this particular case which has been >> the source of observed VM creation failures). > > I'm not sure I really like this approach of attempting to ungzip it > and then falling back. (And the size-checking logic is not > particularly easy to follow.) > > Is there no way to tell that a kernel supports gzipped initrds by > looking at the kernel ? Well, Linux kernels have config options controlling their ability. So even a modern kernel _could_ be configured to require unzipping. I didn't check whether they announce this anywhere outside the (possibly) embedded .config, but even if they did this would be only Linux then. A solution here shouldn't really be OS-specific imo. > A heuristic would probably do: it's OK if we > sometimes insist on decompression ourselves, for a subset of old > kernels where it's not needed. Well, I specifically wanted to avoid any guesswork. But if I simply had reported this as a problem that needs dealing with, things likely would have gone like for the Python version issue (which I still haven't got around to), asking me to look into addressing it. So I thought I'd present a possible solution right away. To be honest, if you want this to be done some meaningfully different way which I'm not convinced of, I'm not sure I'm the one to carry this out, yet I'd still request the issue to be addressed. Jan
Jan Beulich writes ("Re: [PATCH] libxc: don't fail domain creation when unpacking initrd fails"): > On 16.10.17 at 17:45, <ian.jackson@eu.citrix.com> wrote: > > Is there no way to tell that a kernel supports gzipped initrds by > > looking at the kernel ? > > Well, Linux kernels have config options controlling their ability. So > even a modern kernel _could_ be configured to require unzipping. > I didn't check whether they announce this anywhere outside the > (possibly) embedded .config, but even if they did this would be > only Linux then. A solution here shouldn't really be OS-specific imo. I guess I was hoping for an ELF note or some multiboot protocol element or something. If it doesn't exist then your proposed general approach is probably best. I'm afraid I still find the patch less clear than it could be. The new semantics of xc_dom_ramdisk_check_size are awkward. And looking at it briefly, I think it might be possible to try the unzip even if the size is too large. I think a sensible implementation is might have to have a flag variable to control "try doing it raw". And it might be bdest to replace xc_dom_ramdisk_check_size with either a function which does not bomb out if the limit is exceeded. What you are really trying to do here is to pursue two strategies in parallel. And ideally they would not be entangled. Maybe there would have to be a comment. Each of the strategies must rely only on functions which don't bomb out, to achieve that. Ian.
On 16/10/17 17:19, Jan Beulich wrote: >>>> On 16.10.17 at 17:45, <ian.jackson@eu.citrix.com> wrote: >> Jan Beulich writes ("[PATCH] libxc: don't fail domain creation when unpacking >> initrd fails"): >>> At least Linux kernels have been able to work with gzip-ed initrd for >>> quite some time; initrd compressed with other methods aren't even being >>> attempted to unpack. Furthermore the unzip-ing routine used here isn't >>> capable of dealing with various forms of concatenated files, each of >>> which was gzip-ed separately (it is this particular case which has been >>> the source of observed VM creation failures). >> I'm not sure I really like this approach of attempting to ungzip it >> and then falling back. (And the size-checking logic is not >> particularly easy to follow.) >> >> Is there no way to tell that a kernel supports gzipped initrds by >> looking at the kernel ? > Well, Linux kernels have config options controlling their ability. So > even a modern kernel _could_ be configured to require unzipping. > I didn't check whether they announce this anywhere outside the > (possibly) embedded .config, but even if they did this would be > only Linux then. A solution here shouldn't really be OS-specific imo. > >> A heuristic would probably do: it's OK if we >> sometimes insist on decompression ourselves, for a subset of old >> kernels where it's not needed. > Well, I specifically wanted to avoid any guesswork. But if I > simply had reported this as a problem that needs dealing with, > things likely would have gone like for the Python version issue > (which I still haven't got around to), asking me to look into > addressing it. So I thought I'd present a possible solution right > away. To be honest, if you want this to be done some > meaningfully different way which I'm not convinced of, I'm not > sure I'm the one to carry this out, yet I'd still request the > issue to be addressed. I've been bitten by this issue several times before, and a fix would be nice. IMO, the toolstack should not be making assumptions about the initrd, and shouldn't be touching it. It is the users responsibility to provide an initrd which its kernel can read. Furthermore, leaving the decompression to the kernel reduces the dom0 attack surface. ~Andrew
Andrew Cooper writes ("Re: [Xen-devel] [PATCH] libxc: don't fail domain creation when unpacking initrd fails"): > IMO, the toolstack should not be making assumptions about the initrd, > and shouldn't be touching it. It is the users responsibility to provide > an initrd which its kernel can read. > > Furthermore, leaving the decompression to the kernel reduces the dom0 > attack surface. If we expect that only very old or very odd kernels can't do the decompression themselves, then perhaps we could have an option to enable initrd decompression and have it off by default. Your point about the attack surface is well-made. Ian.
>>> Ian Jackson <ian.jackson@eu.citrix.com> 10/16/17 6:52 PM >>> >Jan Beulich writes ("Re: [PATCH] libxc: don't fail domain creation when unpacking initrd fails"): >> On 16.10.17 at 17:45, <ian.jackson@eu.citrix.com> wrote: >> > Is there no way to tell that a kernel supports gzipped initrds by >> > looking at the kernel ? >> >> Well, Linux kernels have config options controlling their ability. So >> even a modern kernel _could_ be configured to require unzipping. >> I didn't check whether they announce this anywhere outside the >> (possibly) embedded .config, but even if they did this would be >> only Linux then. A solution here shouldn't really be OS-specific imo. > >I guess I was hoping for an ELF note or some multiboot protocol >element or something. If it doesn't exist then your proposed general >approach is probably best. > >I'm afraid I still find the patch less clear than it could be. >The new semantics of xc_dom_ramdisk_check_size are awkward. And >looking at it briefly, I think it might be possible to try the unzip >even if the size is too large. I'll double check that. >I think a sensible implementation is might have to have a flag >variable to control "try doing it raw". And it might be bdest to >replace xc_dom_ramdisk_check_size with either a function which does >not bomb out if the limit is exceeded. > >What you are really trying to do here is to pursue two strategies in >parallel. And ideally they would not be entangled. Maybe there would >have to be a comment. Each of the strategies must rely only on >functions which don't bomb out, to achieve that. I'll see what I can do. As quite often when changing code I'm not very familiar with, I had tried to minimize the amount of changes needed. E.g. I did consider dropping xc_dom_ramdisk_check_size() altogether in favor of some other function (or even doing what is needed in its only caller), but that would have been a larger (both textual and factual) change than keeping the function and adding another parameter. As to your other reply to Andrew - I don't think I'm up to wiring through a new guest config file option specifying whether to do the unzipping. Besides the mechanical aspects I'm also unconvinced this would be reasonable without then also considering other compression methods. Jan
>>> On 16.10.17 at 18:43, <ian.jackson@eu.citrix.com> wrote: > Jan Beulich writes ("Re: [PATCH] libxc: don't fail domain creation when > unpacking initrd fails"): >> On 16.10.17 at 17:45, <ian.jackson@eu.citrix.com> wrote: >> > Is there no way to tell that a kernel supports gzipped initrds by >> > looking at the kernel ? >> >> Well, Linux kernels have config options controlling their ability. So >> even a modern kernel _could_ be configured to require unzipping. >> I didn't check whether they announce this anywhere outside the >> (possibly) embedded .config, but even if they did this would be >> only Linux then. A solution here shouldn't really be OS-specific imo. > > I guess I was hoping for an ELF note or some multiboot protocol > element or something. If it doesn't exist then your proposed general > approach is probably best. > > I'm afraid I still find the patch less clear than it could be. > The new semantics of xc_dom_ramdisk_check_size are awkward. And > looking at it briefly, I think it might be possible to try the unzip > even if the size is too large. I don't think so - xc_dom_ramdisk_check_size() returns 1 whenever decompressed size is above the limit. What I do admit is that in the case compressed size is larger than uncompressed size, with the boundary being in between, and with decompression failing, we may accept something that's above the limit. Not sure how bad that is though, as the limit is pretty arbitrary anyway. > I think a sensible implementation is might have to have a flag > variable to control "try doing it raw". And it might be bdest to > replace xc_dom_ramdisk_check_size with either a function which does > not bomb out if the limit is exceeded. > > What you are really trying to do here is to pursue two strategies in > parallel. And ideally they would not be entangled. I would have wanted to do things in sequence rather than in parallel. I can't see how that could work though, in particular when considering the case mentioned above (uncompressed size smaller than compressed) - as the space allocation in the guest can't be reverted, I need to allocate the larger of the two sizes anyway. > Maybe there would have to be a comment. That would be doable, obviously. > Each of the strategies must rely only on > functions which don't bomb out, to achieve that. I'm not sure I understand what "bomb out" is supposed to mean here. I first thought you meant calls to xc_dom_panic(), but now I don't think that's what you would mean here (the more that I'm not introducing that behavior of the function). So what about Andrew's suggestion of leaving the initrd alone unconditionally? Jan
Jan: > [...] As quite often when changing code I'm not very > familiar with, I had tried to minimize the amount of changes needed. E.g. > I did consider dropping xc_dom_ramdisk_check_size() altogether in favor > of some other function (or even doing what is needed in its only caller), > but that would have been a larger (both textual and factual) change than > keeping the function and adding another parameter. I can see why this seema an attractive approach to unfamiliar code. But at least in this case I think the results are unsatisfactory. Jan Beulich writes ("Re: [PATCH] libxc: don't fail domain creation when unpacking initrd fails"): > On 16.10.17 at 18:43, <ian.jackson@eu.citrix.com> wrote: > > I'm afraid I still find the patch less clear than it could be. > > The new semantics of xc_dom_ramdisk_check_size are awkward. And > > looking at it briefly, I think it might be possible to try the unzip > > even if the size is too large. > > I don't think so - xc_dom_ramdisk_check_size() returns 1 > whenever decompressed size is above the limit. What I do > admit is that in the case compressed size is larger than > uncompressed size, with the boundary being in between, and > with decompression failing, we may accept something that's > above the limit. Not sure how bad that is though, as the limit > is pretty arbitrary anyway. Conceptually what you are trying to do is have two alternative strategies. Those two strategies have different limits. So "the limit" is not a meaningful concept. > > What you are really trying to do here is to pursue two strategies in > > parallel. And ideally they would not be entangled. > > I would have wanted to do things in sequence rather than in > parallel. I can't see how that could work though, in particular > when considering the case mentioned above (uncompressed size > smaller than compressed) - as the space allocation in the guest > can't be reverted, I need to allocate the larger of the two sizes > anyway. I don't think it can work. I think you uneed to pursue them in parallel and keep separate records, for each one, of whether we are still pursuing it or whether it has failed (and of course its necessary locals). > > Each of the strategies must rely only on > > functions which don't bomb out, to achieve that. > > I'm not sure I understand what "bomb out" is supposed to > mean here. I first thought you meant calls to xc_dom_panic(), > but now I don't think that's what you would mean here (the > more that I'm not introducing that behavior of the function). > > So what about Andrew's suggestion of leaving the initrd alone > unconditionally? That would be a backward incompatible change. We'd need some kind of justification to explain why no-one cares about it any more. Ian.
>>> On 19.10.17 at 17:06, <ian.jackson@eu.citrix.com> wrote: > Jan Beulich writes ("Re: [PATCH] libxc: don't fail domain creation when > unpacking initrd fails"): >> On 16.10.17 at 18:43, <ian.jackson@eu.citrix.com> wrote: >> > I'm afraid I still find the patch less clear than it could be. >> > The new semantics of xc_dom_ramdisk_check_size are awkward. And >> > looking at it briefly, I think it might be possible to try the unzip >> > even if the size is too large. >> >> I don't think so - xc_dom_ramdisk_check_size() returns 1 >> whenever decompressed size is above the limit. What I do >> admit is that in the case compressed size is larger than >> uncompressed size, with the boundary being in between, and >> with decompression failing, we may accept something that's >> above the limit. Not sure how bad that is though, as the limit >> is pretty arbitrary anyway. > > Conceptually what you are trying to do is have two alternative > strategies. Those two strategies have different limits. So "the > limit" is not a meaningful concept. > >> > What you are really trying to do here is to pursue two strategies in >> > parallel. And ideally they would not be entangled. >> >> I would have wanted to do things in sequence rather than in >> parallel. I can't see how that could work though, in particular >> when considering the case mentioned above (uncompressed size >> smaller than compressed) - as the space allocation in the guest >> can't be reverted, I need to allocate the larger of the two sizes >> anyway. > > I don't think it can work. I think you uneed to pursue them in > parallel and keep separate records, for each one, of whether we are > still pursuing it or whether it has failed (and of course its > necessary locals). So before I do another pointless round of backporting (for the change to be tested in the environment where it is needed), does the below new function (with xc_dom_ramdisk_check_size() dropped altogether) look any better to you? Thanks, Jan static int xc_dom_build_ramdisk(struct xc_dom_image *dom) { size_t unziplen, ramdisklen; void *ramdiskmap; if ( !dom->ramdisk_seg.vstart ) unziplen = xc_dom_check_gzip(dom->xch, dom->ramdisk_blob, dom->ramdisk_size); else unziplen = 0; ramdisklen = max(unziplen, dom->ramdisk_size); if ( dom->max_ramdisk_size ) { if ( unziplen && ramdisklen > dom->max_ramdisk_size ) { ramdisklen = min(unziplen, dom->ramdisk_size); if ( unziplen > ramdisklen) unziplen = 0; } if ( ramdisklen > dom->max_ramdisk_size ) { xc_dom_panic(dom->xch, XC_INVALID_KERNEL, "ramdisk image too large"); goto err; } } if ( xc_dom_alloc_segment(dom, &dom->ramdisk_seg, "ramdisk", dom->ramdisk_seg.vstart, ramdisklen) != 0 ) goto err; ramdiskmap = xc_dom_seg_to_ptr(dom, &dom->ramdisk_seg); if ( ramdiskmap == NULL ) { DOMPRINTF("%s: xc_dom_seg_to_ptr(dom, &dom->ramdisk_seg) => NULL", __FUNCTION__); goto err; } if ( unziplen ) { if ( xc_dom_do_gunzip(dom->xch, dom->ramdisk_blob, dom->ramdisk_size, ramdiskmap, unziplen) != -1 ) return 0; if ( dom->ramdisk_size > ramdisklen ) goto err; } /* Fall back to handing over the raw blob. */ memcpy(ramdiskmap, dom->ramdisk_blob, dom->ramdisk_size); /* If an unzip attempt was made, the buffer may no longer be all zero. */ if ( unziplen > dom->ramdisk_size ) memset(ramdiskmap + dom->ramdisk_size, 0, unziplen - dom->ramdisk_size); return 0; err: return -1; }
On 10/16/17 11:48 AM, Andrew Cooper wrote: > On 16/10/17 17:19, Jan Beulich wrote: >>>>> On 16.10.17 at 17:45, <ian.jackson@eu.citrix.com> wrote: > > I've been bitten by this issue several times before, and a fix would be > nice. Same here. > > IMO, the toolstack should not be making assumptions about the initrd, > and shouldn't be touching it. It is the users responsibility to provide > an initrd which its kernel can read. > > Furthermore, leaving the decompression to the kernel reduces the dom0 > attack surface. This. So many this. I do recall bringing this up at a meet up a while back and the concern was breaking someone's workflow. Maybe we can put a warning that the behavior is deprecated for X number of releases before deleting it.
--- a/tools/libxc/include/xc_dom.h +++ b/tools/libxc/include/xc_dom.h @@ -291,7 +291,6 @@ int xc_dom_mem_init(struct xc_dom_image int xc_dom_kernel_check_size(struct xc_dom_image *dom, size_t sz); int xc_dom_kernel_max_size(struct xc_dom_image *dom, size_t sz); -int xc_dom_ramdisk_check_size(struct xc_dom_image *dom, size_t sz); int xc_dom_ramdisk_max_size(struct xc_dom_image *dom, size_t sz); int xc_dom_devicetree_max_size(struct xc_dom_image *dom, size_t sz); --- a/tools/libxc/xc_dom_core.c +++ b/tools/libxc/xc_dom_core.c @@ -314,7 +314,8 @@ int xc_dom_kernel_check_size(struct xc_d return 0; } -int xc_dom_ramdisk_check_size(struct xc_dom_image *dom, size_t sz) +static int xc_dom_ramdisk_check_size(struct xc_dom_image *dom, size_t sz, + size_t raw) { /* No limit */ if ( !dom->max_ramdisk_size ) @@ -322,8 +323,9 @@ int xc_dom_ramdisk_check_size(struct xc_ if ( sz > dom->max_ramdisk_size ) { - xc_dom_panic(dom->xch, XC_INVALID_KERNEL, - "ramdisk image too large"); + if ( raw > dom->max_ramdisk_size ) + xc_dom_panic(dom->xch, XC_INVALID_KERNEL, + "ramdisk image too large"); return 1; } @@ -999,13 +1001,13 @@ static int xc_dom_build_ramdisk(struct x { unziplen = xc_dom_check_gzip(dom->xch, dom->ramdisk_blob, dom->ramdisk_size); - if ( xc_dom_ramdisk_check_size(dom, unziplen) != 0 ) + if ( xc_dom_ramdisk_check_size(dom, unziplen, dom->ramdisk_size) != 0 ) unziplen = 0; } else unziplen = 0; - ramdisklen = unziplen ? unziplen : dom->ramdisk_size; + ramdisklen = max(unziplen, dom->ramdisk_size); if ( xc_dom_alloc_segment(dom, &dom->ramdisk_seg, "ramdisk", dom->ramdisk_seg.vstart, ramdisklen) != 0 ) @@ -1017,14 +1019,15 @@ static int xc_dom_build_ramdisk(struct x __FUNCTION__); goto err; } - if ( unziplen ) + if ( !unziplen || + xc_dom_do_gunzip(dom->xch, dom->ramdisk_blob, dom->ramdisk_size, + ramdiskmap, unziplen) == -1 ) { - if ( xc_dom_do_gunzip(dom->xch, dom->ramdisk_blob, dom->ramdisk_size, - ramdiskmap, ramdisklen) == -1 ) - goto err; - } - else memcpy(ramdiskmap, dom->ramdisk_blob, dom->ramdisk_size); + if ( unziplen > dom->ramdisk_size ) + memset(ramdiskmap + dom->ramdisk_size, 0, + unziplen - dom->ramdisk_size); + } return 0;
At least Linux kernels have been able to work with gzip-ed initrd for quite some time; initrd compressed with other methods aren't even being attempted to unpack. Furthermore the unzip-ing routine used here isn't capable of dealing with various forms of concatenated files, each of which was gzip-ed separately (it is this particular case which has been the source of observed VM creation failures). Hence, if unpacking fails, simply hand the the compressed blob to the guest as is. Signed-off-by: Jan Beulich <jbeulich@suse.com> --- I'm not intending to request this to go into 4.10, but I certainly wouldn't mind. I would appreciate though if I could at least get some initial feedback earlier than when 4.10 branches off, as we will want to use a backport of this in our trees, which I'd prefer to be in line with what is eventually going to go into master.