Message ID | 20190618202352.39702-1-farman@linux.ibm.com (mailing list archive) |
---|---|
Headers | show |
Series | s390: more vfio-ccw code rework | expand |
On Tue, 18 Jun 2019 22:23:47 +0200 Eric Farman <farman@linux.ibm.com> wrote: > A couple little improvements to the malloc load in vfio-ccw. > Really, there were just (the first) two patches, but then I > got excited and added a few stylistic ones to the end. > > The routine ccwchain_calc_length() has this basic structure: > > ccwchain_calc_length > a0 = kcalloc(CCWCHAIN_LEN_MAX, sizeof(struct ccw1)) > copy_ccw_from_iova(a0, src) > copy_from_iova > pfn_array_alloc > b = kcalloc(len, sizeof(*pa_iova_pfn + *pa_pfn) > pfn_array_pin > vfio_pin_pages > memcpy(a0, src) > pfn_array_unpin_free > vfio_unpin_pages > kfree(b) > kfree(a0) > > We do this EVERY time we process a new channel program chain, > meaning at least once per SSCH and more if TICs are involved, > to figure out how many CCWs are chained together. Once that > is determined, a new piece of memory is allocated (call it a1) > and then passed to copy_ccw_from_iova() again, but for the > value calculated by ccwchain_calc_length(). > > This seems inefficient. > > Patch 1 moves the malloc of a0 from the CCW processor to the > initialization of the device. Since only one SSCH can be > handled concurrently, we can use this space safely to > determine how long the chain being processed actually is. > > Patch 2 then removes the second copy_ccw_from_iova() call > entirely, and replaces it with a memcpy from a0 to a1. This > is done before we process a TIC and thus a second chain, so > there is no overlap in the storage in channel_program. > > Patches 3-5 clean up some things that aren't as clear as I'd > like, but didn't want to pollute the first two changes. > For example, patch 3 moves the population of guest_cp to the > same routine that copies from it, rather than in a called > function. Meanwhile, patch 4 (and thus, 5) was something I > had lying around for quite some time, because it looked to > be structured weird. Maybe that's one bridge too far. I think this is worthwhile. > > Eric Farman (5): > vfio-ccw: Move guest_cp storage into common struct > vfio-ccw: Skip second copy of guest cp to host > vfio-ccw: Copy CCW data outside length calculation > vfio-ccw: Factor out the ccw0-to-ccw1 transition > vfio-ccw: Remove copy_ccw_from_iova() > > drivers/s390/cio/vfio_ccw_cp.c | 108 +++++++++++--------------------- > drivers/s390/cio/vfio_ccw_cp.h | 7 +++ > drivers/s390/cio/vfio_ccw_drv.c | 7 +++ > 3 files changed, 52 insertions(+), 70 deletions(-) > Ok, so I just wanted to take a quick look, and then ended up reviewing it all :) Will give others some time to look at this before I queue.
On 6/19/19 4:25 AM, Cornelia Huck wrote: > On Tue, 18 Jun 2019 22:23:47 +0200 > Eric Farman <farman@linux.ibm.com> wrote: > >> A couple little improvements to the malloc load in vfio-ccw. >> Really, there were just (the first) two patches, but then I >> got excited and added a few stylistic ones to the end. >> >> The routine ccwchain_calc_length() has this basic structure: >> >> ccwchain_calc_length >> a0 = kcalloc(CCWCHAIN_LEN_MAX, sizeof(struct ccw1)) >> copy_ccw_from_iova(a0, src) >> copy_from_iova >> pfn_array_alloc >> b = kcalloc(len, sizeof(*pa_iova_pfn + *pa_pfn) >> pfn_array_pin >> vfio_pin_pages >> memcpy(a0, src) >> pfn_array_unpin_free >> vfio_unpin_pages >> kfree(b) >> kfree(a0) >> >> We do this EVERY time we process a new channel program chain, >> meaning at least once per SSCH and more if TICs are involved, >> to figure out how many CCWs are chained together. Once that >> is determined, a new piece of memory is allocated (call it a1) >> and then passed to copy_ccw_from_iova() again, but for the >> value calculated by ccwchain_calc_length(). >> >> This seems inefficient. >> >> Patch 1 moves the malloc of a0 from the CCW processor to the >> initialization of the device. Since only one SSCH can be >> handled concurrently, we can use this space safely to >> determine how long the chain being processed actually is. >> >> Patch 2 then removes the second copy_ccw_from_iova() call >> entirely, and replaces it with a memcpy from a0 to a1. This >> is done before we process a TIC and thus a second chain, so >> there is no overlap in the storage in channel_program. >> >> Patches 3-5 clean up some things that aren't as clear as I'd >> like, but didn't want to pollute the first two changes. >> For example, patch 3 moves the population of guest_cp to the >> same routine that copies from it, rather than in a called >> function. Meanwhile, patch 4 (and thus, 5) was something I >> had lying around for quite some time, because it looked to >> be structured weird. Maybe that's one bridge too far. > > I think this is worthwhile. > >> >> Eric Farman (5): >> vfio-ccw: Move guest_cp storage into common struct >> vfio-ccw: Skip second copy of guest cp to host >> vfio-ccw: Copy CCW data outside length calculation >> vfio-ccw: Factor out the ccw0-to-ccw1 transition >> vfio-ccw: Remove copy_ccw_from_iova() >> >> drivers/s390/cio/vfio_ccw_cp.c | 108 +++++++++++--------------------- >> drivers/s390/cio/vfio_ccw_cp.h | 7 +++ >> drivers/s390/cio/vfio_ccw_drv.c | 7 +++ >> 3 files changed, 52 insertions(+), 70 deletions(-) >> > > Ok, so I just wanted to take a quick look, and then ended up reviewing > it all :) Haha, oops! :) Thank you! That was a nice surprise. > > Will give others some time to look at this before I queue. > Sounds great! I'll get back to my own reviews (notes the gentle reminder on qemu :)
On 06/18/2019 04:23 PM, Eric Farman wrote: > A couple little improvements to the malloc load in vfio-ccw. > Really, there were just (the first) two patches, but then I > got excited and added a few stylistic ones to the end. > > The routine ccwchain_calc_length() has this basic structure: > > ccwchain_calc_length > a0 = kcalloc(CCWCHAIN_LEN_MAX, sizeof(struct ccw1)) > copy_ccw_from_iova(a0, src) > copy_from_iova > pfn_array_alloc > b = kcalloc(len, sizeof(*pa_iova_pfn + *pa_pfn) > pfn_array_pin > vfio_pin_pages > memcpy(a0, src) > pfn_array_unpin_free > vfio_unpin_pages > kfree(b) > kfree(a0) > > We do this EVERY time we process a new channel program chain, > meaning at least once per SSCH and more if TICs are involved, > to figure out how many CCWs are chained together. Once that > is determined, a new piece of memory is allocated (call it a1) > and then passed to copy_ccw_from_iova() again, but for the > value calculated by ccwchain_calc_length(). > > This seems inefficient. > > Patch 1 moves the malloc of a0 from the CCW processor to the > initialization of the device. Since only one SSCH can be > handled concurrently, we can use this space safely to > determine how long the chain being processed actually is. > > Patch 2 then removes the second copy_ccw_from_iova() call > entirely, and replaces it with a memcpy from a0 to a1. This > is done before we process a TIC and thus a second chain, so > there is no overlap in the storage in channel_program. > > Patches 3-5 clean up some things that aren't as clear as I'd > like, but didn't want to pollute the first two changes. > For example, patch 3 moves the population of guest_cp to the > same routine that copies from it, rather than in a called > function. Meanwhile, patch 4 (and thus, 5) was something I > had lying around for quite some time, because it looked to > be structured weird. Maybe that's one bridge too far. > > Eric Farman (5): > vfio-ccw: Move guest_cp storage into common struct > vfio-ccw: Skip second copy of guest cp to host > vfio-ccw: Copy CCW data outside length calculation > vfio-ccw: Factor out the ccw0-to-ccw1 transition > vfio-ccw: Remove copy_ccw_from_iova() > > drivers/s390/cio/vfio_ccw_cp.c | 108 +++++++++++--------------------- > drivers/s390/cio/vfio_ccw_cp.h | 7 +++ > drivers/s390/cio/vfio_ccw_drv.c | 7 +++ > 3 files changed, 52 insertions(+), 70 deletions(-) > For this series: Reviewed-by: Farhan Ali <alifm@linux.ibm.com>
On Tue, 18 Jun 2019 22:23:47 +0200 Eric Farman <farman@linux.ibm.com> wrote: > A couple little improvements to the malloc load in vfio-ccw. > Really, there were just (the first) two patches, but then I > got excited and added a few stylistic ones to the end. > > The routine ccwchain_calc_length() has this basic structure: > > ccwchain_calc_length > a0 = kcalloc(CCWCHAIN_LEN_MAX, sizeof(struct ccw1)) > copy_ccw_from_iova(a0, src) > copy_from_iova > pfn_array_alloc > b = kcalloc(len, sizeof(*pa_iova_pfn + *pa_pfn) > pfn_array_pin > vfio_pin_pages > memcpy(a0, src) > pfn_array_unpin_free > vfio_unpin_pages > kfree(b) > kfree(a0) > > We do this EVERY time we process a new channel program chain, > meaning at least once per SSCH and more if TICs are involved, > to figure out how many CCWs are chained together. Once that > is determined, a new piece of memory is allocated (call it a1) > and then passed to copy_ccw_from_iova() again, but for the > value calculated by ccwchain_calc_length(). > > This seems inefficient. > > Patch 1 moves the malloc of a0 from the CCW processor to the > initialization of the device. Since only one SSCH can be > handled concurrently, we can use this space safely to > determine how long the chain being processed actually is. > > Patch 2 then removes the second copy_ccw_from_iova() call > entirely, and replaces it with a memcpy from a0 to a1. This > is done before we process a TIC and thus a second chain, so > there is no overlap in the storage in channel_program. > > Patches 3-5 clean up some things that aren't as clear as I'd > like, but didn't want to pollute the first two changes. > For example, patch 3 moves the population of guest_cp to the > same routine that copies from it, rather than in a called > function. Meanwhile, patch 4 (and thus, 5) was something I > had lying around for quite some time, because it looked to > be structured weird. Maybe that's one bridge too far. > > Eric Farman (5): > vfio-ccw: Move guest_cp storage into common struct > vfio-ccw: Skip second copy of guest cp to host > vfio-ccw: Copy CCW data outside length calculation > vfio-ccw: Factor out the ccw0-to-ccw1 transition > vfio-ccw: Remove copy_ccw_from_iova() > > drivers/s390/cio/vfio_ccw_cp.c | 108 +++++++++++--------------------- > drivers/s390/cio/vfio_ccw_cp.h | 7 +++ > drivers/s390/cio/vfio_ccw_drv.c | 7 +++ > 3 files changed, 52 insertions(+), 70 deletions(-) > Thanks, applied.