Message ID | 20190723225817.12800-1-pasic@linux.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/1] virtio/s390: fix race on airq_areas[] | expand |
On 24.07.19 00:58, Halil Pasic wrote: > The access to airq_areas was racy ever since the adapter interrupts got > introduced to virtio-ccw, but since commit 39c7dcb15892 ("virtio/s390: > make airq summary indicators DMA") this became an issue in practice as > well. Namely before that commit the airq_info that got overwritten was > still functional. After that commit however the two infos share a > summary_indicator, which aggravates the situation. Which means > auto-online mechanism occasionally hangs the boot with virtio_blk. > > Signed-off-by: Halil Pasic <pasic@linux.ibm.com> > Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com> > Fixes: 96b14536d935 ("virtio-ccw: virtio-ccw adapter interrupt support.") > --- > * We need definitely this fixed for 5.3. For older stable kernels it is > to be discussed. @Connie what do you think: do we need a cc stable? Unless you can prove that the problem could never happen on old version we absolutely do need cc stable. > > * I have a variant that does not need the extra mutex but uses cmpxchg(). > Decided to post this one because that one is more complex. But if there > is interest we can have a look at it as well. This is slow path (startup) and never called in hot path. Correct? Mutex should be fine. > --- > drivers/s390/virtio/virtio_ccw.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c > index 1a55e5942d36..d97742662755 100644 > --- a/drivers/s390/virtio/virtio_ccw.c > +++ b/drivers/s390/virtio/virtio_ccw.c > @@ -145,6 +145,8 @@ struct airq_info { > struct airq_iv *aiv; > }; > static struct airq_info *airq_areas[MAX_AIRQ_AREAS]; > +DEFINE_MUTEX(airq_areas_lock); > + > static u8 *summary_indicators; > > static inline u8 *get_summary_indicator(struct airq_info *info) > @@ -265,9 +267,11 @@ static unsigned long get_airq_indicator(struct virtqueue *vqs[], int nvqs, > unsigned long bit, flags; > > for (i = 0; i < MAX_AIRQ_AREAS && !indicator_addr; i++) { > + mutex_lock(&airq_areas_lock); > if (!airq_areas[i]) > airq_areas[i] = new_airq_info(i); > info = airq_areas[i]; > + mutex_unlock(&airq_areas_lock); > if (!info) > return 0; > write_lock_irqsave(&info->lock, flags); >
On Wed, 24 Jul 2019 08:44:19 +0200 Christian Borntraeger <borntraeger@de.ibm.com> wrote: > On 24.07.19 00:58, Halil Pasic wrote: > > The access to airq_areas was racy ever since the adapter interrupts got > > introduced to virtio-ccw, but since commit 39c7dcb15892 ("virtio/s390: > > make airq summary indicators DMA") this became an issue in practice as > > well. Namely before that commit the airq_info that got overwritten was > > still functional. After that commit however the two infos share a > > summary_indicator, which aggravates the situation. Which means > > auto-online mechanism occasionally hangs the boot with virtio_blk. > > > > Signed-off-by: Halil Pasic <pasic@linux.ibm.com> > > Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com> > > Fixes: 96b14536d935 ("virtio-ccw: virtio-ccw adapter interrupt support.") > > --- > > * We need definitely this fixed for 5.3. For older stable kernels it is > > to be discussed. @Connie what do you think: do we need a cc stable? > > Unless you can prove that the problem could never happen on old version > we absolutely do need cc stable. Yes, this needs to be cc:stable. > > > > > * I have a variant that does not need the extra mutex but uses cmpxchg(). > > Decided to post this one because that one is more complex. But if there > > is interest we can have a look at it as well. > > This is slow path (startup) and never called in hot path. Correct? Mutex should be > fine. Yes, this is ultimately called through the ->probe functions of virtio drivers. > > --- > > drivers/s390/virtio/virtio_ccw.c | 4 ++++ > > 1 file changed, 4 insertions(+) > > > > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c > > index 1a55e5942d36..d97742662755 100644 > > --- a/drivers/s390/virtio/virtio_ccw.c > > +++ b/drivers/s390/virtio/virtio_ccw.c > > @@ -145,6 +145,8 @@ struct airq_info { > > struct airq_iv *aiv; > > }; > > static struct airq_info *airq_areas[MAX_AIRQ_AREAS]; > > +DEFINE_MUTEX(airq_areas_lock); > > + > > static u8 *summary_indicators; > > > > static inline u8 *get_summary_indicator(struct airq_info *info) > > @@ -265,9 +267,11 @@ static unsigned long get_airq_indicator(struct virtqueue *vqs[], int nvqs, > > unsigned long bit, flags; > > > > for (i = 0; i < MAX_AIRQ_AREAS && !indicator_addr; i++) { > > + mutex_lock(&airq_areas_lock); > > if (!airq_areas[i]) > > airq_areas[i] = new_airq_info(i); > > info = airq_areas[i]; > > + mutex_unlock(&airq_areas_lock); > > if (!info) > > return 0; > > write_lock_irqsave(&info->lock, flags); > > > Reviewed-by: Cornelia Huck <cohuck@redhat.com> Should I pick this and send a pull request, or is it quicker to just take this directly?
On 24.07.19 10:34, Cornelia Huck wrote: > On Wed, 24 Jul 2019 08:44:19 +0200 > Christian Borntraeger <borntraeger@de.ibm.com> wrote: > >> On 24.07.19 00:58, Halil Pasic wrote: >>> The access to airq_areas was racy ever since the adapter interrupts got >>> introduced to virtio-ccw, but since commit 39c7dcb15892 ("virtio/s390: >>> make airq summary indicators DMA") this became an issue in practice as >>> well. Namely before that commit the airq_info that got overwritten was >>> still functional. After that commit however the two infos share a >>> summary_indicator, which aggravates the situation. Which means >>> auto-online mechanism occasionally hangs the boot with virtio_blk. >>> >>> Signed-off-by: Halil Pasic <pasic@linux.ibm.com> >>> Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com> >>> Fixes: 96b14536d935 ("virtio-ccw: virtio-ccw adapter interrupt support.") >>> --- >>> * We need definitely this fixed for 5.3. For older stable kernels it is >>> to be discussed. @Connie what do you think: do we need a cc stable? >> >> Unless you can prove that the problem could never happen on old version >> we absolutely do need cc stable. > > Yes, this needs to be cc:stable. > >> >>> >>> * I have a variant that does not need the extra mutex but uses cmpxchg(). >>> Decided to post this one because that one is more complex. But if there >>> is interest we can have a look at it as well. >> >> This is slow path (startup) and never called in hot path. Correct? Mutex should be >> fine. > > Yes, this is ultimately called through the ->probe functions of virtio > drivers. > >>> --- >>> drivers/s390/virtio/virtio_ccw.c | 4 ++++ >>> 1 file changed, 4 insertions(+) >>> >>> diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c >>> index 1a55e5942d36..d97742662755 100644 >>> --- a/drivers/s390/virtio/virtio_ccw.c >>> +++ b/drivers/s390/virtio/virtio_ccw.c >>> @@ -145,6 +145,8 @@ struct airq_info { >>> struct airq_iv *aiv; >>> }; >>> static struct airq_info *airq_areas[MAX_AIRQ_AREAS]; >>> +DEFINE_MUTEX(airq_areas_lock); >>> + >>> static u8 *summary_indicators; >>> >>> static inline u8 *get_summary_indicator(struct airq_info *info) >>> @@ -265,9 +267,11 @@ static unsigned long get_airq_indicator(struct virtqueue *vqs[], int nvqs, >>> unsigned long bit, flags; >>> >>> for (i = 0; i < MAX_AIRQ_AREAS && !indicator_addr; i++) { >>> + mutex_lock(&airq_areas_lock); >>> if (!airq_areas[i]) >>> airq_areas[i] = new_airq_info(i); >>> info = airq_areas[i]; >>> + mutex_unlock(&airq_areas_lock); >>> if (!info) >>> return 0; >>> write_lock_irqsave(&info->lock, flags); >>> >> > > Reviewed-by: Cornelia Huck <cohuck@redhat.com> > > Should I pick this and send a pull request, or is it quicker to just > take this directly? I think we can you did via a fast path. Halil, can you push to the s390 tree?
On Wed, 24 Jul 2019 08:44:19 +0200 Christian Borntraeger <borntraeger@de.ibm.com> wrote: > > > On 24.07.19 00:58, Halil Pasic wrote: > > The access to airq_areas was racy ever since the adapter interrupts got > > introduced to virtio-ccw, but since commit 39c7dcb15892 ("virtio/s390: > > make airq summary indicators DMA") this became an issue in practice as > > well. Namely before that commit the airq_info that got overwritten was > > still functional. After that commit however the two infos share a > > summary_indicator, which aggravates the situation. Which means > > auto-online mechanism occasionally hangs the boot with virtio_blk. > > > > Signed-off-by: Halil Pasic <pasic@linux.ibm.com> > > Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com> > > Fixes: 96b14536d935 ("virtio-ccw: virtio-ccw adapter interrupt support.") > > --- > > * We need definitely this fixed for 5.3. For older stable kernels it is > > to be discussed. @Connie what do you think: do we need a cc stable? > > Unless you can prove that the problem could never happen on old version > we absolutely do need cc stable. No I would not like to make an attempt at proving that. I prefer code race free anyway. CC-ing stable. > > > > > * I have a variant that does not need the extra mutex but uses cmpxchg(). > > Decided to post this one because that one is more complex. But if there > > is interest we can have a look at it as well. > > This is slow path (startup) and never called in hot path. Correct? Mutex should be > fine. Right, this is only relevant during device initialization, which is an infrequent operation. Thanks, Halil > > --- > > drivers/s390/virtio/virtio_ccw.c | 4 ++++ > > 1 file changed, 4 insertions(+) > > > > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c > > index 1a55e5942d36..d97742662755 100644 > > --- a/drivers/s390/virtio/virtio_ccw.c > > +++ b/drivers/s390/virtio/virtio_ccw.c > > @@ -145,6 +145,8 @@ struct airq_info { > > struct airq_iv *aiv; > > }; > > static struct airq_info *airq_areas[MAX_AIRQ_AREAS]; > > +DEFINE_MUTEX(airq_areas_lock); > > + > > static u8 *summary_indicators; > > > > static inline u8 *get_summary_indicator(struct airq_info *info) > > @@ -265,9 +267,11 @@ static unsigned long get_airq_indicator(struct virtqueue *vqs[], int nvqs, > > unsigned long bit, flags; > > > > for (i = 0; i < MAX_AIRQ_AREAS && !indicator_addr; i++) { > > + mutex_lock(&airq_areas_lock); > > if (!airq_areas[i]) > > airq_areas[i] = new_airq_info(i); > > info = airq_areas[i]; > > + mutex_unlock(&airq_areas_lock); > > if (!info) > > return 0; > > write_lock_irqsave(&info->lock, flags); > > >
On Wed, 24 Jul 2019 10:39:13 +0200 Christian Borntraeger <borntraeger@de.ibm.com> wrote: > > > On 24.07.19 10:34, Cornelia Huck wrote: > > On Wed, 24 Jul 2019 08:44:19 +0200 > > Christian Borntraeger <borntraeger@de.ibm.com> wrote: > > > >> On 24.07.19 00:58, Halil Pasic wrote: > >>> The access to airq_areas was racy ever since the adapter interrupts got > >>> introduced to virtio-ccw, but since commit 39c7dcb15892 ("virtio/s390: > >>> make airq summary indicators DMA") this became an issue in practice as > >>> well. Namely before that commit the airq_info that got overwritten was > >>> still functional. After that commit however the two infos share a > >>> summary_indicator, which aggravates the situation. Which means > >>> auto-online mechanism occasionally hangs the boot with virtio_blk. > >>> > >>> Signed-off-by: Halil Pasic <pasic@linux.ibm.com> > >>> Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com> > >>> Fixes: 96b14536d935 ("virtio-ccw: virtio-ccw adapter interrupt support.") > >>> --- > >>> * We need definitely this fixed for 5.3. For older stable kernels it is > >>> to be discussed. @Connie what do you think: do we need a cc stable? > >> > >> Unless you can prove that the problem could never happen on old version > >> we absolutely do need cc stable. > > > > Yes, this needs to be cc:stable. > > > >> > >>> > >>> * I have a variant that does not need the extra mutex but uses cmpxchg(). > >>> Decided to post this one because that one is more complex. But if there > >>> is interest we can have a look at it as well. > >> > >> This is slow path (startup) and never called in hot path. Correct? Mutex should be > >> fine. > > > > Yes, this is ultimately called through the ->probe functions of virtio > > drivers. > > > >>> --- > >>> drivers/s390/virtio/virtio_ccw.c | 4 ++++ > >>> 1 file changed, 4 insertions(+) > >>> > >>> diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c > >>> index 1a55e5942d36..d97742662755 100644 > >>> --- a/drivers/s390/virtio/virtio_ccw.c > >>> +++ b/drivers/s390/virtio/virtio_ccw.c > >>> @@ -145,6 +145,8 @@ struct airq_info { > >>> struct airq_iv *aiv; > >>> }; > >>> static struct airq_info *airq_areas[MAX_AIRQ_AREAS]; > >>> +DEFINE_MUTEX(airq_areas_lock); > >>> + > >>> static u8 *summary_indicators; > >>> > >>> static inline u8 *get_summary_indicator(struct airq_info *info) > >>> @@ -265,9 +267,11 @@ static unsigned long get_airq_indicator(struct virtqueue *vqs[], int nvqs, > >>> unsigned long bit, flags; > >>> > >>> for (i = 0; i < MAX_AIRQ_AREAS && !indicator_addr; i++) { > >>> + mutex_lock(&airq_areas_lock); > >>> if (!airq_areas[i]) > >>> airq_areas[i] = new_airq_info(i); > >>> info = airq_areas[i]; > >>> + mutex_unlock(&airq_areas_lock); > >>> if (!info) > >>> return 0; > >>> write_lock_irqsave(&info->lock, flags); > >>> > >> > > > > Reviewed-by: Cornelia Huck <cohuck@redhat.com> > > > > Should I pick this and send a pull request, or is it quicker to just > > take this directly? > > I think we can you did via a fast path. Halil, can you push to the s390 tree? Sure! Regards, Halil >
diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c index 1a55e5942d36..d97742662755 100644 --- a/drivers/s390/virtio/virtio_ccw.c +++ b/drivers/s390/virtio/virtio_ccw.c @@ -145,6 +145,8 @@ struct airq_info { struct airq_iv *aiv; }; static struct airq_info *airq_areas[MAX_AIRQ_AREAS]; +DEFINE_MUTEX(airq_areas_lock); + static u8 *summary_indicators; static inline u8 *get_summary_indicator(struct airq_info *info) @@ -265,9 +267,11 @@ static unsigned long get_airq_indicator(struct virtqueue *vqs[], int nvqs, unsigned long bit, flags; for (i = 0; i < MAX_AIRQ_AREAS && !indicator_addr; i++) { + mutex_lock(&airq_areas_lock); if (!airq_areas[i]) airq_areas[i] = new_airq_info(i); info = airq_areas[i]; + mutex_unlock(&airq_areas_lock); if (!info) return 0; write_lock_irqsave(&info->lock, flags);
The access to airq_areas was racy ever since the adapter interrupts got introduced to virtio-ccw, but since commit 39c7dcb15892 ("virtio/s390: make airq summary indicators DMA") this became an issue in practice as well. Namely before that commit the airq_info that got overwritten was still functional. After that commit however the two infos share a summary_indicator, which aggravates the situation. Which means auto-online mechanism occasionally hangs the boot with virtio_blk. Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com> Fixes: 96b14536d935 ("virtio-ccw: virtio-ccw adapter interrupt support.") --- * We need definitely this fixed for 5.3. For older stable kernels it is to be discussed. @Connie what do you think: do we need a cc stable? * I have a variant that does not need the extra mutex but uses cmpxchg(). Decided to post this one because that one is more complex. But if there is interest we can have a look at it as well. --- drivers/s390/virtio/virtio_ccw.c | 4 ++++ 1 file changed, 4 insertions(+)