[RFC,0/6] job: replace AioContext lock with job_mutex

Message ID	20210707165813.55361-1-eesposit@redhat.com (mailing list archive)
Headers	show Return-Path: <SRS0=5YJq=L7=nongnu.org=qemu-devel-bounces+qemu-devel=archiver.kernel.org@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4495461C60 From: Emanuele Giuseppe Esposito <eesposit@redhat.com> To: qemu-block@nongnu.org Subject: [RFC PATCH 0/6] job: replace AioContext lock with job_mutex Date: Wed, 7 Jul 2021 18:58:07 +0200 Message-Id: <20210707165813.55361-1-eesposit@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="US-ASCII" Received-SPF: pass client-ip=170.10.133.124; envelope-from=eesposit@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-1.439, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action Precedence: list Cc: Kevin Wolf <kwolf@redhat.com>, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>, qemu-devel@nongnu.org, Wen Congyang <wencongyang2@huawei.com>, Xie Changlong <xiechanglong.d@gmail.com>, Markus Armbruster <armbru@redhat.com>, Max Reitz <mreitz@redhat.com>, Stefan Hajnoczi <stefanha@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, John Snow <jsnow@redhat.com> Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org>
Series	job: replace AioContext lock with job_mutex \| expand [RFC,0/6] job: replace AioContext lock with job_mutex [RFC,1/6] job: use getter/setters instead of accessing the Job fields directly [RFC,2/6] job: _locked functions and public job_lock/unlock for next patch [RFC,3/6] job: minor changes to simplify locking [RFC,4/6] job.h: categorize job fields [RFC,5/6] job: use global job_mutex to protect struct Job [RFC,6/6] jobs: remove unnecessary AioContext aquire/release pairs

Emanuele Giuseppe Esposito July 7, 2021, 4:58 p.m. UTC

This is a continuation on the work to reduce (and possibly get rid of) the usage of AioContext lock, by introducing smaller granularity locks to keep the thread safety.

This series aims to:
1) remove the aiocontext lock and substitute it with the already existing
global job_mutex
2) fix what it looks like to be an oversight when moving the blockjob.c logic
into the more generic job.c: job_mutex was introduced especially to
protect job->busy flag, but it seems that it was not used in successive
patches, because there are multiple code sections that directly
access the field without any locking.
3) use job_mutex instead of the aiocontext_lock
4) extend the reach of the job_mutex to protect all shared fields
that the job structure has.

The reason why we propose to use the existing job_mutex and not make one for
each job is to keep things as simple as possible for now, and because
the jobs are not in the execution critical path, so we can affort
some delays.
Having a lock per job would increase overall complexity and
increase the chances of deadlocks (one good example could be the job
transactions, where multiple jobs are grouped together).
Anyways, the per-job mutex can always be added in the future.

Patch 1-4 are in preparation for patch 5. They try to simplify and clarify
the job_mutex usage. Patch 5 tries to add proper syncronization to the job
structure, replacing the AioContext lock when necessary.
Patch 6 just removes unnecessary AioContext locks that are now unneeded.

RFC: I am not sure the way I layed out the locks is ideal.
But their usage should not make deadlocks. I also made sure
the series passess all qemu_iotests.

What is very clear from this patch is that it
is strictly related to the brdv_* and lower level calls, because
they also internally check or even use the aiocontext lock.
Therefore, in order to make it work, I temporarly added some
aiocontext_acquire/release pair around the function that
still assert for them or assume they are hold and temporarly
unlock (unlock() - lock()).

I also apologize for the amount of changes in patch 5, any suggestion on
how to improve the patch layout is also very much appreciated.

Emanuele Giuseppe Esposito (6):
job: use getter/setters instead of accessing the Job fields directly
job: _locked functions and public job_lock/unlock for next patch
job: minor changes to simplify locking
job.h: categorize job fields
job: use global job_mutex to protect struct Job
jobs: remove unnecessary AioContext aquire/release pairs

Stefan Hajnoczi July 8, 2021, 10:36 a.m. UTC | #1

On Wed, Jul 07, 2021 at 06:58:07PM +0200, Emanuele Giuseppe Esposito wrote:
> This is a continuation on the work to reduce (and possibly get rid of) the usage of AioContext lock, by introducing smaller granularity locks to keep the thread safety.
> 
> This series aims to:
> 1) remove the aiocontext lock and substitute it with the already existing
>    global job_mutex
> 2) fix what it looks like to be an oversight when moving the blockjob.c logic
>    into the more generic job.c: job_mutex was introduced especially to
>    protect job->busy flag, but it seems that it was not used in successive
>    patches, because there are multiple code sections that directly
>    access the field without any locking.
> 3) use job_mutex instead of the aiocontext_lock
> 4) extend the reach of the job_mutex to protect all shared fields
>    that the job structure has.
> 
> The reason why we propose to use the existing job_mutex and not make one for
> each job is to keep things as simple as possible for now, and because
> the jobs are not in the execution critical path, so we can affort
> some delays.
> Having a lock per job would increase overall complexity and
> increase the chances of deadlocks (one good example could be the job
> transactions, where multiple jobs are grouped together).
> Anyways, the per-job mutex can always be added in the future.
> 
> Patch 1-4 are in preparation for patch 5. They try to simplify and clarify
> the job_mutex usage. Patch 5 tries to add proper syncronization to the job
> structure, replacing the AioContext lock when necessary.
> Patch 6 just removes unnecessary AioContext locks that are now unneeded.
> 
> 
> RFC: I am not sure the way I layed out the locks is ideal.
> But their usage should not make deadlocks. I also made sure
> the series passess all qemu_iotests.
> 
> What is very clear from this patch is that it
> is strictly related to the brdv_* and lower level calls, because
> they also internally check or even use the aiocontext lock.
> Therefore, in order to make it work, I temporarly added some
> aiocontext_acquire/release pair around the function that
> still assert for them or assume they are hold and temporarly
> unlock (unlock() - lock()).

Sounds like the issue is that this patch series assumes AioContext locks
are no longer required for calling the blk_*()/bdrv_*() APIs? That is
not the case yet, so you had to then add those aio_context_lock() calls
back in elsewhere. This approach introduces unnecessary risk. I think we
should wait until blk_*()/bdrv_*() no longer requires the caller to hold
the AioContext lock before applying this series.

Stefan

Paolo Bonzini July 8, 2021, 11:32 a.m. UTC | #2

On 08/07/21 12:36, Stefan Hajnoczi wrote:
>> What is very clear from this patch is that it
>> is strictly related to the brdv_* and lower level calls, because
>> they also internally check or even use the aiocontext lock.
>> Therefore, in order to make it work, I temporarly added some
>> aiocontext_acquire/release pair around the function that
>> still assert for them or assume they are hold and temporarly
>> unlock (unlock() - lock()).
>
> Sounds like the issue is that this patch series assumes AioContext locks
> are no longer required for calling the blk_*()/bdrv_*() APIs? That is
> not the case yet, so you had to then add those aio_context_lock() calls
> back in elsewhere. This approach introduces unnecessary risk. I think we
> should wait until blk_*()/bdrv_*() no longer requires the caller to hold
> the AioContext lock before applying this series.

In general I'm in favor of pushing the lock further down into smaller 
and smaller critical sections; it's a good approach to make further 
audits easier until it's "obvious" that the lock is unnecessary.  I 
haven't yet reviewed Emanuele's patches to see if this is what he's 
doing where he's adding the acquire/release calls, but that's my 
understanding of both his cover letter and your reply.

The I/O blk_*()/bdrv_*() *should* not require the caller to hold the 
AioContext lock; all drivers use their own CoMutex or QemuMutex when 
needed, and generic code should also be ready (caveat emptor).  Others, 
such as reopen, are a mess that requires a separate audit.  Restricting 
acquire/release to be only around those seems like a good starting point.

Paolo

Kevin Wolf July 8, 2021, 12:14 p.m. UTC | #3

Am 08.07.2021 um 13:32 hat Paolo Bonzini geschrieben:
> On 08/07/21 12:36, Stefan Hajnoczi wrote:
> > > What is very clear from this patch is that it
> > > is strictly related to the brdv_* and lower level calls, because
> > > they also internally check or even use the aiocontext lock.
> > > Therefore, in order to make it work, I temporarly added some
> > > aiocontext_acquire/release pair around the function that
> > > still assert for them or assume they are hold and temporarly
> > > unlock (unlock() - lock()).
> > 
> > Sounds like the issue is that this patch series assumes AioContext locks
> > are no longer required for calling the blk_*()/bdrv_*() APIs? That is
> > not the case yet, so you had to then add those aio_context_lock() calls
> > back in elsewhere. This approach introduces unnecessary risk. I think we
> > should wait until blk_*()/bdrv_*() no longer requires the caller to hold
> > the AioContext lock before applying this series.
> 
> In general I'm in favor of pushing the lock further down into smaller and
> smaller critical sections; it's a good approach to make further audits
> easier until it's "obvious" that the lock is unnecessary.  I haven't yet
> reviewed Emanuele's patches to see if this is what he's doing where he's
> adding the acquire/release calls, but that's my understanding of both his
> cover letter and your reply.
> 
> The I/O blk_*()/bdrv_*() *should* not require the caller to hold the
> AioContext lock; all drivers use their own CoMutex or QemuMutex when needed,
> and generic code should also be ready (caveat emptor).  Others, such as
> reopen, are a mess that requires a separate audit.  Restricting
> acquire/release to be only around those seems like a good starting point.

Reopen isn't just a mess, but in fact buggy. After the following patch
goes in, the rule is simple: Don't hold any AioContext locks when
calling bdrv_reopen_multiple().

    'block: Acquire AioContexts during bdrv_reopen_multiple()'
    https://lists.gnu.org/archive/html/qemu-block/2021-07/msg00238.html

It still takes AioContext locks when it calls into other functions that
currently expect it, but that should be the same as usual then.

And once callers don't even hold the lock in the first place, we'll also
get rid of the ugly temporary lock release across reopen.

Kevin

Stefan Hajnoczi July 8, 2021, 1:04 p.m. UTC | #4

On Thu, Jul 08, 2021 at 01:32:12PM +0200, Paolo Bonzini wrote:
> On 08/07/21 12:36, Stefan Hajnoczi wrote:
> > > What is very clear from this patch is that it
> > > is strictly related to the brdv_* and lower level calls, because
> > > they also internally check or even use the aiocontext lock.
> > > Therefore, in order to make it work, I temporarly added some
> > > aiocontext_acquire/release pair around the function that
> > > still assert for them or assume they are hold and temporarly
> > > unlock (unlock() - lock()).
> > 
> > Sounds like the issue is that this patch series assumes AioContext locks
> > are no longer required for calling the blk_*()/bdrv_*() APIs? That is
> > not the case yet, so you had to then add those aio_context_lock() calls
> > back in elsewhere. This approach introduces unnecessary risk. I think we
> > should wait until blk_*()/bdrv_*() no longer requires the caller to hold
> > the AioContext lock before applying this series.
> 
> In general I'm in favor of pushing the lock further down into smaller and
> smaller critical sections; it's a good approach to make further audits
> easier until it's "obvious" that the lock is unnecessary.  I haven't yet
> reviewed Emanuele's patches to see if this is what he's doing where he's
> adding the acquire/release calls, but that's my understanding of both his
> cover letter and your reply.

The problem is the unnecessary risk. We know what the goal is for
blk_*()/bdrv_*() but it's not quite there yet. Does making changes in
block jobs help solve the final issues with blk_*()/bdrv_*()?

If yes, then it's a risk worth taking. If no, then spending time
developing interim code, reviewing those patches, and risking breakage
doesn't seem worth it. I'd rather wait for blk_*()/bdrv_*() to be fully
complete and then see patches that delete aio_context_acquire() in most
places or add locks in the remaining places where the caller was relying
on the AioContext lock.

Stefan

Stefan Hajnoczi July 8, 2021, 1:09 p.m. UTC | #5

On Wed, Jul 07, 2021 at 06:58:07PM +0200, Emanuele Giuseppe Esposito wrote:
> This is a continuation on the work to reduce (and possibly get rid of) the usage of AioContext lock, by introducing smaller granularity locks to keep the thread safety.
> 
> This series aims to:
> 1) remove the aiocontext lock and substitute it with the already existing
>    global job_mutex
> 2) fix what it looks like to be an oversight when moving the blockjob.c logic
>    into the more generic job.c: job_mutex was introduced especially to
>    protect job->busy flag, but it seems that it was not used in successive
>    patches, because there are multiple code sections that directly
>    access the field without any locking.
> 3) use job_mutex instead of the aiocontext_lock
> 4) extend the reach of the job_mutex to protect all shared fields
>    that the job structure has.

Can you explain the big picture:

1. What are the rules for JobDrivers? Imagine you are implementing a new
   JobDriver. What do you need to know in order to write correct code?

2. What are the rules for monitor? The main pattern is looking up a job,
   invoking a job API on it, and then calling job_unlock().

Stefan

Emanuele Giuseppe Esposito July 12, 2021, 8:41 a.m. UTC | #6

On 08/07/2021 15:04, Stefan Hajnoczi wrote:
> On Thu, Jul 08, 2021 at 01:32:12PM +0200, Paolo Bonzini wrote:
>> On 08/07/21 12:36, Stefan Hajnoczi wrote:
>>>> What is very clear from this patch is that it
>>>> is strictly related to the brdv_* and lower level calls, because
>>>> they also internally check or even use the aiocontext lock.
>>>> Therefore, in order to make it work, I temporarly added some
>>>> aiocontext_acquire/release pair around the function that
>>>> still assert for them or assume they are hold and temporarly
>>>> unlock (unlock() - lock()).
>>>
>>> Sounds like the issue is that this patch series assumes AioContext locks
>>> are no longer required for calling the blk_*()/bdrv_*() APIs? That is
>>> not the case yet, so you had to then add those aio_context_lock() calls
>>> back in elsewhere. This approach introduces unnecessary risk. I think we
>>> should wait until blk_*()/bdrv_*() no longer requires the caller to hold
>>> the AioContext lock before applying this series.
>>
>> In general I'm in favor of pushing the lock further down into smaller and
>> smaller critical sections; it's a good approach to make further audits
>> easier until it's "obvious" that the lock is unnecessary.  I haven't yet
>> reviewed Emanuele's patches to see if this is what he's doing where he's
>> adding the acquire/release calls, but that's my understanding of both his
>> cover letter and your reply.
> 
> The problem is the unnecessary risk. We know what the goal is for
> blk_*()/bdrv_*() but it's not quite there yet. Does making changes in
> block jobs help solve the final issues with blk_*()/bdrv_*()?

Correct me if I am wrong, but it seems to me that the bdrv_*()/blk_*() 
operation mostly take care of building, modifying and walking the bds 
graph. So since graph nodes can have multiple AioContext, it makes sense 
that we have a lock when modifying the graph, right?

If so, we can simply try to replace the AioContext lock with a graph 
lock, or something like that. But I am not sure of this.

Emanuele
> 
> If yes, then it's a risk worth taking. If no, then spending time
> developing interim code, reviewing those patches, and risking breakage
> doesn't seem worth it. I'd rather wait for blk_*()/bdrv_*() to be fully
> complete and then see patches that delete aio_context_acquire() in most
> places or add locks in the remaining places where the caller was relying
> on the AioContext lock.
> 
> Stefan
>

Emanuele Giuseppe Esposito July 12, 2021, 8:42 a.m. UTC | #7

On 08/07/2021 15:09, Stefan Hajnoczi wrote:
> On Wed, Jul 07, 2021 at 06:58:07PM +0200, Emanuele Giuseppe Esposito wrote:
>> This is a continuation on the work to reduce (and possibly get rid of) the usage of AioContext lock, by introducing smaller granularity locks to keep the thread safety.
>>
>> This series aims to:
>> 1) remove the aiocontext lock and substitute it with the already existing
>>     global job_mutex
>> 2) fix what it looks like to be an oversight when moving the blockjob.c logic
>>     into the more generic job.c: job_mutex was introduced especially to
>>     protect job->busy flag, but it seems that it was not used in successive
>>     patches, because there are multiple code sections that directly
>>     access the field without any locking.
>> 3) use job_mutex instead of the aiocontext_lock
>> 4) extend the reach of the job_mutex to protect all shared fields
>>     that the job structure has.
> 
> Can you explain the big picture:
> 
> 1. What are the rules for JobDrivers? Imagine you are implementing a new
>     JobDriver. What do you need to know in order to write correct code?

I think that in general, the rules for JobDrivers remain the same. The 
job_mutex lock should be invisible (or almost) from the point of view of 
a JobDriver, because the job API available for it should take care of 
the necessary locking/unlocking.

> 
> 2. What are the rules for monitor? The main pattern is looking up a job,
>     invoking a job API on it, and then calling job_unlock().

The monitor instead is aware of this lock: the reason for that is 
exactly what you have described here.
Looking up + invoking a job API operation (for example calling 
find_job() and then job_pause() ) must be performed with the same lock 
hold all the time, otherwise other threads could modify the job while 
the monitor runs its command.

Please let me know if something is not clear and/or if you have 
additional comments on this!

Emanuele

> 
> Stefan
>

Stefan Hajnoczi July 13, 2021, 1:10 p.m. UTC | #8

On Mon, Jul 12, 2021 at 10:41:46AM +0200, Emanuele Giuseppe Esposito wrote:
> 
> 
> On 08/07/2021 15:04, Stefan Hajnoczi wrote:
> > On Thu, Jul 08, 2021 at 01:32:12PM +0200, Paolo Bonzini wrote:
> > > On 08/07/21 12:36, Stefan Hajnoczi wrote:
> > > > > What is very clear from this patch is that it
> > > > > is strictly related to the brdv_* and lower level calls, because
> > > > > they also internally check or even use the aiocontext lock.
> > > > > Therefore, in order to make it work, I temporarly added some
> > > > > aiocontext_acquire/release pair around the function that
> > > > > still assert for them or assume they are hold and temporarly
> > > > > unlock (unlock() - lock()).
> > > > 
> > > > Sounds like the issue is that this patch series assumes AioContext locks
> > > > are no longer required for calling the blk_*()/bdrv_*() APIs? That is
> > > > not the case yet, so you had to then add those aio_context_lock() calls
> > > > back in elsewhere. This approach introduces unnecessary risk. I think we
> > > > should wait until blk_*()/bdrv_*() no longer requires the caller to hold
> > > > the AioContext lock before applying this series.
> > > 
> > > In general I'm in favor of pushing the lock further down into smaller and
> > > smaller critical sections; it's a good approach to make further audits
> > > easier until it's "obvious" that the lock is unnecessary.  I haven't yet
> > > reviewed Emanuele's patches to see if this is what he's doing where he's
> > > adding the acquire/release calls, but that's my understanding of both his
> > > cover letter and your reply.
> > 
> > The problem is the unnecessary risk. We know what the goal is for
> > blk_*()/bdrv_*() but it's not quite there yet. Does making changes in
> > block jobs help solve the final issues with blk_*()/bdrv_*()?
> 
> Correct me if I am wrong, but it seems to me that the bdrv_*()/blk_*()
> operation mostly take care of building, modifying and walking the bds graph.
> So since graph nodes can have multiple AioContext, it makes sense that we
> have a lock when modifying the graph, right?
> 
> If so, we can simply try to replace the AioContext lock with a graph lock,
> or something like that. But I am not sure of this.

Block graph manipulation (all_bdrv_states and friends) requires the BQL.
It has always been this way.

This raises the question: if block graph manipulation is already under
the BQL and BlockDriver callbacks don't need the AioContext anymore, why
are aio_context_acquire() calls still needed in block jobs?

AIO_WAIT_WHILE() requires that AioContext is acquired according to its
documentation, but I'm not sure that's true anymore. Thread-safe/atomic
primitives are used by AIO_WAIT_WHILE(), so as long as the condition
being waited for is thread-safe too it should work without the
AioContext lock.

Back to my comment about unnecessary risk, pushing the lock down is a
strategy for exploring the problem, but I'm not sure those intermediate
commits need to be committed to qemu.git/master because of the time
required to review them and the risk of introducing (temporary) bugs.
Maybe there's a benefit to this patch series that I've missed?

Stefan

Stefan Hajnoczi July 13, 2021, 1:27 p.m. UTC | #9

On Mon, Jul 12, 2021 at 10:42:47AM +0200, Emanuele Giuseppe Esposito wrote:
> On 08/07/2021 15:09, Stefan Hajnoczi wrote:
> > On Wed, Jul 07, 2021 at 06:58:07PM +0200, Emanuele Giuseppe Esposito wrote:
> > > This is a continuation on the work to reduce (and possibly get rid of) the usage of AioContext lock, by introducing smaller granularity locks to keep the thread safety.
> > > 
> > > This series aims to:
> > > 1) remove the aiocontext lock and substitute it with the already existing
> > >     global job_mutex
> > > 2) fix what it looks like to be an oversight when moving the blockjob.c logic
> > >     into the more generic job.c: job_mutex was introduced especially to
> > >     protect job->busy flag, but it seems that it was not used in successive
> > >     patches, because there are multiple code sections that directly
> > >     access the field without any locking.
> > > 3) use job_mutex instead of the aiocontext_lock
> > > 4) extend the reach of the job_mutex to protect all shared fields
> > >     that the job structure has.
> > 
> > Can you explain the big picture:
> > 
> > 1. What are the rules for JobDrivers? Imagine you are implementing a new
> >     JobDriver. What do you need to know in order to write correct code?
> 
> I think that in general, the rules for JobDrivers remain the same. The
> job_mutex lock should be invisible (or almost) from the point of view of a
> JobDriver, because the job API available for it should take care of the
> necessary locking/unlocking.
> 
> > 
> > 2. What are the rules for monitor? The main pattern is looking up a job,
> >     invoking a job API on it, and then calling job_unlock().
> 
> The monitor instead is aware of this lock: the reason for that is exactly
> what you have described here.
> Looking up + invoking a job API operation (for example calling find_job()
> and then job_pause() ) must be performed with the same lock hold all the
> time, otherwise other threads could modify the job while the monitor runs
> its command.

That helps, thanks!

Stefan

Vladimir Sementsov-Ogievskiy July 13, 2021, 3:18 p.m. UTC | #10

13.07.2021 16:10, Stefan Hajnoczi wrote:
> On Mon, Jul 12, 2021 at 10:41:46AM +0200, Emanuele Giuseppe Esposito wrote:
>>
>>
>> On 08/07/2021 15:04, Stefan Hajnoczi wrote:
>>> On Thu, Jul 08, 2021 at 01:32:12PM +0200, Paolo Bonzini wrote:
>>>> On 08/07/21 12:36, Stefan Hajnoczi wrote:
>>>>>> What is very clear from this patch is that it
>>>>>> is strictly related to the brdv_* and lower level calls, because
>>>>>> they also internally check or even use the aiocontext lock.
>>>>>> Therefore, in order to make it work, I temporarly added some
>>>>>> aiocontext_acquire/release pair around the function that
>>>>>> still assert for them or assume they are hold and temporarly
>>>>>> unlock (unlock() - lock()).
>>>>>
>>>>> Sounds like the issue is that this patch series assumes AioContext locks
>>>>> are no longer required for calling the blk_*()/bdrv_*() APIs? That is
>>>>> not the case yet, so you had to then add those aio_context_lock() calls
>>>>> back in elsewhere. This approach introduces unnecessary risk. I think we
>>>>> should wait until blk_*()/bdrv_*() no longer requires the caller to hold
>>>>> the AioContext lock before applying this series.
>>>>
>>>> In general I'm in favor of pushing the lock further down into smaller and
>>>> smaller critical sections; it's a good approach to make further audits
>>>> easier until it's "obvious" that the lock is unnecessary.  I haven't yet
>>>> reviewed Emanuele's patches to see if this is what he's doing where he's
>>>> adding the acquire/release calls, but that's my understanding of both his
>>>> cover letter and your reply.
>>>
>>> The problem is the unnecessary risk. We know what the goal is for
>>> blk_*()/bdrv_*() but it's not quite there yet. Does making changes in
>>> block jobs help solve the final issues with blk_*()/bdrv_*()?
>>
>> Correct me if I am wrong, but it seems to me that the bdrv_*()/blk_*()
>> operation mostly take care of building, modifying and walking the bds graph.
>> So since graph nodes can have multiple AioContext, it makes sense that we
>> have a lock when modifying the graph, right?
>>
>> If so, we can simply try to replace the AioContext lock with a graph lock,
>> or something like that. But I am not sure of this.
> 
> Block graph manipulation (all_bdrv_states and friends) requires the BQL.
> It has always been this way.
> 
> This raises the question: if block graph manipulation is already under
> the BQL and BlockDriver callbacks don't need the AioContext anymore, why

I don't believe that block drivers are thread-safe now. They have some mutexes.. But who make an audit of thread-safety? I work mostly with nbd and qcow2 drivers, and they never seemd to be thread-safe to me. For example, qcow2 driver has enough operations that are done from non-coroutine context and therefore qcow2 co-mutex just not locked.

> are aio_context_acquire() calls still needed in block jobs?
> 
> AIO_WAIT_WHILE() requires that AioContext is acquired according to its
> documentation, but I'm not sure that's true anymore. Thread-safe/atomic
> primitives are used by AIO_WAIT_WHILE(), so as long as the condition
> being waited for is thread-safe too it should work without the
> AioContext lock.
> 
> Back to my comment about unnecessary risk, pushing the lock down is a
> strategy for exploring the problem, but I'm not sure those intermediate
> commits need to be committed to qemu.git/master because of the time
> required to review them and the risk of introducing (temporary) bugs.

I agree. Add my bit of criticism:

What I dislike about the whole thread-safe update you do:

1. There is no proof of concept - some good example of multiqueue, or something that uses mutliple threads and shows good performance. Something that works, and shows where are we going to and why it is good. That may be draft patches with a lot of "FIXME" and "TODO", but working. For now I feel that I've spent my time to reviewing and proving to myself thread-safety of two previous thread-safe series, but I don't have a hope to see a benefit of it in the near future..

1.1 If we have a proof of concept, that also gives a kind of plan: a list of subsystems (patch series) to do step by step and finally come to what we want. Do you have a kind of plan (of the whole feature) now?

2. There are no tests: something that doesn't work before the series and start to work after. Why it is important:

All these thread-safe primitives are complicated enough. They hard to review and prove correctness. And very simple to break by new code. Tests that runs by CI proves that we don't break subsystems that are already thread-safe. For example, you've recently updated block-copy and several related things. But we have no tests. So, assume, a year later you finish the work of updating all other subsystems to be thread-safe. You'll have no guarantee that block-copy is still thread-safe, and you'll have to start from the beginning.

3. As I said, I really doubt that block drivers are already thread safe. An audit and/or tests are needed at least.

Stefan Hajnoczi July 13, 2021, 4:38 p.m. UTC | #11

On Tue, Jul 13, 2021 at 06:18:39PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> 13.07.2021 16:10, Stefan Hajnoczi wrote:
> > On Mon, Jul 12, 2021 at 10:41:46AM +0200, Emanuele Giuseppe Esposito wrote:
> > > 
> > > 
> > > On 08/07/2021 15:04, Stefan Hajnoczi wrote:
> > > > On Thu, Jul 08, 2021 at 01:32:12PM +0200, Paolo Bonzini wrote:
> > > > > On 08/07/21 12:36, Stefan Hajnoczi wrote:
> > > > > > > What is very clear from this patch is that it
> > > > > > > is strictly related to the brdv_* and lower level calls, because
> > > > > > > they also internally check or even use the aiocontext lock.
> > > > > > > Therefore, in order to make it work, I temporarly added some
> > > > > > > aiocontext_acquire/release pair around the function that
> > > > > > > still assert for them or assume they are hold and temporarly
> > > > > > > unlock (unlock() - lock()).
> > > > > > 
> > > > > > Sounds like the issue is that this patch series assumes AioContext locks
> > > > > > are no longer required for calling the blk_*()/bdrv_*() APIs? That is
> > > > > > not the case yet, so you had to then add those aio_context_lock() calls
> > > > > > back in elsewhere. This approach introduces unnecessary risk. I think we
> > > > > > should wait until blk_*()/bdrv_*() no longer requires the caller to hold
> > > > > > the AioContext lock before applying this series.
> > > > > 
> > > > > In general I'm in favor of pushing the lock further down into smaller and
> > > > > smaller critical sections; it's a good approach to make further audits
> > > > > easier until it's "obvious" that the lock is unnecessary.  I haven't yet
> > > > > reviewed Emanuele's patches to see if this is what he's doing where he's
> > > > > adding the acquire/release calls, but that's my understanding of both his
> > > > > cover letter and your reply.
> > > > 
> > > > The problem is the unnecessary risk. We know what the goal is for
> > > > blk_*()/bdrv_*() but it's not quite there yet. Does making changes in
> > > > block jobs help solve the final issues with blk_*()/bdrv_*()?
> > > 
> > > Correct me if I am wrong, but it seems to me that the bdrv_*()/blk_*()
> > > operation mostly take care of building, modifying and walking the bds graph.
> > > So since graph nodes can have multiple AioContext, it makes sense that we
> > > have a lock when modifying the graph, right?
> > > 
> > > If so, we can simply try to replace the AioContext lock with a graph lock,
> > > or something like that. But I am not sure of this.
> > 
> > Block graph manipulation (all_bdrv_states and friends) requires the BQL.
> > It has always been this way.
> > 
> > This raises the question: if block graph manipulation is already under
> > the BQL and BlockDriver callbacks don't need the AioContext anymore, why
> 
> I don't believe that block drivers are thread-safe now. They have some mutexes.. But who make an audit of thread-safety?

Emanuele :)

FWIW I took a look at the stream, mirror, and backup jobs today and
couldn't find anything that's unsafe after this series. I was expecting
to find issues but I think Emanuele and Paolo have taken care of them.

> > are aio_context_acquire() calls still needed in block jobs?
> > 
> > AIO_WAIT_WHILE() requires that AioContext is acquired according to its
> > documentation, but I'm not sure that's true anymore. Thread-safe/atomic
> > primitives are used by AIO_WAIT_WHILE(), so as long as the condition
> > being waited for is thread-safe too it should work without the
> > AioContext lock.
> > 
> > Back to my comment about unnecessary risk, pushing the lock down is a
> > strategy for exploring the problem, but I'm not sure those intermediate
> > commits need to be committed to qemu.git/master because of the time
> > required to review them and the risk of introducing (temporary) bugs.
> 
> I agree. Add my bit of criticism:
> 
> What I dislike about the whole thread-safe update you do:
> 
> 1. There is no proof of concept - some good example of multiqueue, or something that uses mutliple threads and shows good performance. Something that works, and shows where are we going to and why it is good. That may be draft patches with a lot of "FIXME" and "TODO", but working. For now I feel that I've spent my time to reviewing and proving to myself thread-safety of two previous thread-safe series, but I don't have a hope to see a benefit of it in the near future..

The multi-queue block layer should improve performance in cases where
the bottleneck is a single IOThread. It will allow users to assign more
than one IOThread.

But I think the bigger impact of this work will be addressing
long-standing problems with the block layer's programming model. We
continue to have IOThread bugs because there are many undocumented
assumptions. With the multi-queue block layer the code switches to a
more well-understood multi-threaded programming model and hopefully
fewer issues will arise because there is no problematic AioContext lock
to worry about.

Stefan

Vladimir Sementsov-Ogievskiy July 15, 2021, 12:35 p.m. UTC | #12

13.07.2021 19:38, Stefan Hajnoczi wrote:
> On Tue, Jul 13, 2021 at 06:18:39PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> 13.07.2021 16:10, Stefan Hajnoczi wrote:
>>> On Mon, Jul 12, 2021 at 10:41:46AM +0200, Emanuele Giuseppe Esposito wrote:
>>>>
>>>>
>>>> On 08/07/2021 15:04, Stefan Hajnoczi wrote:
>>>>> On Thu, Jul 08, 2021 at 01:32:12PM +0200, Paolo Bonzini wrote:
>>>>>> On 08/07/21 12:36, Stefan Hajnoczi wrote:
>>>>>>>> What is very clear from this patch is that it
>>>>>>>> is strictly related to the brdv_* and lower level calls, because
>>>>>>>> they also internally check or even use the aiocontext lock.
>>>>>>>> Therefore, in order to make it work, I temporarly added some
>>>>>>>> aiocontext_acquire/release pair around the function that
>>>>>>>> still assert for them or assume they are hold and temporarly
>>>>>>>> unlock (unlock() - lock()).
>>>>>>>
>>>>>>> Sounds like the issue is that this patch series assumes AioContext locks
>>>>>>> are no longer required for calling the blk_*()/bdrv_*() APIs? That is
>>>>>>> not the case yet, so you had to then add those aio_context_lock() calls
>>>>>>> back in elsewhere. This approach introduces unnecessary risk. I think we
>>>>>>> should wait until blk_*()/bdrv_*() no longer requires the caller to hold
>>>>>>> the AioContext lock before applying this series.
>>>>>>
>>>>>> In general I'm in favor of pushing the lock further down into smaller and
>>>>>> smaller critical sections; it's a good approach to make further audits
>>>>>> easier until it's "obvious" that the lock is unnecessary.  I haven't yet
>>>>>> reviewed Emanuele's patches to see if this is what he's doing where he's
>>>>>> adding the acquire/release calls, but that's my understanding of both his
>>>>>> cover letter and your reply.
>>>>>
>>>>> The problem is the unnecessary risk. We know what the goal is for
>>>>> blk_*()/bdrv_*() but it's not quite there yet. Does making changes in
>>>>> block jobs help solve the final issues with blk_*()/bdrv_*()?
>>>>
>>>> Correct me if I am wrong, but it seems to me that the bdrv_*()/blk_*()
>>>> operation mostly take care of building, modifying and walking the bds graph.
>>>> So since graph nodes can have multiple AioContext, it makes sense that we
>>>> have a lock when modifying the graph, right?
>>>>
>>>> If so, we can simply try to replace the AioContext lock with a graph lock,
>>>> or something like that. But I am not sure of this.
>>>
>>> Block graph manipulation (all_bdrv_states and friends) requires the BQL.
>>> It has always been this way.
>>>
>>> This raises the question: if block graph manipulation is already under
>>> the BQL and BlockDriver callbacks don't need the AioContext anymore, why
>>
>> I don't believe that block drivers are thread-safe now. They have some mutexes.. But who make an audit of thread-safety?
> 
> Emanuele :)
> 
> FWIW I took a look at the stream, mirror, and backup jobs today and
> couldn't find anything that's unsafe after this series. I was expecting
> to find issues but I think Emanuele and Paolo have taken care of them.


Hmm, do you mean that all jobs are thread-safe?

Looking at the mirror, what protects s->ops_in_flight for example? It's accessed from job coroutines and from mirror_top filter write operation.

> 
>>> are aio_context_acquire() calls still needed in block jobs?
>>>
>>> AIO_WAIT_WHILE() requires that AioContext is acquired according to its
>>> documentation, but I'm not sure that's true anymore. Thread-safe/atomic
>>> primitives are used by AIO_WAIT_WHILE(), so as long as the condition
>>> being waited for is thread-safe too it should work without the
>>> AioContext lock.
>>>
>>> Back to my comment about unnecessary risk, pushing the lock down is a
>>> strategy for exploring the problem, but I'm not sure those intermediate
>>> commits need to be committed to qemu.git/master because of the time
>>> required to review them and the risk of introducing (temporary) bugs.
>>
>> I agree. Add my bit of criticism:
>>
>> What I dislike about the whole thread-safe update you do:
>>
>> 1. There is no proof of concept - some good example of multiqueue, or something that uses mutliple threads and shows good performance. Something that works, and shows where are we going to and why it is good. That may be draft patches with a lot of "FIXME" and "TODO", but working. For now I feel that I've spent my time to reviewing and proving to myself thread-safety of two previous thread-safe series, but I don't have a hope to see a benefit of it in the near future..
> 
> The multi-queue block layer should improve performance in cases where
> the bottleneck is a single IOThread. It will allow users to assign more
> than one IOThread.
> 
> But I think the bigger impact of this work will be addressing
> long-standing problems with the block layer's programming model. We
> continue to have IOThread bugs because there are many undocumented
> assumptions. With the multi-queue block layer the code switches to a
> more well-understood multi-threaded programming model and hopefully
> fewer issues will arise because there is no problematic AioContext lock
> to worry about.
> 
> Stefan
>

Stefan Hajnoczi July 15, 2021, 1:29 p.m. UTC | #13

On Thu, Jul 15, 2021 at 03:35:37PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> 13.07.2021 19:38, Stefan Hajnoczi wrote:
> > On Tue, Jul 13, 2021 at 06:18:39PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > 13.07.2021 16:10, Stefan Hajnoczi wrote:
> > > > On Mon, Jul 12, 2021 at 10:41:46AM +0200, Emanuele Giuseppe Esposito wrote:
> > > > > 
> > > > > 
> > > > > On 08/07/2021 15:04, Stefan Hajnoczi wrote:
> > > > > > On Thu, Jul 08, 2021 at 01:32:12PM +0200, Paolo Bonzini wrote:
> > > > > > > On 08/07/21 12:36, Stefan Hajnoczi wrote:
> > > > > > > > > What is very clear from this patch is that it
> > > > > > > > > is strictly related to the brdv_* and lower level calls, because
> > > > > > > > > they also internally check or even use the aiocontext lock.
> > > > > > > > > Therefore, in order to make it work, I temporarly added some
> > > > > > > > > aiocontext_acquire/release pair around the function that
> > > > > > > > > still assert for them or assume they are hold and temporarly
> > > > > > > > > unlock (unlock() - lock()).
> > > > > > > > 
> > > > > > > > Sounds like the issue is that this patch series assumes AioContext locks
> > > > > > > > are no longer required for calling the blk_*()/bdrv_*() APIs? That is
> > > > > > > > not the case yet, so you had to then add those aio_context_lock() calls
> > > > > > > > back in elsewhere. This approach introduces unnecessary risk. I think we
> > > > > > > > should wait until blk_*()/bdrv_*() no longer requires the caller to hold
> > > > > > > > the AioContext lock before applying this series.
> > > > > > > 
> > > > > > > In general I'm in favor of pushing the lock further down into smaller and
> > > > > > > smaller critical sections; it's a good approach to make further audits
> > > > > > > easier until it's "obvious" that the lock is unnecessary.  I haven't yet
> > > > > > > reviewed Emanuele's patches to see if this is what he's doing where he's
> > > > > > > adding the acquire/release calls, but that's my understanding of both his
> > > > > > > cover letter and your reply.
> > > > > > 
> > > > > > The problem is the unnecessary risk. We know what the goal is for
> > > > > > blk_*()/bdrv_*() but it's not quite there yet. Does making changes in
> > > > > > block jobs help solve the final issues with blk_*()/bdrv_*()?
> > > > > 
> > > > > Correct me if I am wrong, but it seems to me that the bdrv_*()/blk_*()
> > > > > operation mostly take care of building, modifying and walking the bds graph.
> > > > > So since graph nodes can have multiple AioContext, it makes sense that we
> > > > > have a lock when modifying the graph, right?
> > > > > 
> > > > > If so, we can simply try to replace the AioContext lock with a graph lock,
> > > > > or something like that. But I am not sure of this.
> > > > 
> > > > Block graph manipulation (all_bdrv_states and friends) requires the BQL.
> > > > It has always been this way.
> > > > 
> > > > This raises the question: if block graph manipulation is already under
> > > > the BQL and BlockDriver callbacks don't need the AioContext anymore, why
> > > 
> > > I don't believe that block drivers are thread-safe now. They have some mutexes.. But who make an audit of thread-safety?
> > 
> > Emanuele :)
> > 
> > FWIW I took a look at the stream, mirror, and backup jobs today and
> > couldn't find anything that's unsafe after this series. I was expecting
> > to find issues but I think Emanuele and Paolo have taken care of them.
> 
> 
> Hmm, do you mean that all jobs are thread-safe?
> 
> Looking at the mirror, what protects s->ops_in_flight for example? It's accessed from job coroutines and from mirror_top filter write operation.

You're right. I missed the bdrv_mirror_top BlockDriver:

.pwrite_zeroes -> bdrv_mirror_top_pwrite_zeroes -> active_write_prepare -> QTAILQ_INSERT_TAIL(&s->ops_in_flight, op, next)

This is not thread-safe. A CoMutex is needed here to protect
MirrorBSDOpaque fields.

Stefan

Kevin Wolf July 16, 2021, 3:23 p.m. UTC | #14

Am 13.07.2021 um 15:10 hat Stefan Hajnoczi geschrieben:
> AIO_WAIT_WHILE() requires that AioContext is acquired according to its
> documentation, but I'm not sure that's true anymore. Thread-safe/atomic
> primitives are used by AIO_WAIT_WHILE(), so as long as the condition
> being waited for is thread-safe too it should work without the
> AioContext lock.

Polling something in a different AioContext from the main thread still
temporarily drops the lock, which crashes if it isn't locked. I'm not
sure if the documentation claims that the lock is needed in more cases,
I guess you could interpret it either way.

Kevin

Stefan Hajnoczi July 19, 2021, 9:29 a.m. UTC | #15

On Fri, Jul 16, 2021 at 05:23:50PM +0200, Kevin Wolf wrote:
> Am 13.07.2021 um 15:10 hat Stefan Hajnoczi geschrieben:
> > AIO_WAIT_WHILE() requires that AioContext is acquired according to its
> > documentation, but I'm not sure that's true anymore. Thread-safe/atomic
> > primitives are used by AIO_WAIT_WHILE(), so as long as the condition
> > being waited for is thread-safe too it should work without the
> > AioContext lock.
> 
> Polling something in a different AioContext from the main thread still
> temporarily drops the lock, which crashes if it isn't locked. I'm not
> sure if the documentation claims that the lock is needed in more cases,
> I guess you could interpret it either way.

I'm claiming that the lock doesn't need to be dropped in that case
anymore - as long as the condition we're polling is thread-safe. :)

Have I missed something that still need locking?

We could temporarily introduce an AIO_WAIT_WHILE_UNLOCKED() macro so
that callers can be converted individually.

Stefan

Kevin Wolf July 19, 2021, 2:54 p.m. UTC | #16

Am 19.07.2021 um 11:29 hat Stefan Hajnoczi geschrieben:
> On Fri, Jul 16, 2021 at 05:23:50PM +0200, Kevin Wolf wrote:
> > Am 13.07.2021 um 15:10 hat Stefan Hajnoczi geschrieben:
> > > AIO_WAIT_WHILE() requires that AioContext is acquired according to its
> > > documentation, but I'm not sure that's true anymore. Thread-safe/atomic
> > > primitives are used by AIO_WAIT_WHILE(), so as long as the condition
> > > being waited for is thread-safe too it should work without the
> > > AioContext lock.
> > 
> > Polling something in a different AioContext from the main thread still
> > temporarily drops the lock, which crashes if it isn't locked. I'm not
> > sure if the documentation claims that the lock is needed in more cases,
> > I guess you could interpret it either way.
> 
> I'm claiming that the lock doesn't need to be dropped in that case
> anymore - as long as the condition we're polling is thread-safe. :)
> 
> Have I missed something that still need locking?

I'm not sure if AIO_WAIT_WHILE() actually ever needed the locking. I
think it's more a convenience thing since the callers would already hold
the lock, so dropping it temporarily in AIO_WAIT_WHILE() means that the
callers don't have to duplicate the temporary unlock everywhere.

> We could temporarily introduce an AIO_WAIT_WHILE_UNLOCKED() macro so
> that callers can be converted individually.

Yes, this makes sense to me.

Kevin

[RFC,0/6] job: replace AioContext lock with job_mutex

Message

Comments