diff mbox

linux-user: always start with parallel_cpus set to true

Message ID 1482946636-3743-1-git-send-email-laurent@vivier.eu (mailing list archive)
State New, archived
Headers show

Commit Message

Laurent Vivier Dec. 28, 2016, 5:37 p.m. UTC
We always need real atomics, as we can have shared memory between
processes.

A good test case is the example from futex(2), futex_demo.c:

the use case is

    mmap(...);
    fork();

    Parent and Child:

    while(...)
        __sync_bool_compare_and_swap(...)
        ...
        futex(...)

In this case we need real atomics in __sync_bool_compare_and_swap(),
but as parallel_cpus is set to 0, we don't have.

We also revert "b67cb68 linux-user: enable parallel code generation on clone"
as parallel_cpus in unconditionally set now.

Of course, this doesn't fix atomics that are emulated using
cpu_loop_exit_atomic() as we can't stop virtual CPUs from another processes.

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
---
 linux-user/syscall.c | 8 --------
 translate-all.c      | 4 ++++
 2 files changed, 4 insertions(+), 8 deletions(-)

Comments

Alex Bennée Jan. 4, 2017, 6:39 p.m. UTC | #1
Laurent Vivier <laurent@vivier.eu> writes:

> We always need real atomics, as we can have shared memory between
> processes.
>
> A good test case is the example from futex(2), futex_demo.c:
>
> the use case is
>
>     mmap(...);
>     fork();
>
>     Parent and Child:
>
>     while(...)
>         __sync_bool_compare_and_swap(...)
>         ...
>         futex(...)
>
> In this case we need real atomics in __sync_bool_compare_and_swap(),
> but as parallel_cpus is set to 0, we don't have.
>
> We also revert "b67cb68 linux-user: enable parallel code generation on clone"
> as parallel_cpus in unconditionally set now.
>
> Of course, this doesn't fix atomics that are emulated using
> cpu_loop_exit_atomic() as we can't stop virtual CPUs from another
> processes.

This seems a bit of a hit for something that might never get called.
Could we not move b67cb68 out of the thread fork leg and do it for any
fork? After all the tb_flush will ensure that all code by both parent
and child will be using full atomics at that point?

>
> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
> ---
>  linux-user/syscall.c | 8 --------
>  translate-all.c      | 4 ++++
>  2 files changed, 4 insertions(+), 8 deletions(-)
>
> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> index 7b77503..db697c0 100644
> --- a/linux-user/syscall.c
> +++ b/linux-user/syscall.c
> @@ -6164,14 +6164,6 @@ static int do_fork(CPUArchState *env, unsigned int flags, abi_ulong newsp,
>          sigfillset(&sigmask);
>          sigprocmask(SIG_BLOCK, &sigmask, &info.sigmask);
>
> -        /* If this is our first additional thread, we need to ensure we
> -         * generate code for parallel execution and flush old translations.
> -         */
> -        if (!parallel_cpus) {
> -            parallel_cpus = true;
> -            tb_flush(cpu);
> -        }
> -
>          ret = pthread_create(&info.thread, &attr, clone_func, &info);
>          /* TODO: Free new CPU state if thread creation failed.  */
>
> diff --git a/translate-all.c b/translate-all.c
> index 3dd9214..0b0bb09 100644
> --- a/translate-all.c
> +++ b/translate-all.c
> @@ -142,7 +142,11 @@ static void *l1_map[V_L1_MAX_SIZE];
>
>  /* code generation context */
>  TCGContext tcg_ctx;
> +#ifdef CONFIG_USER_ONLY
> +bool parallel_cpus = true;
> +#else
>  bool parallel_cpus;
> +#endif
>
>  /* translation block context */
>  #ifdef CONFIG_USER_ONLY


--
Alex Bennée
Richard Henderson Jan. 4, 2017, 7:35 p.m. UTC | #2
On 12/28/2016 09:37 AM, Laurent Vivier wrote:
> the use case is
>
>     mmap(...);
>     fork();

While true, we can notice that mmap contains MAP_SHARED, and trigger it then. 
Similarly for shmat.


r~
Laurent Vivier Jan. 4, 2017, 8:21 p.m. UTC | #3
Le 04/01/2017 à 19:39, Alex Bennée a écrit :
> 
> Laurent Vivier <laurent@vivier.eu> writes:
> 
>> We always need real atomics, as we can have shared memory between
>> processes.
>>
>> A good test case is the example from futex(2), futex_demo.c:
>>
>> the use case is
>>
>>     mmap(...);
>>     fork();
>>
>>     Parent and Child:
>>
>>     while(...)
>>         __sync_bool_compare_and_swap(...)
>>         ...
>>         futex(...)
>>
>> In this case we need real atomics in __sync_bool_compare_and_swap(),
>> but as parallel_cpus is set to 0, we don't have.
>>
>> We also revert "b67cb68 linux-user: enable parallel code generation on clone"
>> as parallel_cpus in unconditionally set now.
>>
>> Of course, this doesn't fix atomics that are emulated using
>> cpu_loop_exit_atomic() as we can't stop virtual CPUs from another
>> processes.
> 
> This seems a bit of a hit for something that might never get called.
> Could we not move b67cb68 out of the thread fork leg and do it for any
> fork? After all the tb_flush will ensure that all code by both parent
> and child will be using full atomics at that point?

Yes, but we can have also two processes that are neither parents nor
children trying to communicate through shared memory. We can't know...

Laurent
Laurent Vivier Jan. 4, 2017, 8:22 p.m. UTC | #4
Le 04/01/2017 à 20:35, Richard Henderson a écrit :
> On 12/28/2016 09:37 AM, Laurent Vivier wrote:
>> the use case is
>>
>>     mmap(...);
>>     fork();
> 
> While true, we can notice that mmap contains MAP_SHARED, and trigger it
> then. Similarly for shmat.

Yes, it's a good idea, but is it really expensive to always enable the
parallel_cpus flag?

Laurent
Richard Henderson Jan. 4, 2017, 8:36 p.m. UTC | #5
On 01/04/2017 12:22 PM, Laurent Vivier wrote:
> Yes, it's a good idea, but is it really expensive to always enable the
> parallel_cpus flag?

It is somewhat expensive when the host does support the atomics being used; it 
is very expensive if the host does not.  Of course the common case is x86_64 
host, which does, so perhaps we don't really care.


r~
diff mbox

Patch

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 7b77503..db697c0 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -6164,14 +6164,6 @@  static int do_fork(CPUArchState *env, unsigned int flags, abi_ulong newsp,
         sigfillset(&sigmask);
         sigprocmask(SIG_BLOCK, &sigmask, &info.sigmask);
 
-        /* If this is our first additional thread, we need to ensure we
-         * generate code for parallel execution and flush old translations.
-         */
-        if (!parallel_cpus) {
-            parallel_cpus = true;
-            tb_flush(cpu);
-        }
-
         ret = pthread_create(&info.thread, &attr, clone_func, &info);
         /* TODO: Free new CPU state if thread creation failed.  */
 
diff --git a/translate-all.c b/translate-all.c
index 3dd9214..0b0bb09 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -142,7 +142,11 @@  static void *l1_map[V_L1_MAX_SIZE];
 
 /* code generation context */
 TCGContext tcg_ctx;
+#ifdef CONFIG_USER_ONLY
+bool parallel_cpus = true;
+#else
 bool parallel_cpus;
+#endif
 
 /* translation block context */
 #ifdef CONFIG_USER_ONLY