diff mbox series

[bpf-next,1/3] bpf: Use this_cpu_{inc|dec|inc_return} for bpf_task_storage_busy

Message ID 20220829142752.330094-2-houtao@huaweicloud.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series make bpf_task_storage_busy being preemption-safe | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for bpf-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 2 this patch: 2
netdev/cc_maintainers fail 2 blamed authors not CCed: song@kernel.org martin.lau@linux.dev; 2 maintainers not CCed: song@kernel.org martin.lau@linux.dev
netdev/build_clang success Errors and warnings before: 5 this patch: 5
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 2 this patch: 2
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 37 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next-VM_Test-4 success Logs for llvm-toolchain
bpf/vmtest-bpf-next-VM_Test-5 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-2 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-3 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-1 success Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-7 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-8 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-16 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-17 success Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-13 success Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-14 success Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-10 success Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-11 success Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-6 success Logs for test_maps on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-12 success Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-15 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-9 success Logs for test_progs on s390x with gcc

Commit Message

Hou Tao Aug. 29, 2022, 2:27 p.m. UTC
From: Hou Tao <houtao1@huawei.com>

Now migrate_disable() does not disable preemption and under some
architecture (e.g. arm64) __this_cpu_{inc|dec|inc_return} are neither
preemption-safe nor IRQ-safe, so the concurrent lookups or updates on
the same task local storage and the same CPU may make
bpf_task_storage_busy be imbalanced, and bpf_task_storage_trylock()
will always fail.

Fixing it by using this_cpu_{inc|dec|inc_return} when manipulating
bpf_task_storage_busy.

Fixes: bc235cdb423a ("bpf: Prevent deadlock from recursive bpf_task_storage_[get|delete]")
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 kernel/bpf/bpf_local_storage.c | 4 ++--
 kernel/bpf/bpf_task_storage.c  | 8 ++++----
 2 files changed, 6 insertions(+), 6 deletions(-)

Comments

Martin KaFai Lau Aug. 30, 2022, 12:52 a.m. UTC | #1
On Mon, Aug 29, 2022 at 10:27:50PM +0800, Hou Tao wrote:
> From: Hou Tao <houtao1@huawei.com>
> 
> Now migrate_disable() does not disable preemption and under some
> architecture (e.g. arm64) __this_cpu_{inc|dec|inc_return} are neither
> preemption-safe nor IRQ-safe, so the concurrent lookups or updates on
> the same task local storage and the same CPU may make
> bpf_task_storage_busy be imbalanced, and bpf_task_storage_trylock()
> will always fail.
> 
> Fixing it by using this_cpu_{inc|dec|inc_return} when manipulating
> bpf_task_storage_busy.
> 
> Fixes: bc235cdb423a ("bpf: Prevent deadlock from recursive bpf_task_storage_[get|delete]")
> Signed-off-by: Hou Tao <houtao1@huawei.com>
> ---
>  kernel/bpf/bpf_local_storage.c | 4 ++--
>  kernel/bpf/bpf_task_storage.c  | 8 ++++----
>  2 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
> index 4ee2e7286c23..802fc15b0d73 100644
> --- a/kernel/bpf/bpf_local_storage.c
> +++ b/kernel/bpf/bpf_local_storage.c
> @@ -555,11 +555,11 @@ void bpf_local_storage_map_free(struct bpf_local_storage_map *smap,
>  				struct bpf_local_storage_elem, map_node))) {
>  			if (busy_counter) {
>  				migrate_disable();
> -				__this_cpu_inc(*busy_counter);
> +				this_cpu_inc(*busy_counter);
>  			}
>  			bpf_selem_unlink(selem, false);
>  			if (busy_counter) {
> -				__this_cpu_dec(*busy_counter);
> +				this_cpu_dec(*busy_counter);
>  				migrate_enable();
>  			}
>  			cond_resched_rcu();
> diff --git a/kernel/bpf/bpf_task_storage.c b/kernel/bpf/bpf_task_storage.c
> index e9014dc62682..6f290623347e 100644
> --- a/kernel/bpf/bpf_task_storage.c
> +++ b/kernel/bpf/bpf_task_storage.c
> @@ -26,20 +26,20 @@ static DEFINE_PER_CPU(int, bpf_task_storage_busy);
>  static void bpf_task_storage_lock(void)
>  {
>  	migrate_disable();
> -	__this_cpu_inc(bpf_task_storage_busy);
> +	this_cpu_inc(bpf_task_storage_busy);
>  }
>  
>  static void bpf_task_storage_unlock(void)
>  {
> -	__this_cpu_dec(bpf_task_storage_busy);
> +	this_cpu_dec(bpf_task_storage_busy);
>  	migrate_enable();
>  }
>  
>  static bool bpf_task_storage_trylock(void)
>  {
>  	migrate_disable();
> -	if (unlikely(__this_cpu_inc_return(bpf_task_storage_busy) != 1)) {
> -		__this_cpu_dec(bpf_task_storage_busy);
> +	if (unlikely(this_cpu_inc_return(bpf_task_storage_busy) != 1)) {
> +		this_cpu_dec(bpf_task_storage_busy);
This change is only needed here but not in the htab fix [0]
or you are planning to address it separately ?

[0]: https://lore.kernel.org/bpf/20220829023709.1958204-2-houtao@huaweicloud.com/
Hou Tao Aug. 30, 2022, 2:21 a.m. UTC | #2
Hi,

On 8/30/2022 8:52 AM, Martin KaFai Lau wrote:
> On Mon, Aug 29, 2022 at 10:27:50PM +0800, Hou Tao wrote:
>> From: Hou Tao <houtao1@huawei.com>
>>
>> Now migrate_disable() does not disable preemption and under some
>> architecture (e.g. arm64) __this_cpu_{inc|dec|inc_return} are neither
>> preemption-safe nor IRQ-safe, so the concurrent lookups or updates on
>> the same task local storage and the same CPU may make
>> bpf_task_storage_busy be imbalanced, and bpf_task_storage_trylock()
>> will always fail.
>>
>> Fixing it by using this_cpu_{inc|dec|inc_return} when manipulating
>> bpf_task_storage_busy.
SNIP
>>  static bool bpf_task_storage_trylock(void)
>>  {
>>  	migrate_disable();
>> -	if (unlikely(__this_cpu_inc_return(bpf_task_storage_busy) != 1)) {
>> -		__this_cpu_dec(bpf_task_storage_busy);
>> +	if (unlikely(this_cpu_inc_return(bpf_task_storage_busy) != 1)) {
>> +		this_cpu_dec(bpf_task_storage_busy);
> This change is only needed here but not in the htab fix [0]
> or you are planning to address it separately ?
>
> [0]: https://lore.kernel.org/bpf/20220829023709.1958204-2-houtao@huaweicloud.com/
> .
For htab_lock_bucket() in hash-table, in theory there will be problem as shown
below, but I can not reproduce it on ARM64 host:

*p = 0

// process A
r0 = *p
r0 += 1
            // process B
            r1 = *p
// *p = 1
*p = r0
            r1 += 1
            // *p = 1
            *p = r1

// r0 = 1
r0 = *p
            // r1 = 1
            r1 = *p

In hash table fixes, migrate_disable() in htab_lock_bucket()  is replaced by
preempt_disable(), so the above case will be impossible. And if process A is
preempted by IRQ, __this_cpu_inc_return will be OK.

Beside bpf hash-table, there are also other __this_cpu_inc_return users in bpf
(e.g. __bpf_prog_enter), maybe I should fix these usage as well ?
Martin KaFai Lau Aug. 31, 2022, 12:41 a.m. UTC | #3
On Tue, Aug 30, 2022 at 10:21:30AM +0800, Hou Tao wrote:
> Hi,
> 
> On 8/30/2022 8:52 AM, Martin KaFai Lau wrote:
> > On Mon, Aug 29, 2022 at 10:27:50PM +0800, Hou Tao wrote:
> >> From: Hou Tao <houtao1@huawei.com>
> >>
> >> Now migrate_disable() does not disable preemption and under some
> >> architecture (e.g. arm64) __this_cpu_{inc|dec|inc_return} are neither
> >> preemption-safe nor IRQ-safe, so the concurrent lookups or updates on
> >> the same task local storage and the same CPU may make
> >> bpf_task_storage_busy be imbalanced, and bpf_task_storage_trylock()
> >> will always fail.
> >>
> >> Fixing it by using this_cpu_{inc|dec|inc_return} when manipulating
> >> bpf_task_storage_busy.
> SNIP
> >>  static bool bpf_task_storage_trylock(void)
> >>  {
> >>  	migrate_disable();
> >> -	if (unlikely(__this_cpu_inc_return(bpf_task_storage_busy) != 1)) {
> >> -		__this_cpu_dec(bpf_task_storage_busy);
> >> +	if (unlikely(this_cpu_inc_return(bpf_task_storage_busy) != 1)) {
> >> +		this_cpu_dec(bpf_task_storage_busy);
> > This change is only needed here but not in the htab fix [0]
> > or you are planning to address it separately ?
> >
> > [0]: https://lore.kernel.org/bpf/20220829023709.1958204-2-houtao@huaweicloud.com/
> > .
> For htab_lock_bucket() in hash-table, in theory there will be problem as shown
> below, but I can not reproduce it on ARM64 host:
> 
> *p = 0
> 
> // process A
> r0 = *p
> r0 += 1
>             // process B
>             r1 = *p
> // *p = 1
> *p = r0
>             r1 += 1
>             // *p = 1
>             *p = r1
> 
> // r0 = 1
> r0 = *p
>             // r1 = 1
>             r1 = *p
> 
> In hash table fixes, migrate_disable() in htab_lock_bucket()  is replaced by
> preempt_disable(), so the above case will be impossible. And if process A is
> preempted by IRQ, __this_cpu_inc_return will be OK.
hmm... iiuc, it is fine for the preempted by IRQ case because the
operations won't be interleaved as the process A/B example above?
That should also be true for the task-storage case here,
meaning only CONFIG_PREEMPT can reproduce it?  If that is the
case, please also mention that in the commit message.
Hou Tao Aug. 31, 2022, 2:04 a.m. UTC | #4
Hi,

On 8/31/2022 8:41 AM, Martin KaFai Lau wrote:
> On Tue, Aug 30, 2022 at 10:21:30AM +0800, Hou Tao wrote:
>> Hi,
>>
>> On 8/30/2022 8:52 AM, Martin KaFai Lau wrote:
>>> On Mon, Aug 29, 2022 at 10:27:50PM +0800, Hou Tao wrote:
>>>> From: Hou Tao <houtao1@huawei.com>
>>>>
>>>> Now migrate_disable() does not disable preemption and under some
>>>> architecture (e.g. arm64) __this_cpu_{inc|dec|inc_return} are neither
>>>> preemption-safe nor IRQ-safe, so the concurrent lookups or updates on
>>>> the same task local storage and the same CPU may make
>>>> bpf_task_storage_busy be imbalanced, and bpf_task_storage_trylock()
>>>> will always fail.
>>>>
>>>> Fixing it by using this_cpu_{inc|dec|inc_return} when manipulating
>>>> bpf_task_storage_busy.
>> SNIP
>>>>  static bool bpf_task_storage_trylock(void)
>>>>  {
>>>>  	migrate_disable();
>>>> -	if (unlikely(__this_cpu_inc_return(bpf_task_storage_busy) != 1)) {
>>>> -		__this_cpu_dec(bpf_task_storage_busy);
>>>> +	if (unlikely(this_cpu_inc_return(bpf_task_storage_busy) != 1)) {
>>>> +		this_cpu_dec(bpf_task_storage_busy);
>>> This change is only needed here but not in the htab fix [0]
>>> or you are planning to address it separately ?
>>>
>>> [0]: https://lore.kernel.org/bpf/20220829023709.1958204-2-houtao@huaweicloud.com/
>>> .
>> For htab_lock_bucket() in hash-table, in theory there will be problem as shown
>> below, but I can not reproduce it on ARM64 host:
>>
>> *p = 0
>>
>> // process A
>> r0 = *p
>> r0 += 1
>>             // process B
>>             r1 = *p
>> // *p = 1
>> *p = r0
>>             r1 += 1
>>             // *p = 1
>>             *p = r1
>>
>> // r0 = 1
>> r0 = *p
>>             // r1 = 1
>>             r1 = *p
>>
>> In hash table fixes, migrate_disable() in htab_lock_bucket()  is replaced by
>> preempt_disable(), so the above case will be impossible. And if process A is
>> preempted by IRQ, __this_cpu_inc_return will be OK.
> hmm... iiuc, it is fine for the preempted by IRQ case because the
> operations won't be interleaved as the process A/B example above?
> That should also be true for the task-storage case here,
> meaning only CONFIG_PREEMPT can reproduce it?  If that is the
> case, please also mention that in the commit message.
> .
Yes. There will be no interleave if it is preempted by IRQ and it is also true
for task-storage case. CONFIG_PREEMPT can reproduce and in theory
CONFIG_PREEMPT_RT is also possible. Will add that in commit message.
diff mbox series

Patch

diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
index 4ee2e7286c23..802fc15b0d73 100644
--- a/kernel/bpf/bpf_local_storage.c
+++ b/kernel/bpf/bpf_local_storage.c
@@ -555,11 +555,11 @@  void bpf_local_storage_map_free(struct bpf_local_storage_map *smap,
 				struct bpf_local_storage_elem, map_node))) {
 			if (busy_counter) {
 				migrate_disable();
-				__this_cpu_inc(*busy_counter);
+				this_cpu_inc(*busy_counter);
 			}
 			bpf_selem_unlink(selem, false);
 			if (busy_counter) {
-				__this_cpu_dec(*busy_counter);
+				this_cpu_dec(*busy_counter);
 				migrate_enable();
 			}
 			cond_resched_rcu();
diff --git a/kernel/bpf/bpf_task_storage.c b/kernel/bpf/bpf_task_storage.c
index e9014dc62682..6f290623347e 100644
--- a/kernel/bpf/bpf_task_storage.c
+++ b/kernel/bpf/bpf_task_storage.c
@@ -26,20 +26,20 @@  static DEFINE_PER_CPU(int, bpf_task_storage_busy);
 static void bpf_task_storage_lock(void)
 {
 	migrate_disable();
-	__this_cpu_inc(bpf_task_storage_busy);
+	this_cpu_inc(bpf_task_storage_busy);
 }
 
 static void bpf_task_storage_unlock(void)
 {
-	__this_cpu_dec(bpf_task_storage_busy);
+	this_cpu_dec(bpf_task_storage_busy);
 	migrate_enable();
 }
 
 static bool bpf_task_storage_trylock(void)
 {
 	migrate_disable();
-	if (unlikely(__this_cpu_inc_return(bpf_task_storage_busy) != 1)) {
-		__this_cpu_dec(bpf_task_storage_busy);
+	if (unlikely(this_cpu_inc_return(bpf_task_storage_busy) != 1)) {
+		this_cpu_dec(bpf_task_storage_busy);
 		migrate_enable();
 		return false;
 	}