Message ID | 20230227173632.3292573-1-surenb@google.com (mailing list archive) |
---|---|
Headers | show |
Series | Per-VMA locks | expand |
On Mon, Feb 27, 2023 at 09:35:59AM -0800, Suren Baghdasaryan wrote: <...> > Laurent Dufour (1): > powerc/mm: try VMA lock-based page fault handling first Hi, This series and specifically the commit above broke docker over PPC. It causes to docker service stuck while trying to activate. Revert of this commit allows us to use docker again. [user@ppc-135-3-200-205 ~]# sudo systemctl status docker ● docker.service - Docker Application Container Engine Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled) Active: activating (start) since Mon 2023-06-26 14:47:07 IDT; 3h 50min ago TriggeredBy: ● docker.socket Docs: https://docs.docker.com Main PID: 276555 (dockerd) Memory: 44.2M CGroup: /system.slice/docker.service └─ 276555 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: time="2023-06-26T14:47:07.129383166+03:00" level=info msg="Graph migration to content-addressability took 0.00 se> Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: time="2023-06-26T14:47:07.129666160+03:00" level=warning msg="Your kernel does not support cgroup cfs period" Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: time="2023-06-26T14:47:07.129684117+03:00" level=warning msg="Your kernel does not support cgroup cfs quotas" Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: time="2023-06-26T14:47:07.129697085+03:00" level=warning msg="Your kernel does not support cgroup rt period" Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: time="2023-06-26T14:47:07.129711513+03:00" level=warning msg="Your kernel does not support cgroup rt runtime" Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: time="2023-06-26T14:47:07.129720656+03:00" level=warning msg="Unable to find blkio cgroup in mounts" Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: time="2023-06-26T14:47:07.129805617+03:00" level=warning msg="mountpoint for pids not found" Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: time="2023-06-26T14:47:07.130199070+03:00" level=info msg="Loading containers: start." Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: time="2023-06-26T14:47:07.132688568+03:00" level=warning msg="Running modprobe bridge br_netfilter failed with me> Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: time="2023-06-26T14:47:07.271014050+03:00" level=info msg="Default bridge (docker0) is assigned with an IP addres> Python script which we used for bisect: import subprocess import time import sys def run_command(cmd): print('running:', cmd) p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE) try: stdout, stderr = p.communicate(timeout=30) except subprocess.TimeoutExpired: return True print(stdout.decode()) print(stderr.decode()) print('rc:', p.returncode) return False def main(): commands = [ 'sudo systemctl stop docker', 'sudo systemctl status docker', 'sudo systemctl is-active docker', 'sudo systemctl start docker', 'sudo systemctl status docker', ] for i in range(1000): title = f'Try no. {i + 1}' print('*' * 50, title, '*' * 50) for cmd in commands: if run_command(cmd): print(f'Reproduced on try no. {i + 1}!') print(f'"{cmd}" is stuck!') return 1 print('\n') time.sleep(30) return 0 if __name__ == '__main__': sys.exit(main()) Thanks
On 7/11/23 12:35, Leon Romanovsky wrote: > > On Mon, Feb 27, 2023 at 09:35:59AM -0800, Suren Baghdasaryan wrote: > > <...> > >> Laurent Dufour (1): >> powerc/mm: try VMA lock-based page fault handling first > > Hi, > > This series and specifically the commit above broke docker over PPC. > It causes to docker service stuck while trying to activate. Revert of > this commit allows us to use docker again. Hi, there have been follow-up fixes, that are part of 6.4.3 stable (also 6.5-rc1) Does that version work for you? Vlastimil > [user@ppc-135-3-200-205 ~]# sudo systemctl status docker > ● docker.service - Docker Application Container Engine > Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled) > Active: activating (start) since Mon 2023-06-26 14:47:07 IDT; 3h 50min ago > TriggeredBy: ● docker.socket > Docs: https://docs.docker.com > Main PID: 276555 (dockerd) > Memory: 44.2M > CGroup: /system.slice/docker.service > └─ 276555 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock > > Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: time="2023-06-26T14:47:07.129383166+03:00" level=info msg="Graph migration to content-addressability took 0.00 se> > Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: time="2023-06-26T14:47:07.129666160+03:00" level=warning msg="Your kernel does not support cgroup cfs period" > Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: time="2023-06-26T14:47:07.129684117+03:00" level=warning msg="Your kernel does not support cgroup cfs quotas" > Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: time="2023-06-26T14:47:07.129697085+03:00" level=warning msg="Your kernel does not support cgroup rt period" > Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: time="2023-06-26T14:47:07.129711513+03:00" level=warning msg="Your kernel does not support cgroup rt runtime" > Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: time="2023-06-26T14:47:07.129720656+03:00" level=warning msg="Unable to find blkio cgroup in mounts" > Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: time="2023-06-26T14:47:07.129805617+03:00" level=warning msg="mountpoint for pids not found" > Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: time="2023-06-26T14:47:07.130199070+03:00" level=info msg="Loading containers: start." > Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: time="2023-06-26T14:47:07.132688568+03:00" level=warning msg="Running modprobe bridge br_netfilter failed with me> > Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: time="2023-06-26T14:47:07.271014050+03:00" level=info msg="Default bridge (docker0) is assigned with an IP addres> > > Python script which we used for bisect: > > import subprocess > import time > import sys > > > def run_command(cmd): > print('running:', cmd) > > p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE) > > try: > stdout, stderr = p.communicate(timeout=30) > > except subprocess.TimeoutExpired: > return True > > print(stdout.decode()) > print(stderr.decode()) > print('rc:', p.returncode) > > return False > > > def main(): > commands = [ > 'sudo systemctl stop docker', > 'sudo systemctl status docker', > 'sudo systemctl is-active docker', > 'sudo systemctl start docker', > 'sudo systemctl status docker', > ] > > for i in range(1000): > title = f'Try no. {i + 1}' > print('*' * 50, title, '*' * 50) > > for cmd in commands: > if run_command(cmd): > print(f'Reproduced on try no. {i + 1}!') > print(f'"{cmd}" is stuck!') > > return 1 > > print('\n') > time.sleep(30) > return 0 > > if __name__ == '__main__': > sys.exit(main()) > > Thanks
On Tue, Jul 11, 2023 at 12:39:34PM +0200, Vlastimil Babka wrote: > On 7/11/23 12:35, Leon Romanovsky wrote: > > > > On Mon, Feb 27, 2023 at 09:35:59AM -0800, Suren Baghdasaryan wrote: > > > > <...> > > > >> Laurent Dufour (1): > >> powerc/mm: try VMA lock-based page fault handling first > > > > Hi, > > > > This series and specifically the commit above broke docker over PPC. > > It causes to docker service stuck while trying to activate. Revert of > > this commit allows us to use docker again. > > Hi, > > there have been follow-up fixes, that are part of 6.4.3 stable (also > 6.5-rc1) Does that version work for you? I'll recheck it again on clean system, but for the record: 1. We are running 6.5-rc1 kernels. 2. PPC doesn't compile for us on -rc1 without this fix. https://lore.kernel.org/all/20230629124500.1.I55e2f4e7903d686c4484cb23c033c6a9e1a9d4c4@changeid/ 3. I didn't see anything relevant -rc1 with "git log arch/powerpc/mm/fault.c". Do you have in mind anything specific to check? Thanks
On Tue, Jul 11, 2023 at 02:01:41PM +0300, Leon Romanovsky wrote: > On Tue, Jul 11, 2023 at 12:39:34PM +0200, Vlastimil Babka wrote: > > On 7/11/23 12:35, Leon Romanovsky wrote: > > > > > > On Mon, Feb 27, 2023 at 09:35:59AM -0800, Suren Baghdasaryan wrote: > > > > > > <...> > > > > > >> Laurent Dufour (1): > > >> powerc/mm: try VMA lock-based page fault handling first > > > > > > Hi, > > > > > > This series and specifically the commit above broke docker over PPC. > > > It causes to docker service stuck while trying to activate. Revert of > > > this commit allows us to use docker again. > > > > Hi, > > > > there have been follow-up fixes, that are part of 6.4.3 stable (also > > 6.5-rc1) Does that version work for you? > > I'll recheck it again on clean system, but for the record: > 1. We are running 6.5-rc1 kernels. > 2. PPC doesn't compile for us on -rc1 without this fix. > https://lore.kernel.org/all/20230629124500.1.I55e2f4e7903d686c4484cb23c033c6a9e1a9d4c4@changeid/ Ohh, I see it in -rc1, let's recheck. > 3. I didn't see anything relevant -rc1 with "git log arch/powerpc/mm/fault.c". > > Do you have in mind anything specific to check? > > Thanks >
On Tue, Jul 11, 2023 at 4:09 AM Leon Romanovsky <leon@kernel.org> wrote: > > On Tue, Jul 11, 2023 at 02:01:41PM +0300, Leon Romanovsky wrote: > > On Tue, Jul 11, 2023 at 12:39:34PM +0200, Vlastimil Babka wrote: > > > On 7/11/23 12:35, Leon Romanovsky wrote: > > > > > > > > On Mon, Feb 27, 2023 at 09:35:59AM -0800, Suren Baghdasaryan wrote: > > > > > > > > <...> > > > > > > > >> Laurent Dufour (1): > > > >> powerc/mm: try VMA lock-based page fault handling first > > > > > > > > Hi, > > > > > > > > This series and specifically the commit above broke docker over PPC. > > > > It causes to docker service stuck while trying to activate. Revert of > > > > this commit allows us to use docker again. > > > > > > Hi, > > > > > > there have been follow-up fixes, that are part of 6.4.3 stable (also > > > 6.5-rc1) Does that version work for you? > > > > I'll recheck it again on clean system, but for the record: > > 1. We are running 6.5-rc1 kernels. > > 2. PPC doesn't compile for us on -rc1 without this fix. > > https://lore.kernel.org/all/20230629124500.1.I55e2f4e7903d686c4484cb23c033c6a9e1a9d4c4@changeid/ > > Ohh, I see it in -rc1, let's recheck. Hi Leon, Please let us know how it goes. > > > 3. I didn't see anything relevant -rc1 with "git log arch/powerpc/mm/fault.c". The fixes Vlastimil was referring to are not in the fault.c, they are in the main mm and fork code. More specifically, check for these patches to exist in the branch you are testing: mm: lock newly mapped VMA with corrected ordering fork: lock VMAs of the parent process when forking mm: lock newly mapped VMA which can be modified after it becomes visible mm: lock a vma before stack expansion Thanks, Suren. > > > > Do you have in mind anything specific to check? > > > > Thanks > > > > -- > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com. >
On Tue, Jul 11, 2023 at 09:35:13AM -0700, Suren Baghdasaryan wrote: > On Tue, Jul 11, 2023 at 4:09 AM Leon Romanovsky <leon@kernel.org> wrote: > > > > On Tue, Jul 11, 2023 at 02:01:41PM +0300, Leon Romanovsky wrote: > > > On Tue, Jul 11, 2023 at 12:39:34PM +0200, Vlastimil Babka wrote: > > > > On 7/11/23 12:35, Leon Romanovsky wrote: > > > > > > > > > > On Mon, Feb 27, 2023 at 09:35:59AM -0800, Suren Baghdasaryan wrote: > > > > > > > > > > <...> > > > > > > > > > >> Laurent Dufour (1): > > > > >> powerc/mm: try VMA lock-based page fault handling first > > > > > > > > > > Hi, > > > > > > > > > > This series and specifically the commit above broke docker over PPC. > > > > > It causes to docker service stuck while trying to activate. Revert of > > > > > this commit allows us to use docker again. > > > > > > > > Hi, > > > > > > > > there have been follow-up fixes, that are part of 6.4.3 stable (also > > > > 6.5-rc1) Does that version work for you? > > > > > > I'll recheck it again on clean system, but for the record: > > > 1. We are running 6.5-rc1 kernels. > > > 2. PPC doesn't compile for us on -rc1 without this fix. > > > https://lore.kernel.org/all/20230629124500.1.I55e2f4e7903d686c4484cb23c033c6a9e1a9d4c4@changeid/ > > > > Ohh, I see it in -rc1, let's recheck. > > Hi Leon, > Please let us know how it goes. Once, we rebuilt clean -rc1, docker worked for us. Sorry for the noise. > > > > > > 3. I didn't see anything relevant -rc1 with "git log arch/powerpc/mm/fault.c". > > The fixes Vlastimil was referring to are not in the fault.c, they are > in the main mm and fork code. More specifically, check for these > patches to exist in the branch you are testing: > > mm: lock newly mapped VMA with corrected ordering > fork: lock VMAs of the parent process when forking > mm: lock newly mapped VMA which can be modified after it becomes visible > mm: lock a vma before stack expansion Thanks > > Thanks, > Suren. > > > > > > > Do you have in mind anything specific to check? > > > > > > Thanks > > > > > > > -- > > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com. > >