Skip to content

LCOW: Intermittent DNS resolution failures with Alpine containers #2371

@Iristyle

Description

@Iristyle

Preface - I haven't yet debugged this issue enough to know precisely where the issue lies. I do know that I can very trivially reproduce the problem and wanted to at least get the ticket filed / conversation going. It may be related to some combination of:

  • LCOW (or LCOW image / kernel / opengcs / etc)
  • Alpine 3.9
  • Environment - containers are running inside a Server 2019 Hyper-V VM that has nested virtualization enabled
  • Docker version / some nuance of the Docker DNS resolver

I'm pretty sure this has something to do with Alpine in particular, since running the failing scenario with Ubuntu containers instead does not fail.

docker info

Client:
 Debug Mode: false
 Plugins:
  app: Docker Application (Docker Inc., v0.8.0-beta2)
  buildx: Build with BuildKit (Docker Inc., v0.2.0-6-g509c4b6-tp)

Server:
 Containers: 2
  Running: 0
  Paused: 0
  Stopped: 2
 Images: 138
 Server Version: master-dockerproject-2019-04-28
 Storage Driver: windowsfilter (windows) lcow (linux)
  Windows:
  LCOW:
 Logging Driver: json-file
 Plugins:
  Volume: local
  Network: ics l2bridge l2tunnel nat null overlay transparent
  Log: awslogs etwlogs fluentd gcplogs gelf json-file local logentries splunk syslog
 Swarm: inactive
 Default Isolation: hyperv
 Kernel Version: 10.0 17763 (17763.1.amd64fre.rs5_release.180914-1434)
 Operating System: Windows 10 Enterprise Version 1809 (OS Build 17763.437)
 OSType: windows
 Architecture: x86_64
 CPUs: 2
 Total Memory: 16GiB
 Name: ci-lcow-prod-1
 ID: 0ac02c9d-aaba-42f4-8749-5a64af3068d8
 Docker Root Dir: C:\ProgramData\docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: true
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

The LCOW image is built from linuxkit/lcow@d5dfdbc - it includes kernel 4.19.27 amongst other bits. There is an updated kernel image PR that was merged containing newer versions of OpenGCS, Alpine, kernel and runc BUT when I built it, it didn't launch containers and I had to revert (more info in linuxkit/lcow#45 (comment))

compose file to demonstrate the problem

version: '3'

services:
  foo:
    image: alpine:latest
    dns_search: internal
    entrypoint: sh -c "while true; do nslookup bar.internal && sleep 1s; done"
    networks:
      default:
        aliases:
         - foo.internal

  bar:
    image: alpine:latest
    dns_search: internal
    entrypoint: sh -c "while true; do nslookup foo.internal && sleep 1s; done"
    networks:
      default:
        aliases:
         - bar.internal

Output from compose up

The problem is that DNS resolution failures occur pretty regularly - i.e. foo cannot resolve bar.internal fail and vice versa. While the log also shows some successes, there are a number of failures as well (which vary depending on each run).

PS C:\source\alpine-test> docker-compose -f .\docker-compose-bad.yml up
Creating network "alpine-test_default" with the default driver
Creating alpine-test_bar_1 ... done
Creating alpine-test_foo_1 ... done
Attaching to alpine-test_foo_1, alpine-test_bar_1
foo_1  |
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  | nslookup: can't resolve 'bar.internal': Name does not resolve
bar_1  |
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25 alpine-test_foo_1.alpine-test_default
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  |
foo_1  | Name:      bar.internal
foo_1  | Address 1: 172.18.76.19
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  |
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25 alpine-test_foo_1.alpine-test_default
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  |
foo_1  | Name:      bar.internal
foo_1  | Address 1: 172.18.76.19 alpine-test_bar_1.alpine-test_default
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  |
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25 alpine-test_foo_1.alpine-test_default
foo_1  |
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  | Name:      bar.internal
foo_1  | Address 1: 172.18.76.19 alpine-test_bar_1.alpine-test_default
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  |
bar_1  | nslookup: can't resolve 'foo.internal': Name does not resolve
foo_1  |
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  | Name:      bar.internal
foo_1  | Address 1: 172.18.76.19 alpine-test_bar_1.alpine-test_default
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  |
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25 alpine-test_foo_1.alpine-test_default
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  |
foo_1  | Name:      bar.internal
foo_1  | Address 1: 172.18.76.19 alpine-test_bar_1.alpine-test_default
bar_1  |
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25 alpine-test_foo_1.alpine-test_default
foo_1  |
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  | Name:      bar.internal
foo_1  | Address 1: 172.18.76.19 alpine-test_bar_1.alpine-test_default
bar_1  |
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  |
foo_1  | Name:      bar.internal
foo_1  | Address 1: 172.18.76.19 alpine-test_bar_1.alpine-test_default
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  | nslookup: can't resolve 'foo.internal': Name does not resolve
bar_1  |
foo_1  |
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  | nslookup: can't resolve 'bar.internal': Name does not resolve
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  |
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25 alpine-test_foo_1.alpine-test_default
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  |
foo_1  | Name:      bar.internal
foo_1  | Address 1: 172.18.76.19 alpine-test_bar_1.alpine-test_default
bar_1  |
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25 alpine-test_foo_1.alpine-test_default
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  | nslookup: can't resolve 'bar.internal': Name does not resolve
foo_1  |
bar_1  |
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25 alpine-test_foo_1.alpine-test_default
foo_1  |
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  | nslookup: can't resolve 'bar.internal': Name does not resolve
bar_1  |
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25
foo_1  |
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  | Name:      bar.internal
foo_1  | Address 1: 172.18.76.19 alpine-test_bar_1.alpine-test_default
bar_1  |
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
Gracefully stopping... (press Ctrl+C again to force)

Workaround

One way to workaround the problem is to have the Alpine container perform a dig against the host, which presumably will cache the DNS record for future nslookup calls

compose file

version: '3'

services:
  foo:
    image: alpine:latest
    dns_search: internal
    entrypoint: sh -c "apk add bind-tools; dig bar.internal; while true; do nslookup bar.internal; sleep 2s; done"
    networks:
      default:
        aliases:
         - foo.internal

  bar:
    image: alpine:latest
    dns_search: internal
    entrypoint: sh -c "apk add bind-tools; dig foo.internal; while true; do nslookup foo.internal; sleep 2s; done"
    networks:
      default:
        aliases:
         - bar.internal

Output from compose up

The nslookup results have changed quite a bit from:

bar_1  |
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25

To

bar_1  | Server:                172.25.128.1
bar_1  | Address:       172.25.128.1#53
bar_1  |
bar_1  | Non-authoritative answer:
bar_1  | Name:  foo.internal
bar_1  | Address: 172.25.139.149
bar_1  |

Here's a longer run from the above compose file showing that nslookup no longer fails intermittently.

PS C:\source\alpine-test> docker-compose up
Creating network "alpine-test_default" with the default driver
Creating alpine-test_bar_1 ... done
Creating alpine-test_foo_1 ... done
Attaching to alpine-test_foo_1, alpine-test_bar_1
foo_1  | fetch http://dl-cdn.alpinelinux.org/alpine/v3.9/main/x86_64/APKINDEX.tar.gz
bar_1  | fetch http://dl-cdn.alpinelinux.org/alpine/v3.9/main/x86_64/APKINDEX.tar.gz
foo_1  | fetch http://dl-cdn.alpinelinux.org/alpine/v3.9/community/x86_64/APKINDEX.tar.gz
bar_1  | fetch http://dl-cdn.alpinelinux.org/alpine/v3.9/community/x86_64/APKINDEX.tar.gz
foo_1  | (1/10) Installing libgcc (8.3.0-r0)
bar_1  | (1/10) Installing libgcc (8.3.0-r0)
bar_1  | (2/10) Installing krb5-conf (1.0-r1)
foo_1  | (2/10) Installing krb5-conf (1.0-r1)
bar_1  | (3/10) Installing libcom_err (1.44.5-r0)
foo_1  | (3/10) Installing libcom_err (1.44.5-r0)
bar_1  | (4/10) Installing keyutils-libs (1.6-r0)
foo_1  | (4/10) Installing keyutils-libs (1.6-r0)
bar_1  | (5/10) Installing libverto (0.3.0-r1)
bar_1  | (6/10) Installing krb5-libs (1.15.5-r0)
foo_1  | (5/10) Installing libverto (0.3.0-r1)
foo_1  | (6/10) Installing krb5-libs (1.15.5-r0)
bar_1  | (7/10) Installing json-c (0.13.1-r0)
bar_1  | (8/10) Installing libxml2 (2.9.9-r1)
foo_1  | (7/10) Installing json-c (0.13.1-r0)
foo_1  | (8/10) Installing libxml2 (2.9.9-r1)
bar_1  | (9/10) Installing bind-libs (9.12.4_p1-r1)
foo_1  | (9/10) Installing bind-libs (9.12.4_p1-r1)
foo_1  | (10/10) Installing bind-tools (9.12.4_p1-r1)
bar_1  | (10/10) Installing bind-tools (9.12.4_p1-r1)
foo_1  | Executing busybox-1.29.3-r10.trigger
bar_1  | Executing busybox-1.29.3-r10.trigger
bar_1  | OK: 12 MiB in 24 packages
foo_1  | OK: 12 MiB in 24 packages
foo_1  |
foo_1  | ; <<>> DiG 9.12.4-P1 <<>> bar.internal
foo_1  | ;; global options: +cmd
foo_1  | ;; Got answer:
foo_1  | ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62166
foo_1  | ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
foo_1  |
foo_1  | ;; QUESTION SECTION:
foo_1  | ;bar.internal.                 IN      A
foo_1  |
foo_1  | ;; ANSWER SECTION:
foo_1  | bar.internal.          600     IN      A       172.25.137.174
foo_1  |
foo_1  | ;; Query time: 0 msec
foo_1  | ;; SERVER: 172.25.128.1#53(172.25.128.1)
foo_1  | ;; WHEN: Fri May 03 18:26:29 UTC 2019
foo_1  | ;; MSG SIZE  rcvd: 58
foo_1  |
foo_1  | Server:                172.25.128.1
foo_1  | Address:       172.25.128.1#53
foo_1  |
foo_1  | Non-authoritative answer:
foo_1  | Name:  bar.internal
foo_1  | Address: 172.25.137.174
foo_1  |
bar_1  |
bar_1  | ; <<>> DiG 9.12.4-P1 <<>> foo.internal
bar_1  | ;; global options: +cmd
bar_1  | ;; Got answer:
bar_1  | ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 34929
bar_1  | ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
bar_1  |
bar_1  | ;; QUESTION SECTION:
bar_1  | ;foo.internal.                 IN      A
bar_1  |
bar_1  | ;; ANSWER SECTION:
bar_1  | foo.internal.          600     IN      A       172.25.139.149
bar_1  |
bar_1  | ;; Query time: 0 msec
bar_1  | ;; SERVER: 172.25.128.1#53(172.25.128.1)
bar_1  | ;; WHEN: Fri May 03 18:26:29 UTC 2019
bar_1  | ;; MSG SIZE  rcvd: 58
bar_1  |
bar_1  | Server:                172.25.128.1
bar_1  | Address:       172.25.128.1#53
bar_1  |
bar_1  | Non-authoritative answer:
bar_1  | Name:  foo.internal
bar_1  | Address: 172.25.139.149
bar_1  |
foo_1  | Server:                172.25.128.1
foo_1  | Address:       172.25.128.1#53
foo_1  |
foo_1  | Non-authoritative answer:
foo_1  | Name:  bar.internal
foo_1  | Address: 172.25.137.174
foo_1  |
bar_1  | Server:                172.25.128.1
bar_1  | Address:       172.25.128.1#53
bar_1  |
bar_1  | Non-authoritative answer:
bar_1  | Name:  foo.internal
bar_1  | Address: 172.25.139.149
bar_1  |
foo_1  | Server:                172.25.128.1
foo_1  | Address:       172.25.128.1#53
foo_1  |
foo_1  | Non-authoritative answer:
foo_1  | Name:  bar.internal
foo_1  | Address: 172.25.137.174
foo_1  |
bar_1  | Server:                172.25.128.1
bar_1  | Address:       172.25.128.1#53
bar_1  |
bar_1  | Non-authoritative answer:
bar_1  | Name:  foo.internal
bar_1  | Address: 172.25.139.149
bar_1  |
foo_1  | Server:                172.25.128.1
foo_1  | Address:       172.25.128.1#53
foo_1  |
foo_1  | Non-authoritative answer:
foo_1  | Name:  bar.internal
foo_1  | Address: 172.25.137.174
foo_1  |
bar_1  | Server:                172.25.128.1
bar_1  | Address:       172.25.128.1#53
bar_1  |
bar_1  | Non-authoritative answer:
bar_1  | Name:  foo.internal
bar_1  | Address: 172.25.139.149
bar_1  |
foo_1  | Server:                172.25.128.1
foo_1  | Address:       172.25.128.1#53
foo_1  |
foo_1  | Non-authoritative answer:
foo_1  | Name:  bar.internal
foo_1  | Address: 172.25.137.174
foo_1  |
bar_1  | Server:                172.25.128.1
bar_1  | Address:       172.25.128.1#53
bar_1  |
bar_1  | Non-authoritative answer:
bar_1  | Name:  foo.internal
bar_1  | Address: 172.25.139.149
bar_1  |
foo_1  | Server:                172.25.128.1
foo_1  | Address:       172.25.128.1#53
foo_1  |
foo_1  | Non-authoritative answer:
foo_1  | Name:  bar.internal
foo_1  | Address: 172.25.137.174
foo_1  |
bar_1  | Server:                172.25.128.1
bar_1  | Address:       172.25.128.1#53
bar_1  |
bar_1  | Non-authoritative answer:
bar_1  | Name:  foo.internal
bar_1  | Address: 172.25.139.149
bar_1  |

Ubuntu results

Compose file

version: '3'

services:
  foo:
    image: ubuntu:latest
    dns_search: internal
    entrypoint: sh -c "apt-get update && apt-get install -y dnsutils; while true; do nslookup 'bar.internal'; sleep 2s; done"
    networks:
      default:
        aliases:
         - foo.internal

  bar:
    image: ubuntu:latest
    dns_search: internal
    entrypoint: sh -c "apt-get update && apt-get install -y dnsutils; while true; do nslookup 'foo.internal'; sleep 2s; done"
    networks:
      default:
        aliases:
         - bar.internal

I'll spare the full log here, but switching to an Ubuntu container and nslookup succeeds from the onset:

foo_1  | Server:                172.30.16.1
foo_1  | Address:       172.30.16.1#53
foo_1  |
foo_1  | Non-authoritative answer:
foo_1  | Name:  bar.internal
foo_1  | Address: 172.30.18.190
foo_1  |
bar_1  | Server:                172.30.16.1
bar_1  | Address:       172.30.16.1#53
bar_1  |
bar_1  | Non-authoritative answer:
bar_1  | Name:  foo.internal
bar_1  | Address: 172.30.28.25
bar_1  |

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions