Skip to content
This repository was archived by the owner on Jun 20, 2024. It is now read-only.
This repository was archived by the owner on Jun 20, 2024. It is now read-only.

Memory leak/OOM with "Received update for IP range I own" messages in log #3659

@sferrett

Description

@sferrett

What you expected to happen?

Memory usage of the weave process is expected to be stable and not grow unbounded over time.

What happened?

I had a stable 2.5.0 weave network in my Kubernetes 1.9 cluster of about 100 nodes. The weave was initially installed by kops and had a memory limit of 200mb set. There were no occurrences of "Received update for IP range I own" in the log files and memory usage for weave pods in the cluster had been very stable over time for weeks.

As part of refactoring some services, about 30 nodes were removed from the cluster (bringing the cluster size down to 71 nodes). After this action, the memory usage of the weave pods started growing until it exceeded the memory limit, at which time the pod was OOM killed and restarted. These restarts result in brief disruption for the node on which the restart occurs. At this time the "Received update for IP range I own" message started appearing in the logs (although not from all pods, this nuance was not discovered until later).

After looking at some related tickets and such here (#3650, #3600, #2797), the following actions were taken:

  • The "status ipam" output was checked and seen to have a lot of "unreachable" peers listed in it
  • The unreachable nodes listed by "status ipam" were removed with rmpeer on one node, though this did not fix all the unreachables on all the nodes, the process of listing and removing unreachables was done on a couple of other systems before all systems were showing all 71 nodes in the list and all as reachable.
  • updated to 2.5.2 as there were some related looking tickets mentioned in that release
  • increased the memory limit so that OOM killing might happen less frequently (from 200mb to 1gb)

Weave pods continue to grow in memory usage, the new 2.5.2 pods have not hit their 1g limit yet but look to be heading that way. The "update for IP range I own" messages are still being seen - however on closer inspection these messages are only coming from 3 of the 71 pods.

How to reproduce it?

Have a working kubernetes cluster and delete some nodes out of it.

Anything else we need to know?

Versions:


        Version: 2.5.2 (up to date; next check at 2019/07/12 18:43:12)

        Service: router
       Protocol: weave 1..2
           Name: ea:38:6f:58:7b:81(ip-10-32-124-236.us-west-2.compute.internal)
     Encryption: disabled
  PeerDiscovery: enabled
        Targets: 71
    Connections: 71 (70 established, 1 failed)
          Peers: 71 (with 4966 established, 4 pending connections)
 TrustedSubnets: none

        Service: ipam
         Status: ready
          Range: 100.96.0.0/11
  DefaultSubnet: 100.96.0.0/11
admin@ip-10-32-92-49:~$ docker version
Client:
 Version:      17.03.2-ce
 API version:  1.27
 Go version:   go1.7.5
 Git commit:   f5ec1e2
 Built:        Tue Jun 27 02:09:56 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.03.2-ce
 API version:  1.27 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   f5ec1e2
 Built:        Tue Jun 27 02:09:56 2017
 OS/Arch:      linux/amd64
 Experimental: false
Linux ip-10-32-92-49 4.4.121-k8s #1 SMP Sun Mar 11 19:39:47 UTC 2018 x86_64 GNU/Linux
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.8", GitCommit:"c138b85178156011dc934c2c9f4837476876fb07", GitTreeState:"clean", BuildDate:"2018-05-21T18:53:18Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}```

Logs:

This is the logs from one of the weave pods that is showing the "Received update for IP range I own" messages: weave-net-q56hl.log
This is the pprof/heap output for the above node
weave-net-q56hl.heap.gz
This is status ipam from the above node
weave-net-q56hl.ipam.txt
This is status peers from the above node
weave-net-q56hl.peers.txt

This is the logs from one of the weave pods not showing that message:
weave-net-9t7d8.log
This is the pprof/heap output for the above node weave-net-9t7d8.heap.gz

And here's a picture showing the history of memory usage form these pods
Screen Shot 2019-07-12 at 9 20 16 AM

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions