Skip to content

Commit b94f636

Browse files
committed
kvm: Aqcuire lock when running security group Python script
It could happen that when multiple instances are starting at the same time on a KVM host the Agent spawns multiple instances of security_group.py which both try to modify iptables/ebtables rules. This fails with on of the two processes failing. The instance is still started, but it doesn't have any IP connectivity due to the failed programming of the security groups. This modification lets the script aqcuire a exclusive lock on a file so that only one instance of the scripts talks to iptables/ebtables at once. Other instances of the script which start will poll every 500ms if they can obtain the lock and otherwise execute anyway after 15 seconds.
1 parent 7017a82 commit b94f636

File tree

1 file changed

+23
-0
lines changed

1 file changed

+23
-0
lines changed

scripts/vm/network/security_group.py

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,11 @@
2626
from optparse import OptionParser, OptionGroup, OptParseError, BadOptionError, OptionError, OptionConflictError, OptionValueError
2727
import re
2828
import libvirt
29+
import fcntl
30+
import time
2931

3032
logpath = "/var/run/cloud/" # FIXME: Logs should reside in /var/log/cloud
33+
lock_file = "/var/lock/cloudstack_security_group.lock"
3134
iptables = Command("iptables")
3235
bash = Command("/bin/bash")
3336
ebtables = Command("ebtables")
@@ -36,6 +39,18 @@
3639
hyper = cfo.getEntry("hypervisor.type")
3740
if hyper == "lxc":
3841
driver = "lxc:///"
42+
43+
lock_handle = None
44+
45+
def obtain_file_lock(path):
46+
global lock_handle
47+
48+
with open(path, 'w') as lock_handle:
49+
fcntl.lockf(lock_handle, fcntl.LOCK_EX | fcntl.LOCK_NB)
50+
return True
51+
52+
return False
53+
3954
def execute(cmd):
4055
logging.debug(cmd)
4156
return bash("-c", cmd).stdout
@@ -1029,6 +1044,14 @@ def addFWFramework(brname):
10291044
sys.exit(1)
10301045
cmd = args[0]
10311046
logging.debug("Executing command: " + str(cmd))
1047+
1048+
for i in range(0, 30):
1049+
if obtain_file_lock(lock_file) is False:
1050+
logging.warn("Lock on %s is being held by other process. Waiting for release." % lock_file)
1051+
time.sleep(0.5)
1052+
else:
1053+
break
1054+
10321055
if cmd == "can_bridge_firewall":
10331056
can_bridge_firewall(args[1])
10341057
elif cmd == "default_network_rules":

0 commit comments

Comments
 (0)