The code works when it is run for a single image. But when I run the same code in a loop for few images from the publaynet dataset, cached results seem to apply (i.e. The bounding boxes overlap and the boxes for the previous images are also put in the current image).