Currently, VidoeHandler implements a get which accepts a single float, being the frame time and returns a single frame, while get of AudioHandler requires 2 floats (start, end) in order to get a chunk of the audio recording.
I would add an optional end=None parameter to VideoHandler get, that retrieves a range of frames. The simplest implementation would be converting the range to integers and use __getitem__.