- Clone this repository
- Install the requirements
-
You can run using Python script
main.py python main.py --model actions/yV8_medium/weights/best.pt --input_path assets/actions.jpg --output_path Output/actions.jpg --show_labelspython main.py --model players/yV8_medium/weights/best.pt --input_path assets/players.jpg --output_path Output/players.jpg-
Since ultralytics provides simple CLI you can also use it
yolo predict model=actions\yV8_medium\weights\best.pt source=assets\actions.jpg show_conf=False show_labels=Trueyolo predict model=players\yV8_medium\weights\best.pt source=assets\players.jpg show_conf=False show_labels=False- To run the event detection, use
sliding_window.pyscript. - The effectiveness of this approach depends on confidence level (
--conf), sliding window size and threshhold, so for experimental purposes usesliding_window_verbose.py. In addition to event, it also draws[frame_number] (who's action/yellow circle) class_name (confidence) --gpu- for faster inference. Can be added to other scripts (do yourself). And--imgsz- to prevent YOLO resizing the input to 640x640 but instead to larger/smaller custom size.
git clone https://github.com/shukkkur/VolleyVision.git
cd VolleyVision\Stage II - Players & Actions
pip install ultralytics
i. |
ii. |
|---|---|
![]() |
![]() |
For event detection I used the action recognition model, but now instead of drawing the bounding boxes on every frame, we intoduce temporal information using a list (deque). So, basically we store the last N frames' predictions and declare that an event occured only if a certain action has been predicted several times. Gosh, my explanation is terrible, please refer to below examples.
# `-` means no detections
# `()` braces are the sliding window
# let's choose sliding_window of size `5`
# and threshold number of `3`, meaning we need 3 detections for an event to be declared
predictions = [-, -, spike, -, spike, spike, -, -, spike, spike]
[(-, -, spike, -, spike), spike, -, -, spike, spike] # only two detections, no event
[-, (-, spike, -, spike, spike), -, -, spike, spike] # yeah, we've got 3 detection within the window, draw "SPIKE" on top
[-, -, (spike, -, spike, spike, -), -, spike, spike] # keep drawing/printing/declraing "SPIKE"
...
I really hope this make sense now. Anyway, this simple, yet amazing idea belongs to my former mentor at BallerTV - Paul Kefer. Thank you for your patience))).
!python sliding_wndow.py --model actions/yV8_medium/weights/best.pt --input_path "assets/rallies/rally.mp4" --output_path Output/event_detection.mp4 --conf 0.5
!python sliding_wndow_verbose.py --model actions/yV8_medium/weights/best.pt --input_path "assets/rallies/rally.mp4" --output_path Output/event_detection.mp4 --conf 0.4 --gpu --imgsz 1920 1080
sliding_window.py |
sliding_window_verbose.py |
|---|---|
![]() |
![]() |
For any additional quesitons feel free to take part in discussions, open an issue or contact me.




