- My research focuses on multimodal AI, with a particular emphasis on video-centric AI modeling. My research focuses on empowering AI systems to aid humans in interpreting complex video content, thereby facilitating higher-level reasoning across various applications such as sports analytics, surveillance and media-content. I am deeply interested in exploring the following three key areas for a comprehensive understanding of videos:
- Efficient Video Representation: The high computational demands of video data necessitate the use of efficient techniques like keyframe selection and tokenization.
- Perception and Reasoning in Videos: Understanding temporal information in video, such as frame continuity, causality, and diversity, remains a challenging problem.
- Multi-modal Learning: Audio, when encoded in video streams, introduces semantic information that complements but is distinct from visual semantics.
-
G Lim, H Kim, J Kim Y Choi, Probabilistic Vision-Language Representation for Weakly Supervised Temporal Action Localization. July 2024, ACM MM. [paper][code]
-
W Jo, G Lim, G Lee, H Kim, B Ko Y Choi, VVS: Video-to-Video Retrieval with Irrelevant Frame Suppression. February 2024, AAAI. [paper][code]
-
W Jo, G Lim, Y Hwang, G Lee, J Kim, J Yun, J Jung, Y Choi, Simultaneous Video Retrieval and Alignment. March 2023, IEEE Access. [paper]
-
J Kim, W Jo, G Lim, J Yun, S Kwak, S Jung, W Cheong, H Choo, J Seo, and Y Choi, Compression Method for MPEG CDVA Global Feature Descriptors CDVA, Journal of Broadcast Engineering. May 2022. [paper]
-
W Jo, G Lim, J Kim, J Yun, Y Choi, Exploring the Temporal Cues to Enhance Video Retrieval on Standardized CDVA. Apr 2022, IEEE Access. [paper][code]
πGoogle Scholar / π Blog / πHomepage