-
University of Macau
- Macau, China
-
01:49
(UTC +08:00) - http://www.zdzheng.xyz
- https://orcid.org/0000-0002-2434-9050
- https://scholar.google.com/citations?user=XT17oUEAAAAJ
Highlights
Stars
🔥🔥🔥🔥 首个把OpenClaw接入企业微信的插件 / 个人微信可互通 / BOT支持流式输出 / 支持群聊@ / 支持白名单控制 / 全中文可视化配置
Text-Instructed Generation and Refinement for Template-Free Hand-Object Interaction
[SIGGRAPH ASIA 2024 TCS] AnimateLCM: Computation-Efficient Personalized Style Video Generation without Personalized Video Data
A curated list of awesome temporal action segmentation resources.
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
The official implementation of "Last-Meter Precision Navigation for UAVs: A Diffusion-Refined Aerial Visual Servoing Approach"
Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL
video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions, which is developed by the Department of Electronic Engineering at Tsin…
Training VLM agents with multi-turn reinforcement learning
"VideoAgent: All-in-One Agentic Framework for Video Understanding, Editing, and Remaking"
Official code for our paper: "SketchThinker-R1: Towards Efficient Sketch-Style Reasoning in Large Multimodal Models".
This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.
[Nature Reviews Bioengineering🔥] Application of Large Language Models in Medicine. A curated list of practical guide resources of Medical LLMs (Medical LLMs Tree, Tables, and Papers)
Repo for Chinese Medical ChatGLM 基于中文医学知识的ChatGLM指令微调
「TIP2023」Vision-Based UAV Self-Positioning in Low-Altitude Urban Environments
Official Code for "CLIP-SR: Collaborative Linguistic and Image Processing for Super-Resolution"
[ICLR 2026 🔥 ] Official implementation of "UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing"
Native Multimodal Models are World Learners
Chinese medical dialogue data 中文医疗对话数据集
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
✨ Official code for our paper: "Uncertainty-o: One Model-agnostic Framework for Unveiling Epistemic Uncertainty in Large Multimodal Models".
[IEEE TMI 2024] MultiEYE: Dataset and Benchmark for OCT-Enhanced Retinal Disease Recognition from Fundus Images
🎬 卡卡字幕助手 | VideoCaptioner - 基于 LLM 的智能字幕助手 - 视频字幕生成、断句、校正、字幕翻译全流程处理!- A powered tool for easy and efficient video subtitling.


