YOLO-World

Developed by
- Tencent AI Lab, ARC Lab, Tencent PCG, Huazhong University of Science and Technology
Model type
- Re-parameterizable Vision-Language Path Aggregation Network
Task
- Object Detection
Model description
- Is the next-generation of YOLO detectors, aiming for real-time open-vocabulary object detection.
- Pre-trained on large-scale vision-language datasets, including Objects365, GQA, Flickr30K, and CC3M, which enpowers YOLO-World with strong zero-shot open-vocabulary capbility and grounding ability in images.
- Achieves fast inference speeds and presents re-parameterization techniques for faster inference and deployment given users' vocabularies.