Skip to main content

YOLO-World

  • Developed by
    • Tencent AI Lab, ARC Lab, Tencent PCG, Huazhong University of Science and Technology
  • Model type
    • Re-parameterizable Vision-Language Path Aggregation Network
  • Task
    • Object Detection
  • Model description
    • Is the next-generation of YOLO detectors, aiming for real-time open-vocabulary object detection.
    • Pre-trained on large-scale vision-language datasets, including Objects365, GQA, Flickr30K, and CC3M, which enpowers YOLO-World with strong zero-shot open-vocabulary capbility and grounding ability in images.
    • Achieves fast inference speeds and presents re-parameterization techniques for faster inference and deployment given users' vocabularies.