일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | ||||||
2 | 3 | 4 | 5 | 6 | 7 | 8 |
9 | 10 | 11 | 12 | 13 | 14 | 15 |
16 | 17 | 18 | 19 | 20 | 21 | 22 |
23 | 24 | 25 | 26 | 27 | 28 | 29 |
30 | 31 |
- dinov2: learning robust visual features without supervision 논문
- polling-based object probing evaluation
- 1차 미분 마스크
- dinov2: learning robust visual features without supervision
- evaluating object hallucination in large vision-language models 논문
- blip-2
- 객체 검출
- 논문 리뷰
- Object detection article
- vlm 환각
- mobilenetv1
- 논문 요약
- vlm hallucination
- 이미지 필터링
- 딥러닝 목적함수
- vlm 환각이란
- dinov2: learning robust visual features without supervision 논문 리뷰
- clip
- 엔트로피란
- vlm hallucination paper
- clip adapter
- object detection
- 딥러닝 엔트로피
- dinov2 논문 리뷰
- 에지 검출
- evaluating object hallucination in large vision-language models
- evaluating object hallucination in large vision-language models paper
- 기계학습
- 원격 학습 안끊기게
- vlm
- Today
- Total
목록2025/03/31 (4)
My Vision, Computer Vision

Evaluating Object Hallucination in Large Vision-Language ModelsInspired by the superior language abilities of large language models (LLM), large vision-language models (LVLM) have been recently explored by integrating powerful LLMs for improving the performance on complex multimodal tasks. Despite the promising progrearxiv.orgAuthor : Li, Yifan, et al.Journal : EMNLP 2023Keyword : Hallucination,..

DINOv2: Learning Robust Visual Features without SupervisionThe recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar foundation models in computer vision. These models could greatly simplify the use of images in any system by producingarxiv.org Author : MLAOquab, Maxime, et al.Journal : ArxivKeyword : dinov2Published..

MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual EncodersVisual encoders are fundamental components in vision-language models (VLMs), each showcasing unique strengths derived from various pre-trained visual foundation models. To leverage the various capabilities of these encoders, recent studies incorporate multarxiv.orgAuthor : Cao, Jiajun, et al.Journal : ArxivKeyword : Knowledg..

EfficientVLM: Fast and Accurate Vision-Language Models via Knowledge Distillation and Modal-adaptive PruningPre-trained vision-language models (VLMs) have achieved impressive results in a range of vision-language tasks. However, popular VLMs usually consist of hundreds of millions of parameters which brings challenges for fine-tuning and deployment in real-worldarxiv.org Author : Wang, Tiannan, ..