일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | |||||
3 | 4 | 5 | 6 | 7 | 8 | 9 |
10 | 11 | 12 | 13 | 14 | 15 | 16 |
17 | 18 | 19 | 20 | 21 | 22 | 23 |
24 | 25 | 26 | 27 | 28 | 29 | 30 |
31 |
- blip-2
- grefcoco
- mobilenetv1
- res
- gsoc 2025
- clip
- grefcoco dataset
- 딥러닝 엔트로피
- 1차 미분 마스크
- 에지 검출
- 엔트로피란
- res paper
- referring expression segmentation
- 2호선 완주
- 논문 리뷰
- Object detection article
- vlm
- 원격 학습 안끊기게
- gres
- 이미지 필터링
- 기계학습
- object detection
- 논문 요약
- 대학원 일상
- 객체 검출
- clip adapter
- 딥러닝 목적함수
- gsoc midterm evaluations
- gsoc 후기
- 2호선 따라걷기
- Today
- Total
목록Paper (50)
My Vision, Computer Vision

LAVT: Language-Aware Vision Transformer for Referring Image SegmentationReferring image segmentation is a fundamental vision-language task that aims to segment out an object referred to by a natural language expression from an image. One of the key challenges behind this task is leveraging the referring expression for highligharxiv.orgAuthor : Yang, Zhao, et alJournal : CVPR 2022Keyword : PWAM, ..

A Survey on Hallucination in Large Vision-Language ModelsRecent development of Large Vision-Language Models (LVLMs) has attracted growing attention within the AI landscape for its practical implementation potential. However, ``hallucination'', or more specifically, the misalignment between factual visual contentarxiv.org Author : Liu, Hanchao, et al.Journal : ArxivKeyword : Survey, Vision Langau..

GRES: Generalized Referring Expression SegmentationReferring Expression Segmentation (RES) aims to generate a segmentation mask for the object described by a given language expression. Existing classic RES datasets and methods commonly support single-target expressions only, i.e., one expression refers toarxiv.orgAuthor : Liu, Chang, Henghui Ding, and Xudong Jiang.Journal : CVPR 2023Keyword : Re..

DINOv2: Learning Robust Visual Features without SupervisionThe recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar foundation models in computer vision. These models could greatly simplify the use of images in any system by producingarxiv.org Author : MLAOquab, Maxime, et al.Journal : ArxivKeyword : dinov2Published..

MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual EncodersVisual encoders are fundamental components in vision-language models (VLMs), each showcasing unique strengths derived from various pre-trained visual foundation models. To leverage the various capabilities of these encoders, recent studies incorporate multarxiv.orgAuthor : Cao, Jiajun, et al.Journal : ArxivKeyword : Knowledg..

EfficientVLM: Fast and Accurate Vision-Language Models via Knowledge Distillation and Modal-adaptive PruningPre-trained vision-language models (VLMs) have achieved impressive results in a range of vision-language tasks. However, popular VLMs usually consist of hundreds of millions of parameters which brings challenges for fine-tuning and deployment in real-worldarxiv.org Author : Wang, Tiannan, ..