'2025/02 글 목록

Notice

Recent Posts

Recent Comments

Link

« 2025/02 »
일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

Tags more

Archives

Today

Total

관리 메뉴

목록2025/02 (16)

My Vision, Computer Vision

[딥러닝 공부] Vision-Language Evaluation Metrics(VLM 벤치마크 평가 지표)

Evaluation Metrics for VLM BenchmarkVLM 벤치마크에서 자주 사용되는 평가 지표 5가지에 대해 알아보자.BLEU, METEOR, ROUGE, CIDEr, SPICE먼저 BLEU와 METEOR는 기계 번역(Machine Translation, MT)의 성능을 측정하기 위해 고안된 지표이다.ROUGE는 4가지 버전이 있고, 요약(Summary) 성능을 측정하기 위해 고안된 지표이다.CIDEr, SPICE는 이미지 캡셔닝 모델의 평가 지표로, 직접적으로 Vision-Langauge를 타겟팅한 메트릭이다.Candidation, Reference란?위 평가 지표들은 모두 모델의 성능을 측정하기 위해 만들어진 지표이다.따라서 모델이 출력한 답과 실제 정답을 비교하는 과정이 필요한데, 모..

공부 2025. 2. 28. 15:22

[논문 요약/리뷰] SPICE: Semantic Propositional Image Caption Evaluation

SPICE: Semantic Propositional Image Caption EvaluationThere is considerable interest in the task of automatically generating image captions. However, evaluation is challenging. Existing automatic evaluation metrics are primarily sensitive to n-gram overlap, which is neither necessary nor sufficient for the taarxiv.orgJournal : ECCV 2016Published Date : 2016년 9월 16일keyword : Evaluation Metric, SP..

Paper 2025. 2. 28. 13:44

[논문 요약/리뷰] CIDEr: Consensus-based Image Description Evaluation

CIDEr: Consensus-based Image Description EvaluationJournal : CVPR 2015Published Date : 2014년 11월 20일Keyword : CIDEr score, Evaluation Metric, Microsoft CIDEr: Consensus-based Image Description EvaluationAutomatically describing an image with a sentence is a long-standing challenge in computer vision and natural language processing. Due to recent progress in object detection, attribute classifica..

Paper 2025. 2. 27. 19:30

[논문 요약/리뷰] ROUGE: A Package for Automatic Evaluation of Summaries

ROUGE: A Package for Automatic Evaluation of SummariesPublished Date: 2004년 7월 1일 ROUGE: A Package for Automatic Evaluation of SummariesChin-Yew Lin. Text Summarization Branches Out. 2004.aclanthology.orgAbstractROUGE는 기계 요약 성능을 측정하기 위해 만들어진 평가 지표이다.ROUGE는 Recall-Oriented Understudy for Gisting Evaluation의 약자이다.기계가 요약한 내용과 인간이 요약한(이상적인) 내용의 오버래핑을 카운트하는 평가 지표이다.Methods본 논문에서 제안하는 ROUGE는 4가지로, ROU..

공부 2025. 2. 27. 19:22

[논문 요약/리뷰] METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments

METEOR 논문 링크 : https://aclanthology.org/W05-0909.pdf Published Date : 2005년 6월 1일Keyword : Evaluation Metric, METEOR score BLEU의 한계를 설명하며 그 부분을 보완한 평가 지표인 METEOR를 제안하는 논문이다.ProblemBLEU가 제안된 후 기계 번역에서 Automatic Metric에 대한 관심이 증가하고 있다.기계번역(Machine Translation, MT)에서 자동화된 메트릭의 주요 핵심은 사람이 평가하는 것과 밀접한 상관 관계가 있어야 한다는 것이다.하지만 BLEU score는 Recall을 고려하지 않는다.또한 높은 차수(~4)의 N-gram을 사용해서 단어의 순서에 대한 평가를 하는데, 이..

카테고리 없음 2025. 2. 25. 17:02

[논문 요약/리뷰] BLEU: a Method for Automatic Evaluation of Machine Translation

BLEU | Proceedings of the 40th Annual Meeting on Association for Computational LinguisticsWe present the results of an experiment on extending the automatic method of Machine Translation evaluation BLUE with statistical weights for lexical items, such as tf.idf scores. We show that this extension gives additional information about evaluated ...dl.acm.org Published Date : 2002년 7월 1일keyword : BLE..

Paper 2025. 2. 25. 16:43

이전 Prev 1 2 3 Next 다음

목록2025/02 (16)

My Vision, Computer Vision

티스토리툴바