Improving Unsupervised Out-of-domain Detection through Pseudo Labeling and Learning
Published in Findings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
We propose a novel two-stage framework for unsupervised out-of-domain (OOD) detection that addresses the limitations of traditional one-class classification approaches. Our method leverages pseudo labeling techniques to improve detection performance in challenging scenarios.
Unsupervised out-of-domain (OOD) detection is a task aimed at discriminating whether given samples are from the in-domain or not, without the categorical labels of in-domain instances. Unlike supervised OOD, as there are no labels for training a classifier, previous works on unsupervised OOD detection adopted the one-class classification (OCC) approach, assuming that the training samples come from a single class. However, this assumption is often violated in real-world scenarios where the training data contains multiple classes. Our empirical results on three datasets show that our two-stage framework significantly outperforms baseline models in more challenging scenarios.
Recommended citation: Byounghan Lee, Jaesik Kim, Junekyu Park, Kyung-Ah Sohn. (2023). "Improving Unsupervised Out-of-domain Detection through Pseudo Labeling and Learning." Findings of the Association for Computational Linguistics: EACL 2023, pages 1031–1041.
Download Paper