Identifying Ambiguous Queries in Web Search
- Ruihua Song (Shanghai Jiao Tong University, China)
- Zhenxiao Luo (Fudan University, China)
- Ji-Rong Wen (Microsoft Research Asia)
- Yong Yu (Shanghai Jiao Tong University)
- Hsiao-Wuen Hon (Microsoft Research Asia)
It is widely believed that some queries submitted to search engines are by nature ambiguous (e.g., java, apple). However, few studies have investigated the questions of how many queries are ambiguous? and how can we automatically identify an ambiguous query? This paper deals with these issues. First, we construct the taxonomy of query ambiguity, and ask human annotators to manually classify queries based upon it. From manually labeled results, we found that query ambiguity is to some extent predictable. We then use a supervised learning approach to automatically classify queries as being ambiguous or not. Experimental results show that we can correctly identify 87% of labeled queries with a machine learning approach. Finally, we estimate that about 16% of queries in a real search log are ambiguous.