Natural language conversation between human and computer is one of the most challenging AI problems, which involves language understanding, reasoning, and the use of common sense knowledge. Despite a significant amount of effort on the research in the past decades, the progress on the problem is unfortunately quite limited. One of the major reasons for that is lack of large volume of real conversation data.
In this task, we consider a much simplified version of the problem: one round of conversation formed by two short texts, with the former being an initial post from a user and the latter being a comment given by the computer. We refer to it as short text conversation (STC). Thanks to the extremely large amount of short text conversation data available on social media such as Twitter and Weibo, we anticipate that significant progress could be made in the research on the problem with the use of the big data, much like what has happened in machine translation, community question answering, etc.
One simple approach to STC, and perhaps the first approach which one would want to try, is to take it as an IR problem, maintain a large repository of short text conversation data, and develop a conversation system mainly based on IR technologies. Given an initial post A, the system searches the repository to return the most suitable comment. The comments in the repository were originally posted in response to some posts other than A, but we assume that they can be reused as a reasonable comment to A. That is, rather than pursuing generation-based STC (i.e., creating suitable comments given an initial post from the user), we tackle the simpler problem of retrieval-based STC. With advanced IR technologies and big data, even retrieval based STC systems may eventually behave like human in each round of conversation.
The key research question which would be attacked here is: Given a new post, can an appropriate (i.e. human-like) comment be returned by searching a post-comment repository? What are the challenges and limitations of retrieval-based STC?
The main purpose of the STC@NTCIR is to bring together IR, NLP and Machine Learning researchers working on or interested in natural language conversation, to share latest research results, express opinions on the related issues, and discuss future directions.
As the first step, STC is defined as an IR task as depicted in the upper figure. A repository of post-comment pairs from Sina Weibo for Chinese task (Twitter for Japanese task, see http://ntcir12.noahlab.com.hk/japanese/stc-jpn.htm for the details of the Japanese task.) is prepared. Each participating team receives the repository in advance.
(1): In the training period, they can build their own conversation system based on IR technologies, using the given post-comment pairs as training data.
(2): In the test period, each team is given 50-100 test queries (posts), that have been held out from the repository. Each team is asked to provide a ranked list of ten results (comments) for each query. The comments must be those from the repository.
(3): In the evaluation period, the results from all the participating teams are pooled and labelled with 0 (inappropriate), 1 (appropriate in some context), and 2 (appropriate) by multiple judges. Graded relevance IR measures are used for evaluation.
The original Web texts are in Chinese (for Chinese task) and Japanese (for Japanese task). Furthermore, to help non-Chinese/Japanese participants, we provide English translations of the original texts using machine translation. Non-native speakers can get a rough idea of the content from the translations and can still participate in the task.
Hang Li, Noah's Ark Lab, Huawei, Hong Kong
Tetsuya Sakai, Waseda University, Japan
Zhengdong Lu, Noah's Ark Lab, Huawei, Hong Kong
Lifeng Shang, Noah's Ark Lab, Huawei, Hong Kong
Yusuke Miyao, National Institute of Informatics, Japan
Ryuichiro Higashinaka, Nippon Telegraph and Telephone Corporation, Japan
Please follow us on twitter @ntcirstc