Outstanding and best papers and the decision process

Dear readers:

As PC Chairs, this will be our final post with regards to the transparency of the conference organization of ACL 2017.  Today (Wed, 2 Aug), our membership had the privilege of listening to a selection of 22 papers that had been selected as outstanding papers from our community, representing approximately 1.5% of the submissions to ACL 2017.

The outstanding paper tier was created in 2016 to give recognition to a larger group of excellent papers, aside from the best paper awards, given the growing size of our community, and it is with great pleasure that we have the privilege of continuing this practice which we hope will be an annual tradition for our field.

Our 61 Area Chairs helped to select top papers in their areas that satisfied minimal score requirements and which were nominated by at least one primary reviewer.  ACs re-read finalists in their area and discussed among their groups on the merits of the nominee’s work and the primary reviews.  As PC chairs, we balanced the AC’s nominations for diversity and representativeness among areas and review consistency, allowing us to form the four Outstanding Paper sessions, run in the limited, two parallel track format to allow our membership more possibility of covering the papers of interest to them.  The 22 papers are:

Long Papers (15):

  1. Ryan Lowe, Michael Noseworthy, Iulian Vlad Serban, Nicolas Angelard-Gontier, Yoshua Bengio and Joelle Pineau
    Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses
  2. Daniel Hershcovich, Omri Abend and Ari Rappoport
    A Transition-Based Directed Acyclic Graph Parser for UCCA
  3. Maxim Rabinovich, Mitchell Stern and Dan Klein
    Abstract Syntax Networks for Code Generation and Semantic Parsing
  4. Yanzhuo Ding, Yang Liu, Huanbo Luan and Maosong Sun
    Visualizing and Understanding Neural Machine Translation
  5. Ines Rehbein and Josef Ruppenhofer
    Detecting annotation noise in automatically labelled data
  6. Suncong Zheng, Feng Wang and Hongyun Bao
    Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme
  7. Mingbin Xu, Hui Jiang and Sedtawut Watcharawittayakul
    A Local Detection Approach for Named Entity Recognition and Mention Detection
  8. Milan Gritta, Mohammad Taher Pilehvar, Nut Limsopatham and Nigel Collier
    Vancouver Welcomes You! Minimalist Location Metonymy Resolution
  9. Yasuhide Miura, Motoki Taniguchi, Tomoki Taniguchi and Tomoko Ohkuma
    Unifying Text, Metadata, and User Network Representations with a Neural Network for Geolocation Prediction
  10. Ramakanth Pasunuru and Mohit Bansal
    Multi-Task Video Captioning with Visual and Textual Entailment
  11. Jiwei Tan and Xiaojun Wan
    Abstractive Document Summarization with a Graph-Based Attentional Neural Model
  12. Ryan Cotterell and Jason Eisner
    Probabilistic Typology: Deep Generative Models of Vowel Inventories
  13. Xinchi Chen, Zhan Shi, Xipeng Qiu and Xuanjing Huang
    Adversarial Multi-Criteria Learning for Chinese Word Segmentation
  14. Shuhei Kurita, Daisuke Kawahara and Sadao Kurohashi
    Neural Joint Model for Transition-based Chinese Syntactic Analysis
  15. Jan Buys and Phil Blunsom
    Robust Incremental Neural Semantic Graph Parsing

Short Papers (7)

  1. Bogdan Ludusan, Reiko Mazuka, Mathieu Bernard, Alejandrina Cristia and Emmanuel Dupoux
    The Role of Prosody and Speech Register in Word Segmentation: A Computational Modelling Perspective
  2. Yizhong Wang and Sujian Li
    A Two-stage Parsing Method for Text-level Discourse Analysis
  3. Keisuke Sakaguchi, Matt Post and Benjamin Van Durme
    Error-repair Dependency Parsing for Ungrammatical Texts
  4. Jindřich Libovický and Jindřich Helcl
    Attention Strategies for Multi-Source Sequence-to-Sequence Learning
  5. Xinyu Hua and Lu Wang
    Understanding and Detecting Diverse Supporting Arguments on Controversial Issues
  6. Afshin Rahimi, Trevor Cohn and Timothy Baldwin
    A Neural Model for User Geolocation and Lexical Dialectology
  7. Alane Suhr, Mike Lewis, James Yeh and Yoav Artzi
    A Corpus of Compositional Language for Visual Reasoning

As is tradition, we formed a small committee of five senior scholars: Key-Sun Choi, Jing Jiang, Graham Neubig, Emily Pitler and Bonnie Webber – to select the final best papers.  Due to personal difficulties, one member of the committee was not able to render reviews, so Min joined the evaluation committee to ensure an odd numbered size to break any ties.

We ran a two stage selection process where each member first read 4-6 papers that they self-nominated to read.  After forming their initial impressions, each member recommended at most one short and one long paper from their respective pool for the second, final selection round.  Although it was difficult to select among the very high quality work, the committee was able to reach consensus rather easily and selected a best short and long paper, and in the process, decided to create a best resource paper in recognition of the central importance of linguistic resources as an enabler of downstream research.

Mohit and Heng, our co-chairs for system demonstrations, also ran a decision process for the best demonstration awards.  In addition to looking at reviewer scores, they carefully combed through the reviewer comments to arrive at their decisions for the demonstration awards.

With this preamble, we are now pleased to announce the best paper awards for ACL 2017, and discuss the deciding factors leading up to the best paper awards.

Best Demonstration Paper (Runner Up):

Guillaume Klein, Yoon Kim, Yuntian Deng, Jean Senellart and Alexander Rush, OpenNMT: Open-Source Toolkit for Neural Machine Translation

Best Demonstration Paper:

Marjan Ghazvininejad, Xing Shi, Jay Priyadarshi and Kevin Knight,
Hafez: an interactive poetry generation system

Best Resource Paper:

Alane Suhr, Mike Lewis, James Yeh and Yoav Artzi,
A Corpus of Natural Language for Visual Reasoning

In awarding Suhr et al. with the best resource award, our committee member wrote:
One thing I really appreciate is the sanity checks that went in to make sure they aren’t creating datasets that can be solved without any understanding.  I found it very reassuring that the ‘text only’ and ‘image only’ baselines almost exactly matched the majority class baseline, and that the more involved baselines like [prior] neural module network approach …  does better, but still with large room for improvement.  I also appreciated [that they showed] a variety of language issues in this dataset compared with the visual QA dataset.

Best Short Paper:

Bogdan Ludusan, Reiko Mazuka, Mathieu Bernard, Alejandrina Cristia and Emmanuel Dupoux,
The Role of Prosody and Speech Register in Word Segmentation

It is well-recognized that it is difficult to encapsulate research into the short paper format.  Ludusan et al. fits these criteria to a tee.  Our committee member writes:

Of the short papers recommended by committee members, including
myself, only ONE meets my criteria for “best short paper”, meaning that
(a) it is written as a short paper (not a long paper with details omitted);
(b) that it contains everything appropriate to a research paper; and
(c) that it is “best” with respect to being a short paper.
It is an elegant paper that fits perfectly into the short paper format. It stands out as an example of what a short paper should be, in terms of presenting a methodologically-sound experimental design, a complete set of results and well-informed conclusions.

Best Long Paper:

Ryan Cotterell and Jason Eisner,
Probabilistic Typology: Deep Generative Models of Vowel Inventories

In choosing the Cotterell and Eisner paper as the best paper, we noted the style of work was original and apt: using well established “classical” generative models combined with neural networks in a sophisticated way to get at linguistic questions).  These two paragraphs from the first reviewer give a very close summary of my views:

I was really impressed with this paper. It uses modern deep learning tools, but in a subtle and appropriate way. The computation is aimed towards a clear and meaningful goal that hasn’t been approachable in previous ways, as far as I can tell. Rather than looking at the conditional likelihood of one vowel given another, we can now evaluate the joint likelihood of a complete inventory. … [The] combination of positive and negative results is what we need to determine the bounds of what works and what doesn’t.

In addition to their recognition by the community, the best system demonstration and resource papers carry a cash award of USD 500 and the best short and long papers each carry a cash award of USD 1000.

Please join us in congratulating all of the authors of both outstanding and best papers on their contributions to furthering research in our community!  They all deserve our attention and careful reading.