Natural language generation for task-oriented dialogue systems aims to effectively realize system dialogue actions. All natural language generators (NLGs) must realize grammatical, natural and appropriate output, but in addition, generators for task-oriented dialogue must faithfully perform a specific dialogue act that conveys specific semantic information, as dictated by the dialogue policy of the system dialogue manager. Most previous work on deep learning methods for task-oriented NLG assumed generation output could be an “utterance skeleton” (i.e. delexicalized utterances), with variable names for slots, which are then replaced with actual values as part of post-processing. However, in fact, the value of slots can influence the lexical selection in the surrounding context as well as the overall sentence plan. To model this effect, in this paper, we investigate sequence-to-sequence (seq2seq) models in which slot values are included as part of the input sequence and the output surface form. Furthermore, we study whether a separate sentence planning module that decides on grouping of slot value mentions as input to the seq2seq model can result in more natural sentences than a seq2seq model that aims to jointly learn the plan and the surface realization.