Many techniques to utilize side information of users and/or items as inputs to recommenders to improve recommendation, especially on cold-start items/users, have been developed over the years. In this work, we test the approach of utilizing item side information, specifically categorical attributes, in the output of recommendation models either through multi-task learning or hierarchical classification. We first demonstrate the efficacy of these approaches for both matrix factorization and neural networks with a medium-size realword data set. We then show that they improve a neural-network based production model in an industrial-scale recommender system. We demonstrate the robustness of the hierarchical classification approach by introducing noise in building the hierarchy. Lastly, we investigate the generalizability of hierarchical classification on a simulated dataset by building two user models in which we can fully control the generative process of user-item interactions.