Quantifying the value of datum is a fundamental problem in machine learning. Besides building insights about the learning task, data valuation has applications in diverse use-cases, such as domain adaptation, corrupted sample discovery, and robust learning. To adaptively learn data values jointly with the predictive model, we propose a meta learning framework - named Data Valuator using Reinforcement Learning (DVRL). We employ a data value estimator, modeled by a deep neural network, to output how likely each datum is used in training of the predictive model. Training of the data value estimator is guided with the reinforcement signal based on a reward directly obtained from the performance on the target task. We evaluate DVRL in various applications across multiple types of datasets. DVRL yields superior quality data value estimates compared to alternative methods. The corrupted sample discovery performance of DVRL is close to optimal (i.e. as if the noisy samples are apriori known) in many regimes. For domain adaptation and robust learning tasks, outperformance of DVRL is significant - 14.6\% and 10.8\% average performance improvements, respectively.