Simple perceptual decision-making tasks such as the Stroop and flanker tasks are popular as a method of measuring individual variation in the processing of conflicting visual stimuli--for instance, the difference in accuracy on stimuli with and without conflict. A major challenge in applying these tasks, for instance to compare two different populations of subjects, is the low reliability of the nonparametric measures of performance in the tasks. Here, we model dynamic adjustments in decision policies often seen in human behavior, thereby capturing trial-by-trial variation in decision policies, in addition to the classically used average statistics. We propose a recurrent network model, and a novel meta-learning algorithm MixMP, to capture behavioral strategies in the task in a model-agnostic manner, and to overcome small-sample learning challenges by pooling across subjects. We show that by splitting the learning into a complex, shared metamodel and simple subject-specific parameters, we learn significantly better predictive models, and also identify latent dimensions indexing the decision policy that may serve as a better measure of individual differences in the task.