Service robots have become increasingly important subjects in our lives. However, they are still facing problems like adaptability to their users. While major work has focused on intelligent service robots, the proposed approaches were mostly user independent. Our work is part of the FUI-RoboPopuli project, which concentrates on endowing entertainment companion robots with adaptive and social behaviour. In particular, we are interested in robots that are able to learn and plan so that they adapt and personalize their behaviour according to their users. Markov Decision Processes (MDPs) are largely used for adaptive robots applications. However, one challenging point is reducing the sample complexity required to learn an MDP model, including the reward function. In this article, we present our contribution regarding the representation and the learning of the reward function through analysing interaction traces (i.e. the interaction history between the robot and their users, including users’ feedback). Our approach permits to generalise the learned rewards so that when new users are introduced, the robot may quickly adapt using what it learned from previous experiences with other users. We propose, in this article, two algorithms to learn the reward function. The first is direct and certain; the robot applies with a user what it learned during interaction with same kind of users (i.e. users with similar profiles). The second algorithm generalises what it learns to be applied to all kinds of users. Through simulation, we show that the generalised algorithm converges to an optimal reward function with less than half the samples needed by the direct algorithm.