Until recently, propositions on the subject of intelligent service companions, like robots, were mostly user and environment independent. Our work is part of the FUI-RoboPopuli project, which concentrates on endowing entertainment companion robots with adaptive and social behavior. More precisely, we focus on the capacity of a robotic system to learn how to personalize and adapt its behavior/actions according to its interaction situation that describes (a) the current user(s), (b) the current environment settings. Our approach is based on Markov Decision Processes (MDPs) that are largely used for adaptive robot applications. In order to have, as quickly as possible, a relevant adaptive behavior whatever the interaction situation, several approaches were proposed to decrease the sample complexity required to learn the MDP model, including its reward function. To this end, we propose two learning algorithms to learn the MDP reward function through analyzing interaction traces (i.e. the interaction history between the robot and its users including their feedback regarding the robot actions). The first algorithm is direct and certain, but does not particularly exploit its knowledge to adapt to unknown situations. The second is able to detect the importance of certain situation information in the adaptation process. The detection of important information is used to generalize the learned reward function to unknown situations (i.e. unknown users and/or environment settings). In this paper, we present both learning algorithms, simulated experiments and an experiment with the EMOX robot that was upgraded during the FUI-RoboPopuli project. The results of those experiments prove that our proposed algorithms are able to learn, through interactions with simulated and real situations, a reward function that leads to an adapted and personalized behavior. We also present a scaling analysis where we define the main parameters in the proposed representation and we study the level of importance of each of these parameters in the learning and convergence complexity.