To accelerate the use of robots in everyday tasks they must be able to cope with unstructured, unpredictable, and continuously changing environments. This requires robots that perform independently and learn both how to respond to the world and how the world responds to actions the
robots undertake. One approach to learning is reinforcement learning (RL), in which the robot acts via a process guided by reinforcements from the environment that indicate how well it is performing the required task. Common RL algorithms in robotic systems include Q and its variation Q(λ) -learning, which are model-free off-policy learning algorithms that select actions according to several control policies. Although Q and Q(λ) learning have been used in many robotic applications, these approaches must be improved. Their drawbacks include: (i) extremely expensive computability, (ii) large state-action spaces, and (iii) long learning times (until convergence to an optimal policy). This thesis presents a new collaborative learning algorithm, denoted the CQ(λ) algorithm, that is based on the Q(λ) -learning algorithm. The CQ(λ) -learning algorithm was developed, tested and applied for two frameworks: (i) learning by multiple agents, and (ii) learning by human-robot systems. In the first framework, collaboration involves taking the maximum of state-action values, i.e., the Q -value, across all learning agents at each update step. In the second framework, two levels of collaboration are defined for a human-robot learning system: (i) autonomous - the robot decides which actions to take, acting autonomously according to its Q(λ) learning function, and (ii) semiautonomous - a human operator (HO) guides the robot to take an action or a policy and the robot uses the suggestion to replace its own exploration process. The key idea here is to give the robot enough self awareness to adaptively switch its collaboration level from autonomous (self performing) to semi-autonomous (human intervention and guidance). This awareness is represented by a self test of its learning performance. The approach of variable autonomy is demonstrated in the context of an intelligent environment using mobile and fixed-arm robots. Extensive experimentation with different robotic systems in a variety of applications demonstrated the strengths and weaknesses of the algorithm. Applications specifically developed for testing the CQ(λ) -learning algorithm are demonstrated in the context of an intelligent environment using a mobile robot for navigation and a fixed-arm robot for the inspection of suspicious objects. The results revealed that CQ(λ) is superior over the standard Q(λ) algorithm. The suggested learning method is expected to reduce both the number of trials needed and the time required for a robot to learn a task.