This paper presents a scheduling reinforcement learning algorithm designed for the execution of complex tasks. The algorithm presented here addresses the highlevel learning task of scheduling a single transfer agent (a robot arm) through a set of sub-tasks in a sequence that will achieve optimal task execution times. In lieu of fixed interprocess job transfers, the robot allows the flexibility of job movements at any point in time. Execution of a complex task was demonstrated using a Motoman UP-6 six degree of freedom fixed-arm robot, applied to a toast making system. The algorithm addressed the scheduling of a sequence of toast transitions with the objective of minimal completion time. Experiments performed examined the trade-off between exploration of the state-space and exploitation of the information already gathered, and its effects on the algorithm’s performance. Comparison of the suggested algorithm to the Monte-Carlo method and a random search method demonstrated the superiority of the algorithm over a wide range of learning conditions. The results were assessed against the optimal solution obtained by Branch and Bound.