We here present parts of our ongoing work to facilitate the large-scale analysis of smooth pursuit eye movements made while viewing dynamic natural scenes. Classification of smooth pursuit episodes can be difficult in the presence of eye-tracking noise, and we thus recently proposed an algorithm that clusters gaze recordings from several observers in order to improve classification robustness. We now implemented a publicly available tool that allows for generation of a ground truth benchmark by assisted hand-labelling of video gaze data. Based on the labelling produced with the tool we present preliminary evaluation results for our smooth pursuit classification approach in comparison to state-of-the-art algorithms. Overall, human observers spend more than 12% of their viewing time performing smooth pursuit, which emphasizes the importance of investigating smooth pursuit behaviour in naturalistic contexts.