Sprint 2019-03-SystemScalability

August 23, 2019

Ended: August 23, 2019 Extended 2 days from Aug 21)

This sprint is focused on system scalability, In particular enabling elastic Kubernetes on AWS  Our next sprint is building on this work as we look at the system behavior under resource saturation

Objectives going in:

In this sprint we will:

() Move what was left stranded on the board across the board. (2 issues)
() We will work on usability issues for the sid Apps.
() This sprint is a week and a half versus our regular 2 week sprints.

Review of Results:

Significant features related to scaling the system were incorporated this sprint.  The next sprint will build on these as we look at system saturation behavior.

() Elastic Kubernetes was enabled. This allows the environment to adapt to the number of users on the system automatically.
() A solution to the Kubernetes worker node disk capacity was reached.  This goes a long way to preventing a user job from over utilizing worker node disk capacity.

The team was short-staffed throughout the sprint.

List of stories closed out:

Remove evicted pods automatically #571
Analyze RCE Job statistics and size of environment (epic) #640
Pods are not terminated when failed #560
Does the pod disk reservation include the size of the uncompressed docker image being run? #604
Set a default of 16GB limit (maximum) and a 4GB (or the size of the image) request (minimum) for all user jobs #603
As Dev/Ops, I want a consistent way to tag containers and container repositories #418 (edited)