In recent years, the Hamiltonian Monte Carlo (HMC) algorithm has been found to work more efficiently compared to other popular Markov Chain Monte Carlo (MCMC) methods (such as random walk Metropolis-Hastings) in generating samples from a posterior distribution. A general framework for HMC based on the use of graphical processing units (GPUs) is shown to greatly reduce the computing time needed for Bayesian inference. The most expensive computational tasks in HMC are the evaluation of the posterior kernel and computing its gradient with respect to the parameters of interest. One of primary goals of this article to show that by expressing each of these tasks in terms of simple matrix or element-wise operations and maintaining persistent objects in GPU memory, the computational time can be drastically reduced. By using GPU objects to perform the entire HMC simulation, most of the latency penalties associated with transferring data from main to GPU memory can be avoided. Thus, the proposed computational framework is conceptually very simple, but also is general enough to be applied to most problems that use HMC sampling. For clarity of exposition, the effectiveness of the proposed approach is demonstrated in the high-dimensional setting on a standard statistical model - multinomial regression. Using GPUs, analyses of data sets that were previously intractable for fully Bayesian approaches due to the prohibitively high computational cost are now feasible.