Recent publications

Price D, Clark M, Barsdell B, Babich R, Greenhill L. Optimizing performance per watt on GPUs in High Performance Computing: temperature, frequency and voltage effects. -- [Internet]. Submitted. Publisher's VersionAbstract

The magnitude of the real-time digital signal processing challenge attached to large radio astronomical antenna arrays motivates use of high performance computing (HPC) systems. The need for high power efficiency (performance per watt) at remote observatory sites parallels that in HPC broadly, where efficiency is an emerging critical metric. We investigate how the performance per watt of graphics processing units (GPUs) is affected by temperature, core clock frequency and voltage. Our results highlight how the underlying physical processes that govern transistor operation affect power efficiency. In particular, we show experimentally that GPU power consumption grows non-linearly with both temperature and supply voltage, as predicted by physical transistor models. We show lowering GPU supply voltage and increasing clock frequency while maintaining a low die temperature increases the power efficiency of an NVIDIA K20 GPU by up to 37-48% over default settings when running xGPU, a compute-bound code used in radio astronomy. We discuss how temperature-aware power models could be used to reduce power consumption for future HPC installations. Automatic temperature-aware and application-dependent voltage and frequency scaling (T-DVFS and A-DVFS) may provide a mechanism to achieve better power efficiency for a wider range of codes running on GPUs.

Kocz J, Greenhill L, Barsdell B, Price D, Bernardi G, Bourke S, Clark M, et al. Digital Signal Processing using Stream High Performance Computing: A 512-input Broadband Correlator for Radio Astronomy. JAI [Internet]. 2015;4 (1) :id. Publisher's VersionAbstract

A "large-N" correlator that makes use of Field Programmable Gate Arrays and Graphics Processing Units has been deployed as the digital signal processing system for the Long Wavelength Array station at Owens Valley Radio Observatory (LWA-OV), to enable the Large Aperture Experiment to Detect the Dark Ages (LEDA). The system samples a ~100MHz baseband and processes signals from 512 antennas (256 dual polarization) over a ~58MHz instantaneous sub-band, achieving 16.8Tops/s and 0.236 Tbit/s throughput in a 9kW envelope and single rack footprint. The output data rate is 260MB/s for 9 second time averaging of cross-power and 1 second averaging of total-power data. At deployment, the LWA-OV correlator was the largest in production in terms of N and is the third largest in terms of complex multiply accumulations, after the Very Large Array and Atacama Large Millimeter Array. The correlator's comparatively fast development time and low cost establish a practical foundation for the scalability of a modular, heterogeneous, computing architecture.

Bernardi G, McQuinn M, Greenhill L. Foreground Model and Antenna Calibration Errors in the Measurement of the Sky-averaged lambda21 cm Signal at z~20. ApJ [Internet]. 2015;799 (1) :id.90. Publisher's VersionAbstract

The most promising near-term observable of the cosmic dark age prior to widespread reionization (z ~ 15-200) is the sky-averaged λ21 cm background arising from hydrogen in the intergalactic medium. Though an individual antenna could in principle detect the line signature, data analysis must separate foregrounds that are orders of magnitude brighter than the λ21 cm background (but that are anticipated to vary monotonically and gradually with frequency, e.g., they are considered "spectrally smooth"). Using more physically motivated models for foregrounds than in previous studies, we show that the intrinsic spectral smoothness of the foregrounds is likely not a concern, and that data analysis for an ideal antenna should be able to detect the λ21 cm signal after subtracting a ~fifth-order polynomial in log ν. However, we find that the foreground signal is corrupted by the angular and frequency-dependent response of a real antenna. The frequency dependence complicates modeling of foregrounds commonly based on the assumption of spectral smoothness. Our calculations focus on the Large-aperture Experiment to detect the Dark Age, which combines both radiometric and interferometric measurements. We show that statistical uncertainty remaining after fitting antenna gain patterns to interferometric measurements is not anticipated to compromise extraction of the λ21 cm signal for a range of cosmological models after fitting a seventh-order polynomial to radiometric data. Our results generalize to most efforts to measure the sky-averaged spectrum.

Kocz J, Greenhill L, Barsdell B, Bernardi G, Jameson A, Clark M, Craig J, et al. A Scalable FPGA/GPU FX Correlator. JAI [Internet]. 2014;3 (1) :id.1450002-330. Publisher's VersionAbstract
Radio astronomical imaging arrays comprising large numbers of antennas, O(102–103), have posed a signal processing challenge because of the required O(N2) cross correlation of signals from each antenna and requisite signal routing. This motivated the implementation of a Packetized Correlator architecture that applies Field Programmable Gate Arrays (FPGAs) to the O(N) "F-stage" transforming time domain to frequency domain data, and Graphics Processing Units (GPUs) to the O(N2) "X-stage" performing an outer product among spectra for each antenna. The design is readily scalable to at least O(103) antennas. Fringes, visibility amplitudes and sky image results obtained during field testing are presented.
Bernardi G, Greenhill L, Mitchel D, Ord S, et al. A 189 MHz, 2400 deg2 Polarization Survey with the Murchison Widefield Array 32-element Prototype. ApJ [Internet]. 2013;771 (2) :id.105. Publisher's VersionAbstract

We present a Stokes I, Q and U survey at 189 MHz with the Murchison Widefield Array 32 element prototype covering 2400 deg2. The survey has a 15.6 arcmin angular resolution and achieves a noise level of 15 mJy beam–1. We demonstrate a novel interferometric data analysis that involves calibration of drift scan data, integration through the co-addition of warped snapshot images, and deconvolution of the point-spread function through forward modeling. We present a point source catalog down to a flux limit of 4 Jy. We detect polarization from only one of the sources, PMN J0351-2744, at a level of 1.8% ± 0.4%, whereas the remaining sources have a polarization fraction below 2%. Compared to a reported average value of 7% at 1.4 GHz, the polarization fraction of compact sources significantly decreases at low frequencies. We find a wealth of diffuse polarized emission across a large area of the survey with a maximum peak of ~13 K, primarily with positive rotation measure values smaller than +10 rad m–2. The small values observed indicate that the emission is likely to have a local origin (closer than a few hundred parsecs). There is a large sky area at α ≥ 2h30m where the diffuse polarized emission rms is fainter than 1 K. Within this area of low Galactic polarization we characterize the foreground properties in a cold sky patch at (α, δ) = (4h, –27fdg6) in terms of three-dimensional power spectra.

Clark M, LaPlante P, Greenhill L. Accelerating radio astronomy cross-correlation with graphics processing units. IJHPCA [Internet]. 2013;27 :178-192. Publisher's VersionAbstract

We present a highly parallel implementation of the cross-correlation of time-series data using graphics processing units (GPUs), which is scalable to hundreds of independent inputs and suitable for the processing of signals from ‘large-Formula’ arrays of many radio antennas. The computational part of the algorithm, the X-engine, is implemented efficiently on NVIDIA’s Fermi architecture, sustaining up to 79% of the peak single-precision floating-point throughput. We compare performance obtained for hardware- and software-managed caches, observing significantly better performance for the latter. The high performance reported involves use of a multi-level data tiling strategy in memory and use of a pipelined algorithm with simultaneous computation and transfer of data from host to device memory. The speed of code development, flexibility, and low cost of the GPU implementations compared with application-specific integrated circuit (ASIC) and field programmable gate array (FPGA) implementations have the potential to greatly shorten the cycle of correlator development and deployment, for cases where some power-consumption penalty can be tolerated.

Greenhill L, Goddi C, Chandler C, Matthews L, Humphreys E. Dynamical Evidence for a Magnetocentrifugal Wind from a 20 Msun; Binary Young Stellar Object. ApJL [Internet]. 2013;770 (2) :id.32. Publisher's VersionAbstract

In Orion BN/KL, proper motions of λ7 mm vibrationally excited SiO masers trace the rotation of a nearly edge-on disk and a bipolar wide-angle outflow 10-100 AU from radio source I, a binary young stellar object of ~20 M . Here we map ground-state λ7 mm SiO emission with the Very Large Array and track proper motions over 9 yr. The innermost and strongest emission lies in two extended arcs bracketing Source I. The proper motions trace a northeast-southwest bipolar outflow 100-1000 AU from Source I with a median three-dimensional motion of ~18 km s–1. An overlying distribution of λ1.3 cm H2O masers betrays similar flow characteristics. Gas dynamics and emission morphology traced by the masers suggest the presence of a magnetocentrifugal disk wind. Reinforcing evidence lies in the colinearity of the flow, apparent rotation across the flow parallel to the disk rotation, and recollimation that narrows the flow opening angle ~120 AU downstream. The arcs of ground-state SiO emission may mark the transition point to a shocked super-Alfvénic outflow.

Humphreys E, Reid M, Moran J, Greenhill L, Argon A. Toward a New Geometric Distance to the Active Galaxy NGC 4258. III. Final Results and the Hubble. ApJ [Internet]. 2013;775 (1) :id.13. Publisher's VersionAbstract

We report a new geometric maser distance estimate to the active galaxy NGC 4258. The data for the new model are maser line-of-sight (LOS) velocities and sky positions from 18 epochs of very long baseline interferometry observations, and LOS accelerations measured from a 10 yr monitoring program of the 22 GHz maser emission of NGC 4258. The new model includes both disk warping and confocal elliptical maser orbits with differential precession. The distance to NGC 4258 is 7.60 ± 0.17 ± 0.15 Mpc, a 3% uncertainty including formal fitting and systematic terms. The resulting Hubble constant, based on the use of the Cepheid variables in NGC 4258 to recalibrate the Cepheid distance scale, is H 0 = 72.0 ± 3.0 km s–1 Mpc–1.