NVIDIA @ ISC 2013: CUDA 5.5 Released & More

As the 2013 International Supercomputing Conference continues this week, product and technology announcements continue to trickle out of the show. NVIDIA of course is no stranger to this show, and coming off their success with Titan last year are ever increasing their presence to try to capture a larger share of the lucrative HPC market. To that end NVIDIA is releasing several announcements this morning that we wanted to briefly cover. The big news out of ISC 2013 for NVIDIA is that CUDA 5.5 is now out of private beta and onto its public release candidate. Though CUDA 5.5 is just a point release for CUDA, it does bring several significant changes for developers. The biggest change of course is that this is the first version of CUDA to offer ARM support, going hand in hand with the launch of the Kayla development platform and ahead of next year’s launch of NVIDIA’s Logan SoC. CUDA on ARM is a significant point of interest for NVIDIA for a couple of reasons. On the consumer side of things NVIDIA is hoping to ultimately leverage CUDA for compute on SoC based devices, similar to what they have done in the PC space over the last half-decade. For the ISC crowd however the focus is on what this means for NVIDIA’s HPC ambitions, as an ARM based HPC environment is something that can be powered exclusively by NVIDIA processors, as opposed to today’s common scenario of pairing Tesla compute cards with x86 AMD and Intel processors. Though if nothing else, in the more immediate future this is NVIDIA ensuring they aren’t left behind by the continuing growth of ARM device sales. Along with bringing ARM support to CUDA, CUDA 5.5 also introduces cross compilation support to the toolkit, allowing ARM binaries to be built either natively on ARM systems or much more quickly on faster x86 systems. Other changes include several different improvements in MPI and HyperQ, such as MPI workload prioritization and HyperQ gaining the ability to receive jobs from multiple MPI processes on Linux systems. Finally, on a broader view CUDA 5.5 will also be bringing some small but important changes that most developers will see in one way or another. On the development side of things NVIDIA is rolling out a new guided performance analysis tool for use with their Visual Profiler tool and with the Nsight Eclipse Edition IDE in order to help developers better identify and resolve performance bottlenecks. Meanwhile on the deployment side of things NVIDIA is finally rolling out a static compilation option, which should simplify the distribution of CUDA appliations, allowing the necessary CUDA libraries to be statically linked in applications, rather than relying solely on dynamic linking and requiring that the necessary libraries be bundled with the application or the CUDA toolkit installed on the target computer. Moving on, along with the CUDA 5.5 announcements NVIDIA is also using ISC to showcase some of the latest projects being developed with NVIDIA’s GPUs. NVIDIA’s major theme for ISC is neural net computing, with a pair of announcements relating to that. On the academic front, Stanford has put together a new cluster to model neural networking for researching how the human brain learns. The 16 server cluster is capable of modeling a 11.2 billion parameter neutral network, which is 6.5 times bigger than the second largest such network, a 1.7 billion parameter model put together by Google in 2012. The cluster is the basis of a separate paper being released this week for the International Conference on Machine Learning, which is simultaneously taking place this week in Atlanta. Meanwhile on the business front, Nuance, the company behind speech recognition software such as the Dragon series, is being tapped for ISC as a neural network case study. Nuance has used neural networking techniques for years as the basis of the machine learning systems their software uses to train itself, and more recently the company has begun integrating GPUs into that work. Specifically, the company is now using NVIDIA’s GPUs to accelerate the training process, cutting down the amount of time needed to train a model from weeks to days, and in turn allowing the company to experiment with new many more models in the same period of time. The ultimate result being that the company can test and refine models more frequently, with these refined models becoming the basis of future products. Finally, although Titan is no longer the #1 supercomputer in the world – having been bumped down to merely #2 on the latest Top500 list – word comes from NVIDIA that Titan has finally passed all of its necessary acceptance tests. As is typical for supercomputers, they are unveiled and listed before undergoing full acceptance testing, which means final acceptance may not come until months later. In the case of Titan some unexpected issues were discovered with the PCIe connectors on its motherboards, with excess gold in the connectors leading to solder issues. The issue was repaired in April, and Titan was resubmitted for an acceptance testing pass, which as of last week it has since passed and finally entered full production. ]]> http://www.anandtech.com/show/7080/nvidia-isc-2013-cuda-55-more