SC Best Papers CollectingHPC is the underpinning for many of today’s most exciting new research areas, from machine learning to artificial intelligence to quantum computing. Working with science teams across disciplines, we are finding new ways to fight disease, combat poverty and homelessness, grow heartier crops, and better predict natural disasters. Bring your passion and energy for HPC to SC20, and together, we can be more than HPC.
International Conference for High Performance Computing, Networking, Storage, and Analysis, (2015)
Microstructures forming during ternary eutectic directional solidification processes have significant influence on the macroscopic mechanical properties of metal alloys. For a realistic simulation, we use the well established thermodynamically consistent phase-field method and im...
Cited by33BibtexViews133DOI
0
0
SC, pp.237-248, (2014)
The growing size of modern storage systems is expected to exceed billions of objects, making metadata scalability critical to overall performance. Many existing distributed file systems only focus on providing highly parallel fast access to file data, and lack a scalable metadata...
Cited by94BibtexViews166DOI
0
0
High Performance Computing, Networking, Storage and Analysis, no. 2 (2014): 1-12
We report on improvements made over the past two decades to our adaptive treecode N-body method
Cited by40BibtexViews117DOI
0
0
Scientific Programming - Selected Papers from Super Computing 2012, no. 3-4 (2012): 1-12
Task parallelism raises the level of abstraction in shared memory parallel programming to simplify the development of complex applications. However, task parallel applications can exhibit poor performance due to thread idleness, scheduling overheads, and work time inflation -- ad...
Cited by57BibtexViews114DOI
0
0
Scientific Programming - Selected Papers from Super Computing 2012, no. 3-4 (2012): 1-12
In high-performance computing on distributed-memory systems, communication often represents a significant part of the overall execution time. The relative cost of communication will certainly continue to rise as compute-density growth follows the current technology and industry t...
Cited by18BibtexViews117DOI
0
0
SC, pp.1-12, (2011)
Most pseudorandom number generators (PRNGs) scale poorly to massively parallel high-performance computation because they are designed as sequentially dependent state transformations. We demonstrate that independent, keyed transformations of counters produce a large alternative cl...
Cited by175BibtexViews97DOI
0
0
SC, pp.1-11, (2011)
Many data-intensive scientific analysis techniques require global domain traversal, which over the years has been a bottleneck for efficient parallelization across distributed-memory architectures. Inspired by MapReduce and other simplified parallel programming approaches, we hav...
Cited by36BibtexViews92DOI
0
0
SC, pp.1-11E, (2010)
General-Purpose Graphics Processing Units (GPGPUs) are promising parallel platforms for high performance computing. The CUDA (Compute Unified Device Architecture) programming model provides improved programmability for general computing on GPGPUs. However, its unique execution mo...
Cited by279BibtexViews100DOI
0
0
SC, pp.1-11, (2010)
This paper presents an in-depth analysis of the impact of system noise on large-scale parallel application performance in realistic settings. Our analytical model shows that not only collective operations but also point-to-point communications influence the application's sensitiv...
Cited by245BibtexViews85DOI
0
0
high performance computing, pp.1-11, (2009)
Anton is a recently completed special-purpose supercomputer designed for molecular dynamics (MD) simulations of biomolecular systems. The machine's specialized hardware dramatically increases the speed of MD calculations, making possible for the first time the simulation of biolo...
Cited by449BibtexViews145DOI
0
0
Portland, OR, pp.1-11, (2009)
Manycore processors with wide SIMD cores are becoming a popular choice for the next generation of throughput oriented architectures. We introduce a hardware technique called "diverge on miss" that allows SIMD cores to better tolerate memory latency for workloads with non-contiguo...
Cited by69BibtexViews95DOI
0
0
小科