[10] High performance computing & Big data

[10-1] February 28, 16:40-17:10


High Performance Data Assimiliation: A Computer Science Perspective of Parallel Data Coupling

Tom Peterka (Argonne National Laboratory)


All science problems of sufficient difficulty consist of workflows of interconnected tasks. Examples are simulations coupled with in situ analytics and visualization, multiphysics codes in complex interrelated systems, and wide-area experiments integrated with HPC simulations. As data move between tasks, they are filtered, redistributed, or completely transformed in their representation or meaning. Efficient and semantically valid data assimilation in general-purpose software---not limited to a single application or specific data model---is one of my group's research areas and the topic of this talk. Workflow software that couples heterogeneous codes and data models requires data to be transformed and reorganized multiple times along the data path. A wide variety of data models encountered in simulation, experiments, and analytics codes need to be correctly split and merged and efficiently communicated among workflow tasks. However, even simple redistribution of unstructured data across tasks with diffferent numbers of MPI ranks can easily destroy the semantics of the data. I will present our solutions to semantics-preserving data redistribution and efficient parallel data communication in this talk. Examples from molecular dynamics, cosmology, and nanoscale materials science will motivate the discussion.

  Presentation file: 10_1_T.Peterka.pdf