Arc: An IR for Batch and Stream Programming

Lars Kroll, Klas Segeljakt, Paris Carbone, Christian Schulte, Seif Haridi.

[pdf | bibtex]

In big data analytics, there is currently a large number of data programming models and their respective frontends such as relational tables, graphs, tensors, and streams. This has lead to a plethora of runtimes that typically focus on the efficient execution of just a single frontend. This fragmentation manifests itself today by highly complex pipelines that bundle multiple runtimes to support the necessary models. Hence, joint optimization and execution of such pipelines across these frontend-bound runtimes is infeasible. We propose Arc as the first unified Intermediate Representation (IR) for data analytics that incorporates stream semantics based on a modern specification of streams, windows and stream aggregation, to combine batch and stream computation models. Arc extends Weld, an IR for batch computation and adds support for partitioned, out-of-order stream and window operators which are the most fundamental building blocks in contemporary data streaming.

In: Seventeenth ACM SIGPLAN International Symposium on Database Programming Languages, Phoenix, AZ, USA. ACM Press, June, 2019.

Copyright ACM, 2019. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definite version was published as described above.