Apache Arrow

defines a language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware like CPUs and GPUs. The Arrow memory format also supports zero-copy reads for lightning-fast data access without serialization overhead.

Created: Apr 26, 2023 by Pradeep Gowda Updated: Mar 30, 2024 Tagged: bigdata · analytics · arrow

Apache Arrow Project: https://arrow.apache.org/

Sub Projects

Apache Arrow Ballista — Arrow DataFusion documentation – “Although Ballista is largely inspired by Apache Spark, there are some key differences.”
Apache Arrow DataFusion — Arrow DataFusion documentation
Apache Spark Datafusion Comet; github
Apache Arrow Flight
Substrait: Cross-Language Serialization for Relational Algebra

See also

Articles

Apache Arrow Uses