I may end up building something similar to a database at work. Besides that, I’m interested in writing more Rust and I will use toy databases and databases concepts as an excuse. This document is sorted by time and not by topic.
My reading notes on database-internals are on its own page, and not here.
arrow: Arrow seems to be on everybody’s mouth at the moment. Apache Arrow defines a language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware like CPUs and GPUs. The Arrow memory format also supports zero-copy reads for lightning-fast data access without serialization overhead.
Influx iox: Influx iox is the next gen storage engine for InfluxDB. They annouced it here.
Their stack is: (1) Rust; (2) apache arrow; (3) DataFusion; (4) Parquet;
Details of their arch: https://www.influxdata.com/blog/influxdb-engine/
It’s interesting that they say they now support Events data: “we’ve always had the vision that InfluxDB should be useful for event data (i.e. irregular time series) as well as metric data (i.e. regular time series)”. Their previous version had some issues with high-cardinality data (wich smells like events for me). So it seems like they fixed this with the new version! Pretty cool!!
Also, it seems like it took at least 2 and half years to get to a public usable state. So if I ever need to estimate some similar project, I would probably bet something around that 😛.
At their documentation they have some talks that may be interesting. I will watch it later.
More links on Influx IOX
Arrow-DataFusion: DataFusion is a very fast, extensible query engine for building high-quality data-centric systems in Rust, using the Apache Arrow in-memory format. DataFusion offers SQL and Dataframe APIs, excellent performance, built-in support for CSV, Parquet, JSON, and Avro, extensive customization, and a great community.
A lot of projects (besides Influx iox) also use DataFusion as a building block for distributed database. Since I’m interested in time-series at the moment:
- Ballista Distributed SQL Query Engine
- CeresDB Distributed Time-Series Database
- Greptime Open Source & Cloud Native Distributed Time Series Database
- Balista is a distributed compute platform built on DataFusion. Which may serve as building blocks for other softwares instead of writing this from scratch
Data Fusion supports different type of sources, including in-memory only
The DataFusion flow is normally SQL -> DataFrame -> LogicalPlan
Extensions of DataFusion makes the ExecutionPlan distributed
Fligh protocol would allow clients to pull the results from different machines (not needing to fetch everything from leader node/scheduler)
QueryEngine is made of three main parts:
- Frontend (the query language parser)
- Intermediary Query Representation (The logical Plan)
- Concreat execution operators
For Data Fusion those are represented by the following Rust structs:
- LogicalPlan and Exp
- ExecutablePlan and RecordBatch (this last one is from Arrow)
DataFusion reads RecordBatches (arrow) and output RecordBatches.
You can create an external table (pointing to the parquet file)
explain. You read the LogicalPLan from bellow to up (data flows up from the leaves to the root of the tree)
DataFusion also supports
explain verbose to see optimizations applied
- Declarative, you describe what you want; system figures out how
- Procedural: describe how directly (with
LogicalPlanBuilder) or DataFrame
Type Coercion: Data Fusion adds typecast automatically when possible
DataFusion is Async, Vectorized, Eager Pull, Partionaed and Multi-Core
- Arrow: standard for representing Tabular Data
- Substrait: representing relational operations on tabular data)
If Python can generate substrait plans -> pass to the query engine which could divide between different nodes -> and pass to the storage we would have a very good world 😃
- Measurements are tables in the new system
- Chunks are partioned by Date (daily for example) and by table
- Chunks are never overwritten and are sorted by age of data.
- Query Engine is DataFusion + Arrow executed as Rust async streams using tokio executor
Query Processing in IOX has three front-ends:
- Storage gRPC frontend
- SQL FrontEnd (DataFusion)
- Reorg Frontend
All of them generate DataFusion LogicalPlan that can be optimized, phusical planned and excecuted the same way.
The output changes tho:
- StorageGrpc uses gRPC output with Arrow Record Batches (SeriesFrame)
- Arrow Flight IPC (for SQL frontend)
- ReadBuffer/ParquetWriter (for Reorg Frontend)
One of the query optimzation features is to not even look at irrelevant chunks if the user is querying by time since the chunks are partioned by time (and each of of them has the metadata regarding the range of dates they have)
IOx answeres queries by combining data from parquet files + in memory cache
Flux and InfluxQl talks to IOx via gRPC calls.