@infect/infect-rda-sample-storage
Version:
INFECT Sample Storage for RDA
43 lines (26 loc) • 1.45 kB
Markdown
# Infect RDA Sample Storage Service
This service stores the data for RDA and the functions processing it in the RDA
compute nodes.
## Data Sets
RDA processes data based on data sets. A data set is a set of data, that is
separated from other data. There may be for exampl a data set for each tenant or
data sets for testing purposes.
A data set has a set of fields that are stored for it. Fields are properties of
data stored on each record of the data set. Fields may differ per data set. A
data set thus defines a schema for the data stored in it.
## Data Version
A data version are n records that were added to the storage at some time. Each
time data is imported into the storage, a version is created. Aversion belongs
to a data set. Versions have a status describing if it should be loaded into the
compute nodes for processing.
## Data groups
Data groups describe a set of records. They are used to distribute data fast to
the different compute nodes. Records are loaded in groups (data groups) into the
compute nodes. This is done by computing a [rendezvous
hashes](https://en.wikipedia.org/wiki/Rendezvous_hashing) for each group in
order to assign it to the shards (compute nodes) evenly in order to distribute
the load evenly.
## Shards
Shards are shards of data that are loaded into compute nodes. When a
cluster is created, it requests shards at the storage. Shards are made of data
groups which represent groups of records.