'How to create schema with billion columns in Cassandra?
I see in Cassandra for every row key we can save billions of column name/column value pairs.
Also we need to create the schema before being able to persist data in Cassandra.
So I am wondering how are we supposed to create a schema with billion of columns?
That does not make sense? I am specifically working on a use case where I want save events generated every minute interval and every minute can have millions (if not billions) of events. So I am wondering how to model it correctly.
2022-05-22-05-55 --> <event id as column name, event value as column value>
Please help.
Solution 1:[1]
First, Cassandra needs schema of the table, so you need to provide all columns in advance (making changes in realtime is a bad idea). Second, even 10s of thousands columns is a bad idea as it will lead to significant overhead.
But really, you may just use following schema:
create table events (
minute text,
event_id text,
value <some type>,
primary key(minute, event_id));
but there are still open items:
- it's not recommended to have more than 100k cells per partition (maybe few millions with Cassandra 4.0)
- your partitioning schema isn't efficient - in any given minute only N nodes will handle writes while other are idle.
I would recommend to start with describing your use case, and then deciding on schema.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Alex Ott |
