'How to create schema with billion columns in Cassandra?

I see in Cassandra for every row key we can save billions of column name/column value pairs.

Also we need to create the schema before being able to persist data in Cassandra.

So I am wondering how are we supposed to create a schema with billion of columns?

That does not make sense? I am specifically working on a use case where I want save events generated every minute interval and every minute can have millions (if not billions) of events. So I am wondering how to model it correctly.

2022-05-22-05-55 --> <event id as column name, event value as column value>

Please help.



Solution 1:[1]

First, Cassandra needs schema of the table, so you need to provide all columns in advance (making changes in realtime is a bad idea). Second, even 10s of thousands columns is a bad idea as it will lead to significant overhead.

But really, you may just use following schema:

create table events (
  minute text,
  event_id text,
  value <some type>,
  primary key(minute, event_id));

but there are still open items:

  • it's not recommended to have more than 100k cells per partition (maybe few millions with Cassandra 4.0)
  • your partitioning schema isn't efficient - in any given minute only N nodes will handle writes while other are idle.

I would recommend to start with describing your use case, and then deciding on schema.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Alex Ott