Category "dataproc"

Dataproc: Can user create workers of different instance types?

scenario: master: x1 machine type workers: x2-machine type, x3-machine type. For the above scenario: AWS EMR instance fleet allows users to create different wor

Apache Beam run docker in pipeline

The apache beam pipeline (python) I'm currently working on contains a transformation which runs a docker container. While that works well during local testing w

Google cloud dataproc cluster created with an environment.yaml with a jupyter resource but environment not available as a jupyter kernel

I have created a new dataproc cluster with a specific environment.yaml. Here is the command that I have used to create that cluster: gcloud dataproc clusters cr