'Reading csv file in R from Google Storage Bucket with cloudml or googleCloudStorageR

I want to read csv file from a Google Storage bucket.

With googleCloudStorageR library :

 bucket_name  <- "xxxxx"
  gfs_tmp_file <- "xxx.csv"
  # Set bucket default 
  googleCloudStorageR::gcs_global_bucket(bucket_name)
  gfs_file <- googleCloudStorageR::gcs_get_object(gfs_file) 

But here gfs_file contains raw data and I don't know how to migrate to a data.frame R

√ Downloaded and parsed gfs_data_temp.csv into R object of class: raw
   [1] 2c 44 41 54 5f 52 55 4e 2c 44 41 54 5f 46 4f 52 45 43 41 53 54 2c 4c 49 42 5f 53 4f 55 52 43 45 2c 4d 45 53 5f 4c
  [39] 4f 4e 47 49 54 55 44 45 2c 4d 45 53 5f 4c 41 54 49 54 55 44 45 2c 4d 45 53 5f 54 45 4d 50 45 52 41 54 55 52 45 2c
  [77] 4d 45 53 5f 48 55 4d 49 44 49 54 45 2c 4d 45 53 5f 50 4c 55 49 45 2c 4d 45 53 5f 56 49 54 45 53 53 45 5f 56 45 4e
  1. With cloudml library, it seems more easely :

No tested :

library(cloudml)
data_dir <- gs_data_dir("gs://{bucket_name}")
gfs_file <- file.path(data_dir, gfs_file)
mtcars_dataset <- csv_dataset(gfs_file) 

So what is the best method to download file from GC bucket and store it in a data.frame R?



Solution 1:[1]

Using googleCloudStorageR library brings raw data from the file that you read. What you can do is to insert the raw data into the dataframe as:

data_frame <- data.frame( column_name1 = vector1, column_name2 = vector2 )

Where:

  • column_name1, column_name2: determines the name for columns in data frame.
  • vector1, vector2: determines the data vector that contains data
    values for data frame columns.

You can see here more information.

Additionally, the cloudml library doesn’t mention how it brings you the data, so you should try it to see if it returns the data as you want, or you need to insert the data manually to the data frame.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Jose Gutierrez Paliza