'How do I load this file into Hive (Serde) [duplicate]

I'm struggling with creating a schema for a file (comma delimited) I need to load into Hive. Content looks something like this - First few columns have perfect values, sorted nicely:

2021-09-13,11111111,111111,2244,2186,xxxxx,xxxxx,2000106,xxx,2018-06-25 10:54:54,2018-06-25 07:24:00,2021-09-13 01:28:00,0,CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:N,,,false,

Then, there's a column with a huge chunk of data with end-of-line characters, commas and what not. It is wrapped in quotes:

"1. Navigate to the following URL:

https://sample.com/home.html

2. Review the server HTTP response headers:

HTTP/1.1 200 OK
Server: xxxxxxxxxxxxxxx
Pragma: No-cache
Cache-Control: no-cache, public
Expires: xxxxxxxxxxxxxxxxxxx
Content-Length: 2911
X-Cnection: close
Content-Type: text/html;charset=UTF-8
Vary: Accept-Encoding
Date: xxxxxxxxxxxxxxxxx
Connection: close
Set-Cookie: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx; xxxxxxxxxxxxxxxxxxxxxx
Set-Cookie: bm_sv=xxxxxxxxxxxxxxxxxxxxxxxx=; Domain=.xxxxxx.com; Path=/; Max-Age=4737; HttpOnly

3. Note that neither the ""X-Frame-Options"" nor ""frame-ancestors"" headers appear to be present","xxxxxxxxxxxxxxxxxxxxxxxxx.",xxxxxxxxxxxxxxxxxx,"Missing ""X-Frame-Options""",443,6,http,"xxxm.
xxxxxxxxxxxxxxxx.",Information,false,"To remediate this issue, (re)configure the web application to use xxxxxx ""self"" :

Content-Security-Policy: frame-ancestors 'self' xxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxxxxxx."

And then the rest of it:

,12.0,2021-09-13 13:03:49,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

Could anyone please advise how to set the DDL (LazySimpleSerDe, OpenCSVSerDe, RegexSerDe)?

Thanks in advance, Gal

hive hive-serde

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'How do I load this file into Hive (Serde) [duplicate]

Sources

Related Questions