'How to parse HTTP headers using Bash?

I need to get 2 values from a web page header that I am getting using curl. I have been able to get the values individually using:

response1=$(curl -I -s http://www.example.com | grep HTTP/1.1 | awk {'print $2'})
response2=$(curl -I -s http://www.example.com | grep Server: | awk {'print $2'})

But I cannot figure out how to grep the values separately using a single curl request like:

response=$(curl -I -s http://www.example.com)
http_status=$response | grep HTTP/1.1 | awk {'print $2'}
server=$response | grep Server: | awk {'print $2'}

Every attempt either leads to a error message or empty values. I am sure it is just a syntax issue.



Solution 1:[1]

If you wanted to extract more than a couple of headers, you could stuff all the headers into a bash associative array. Here's a simple-minded function which assumes that any given header only occurs once. (Don't use it for Set-Cookie; see below.)

# Call this as: headers ARRAY URL
headers () {
  {
    # (Re)define the specified variable as an associative array.
    unset $1;
    declare -gA $1;
    local line rest

    # Get the first line, assuming HTTP/1.0 or above. Note that these fields
    # have Capitalized names.
    IFS=$' \t\n\r' read $1[Proto] $1[Status] rest
    # Drop the CR from the message, if there was one.
    declare -gA $1[Message]="${rest%$'\r'}"
    # Now read the rest of the headers. 
    while true; do
      # Get rid of the trailing CR if there is one.
      IFS=$'\r' read line rest;
      # Stop when we hit an empty line
      if [[ -z $line ]]; then break; fi
      # Make sure it looks like a header
      # This regex also strips leading and trailing spaces from the value
      if [[ $line =~ ^([[:alnum:]_-]+):\ *(( *[^ ]+)*)\ *$ ]]; then
        # Force the header to lower case, since headers are case-insensitive,
        # and store it into the array
        declare -gA $1[${BASH_REMATCH[1],,}]="${BASH_REMATCH[2]}"
      else
        printf "Ignoring non-header line: %q\n" "$line" >> /dev/stderr
      fi
    done
  } < <(curl -Is "$2")
}

Example:

$ headers so http://stackoverflow.com/
$ for h in ${!so[@]}; do printf "%s=%s\n" $h "${so[$h]}"; done | sort
Message=OK
Proto=HTTP/1.1
Status=200
cache-control=public, no-cache="Set-Cookie", max-age=43
content-length=224904
content-type=text/html; charset=utf-8
date=Fri, 25 Jul 2014 17:35:16 GMT
expires=Fri, 25 Jul 2014 17:36:00 GMT
last-modified=Fri, 25 Jul 2014 17:35:00 GMT
set-cookie=prov=205fd7f3-10d4-4197-b03a-252b60df7653; domain=.stackoverflow.com; expires=Fri, 01-Jan-2055 00:00:00 GMT; path=/; HttpOnly
vary=*
x-frame-options=SAMEORIGIN

Note that the SO response includes one or more cookies, in Set-Cookie headers, but we can only see the last one because the naive script overwrites entries with the same header name. (As it happens, there was only one but we can't know that.) While it would be possible to augment the script to special case Set-Cookie, a better approach would probably be to provide a cookie-jar file, and use the -b and -c curl options in order to maintain it.

Solution 2:[2]

Using process substitution, (<( ... )) you are able to read into shell variable:

sh$ read STATUS SERVER < <(
      curl -sI http://www.google.com | 
      awk '/^HTTP/ { STATUS = $2 } 
           /^Server:/ { SERVER = $2 } 
           END { printf("%s %s\n",STATUS, SERVER) }'
    )

sh$ echo $STATUS
302
sh$ $ echo $SERVER
GFE/2.0

Solution 3:[3]

Improved and modernized @rici's answer with Bash >=4.2 features:

  • Use declare -n nameref variable to reference the associative array.
  • Use declare -l automatically lowercased variable value.
  • Use ${var@a} to query variable declaration attributes.
  • Change to process the input stream rather than call the curl command.
  • Make it compatible with RFC-2822's Folded Headers
#!/usr/bin/env bash

shopt -s extglob # Requires extended globbing

# Process the input headers stream into an associative ARRAY
# @Arguments
# $1: The associative array receiving headers
# @Input
# &1: The headers stream
parse_headers() {
  if [ $# -ne 1 ]; then
    printf 'Need an associative array name argument\n' >&2
    return 1
  fi
  local -n header=$1 # Nameref argument
  # Check that argument is the name of an associative array
  case ${header@a} in
    A | At) ;;
    *)
      printf \
      'Variable %s with attributes %s is not a suitable associative array\n' \
      "${!header}" "${header@a}" >&2
      return 1
      ;;
  esac
  header=() # Clear the associative array
  local -- line rest v
  local -l k # Automatically lowercased

  # Get the first line, assuming HTTP/1.0 or above. Note that these fields
  # have Capitalized names.
  IFS=$' \t\n\r' read -r header['Proto'] header['Status'] rest
  # Drop the CR from the message, if there was one.
  header['Message']="${rest%%*([[:space:]])}"
  # Now read the rest of the headers.
  while IFS=$'\r\n: ' read -d $'\r' -r line rest && [ -n "$line$rest" ]; do
    rest=${rest%%*([[:space:]])}
    rest=${rest##*([[:space:]])}
    line=${line%%*([[:space:]])}
    [ -z "$line" ] && break # Blank line is end of headers stream
    if [ -n "$rest" ]; then
      k=$line
      v=$rest
    else
      # Handle folded header
      # See: https://www.rfc-editor.org/rfc/rfc2822#section-2.2.3
      v+=" ${line##*([[:space:]])}"
    fi
    header["$k"]="$v"
  done
}

declare -A HTTP_HEADERS

parse_headers HTTP_HEADERS < <(
  curl \
    --silent \
    --head \
    --location \
    https://stackoverflow.com/q/24943170/7939871
)

for k in "${!HTTP_HEADERS[@]}"; do
  printf '[%q]=%q\n' "$k" "${HTTP_HEADERS[$k]}"
done

typeset -p HTTP_HEADERS

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 rici
Solution 2
Solution 3