'splitstring by space inmultiple columns in sas

Hi I have one doubt in sas How to split string into multiple columns in sas? Here before first space value consider as firstname and last space after values consider as lastname and between first and lastspace values consider as middle name.

data my_data1;
    input name $500.;
    datalines;
Andy Lincoln Bernard ravni
Barry Michael
Chad Simpson Smith
Eric
Frank Giovanni Goodwill
;
run;
proc print data=my_data1;

based on data expecte out like below :

Fname | Middlename            | lname 
Andy  | Lincoln Bernard       |ravni  
Barry |                      |Michael
Chad | Simpson               |Smith
Eric |                       |
Frank|Giovanni               |Goodwill

    

I tried like below

data my_data2;
    set my_data1;
    Fname=scan(name, 1, ' ');
    Middlename=scan(name, 2, ' ');
    Lname=scan(name, -1, ' ');
run;

proc print data=my_data2;

above logic not give expected out put.

can you please tell me how to write code achive this task in sas



Solution 1:[1]

Code:

data want;
    length first_name middle_name last_name $50.;

    set have;

    n_names = countw(name);

    if(n_names) = 1 then first_name = name;
        else if(n_names = 2) then do;
            first_name = scan(name, 1);
            last_name  = scan(name, -1);
        end;
            else do;
                first_name  = scan(name, 1);
                last_name   = scan(name, -1);
                middle_name = substr(name, length(first_name)+2, length(name) - (length(first_name) + length(last_name))-2 );
            end;
run;

How it works

We know:

  • If there's one word, it's a first name
  • If there are two words, it's a first and last name
  • If there are three or more words, it's a first, last, and middle name

To get the middle name, we know:

  • Where the first name starts and how long it is
  • Where the last name starts and how long it is
  • How long the entire name is

By simply doing some subtraction, we can get a substring of the middle name:

Len  ----------------- 17
     ----5        ---4
     First Middle Last
Pos        7    12

The length of the string is 17. "Middle" starts at 7 and ends at 12. We can get the length of the middle name by simply substracting the lengths of the first and last names from the total length of the string. We subtract 2 to account for the space at the end of the middle name.

17 - (5 + 4) - 2 = 6

Our start position is 5 + 2 (i.e. the first name + 2) to account for the space. Translating this to substr:

substr(name, length(first_name)+2, length(name) - (length(first_name) + length(last_name))-2 )

Solution 2:[2]

Adapted from How to separate first name and middle name and last name

data want;
   set my_data1;
   length first middle middle1 middle2 last $ 40;
   array parts[*] first middle1 middle2 last;

   do i = 1 to countw(name);
      if i = countw(name) and i < dim(parts) then do;
         parts[dim(parts)] = scan(name, i);
      end;
      else do;
         parts[i] = scan(name, i);
      end;
   end;
   
   if middle1 ne "" and middle2 ne "" then middle = catx(" ", middle1, middle2);
   else middle = middle1;
   if first = "" and last ne "" then do;
        first = last;
        last = "";
   end;

   drop name i middle1 middle2;
run;

enter image description here

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Kermit