'SaS 9.4: How to use different weights on the same variable without datastep or proc sql

I can't find a way to summarize the same variable using different weights.

I try to explain it with an example (of 3 records):

data pippo;
    a=10;  
    wgt1=0.5;  
    wgt2=1;  
    wgt3=0;
    output;
    a=3;  
    wgt1=0;  
    wgt2=0;  
    wgt3=1;
    output;
    a=8.9;  
    wgt1=1.2;  
    wgt2=0.3;  
    wgt3=0.1;
    output;
run; 

I tried the following:

proc summary data=pippo missing nway;  
      var a /weight=wgt1;  
      var a /weight=wgt2;  
      var a /weight=wgt3;  
output out=pluto (drop=_freq_ _type_) sum()=;  
run;

Obviously it gives me a warning because I used the same variable "a" (I can't rename it!).

I've to save a huge amount of data and not so much physical space and I should construct like 120 field (a0-a6,b0-b6 etc) that are the same variables just with fixed weight (wgt0-wgt5).

I want to store a dataset with 20 columns (a,b,c..) and 6 weight (wgt0-wgt5) and, on demand, processing a "summary" without an intermediate datastep that oblige me to create 120 fields.

Due to the huge amount of data (more or less 55Gb every month) I'd like also not to use proc sql statement:

proc sql;  
create table pluto  
as select sum(db.a * wgt1) as a0, sum(db.a * wgt1) as a1 , etc.  
quit;

There is a "Super proc summary" that can summarize the same field with different weights?

Thanks in advance,
Paolo

sas


Solution 1:[1]

I think there are a few options. One is the data step view that data_null_ mentions. Another is just running the proc summary however many times you have weights, and either using ods output with the persist=proc or 20 output datasets and then setting them together.

A third option, though, is to roll your own summarization. This is advantageous in that it only sees the data once - so it's faster. It's disadvantageous in that there's a bit of work involved and it's more complicated.

Here's an example of doing this with sashelp.baseball. In your actual case you'll want to use code to generate the array reference for the variables, and possibly for the weights, if they're not easily creatable using a variable list or similar. This assumes you have no CLASS variable, but it's easy to add that into the key if you do have a single (set of) class variable(s) that you want NWAY combinations of only.

data test;
  set sashelp.baseball;
  array w[5];
  do _i = 1 to dim(w);
    w[_i] = rand('Uniform')*100+50;    
  end;
  output;
run;



data want;
  set test end=eof;
  i = .;
  length varname $32;
  sumval = 0 ;
  sum=0;
  if _n_ eq 1 then do;
    declare hash h_summary(suminc:'sumval',keysum:'sum',ordered:'a');;
    h_summary.defineKey('i','varname');   *also would use any CLASS variable in the key;
    h_summary.defineData('i','varname');  *also would include any CLASS variable in the key;
    h_summary.defineDone();
  end;
  array w[5];                        *if weights are not named in easy fashion like this generate this with code;
  array vars[*] nHits nHome nRuns;   *generate this with code for the real dataset;
  do i = 1 to dim(w);
    do j = 1 to dim(vars);
    
      varname = vname(vars[j]);
      sumval = vars[j]*w[i];
      rc = h_summary.ref();
      if i=1 then put varname= sumval= vars[j]= w[i]=;
    end;
  end;
  
  if eof then do;
     rc = h_summary.output(dataset:'summary_output');
  end;
run;

One other thing to mention though... if you're doing this because you're doing something like jackknife variance estimation or that sort of thing, or anything that uses replicate weights, consider using PROC SURVEYMEANS which can handle replicate weights for you.

Solution 2:[2]

You can SCORE your data set using a customized SCORE data set that you can generate

with a data step.

options center=0;
data pippo;
   retain a 10 b 1.75 c 5 d 3 e 32;
   run;

data score;
   if 0 then set pippo;
   array v[*] _numeric_;
   retain _TYPE_ 'SCORE';
   length _name_ $32;
   array wt[3] _temporary_ (.5 1 .333);
   do i = 1 to dim(v);
      call missing(of v[*]);
      do j = 1 to dim(wt);
         _name_ = catx('_',vname(v[i]),'WGT',j);
         v[i] = wt[j];
         output;
         end;
      end;
   drop i j;
   run;
proc print;[enter image description here][1]
   run;
proc score data=pippo score=score;
   id a--e;
   var a--e;
   run;
proc print;
   run;

proc means stackods sum;
   ods exclude summary;
   ods output summary=summary;
   run;
proc print;
   run;
   

enter image description here

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Joe
Solution 2 Data Null