'How can I loop multiple datasets through a data step or proc sql query SAS?
I have multiple datasets (100+) that all contain the same 3 columns (code_num, replicate, total_qty) each with a distinct code (code_num).
data code_num_1
code_num replicate total_qty
12345 376 45
12345 76 67
12345 943 300
.
.
data code_num_2
code_num replicate total_qty
12234 85 746
12234 900 35
12234 726 273
.
.
and etc.
I would like to run those datasets through a data step if possible:
data test;
set test_; <-- datasets will go here...
if _N_ in(&PercentileRow10,&PercentileRow20,&PercentileRow30,&PercentileRow40,&PercentileRow50,&PercentileRow60,&PercentileRow70, &PercentileRow80,&PercentileRow90);
run;
*Note: &percentilerow is a macro variable that will obtain the percentiles from the datasets. The column quantity will determine percentiles. I have this step beforehand:
proc sql no print;
create table ___ as select code_num, replicate, sum(qty) as total_qty from ____ group by code_num, replicate order by total_qty; quit;
Ideally, I would like to obtain the percentiles of each dataset and create a new dataset that will have each percentile and the associated replicate it occurred and the total quantity. Could I use a macro and do loop to run my datasets through this data set to produce new datasets?
data code_num_1_perc
percentile replicate qty
10 87 45
20 933 65
30 34 100
.
.
90 467 837
This is my ideal output for each dataset code_num_#. If possible
Solution 1:[1]
If I understand the requirements correct, the proposed methodology is flawed.
For example, the median (50th percentile) of a series such as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 is 5.5. 5.5 is not a value in the data set so how would a replicate number be selected?
My recommendation would be a different process altogether. Look into PROC RANK to see how ties are handled and how you'd like them handled. You didn't specify which variable would used to calculate the percentiles.
- Combine all data sets into one, adding in a data set identifier to uniquely identify each data set.
data combined;
length source data_set_name $50.;
set code_num_: indsname = source;
data_set_name = source;
run;
- Use PROC RANK to group into deciles
proc rank data=combined out=combined_deciles groups=10;
by data_set_name;
var total_qty;
ranks PRanks;
run;
- Get the first (or last, based on requirements) value for each rank
data want;
set combined_deciles;
by datasetName Pranks;
if first.Pranks;
run;
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
