I am trying to sample a fixed number of units from certain subgroups within
one datafile. Units consist of just an ID for each case and its size
(population). The file is sorted by size.
What I would like to do is to sample from three groups defined by size (pop).
Lets say I want 10 units from the biggest 50, 10 units from those having
pop between a and b, and 10 units from the smallests ones (less than b).
As a result, I would like to have in the original file a "filter" variable
for each subsample (1=selected, else=0).
*SOLUTION by rlevesque@videotron.ca posted to SPSSX-L on 2001/05/14.
* www.spsstools.net
*.
* Define some dummy data for illustration purposes.
INPUT PROGRAM.
LOOP id=1 TO 200.
COMPUTE pop=5+TRUNC(UNIFORM(95)).
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
LIST.
SORT CASES BY pop(D).
* The solution starts here.
* Define a macro to do the job.
*//////////////////////.
DEFINE !sample (size=!TOKENS(1) /larger=!TOKEN(1) /b=!TOKEN(1)).
* Rank the pop to know which cases are in largest 50.
RANK
VARIABLES=pop (D) /RANK INTO rpop /PRINT=YES
/TIES=MEAN .
* Assign a categ to each case.
COMPUTE categ=2.
DO IF rpop LE !larger.
COMPUTE categ=1.
ELSE IF pop LT !b.
COMPUTE categ=3.
END IF.
* Get the random samples.
COMPUTE draw=UNIFORM(1).
RANK VARIABLES=draw (A) BY categ /RANK INTO rdraw.
COMPUTE filter1=(rdraw LE !size).
VALUE LABEL filter1 1 'Selected' 0 'Not selected'.
EXECUTE.
!ENDDEFINE.
*//////////////////////.
*Example of use of the macro when we want 10 cases (size=10) from each category where.
*cat1= is out of those where rank of pop is in largest 50 (larger=50).
*cat3= is out of those where pop < 20 (b=20).
*cat2= is out of the remaining cases.
* Call the macro.
!sample size=10 larger=50 b=20.
* This crosstab shows that 10 cases from each categ were selected.
CROSSTABS
/TABLES=flag BY categ
/FORMAT= AVALUE TABLES
/CELLS= COUNT .
* Change the parameters of the macro if you want other types of samples.