The issue is you're failing to quite understand how URS
works - I recommend a look through the documentation.
Take this (extreme) example:
proc surveyselect data=sashelp.cars method=urs out=sample_cars sampsize=10000 seed=100;
run;
NOTE: The sample size, 10000, is greater than the number of sampling units, 428.
NOTE: The data set WORK.SAMPLE_CARS has 428 observations and 16 variables.
NOTE: PROCEDURE SURVEYSELECT used (Total process time):
real time 0.02 seconds
cpu time 0.03 seconds
Here I ask for 10,000 (out of 428 total records!), and get... 428 records. The important detail to pay attention to is the NumberHits
variable. That says how many times each record was sampled.
If you want one record output for each hit, meaning you want those duplicates, you can add outhits
to your PROC SURVEYSELECT
statement. From the documentation on URS:
For unrestricted random sampling, by default, the output data set contains a single copy of each unit selected, even when a unit is selected more than once, and the variable NumberHits records the number of hits (selections) for each unit. If you specify the OUTHITS option, the output data set contains m copies of a sampling unit for which NumberHits is m; for example, the output data set contains three copies of a sampling unit that is selected three times (NumberHits is three). For information about the contents of the output data set, see the section Sample Output Data Set.
Here is my example modified to do just that.
proc surveyselect data=sashelp.cars method=urs out=sample_cars sampsize=10000 seed=100 outhits;
run;
NOTE: The sample size, 10000, is greater than the number of sampling units, 428.
NOTE: The data set WORK.SAMPLE_CARS has 10000 observations and 16 variables.
NOTE: PROCEDURE SURVEYSELECT used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…