SPSS AnswerNet: Result Solution ID: 100000483 Product: SPSS Base Version: O/S: Question Type: Syntax/Batch/Scripting Question Subtype: Date and Time Title: Generating random dates from a date range Description: Q. How can I generate 100 dates randomly, drawing from the range of Jan. 1, 1920 to Dec. 31, 1989, inclusive? A. Two syntax jobs are presented below. The first samples dates in the range with replacement, so a given date may appear more than once. The second job samples dates without replacement. Code is provided at the end of the note for a rough check of the distribution of dates generated by either job. Job 1: Dates Sampled with Replacement: Dates are stored in SPSS as the number of seconds since the beginning of the Gregorian calendar, i.e. midnight on Oct. 14, 1582. To create each random date, this program generates a random number of seconds in the range implied by the dates that you specify and stores the number in the variable RDAY . The XDATE.DATE function removes the fractional part of the day so that RDAY is the number of seconds to midnight of the same day. The display of RDAY is then formatted with an SPSS date format (ADATE10 in this example). Note that the end date chosen is one day later than the last date in the desired range. This is because the uniform number generator rv.uniform(a,b) generates a number between a and b, exclusive. The largest number that can be generated is the number corresponding to 11:59:59 on Dec. 31, 1989. For the starting point, you do not need to specify the day preceding your desired start date, however. Although the exact stroke of midnight on Jan. 1, 1920 can not be generated by rv.uniform(date.dmy(1,1,1920)), any point in the following second and throughout that day can be generated. * generate 100 random dates from 1/1/1920 to 31/12/1989) . * with replacement . new file. input program. loop #i = 1 to 100. compute rday = xdate.date(rv.uniform(date.dmy(1,1,1920),date.dmy(1,1,1990))). end case. end loop. end file. end input program. execute. formats rday (adate10). Job 2. Dates Sampled Without Replacement: To sample without replacement, a file is constructed where every date appears exactly once, in a variable named RDAY. RDAY is first created as the number of days from the beginning of the Gregorian calendar to the date implied by the case's sequence number, where the first case is 1/1/1920 and each subsequent case represents an increase of one day. Using the YRMODA function saves you the work of calculating the number of days for the start and end of the loop. A random number is generated for each case as the case is added to the active file. After all cases are added, this random number is then ranked, with the rank stored in the variable RX. These ranks thus provide a random ordering for the days. Selecting cases with RX less than or equal to 100 selects the first 100 randomly-ordered days from the full set. Multiplying RDAY by 86,400 (the number of seconds in a day) places RDAY on a date scale and a standard SPSS date format can be applied. In contrast to the generation of RDAY in Job 1, the scale of the random number X in Job 2 is irrelevant. Also, the start and end points for the loop are both included in the file before sampling. You don't have to ask for Jan. 1, 1990 to include Dec. 31, 1989. * generate 100 random dates from 1/1/1920 to 31/12/1989) . * no replacement . new file. input program. loop rday = yrmoda(1920,1,1) to yrmoda(1989,12,31). compute x = rv.uniform(0,1). end case. end loop. end file. end input program. execute. rank variables = x /rank into rx. select if (rx <= 100). execute. compute rday = rday*86400 . formats rday (adate10). execute. * you can delete x and rx when you save the file. * For both jobs, the following gives you a rough check * of the distribution of your generated dates. * to get a distribution of decades collapse years to decade start. compute decade = trunc(xdate.year(rday)/10)*10. formats decade (f8). frequencies decade. Created on: 08/25/1999