General SPSS Learning Resources
One of the key to syntax writing is the Paste Button which exists in most windows of the Graphic User Interface (GUI). As far as I am concerned, the main purpose of the GUI is to facilitate writing syntax! Another key point is to make sure the commands are printed in the Output Window (see Keep the Log!)
Lets write a syntax file:
- Using the menu, load the file "employee data.sav" which came with SPSS.
- Using the data editor menu, select FILE>NEW>SYNTAX (this opens a new syntax window)
- Check the Log in the Output window, it will contain
GET FILE='C:\Program Files\SPSS\Employee data.sav'.(Of course the path in your system may be different). If you do not see the Log, refer to Keep the Log! for instructions.
- Double click the Log listing (on the right hand side of the Output Window); select and copy the
- Paste the command in the syntax window.
- Using the menu, select ANALYZE>DESCRIPTIVE>FREQUENCIES, move the variable jobcat to the Variable(s) Text Box.
- Click the Paste button. The syntax is now in the syntax window.
- Save the syntax file (using the menu of the syntax window): FILE>SAVE (or use Ctrl-S)
(For a good description of how to use the Syntax window, SPSS's journal and the Log, see Syntax Editor Window on the University Central Michigan's site.)
This is the content of the syntax file created above:
GET FILE='C:\program files\spss\employee data.sav'. FREQUENCIES VARIABLES=jobcat /ORDER= ANALYSIS.
The need to use syntax cannot be overstated. The more one uses SPSS, the more one needs to use syntax files as opposed to the "point and click" method.
Imagine John uses "point and click" to execute data transformations, add value and variable labels and run 50 procedures on a company wide sales data file. Many things may happen after the analysis is completed:
- a) John's manager finds the analysis so interesting that
- he requests the whole thing be redone but separately for each operating divisions
- the analysis will have to be done weekly from now on
- b) some sales info were missing (or were incorrect) from the file and he has to redo the analysis.
If John has saved his analysis in syntax file, he will be able to do any of the above in 2–3% of the time he would have needed to redo it manually using "point and click". It should be obvious that syntax leads to huge gains of productivity.
Even if John had not saved his analysis in a syntax file, it is easy to recover the code from the journal file. See Syntax Editor Window on the University Central Michigan's site if you do not know how to do that.
Other benefits of using syntax files are:
- documentation (the syntax file automatically documents what was done)
- reproducibility of results (this is a corollary of 1. try to redo a one hour "point and click" session!)
- batch processing (syntax files can be automatically run at night when system resources are greater)
- opens the door to macro (getting used to syntax is a prerequisite to accessing the macro world)
- allow access to all SPSS features (some features of SPSS are only available through syntax)
- efficient method of communication (SPSSX-L list and newsgroup use syntax to post answers, these "answers" can be understood by people using SPSS versions other than English.)
I am the author of SPSS Programming and Data Management, published by SPSS/IBM, you may download a free pdf version with the related examples.
The following books have been written by SPSS's Training Department and are available at $99 US each:
I have not seen the first book above so I cannot comment on its usefulness. On the other hand, I find the second one very good. Here are the Chapters' titles of this 154 pages spiral bounded book:
- Chapter 1: Introduction and Syntax Review
- Chapter 2: Basic SPSS Programming Concepts
- Chapter 3: Complex File Types
- Chapter 4: Input Programs
- Chapter 5: Advanced Data Manipulation
- Chapter 6: Introduction to Macros
- Chapter 7: Advanced Macros
- Chapter 8: Macro Tricks
The Syntax Reference Guide is also a valuable resource but its usefulness increases with the user's knowledge...
Other books are available from IBM/SPSS's site.
Of course, a good method to learn is to look at existing syntax. Look at various syntax files, even if they do not seem useful to you at the particular time.
String Manipulation Tutorial (see also Parse Data)Let's look at the code and the output of the syntax String Manipulation Tutorial. The following operations are covered here:
- Deleting leading zeros (or any other leading character)
- Replacing dots "." by commas "," (or any other char "x" by char "y")
- To delete the "/" and everything to the left (or any other ... you know what I mean?)
- To delete the "/" and everything to the right
- To concatenate strings str1 and str2
* String manipulation tutorial. * Replacing / deleting certain characters in strings; combining strings. * Raynald Levesque.
First, define some dummy data to work with.
DATA LIST FIXED /name 1-25 (A). BEGIN DATA 000John Doe /10.14.12 Mary Poppins /17.21 Billy Joe /21.25 Peter Pan /10.35 END DATA. LIST.
This is the output produced by the
NAME 000John Doe /10.14.12 Mary Poppins /17.21 Billy Joe /21.25 Peter Pan /10.35
String variables must be defined before being used.
STRING name1 TO name4 (A25). VARIABLE LABEL name 'Original value' name1 'Without leading zeros' name2 'Replace . by ,' name3 'Delete up to "/"' name4 'Delete from "/"'.
Task 1. Delete leading zeros.
COMPUTE name1=LTRIM(name,"0"). LIST name name1.
NAME NAME1 000John Doe /10.14.12 John Doe /10.14.12 Mary Poppins /17.21 Mary Poppins /17.21 Billy Joe /21.25 Billy Joe /21.25 Peter Pan /10.35 Peter Pan /10.35
Task 2. Replace dots "." by commas ",".
COMPUTE name2=name1. * The loop ensures that multiple occurences are covered. * The "+" ensures that the code would also work as an INCLUDE file. LOOP IF INDEX(name2,".")>0. + COMPUTE SUBSTR(name2,INDEX(name2,"."),1)=",". END LOOP. LIST name1 name2.
NAME1 NAME2 John Doe /10.14.12 John Doe /10,14,12 Mary Poppins /17.21 Mary Poppins /17,21 Billy Joe /21.25 Billy Joe /21,25 Peter Pan /10.35 Peter Pan /10,35
The above looks simple enough. Three lines are sufficient to replace all occurences of "." (or any other character of course) but what if you need to do this for 400 variables? The solution is a macro. See Macro Tutorial.
Task 3. Delete the "/" and everything to the left.
COMPUTE name3=SUBSTR(name1,INDEX(name1,"/")+1). LIST name1 name3.
NAME1 NAME3 John Doe /10.14.12 10.14.12 Mary Poppins /17.21 17.21 Billy Joe /21.25 21.25 Peter Pan /10.35 10.35
Task 4. Delete the "/" and everything to the right.
COMPUTE name4=SUBSTR(name1,1,INDEX(name1,"/")-1). LIST name1 name4.
NAME1 NAME4 John Doe /10.14.12 John Doe Mary Poppins /17.21 Mary Poppins Billy Joe /21.25 Billy Joe Peter Pan /10.35 Peter Pan
Task 5. Concatenate strings str1 and str2.
STRING str1 str2 str3 str4 (A2). COMPUTE str1="A". COMPUTE str2="B".
Note that the following does NOT work.
It does not work because str1 is actually equal to "A" followed by a space. Similarly, str2 is "B" followed by a space.
CONCAT(str1, str2) results in the 4 character string "A B " which* is truncated to two character strings "A " to fit the dimension of str3.
The following DOES work.
COMPUTE str4=CONCAT(RTRIM(str1),str2). LIST str1 str2 str3 str4.
STR1 STR2 STR3 STR4 A B A AB A B A AB A B A AB A B A AB
Consider performing several operations with dates:
- Create a date variable from a numeric variable such as 19901204 (see 6 for the reverse operation)
- Create a date variable from 3 numeric variables containing day, month and year.
- Convert a string containing a date into a date variable (see also 7)
- Calculate age
- Adding one day to a date
- Create a numeric variable such as 19901204 from a date variable
- Create a string variable such as 19901204 from a date variable
Note: the purpose of the following examples is to help you understand what is going on (not to write the most condensed code).
1. Create a date variable from a numeric variable such as 19901204
DATA LIST LIST /date1. BEGIN DATA 19901204 20000131 END DATA. LIST. COMPUTE day1=MOD(date1,100). COMPUTE month1=MOD(TRUNC(date1/100),100). COMPUTE year1=TRUNC(date1/10000). COMPUTE date2=DATE.DMY(day1,month1,year1). FORMATS date2(SDATE10). VARIABLE WIDTH date2(11). EXECUTE.
2. Create a date variable from 3 numeric variables containing day, month and year
DATA LIST LIST /year1 month1 day1. BEGIN DATA 1999 12 07 2000 10 18 2000 07 10 2001 02 02 END DATA. LIST. COMPUTE mydate=DATE.DMY(day1,month1,year1). FORMATS mydate(DATE11). VARIABLE WIDTH mydate(11). EXECUTE. * Using an other date format. COMPUTE mydate2=mydate. FORMATS mydate2(ADATE11). VARIABLE WIDTH mydate2(11). EXECUTE.
3. Convert a string containing a date into a date variable
DATA LIST LIST /datestr(A10). BEGIN DATA 11/26/1966 01/15/1981 END DATA. LIST. * Method 1 (a general method). COMPUTE mth=NUMBER(SUBSTR(datestr,1,2),F8.0). COMPUTE day=NUMBER(SUBSTR(datestr,4,2),F8.0). COMPUTE yr=NUMBER(SUBSTR(datestr,7),F8.0). COMPUTE mydate=DATE.DMY(day,mth,yr). FORMAT mydate(SDATE11). VARIABLE WIDTH mydate (11). EXECUTE. * The date in the above string variable has the form mm/dd/yyyy. The code works as * is if the initial format is mm.dd.yyyy or mm-dd-yyyy. It is easy to modify the above * to handle variations such as yyyy/mm/dd, dd/mm/yyyy. * method 2 (works only when data fit an existing SPSS date format). COMPUTE mydate=NUMBER(datestr,ADATE10). FORMATS mydate(ADATE10). VARIABLE WIDTH mydate(10). /* The purpose of this line is to display all 4 digits of the year in the data editor */. EXECUTE.
4. Calculate age
Date variables contain the number of seconds since October 14, 1582. (This weird date corresponds to the beginning of the Gregorian Calendar). The internal value of a date variable is the same whatever date format is used. For instance the following are true about a date variable containing the date 11/26/1966.
|Internal value||Format||What you see in the Data Editor|
|12121574400||WKYR8||48 WK 66|
Thus a command such as
COMPUTE agesec=DATE.DMY(1,7,2001) - dtbirth calculates the number of seconds between the date of birth and July 1, 2001.
To convert to number of years and fraction of years, one option is to divide by the number of seconds in a year:
COMPUTE age1=agesec/(365.25*24*60*60). In the above, a year is assumed to have 365.25 days to account for leap years.
A better way is to ask SPSS to convert the duration in days, then divide by 365.25:
COMPUTE age2=CTIME.DAYS(DATE.DMY(1,1,2001) - dtbirth)/365.25.
If you need a categorical variable (say agegr) such that agegr equals 0 when ages are between 0 and 4.99, 1 when ages are between 5 and 9.99, etc., do the following:
COMPUTE agegr=TRUNC(age2/5). VALUE LABELS agegr 0 '0-4.99' 1 '5-9.99' 2 '10-14.99'
(For a general method, see Group Data and Define Corresponding Value Labels.SPS).
5. Adding one day to a date
COMPUTE date1=date1 + 60*60*24.
6. Create a numeric variable such as 19901204 from a date variable
COMPUTE numb1=XDATE.YEAR(date1)*10000 + XDATE.MONTH(date1)*100 + XDATE.MDAY(date1).
7. Create a string variable such as 19901204 from a date variable
*Continuing from 6. STRING str1(A8). COMPUTE str1=STRING(numb1,F8.0).
Converting strings into numbers
The next examples are in the following syntax file.
See also the SPSS macro tutorial for a generalization of this approach.
DATA LIST FIXED /mydata 1-10 (A). BEGIN DATA 6,188 400 12,125.25 END DATA. LIST. COMPUTE nb=NUMBER(mydata,COMMA10). LIST. DATA LIST FIXED /mydata 1-10 (A). BEGIN DATA 6188 400 12125.25 END DATA. LIST. COMPUTE nb=NUMBER(mydata,F10). LIST. DATA LIST FIXED /mydata 1-10 (A). BEGIN DATA 6.188 400 12.125,25 END DATA. LIST. COMPUTE nb=NUMBER(mydata,DOT10). LIST. DATA LIST FIXED /mydata 1-10 (A). BEGIN DATA $6,188 $400 $12,125.25 END DATA. LIST. COMPUTE nb=NUMBER(mydata,DOLLAR10). LIST. DATA LIST FIXED /mydata 1-10 (A). BEGIN DATA 6.188% 400% 12.12525% END DATA. LIST. COMPUTE nb=NUMBER(mydata,PCT10). LIST. FORMATS nb(PCT10.5). LIST.