Learning Syntax | Raynald's SPSS Tools

Index

General SPSS Learning Resources

Key Items (includes why syntax is a must)
Books (free download!)
Mailing List

Syntax Tutorials

String Manipulation
Dates, Time and Age Tutorial
Converting strings into numbers

General SPSS Learning Resources

Key Items

One of the key to syntax writing is the Paste Button which exists in most windows of the Graphic User Interface (GUI). As far as I am concerned, the main purpose of the GUI is to facilitate writing syntax! Another key point is to make sure the commands are printed in the Output Window (see Keep the Log!)

Lets write a syntax file:

Using the menu, load the file "employee data.sav" which came with SPSS.
Using the data editor menu, select FILE>NEW>SYNTAX (this opens a new syntax window)
Check the Log in the Output window, it will contain GET FILE='C:\Program Files\SPSS\Employee data.sav'. (Of course the path in your system may be different). If you do not see the Log, refer to Keep the Log! for instructions.
Double click the Log listing (on the right hand side of the Output Window); select and copy the GET FILE command.
Paste the command in the syntax window.
Using the menu, select ANALYZE>DESCRIPTIVE>FREQUENCIES, move the variable jobcat to the Variable(s) Text Box.
Click the Paste button. The syntax is now in the syntax window.
Save the syntax file (using the menu of the syntax window): FILE>SAVE (or use Ctrl-S)

(For a good description of how to use the Syntax window, SPSS's journal and the Log, see Syntax Editor Window on the University Central Michigan's site.)

This is the content of the syntax file created above:

GET FILE='C:\program files\spss\employee data.sav'.
FREQUENCIES
    VARIABLES=jobcat 
    /ORDER= ANALYSIS.

The need to use syntax cannot be overstated. The more one uses SPSS, the more one needs to use syntax files as opposed to the "point and click" method.

Imagine John uses "point and click" to execute data transformations, add value and variable labels and run 50 procedures on a company wide sales data file. Many things may happen after the analysis is completed:

a) John's manager finds the analysis so interesting that
- he requests the whole thing be redone but separately for each operating divisions
- the analysis will have to be done weekly from now on
b) some sales info were missing (or were incorrect) from the file and he has to redo the analysis.

If John has saved his analysis in syntax file, he will be able to do any of the above in 2–3% of the time he would have needed to redo it manually using "point and click". It should be obvious that syntax leads to huge gains of productivity.

Even if John had not saved his analysis in a syntax file, it is easy to recover the code from the journal file. See Syntax Editor Window on the University Central Michigan's site if you do not know how to do that.

Other benefits of using syntax files are:

documentation (the syntax file automatically documents what was done)
reproducibility of results (this is a corollary of 1. try to redo a one hour "point and click" session!)
batch processing (syntax files can be automatically run at night when system resources are greater)
opens the door to macro (getting used to syntax is a prerequisite to accessing the macro world)
allow access to all SPSS features (some features of SPSS are only available through syntax)
efficient method of communication (SPSSX-L list and newsgroup use syntax to post answers, these "answers" can be understood by people using SPSS versions other than English.)

Books

I am the author of SPSS Programming and Data Management, published by SPSS/IBM, you may download a free pdf version with the related examples.

The following books have been written by SPSS's Training Department and are available at $99 US each:

I have not seen the first book above so I cannot comment on its usefulness. On the other hand, I find the second one very good. Here are the Chapters' titles of this 154 pages spiral bounded book:

Chapter 1: Introduction and Syntax Review
Chapter 2: Basic SPSS Programming Concepts
Chapter 3: Complex File Types
Chapter 4: Input Programs
Chapter 5: Advanced Data Manipulation
Chapter 6: Introduction to Macros
Chapter 7: Advanced Macros
Chapter 8: Macro Tricks
Exercises

The Syntax Reference Guide is also a valuable resource but its usefulness increases with the user's knowledge...

Other books are available from IBM/SPSS's site.

Mailing List

Of course, a good method to learn is to look at existing syntax. Look at various syntax files, even if they do not seem useful to you at the particular time.

Syntax Tutorials

String Manipulation Tutorial (see also Parse Data)

Let's look at the code and the output of the syntax String Manipulation Tutorial. The following operations are covered here:

Deleting leading zeros (or any other leading character)
Replacing dots "." by commas "," (or any other char "x" by char "y")
To delete the "/" and everything to the left (or any other ... you know what I mean?)
To delete the "/" and everything to the right
To concatenate strings str1 and str2

* String manipulation tutorial.
* Replacing / deleting certain characters in strings; combining strings.
* Raynald Levesque.

First, define some dummy data to work with.

DATA LIST FIXED /name 1-25 (A).
BEGIN DATA
000John Doe /10.14.12
Mary Poppins /17.21
Billy Joe /21.25
Peter Pan /10.35
END DATA.

LIST.

This is the output produced by the LIST command.

NAME
000John Doe /10.14.12
Mary Poppins /17.21
Billy Joe /21.25
Peter Pan /10.35

String variables must be defined before being used.

STRING name1 TO name4 (A25).
VARIABLE LABEL name 'Original value'
    name1 'Without leading zeros' 
    name2 'Replace . by ,' 
    name3 'Delete up to "/"' 
    name4 'Delete from "/"'.

Task 1. Delete leading zeros.

COMPUTE name1=LTRIM(name,"0").
LIST name name1.

NAME                   NAME1
000John Doe /10.14.12  John Doe /10.14.12
Mary Poppins /17.21    Mary Poppins /17.21
Billy Joe /21.25       Billy Joe /21.25
Peter Pan /10.35       Peter Pan /10.35

Task 2. Replace dots "." by commas ",".

COMPUTE name2=name1.
* The loop ensures that multiple occurences are covered.
* The "+" ensures that the code would also work as an INCLUDE file.
LOOP IF INDEX(name2,".")>0.
+ COMPUTE SUBSTR(name2,INDEX(name2,"."),1)=",".
END LOOP.

LIST name1 name2.

NAME1                 NAME2
John Doe /10.14.12    John Doe /10,14,12
Mary Poppins /17.21   Mary Poppins /17,21
Billy Joe /21.25      Billy Joe /21,25
Peter Pan /10.35      Peter Pan /10,35

The above looks simple enough. Three lines are sufficient to replace all occurences of "." (or any other character of course) but what if you need to do this for 400 variables? The solution is a macro. See Macro Tutorial.

Task 3. Delete the "/" and everything to the left.

COMPUTE name3=SUBSTR(name1,INDEX(name1,"/")+1).
LIST name1 name3.

NAME1              NAME3
John Doe /10.14.12   10.14.12
Mary Poppins /17.21  17.21
Billy Joe /21.25     21.25
Peter Pan /10.35     10.35

Task 4. Delete the "/" and everything to the right.

COMPUTE name4=SUBSTR(name1,1,INDEX(name1,"/")-1).
LIST name1 name4.

NAME1  NAME4
John Doe /10.14.12    John Doe
Mary Poppins /17.21   Mary Poppins
Billy Joe /21.25      Billy Joe
Peter Pan /10.35      Peter Pan

Task 5. Concatenate strings str1 and str2.

STRING str1 str2 str3 str4 (A2).
COMPUTE str1="A".
COMPUTE str2="B".

Note that the following does NOT work.

 COMPUTE str3=CONCAT(str1,str2).

It does not work because str1 is actually equal to "A" followed by a space. Similarly, str2 is "B" followed by a space.

Thus CONCAT(str1, str2) results in the 4 character string "A B " which* is truncated to two character strings "A " to fit the dimension of str3.

The following DOES work.

COMPUTE str4=CONCAT(RTRIM(str1),str2).
LIST str1 str2 str3 str4.

STR1 STR2 STR3 STR4
A    B    A    AB
A    B    A    AB
A    B    A    AB
A    B    A    AB

Dates Tutorial

Consider performing several operations with dates:

Create a date variable from a numeric variable such as 19901204 (see 6 for the reverse operation)
Create a date variable from 3 numeric variables containing day, month and year.
Convert a string containing a date into a date variable (see also 7)
Calculate age
Adding one day to a date
Create a numeric variable such as 19901204 from a date variable
Create a string variable such as 19901204 from a date variable

Note: the purpose of the following examples is to help you understand what is going on (not to write the most condensed code).

1. Create a date variable from a numeric variable such as 19901204

DATA LIST LIST /date1.
BEGIN DATA
19901204
20000131
END DATA.
LIST.
COMPUTE day1=MOD(date1,100).
COMPUTE month1=MOD(TRUNC(date1/100),100).
COMPUTE year1=TRUNC(date1/10000).
COMPUTE date2=DATE.DMY(day1,month1,year1).
FORMATS date2(SDATE10).
VARIABLE WIDTH date2(11).
EXECUTE.

2. Create a date variable from 3 numeric variables containing day, month and year

DATA LIST LIST /year1 month1 day1.
BEGIN DATA
1999 12 07
2000 10 18
2000 07 10
2001 02 02
END DATA.
LIST.
COMPUTE mydate=DATE.DMY(day1,month1,year1).
FORMATS mydate(DATE11).
VARIABLE WIDTH mydate(11).
EXECUTE.

* Using an other date format.
COMPUTE mydate2=mydate.
FORMATS mydate2(ADATE11).
VARIABLE WIDTH mydate2(11).
EXECUTE.

3. Convert a string containing a date into a date variable

DATA LIST LIST /datestr(A10).
BEGIN DATA
11/26/1966
01/15/1981
END DATA.
LIST.

* Method 1 (a general method).
COMPUTE mth=NUMBER(SUBSTR(datestr,1,2),F8.0).
COMPUTE day=NUMBER(SUBSTR(datestr,4,2),F8.0).
COMPUTE yr=NUMBER(SUBSTR(datestr,7),F8.0).
COMPUTE mydate=DATE.DMY(day,mth,yr).
FORMAT mydate(SDATE11).
VARIABLE WIDTH mydate (11).
EXECUTE.

* The date in the above string variable has the form mm/dd/yyyy. The code works as
* is if the initial format is mm.dd.yyyy or mm-dd-yyyy. It is easy to modify the above
* to handle variations such as yyyy/mm/dd, dd/mm/yyyy.

* method 2 (works only when data fit an existing SPSS date format).
COMPUTE mydate=NUMBER(datestr,ADATE10).
FORMATS mydate(ADATE10).
VARIABLE WIDTH mydate(10).  /* The purpose of this line is to display all 4 digits of
the year in the data editor */.
EXECUTE.

4. Calculate age

Date variables contain the number of seconds since October 14, 1582. (This weird date corresponds to the beginning of the Gregorian Calendar). The internal value of a date variable is the same whatever date format is used. For instance the following are true about a date variable containing the date 11/26/1966.

Internal value	Format	What you see in the Data Editor
12121574400	ADATE11	11/26/1966
12121574400	SDATE11	1966/11/26
12121574400	MOYR8	NOV 1966
12121574400	WKYR8	48 WK 66

Thus a command such as COMPUTE agesec=DATE.DMY(1,7,2001) - dtbirth calculates the number of seconds between the date of birth and July 1, 2001.

To convert to number of years and fraction of years, one option is to divide by the number of seconds in a year: COMPUTE age1=agesec/(365.25*24*60*60). In the above, a year is assumed to have 365.25 days to account for leap years.

A better way is to ask SPSS to convert the duration in days, then divide by 365.25: COMPUTE age2=CTIME.DAYS(DATE.DMY(1,1,2001) - dtbirth)/365.25.

If you need a categorical variable (say agegr) such that agegr equals 0 when ages are between 0 and 4.99, 1 when ages are between 5 and 9.99, etc., do the following:

COMPUTE agegr=TRUNC(age2/5).
VALUE LABELS agegr 0 '0-4.99' 1 '5-9.99' 2 '10-14.99'

(For a general method, see Group Data and Define Corresponding Value Labels.SPS).

5. Adding one day to a date

COMPUTE date1=date1 + 60*60*24.

6. Create a numeric variable such as 19901204 from a date variable

COMPUTE numb1=XDATE.YEAR(date1)*10000 + XDATE.MONTH(date1)*100 + XDATE.MDAY(date1).

7. Create a string variable such as 19901204 from a date variable

*Continuing from 6. 

STRING str1(A8). 

COMPUTE str1=STRING(numb1,F8.0).

Converting strings into numbers

The next examples are in the following syntax file.

See also the SPSS macro tutorial for a generalization of this approach.

DATA LIST FIXED /mydata 1-10 (A).
BEGIN DATA 
6,188
400
12,125.25
END DATA.
LIST.
COMPUTE nb=NUMBER(mydata,COMMA10).
LIST.

DATA LIST FIXED /mydata 1-10 (A).
BEGIN DATA
6188
400
12125.25
END DATA.
LIST.
COMPUTE nb=NUMBER(mydata,F10).
LIST.

DATA LIST FIXED /mydata 1-10 (A).
BEGIN DATA
6.188
400
12.125,25
END DATA.
LIST.
COMPUTE nb=NUMBER(mydata,DOT10).
LIST.

DATA LIST FIXED /mydata 1-10 (A).
BEGIN DATA
$6,188
$400
$12,125.25
END DATA.
LIST.
COMPUTE nb=NUMBER(mydata,DOLLAR10).
LIST.

DATA LIST FIXED /mydata 1-10 (A).
BEGIN DATA
6.188%
400%
12.12525%
END DATA.
LIST.
COMPUTE nb=NUMBER(mydata,PCT10).
LIST.
FORMATS nb(PCT10.5).
LIST.

...

Navigate from here