I have not seen the first book above so I cannot comment on its usefulness. On the
other hand, I find the second one very good.
Here are the Chapters' titles of this 154 pages spiral
bounded book.
The Syntax Reference Guide is also a valuable resource
but its usefulness increases with the user's knowledge...
Other books
are available from IBM/SPSS's site.
One of the key to syntax writing is the Paste Button which exists in
most windows of the Graphic User Interface (GUI).
As far as I am concerned, the main purpose of the GUI is to facilitate writing syntax!
Another key point is to make sure the commands are printed in the Output Window (see Keep the Log!)
Lets write a syntax file:
Using the menu, load the file "employee data.sav" which came with SPSS.
Using the data editor menu, select FILE>NEW>SYNTAX (this opens a new syntax
window)
Check the Log in the Output window, it will contain GET
FILE='C:\Program Files\SPSS\Employee data.sav'.
(Of course the path in your system may be different). If you do not see the Log, refer to Keep the Log! for instructions.
Double click the Log listing (on the right hand side of the Output Window); select and
copy the GET FILE command.
Paste the command in the syntax window.
Using the menu, select ANALYZE>DESCRIPTIVE>FREQUENCIES; move the variable jobcat
to the Variable(s) Text Box.
Click the Paste button. The syntax is now in the syntax window.
Save the syntax file (using the menu of the syntax window): FILE>SAVE (or use Ctrl-S)
(For a good description of how to use the Syntax window, SPSS's journal and the Log,
see Syntax Editor
Window on the University Central Michigan's site.)
This is the content of the syntax file created
above:
GET
FILE='C:\program files\spss\employee data.sav'.
FREQUENCIES
VARIABLES=jobcat
/ORDER= ANALYSIS .
The need to use syntaxcannot be overstated. The more
one uses SPSS, the more one needs to use syntax files as opposed to the "point and
click" method.
Imagine John uses "point and click" to execute data transformations, add
value and variable labels and run 50 procedures on a company wide sales data file. Many
things may happen after the analysis is completed:
a) John's manager finds the analysis so interesting that
---> he requests the whole thing be
redone but separately for each operating divisions
---> the analysis will have to be done
weekly from now on
b) some sales info were missing (or were incorrect) from the file and he has to redo
the analysis.
If John has saved his analysis in syntax file, he will be able to do any of the above
in 2-3% of the time he would have needed to redo it manually using "point and
click". It should be obvious that syntax leads to huge gains of productivity.
Even if John had not saved his analysis in a syntax file, it is easy to recover the
code from the journal file. See Syntax Editor Window
on the University Central Michigan's site if you do not know how to do that.
Other benefits of using syntax files are:
documentation (the syntax file automatically documents what was done)
reproducibilityof results (this is a corollary of 1.
try to redo a one hour "point and click" session!)
batch processing (syntax files can be automatically run at night when
system resources are greater)
opens the door to macro (getting used to syntax is a prerequisite to
accessing the macro world)
allow access to all SPSS features (some features of SPSS are only
available through syntax)
efficient method of communication (SPSSX-L list and newsgroup use
syntax to post answers, these "answers" can be understood by people using SPSS
versions other than English.)
Of course, a good method to learn is to look at existing syntax. Look at various syntax
files, even if they do not seem useful to you at the particular time.
* String manipulation tutorial.
* Replacing / deleting certain characters in strings; combining strings.
* Raynald Levesque.
* Define some dummy data to work with.
DATA LIST FIXED /name 1-25 (A).
BEGIN DATA
000John Doe /10.14.12
Mary Poppins /17.21
Billy Joe /21.25
Peter Pan /10.35
END DATA.
LIST.
NAME
000John Doe /10.14.12
Mary Poppins /17.21
Billy Joe /21.25
Peter Pan /10.35
* String variables must be defined before being used. STRING name1 TO name4 (A25).
VARIABLE LABEL name 'Original value'
name1 'Without leading zeros'
name2 'Replace . by ,'
name3 'Delete up to "/"'
name4 'Delete from "/"'.
* 1. To delete leading zeros. COMPUTE name1=LTRIM(name,"0").
LIST name name1.
NAME
NAME1
000John Doe /10.14.12 John Doe /10.14.12
Mary Poppins /17.21 Mary Poppins /17.21
Billy Joe /21.25
Billy Joe /21.25
Peter Pan /10.35
Peter Pan /10.35
COMPUTE name2=name1.
* The loop ensures that multiple occurences are covered.
* The "+" ensures that the code would also work as an INCLUDE file.
LOOP IF INDEX(name2,".")>0.
+ COMPUTE SUBSTR(name2,INDEX(name2,"."),1)=",".
END LOOP.
LIST name1 name2.
NAME1
NAME2
John Doe /10.14.12 John Doe /10,14,12
Mary Poppins /17.21 Mary Poppins /17,21
Billy Joe /21.25
Billy Joe /21,25
Peter Pan /10.35 Peter Pan /10,35
* The above looks simple
enough. Three lines are sufficient to replace all * occurences of "." (or any
other character of course)but what if you need to do * do this for 400 variables?? The
solution is a macro. See Macro Tutorial
COMPUTE name3=SUBSTR(name1,INDEX(name1,"/")+1).
LIST name1 name3.
NAME1
NAME3
John Doe /10.14.12 10.14.12
Mary Poppins /17.21 17.21
Billy Joe /21.25
21.25
Peter Pan /10.35
10.35
*------- Note that the following does NOT work.
COMPUTE str3=CONCAT(str1,str2).
* It does not work because str1 is actually equal to "A" followed by a space.
* Similarly, str2 is "B" followed by a space.
* Thus CONCAT(str1, str2) results in the 4 character strings "A B ". which
* is
truncated to two character strings "A " to fit the dimension of str3.
*------- The following DOES work.
COMPUTE str4=CONCAT(RTRIM(str1),str2).
LIST str1 str2 str3 str4.
STR1 STR2 STR3 STR4
A B A AB A B A AB A B A AB A B A AB
DATA LIST LIST /date1.
BEGIN DATA
19901204
20000131
END DATA.
LIST.
COMPUTE day1=MOD(date1,100).
COMPUTE month1=MOD(TRUNC(date1/100),100).
COMPUTE year1=TRUNC(date1/10000).
COMPUTE date2=DATE.DMY(day1,month1,year1).
FORMATS date2(SDATE10).
VARIABLE WIDTH date2(11).
EXECUTE.
* The date in the above string variable has the form mm/dd/yyyy. The code works as
* is if the initial format is mm.dd.yyyy or mm-dd-yyyy. It is easy to modify the above
* to handle variations such as yyyy/mm/dd, dd/mm/yyyy.
--> method 2 (works only when data fit an existing SPSS date format)
COMPUTE mydate=NUMBER(datestr,ADATE10).
FORMATS mydate(ADATE10).
VARIABLE WIDTH mydate(10). /* The purpose of this line is to display all 4 digits of
the year in the data editor */.
EXECUTE.
* 4. Calculate age Date variables contain the number
of seconds since October 14, 1582. (This weird date corresponds to the beginning of the Gregorian
Calendar). The internal value of a date variable is the same whatever date format is
used. For instance the following are true about a date variable containing the date
11/26/1966.
internal value
format
what you see in the data editor
12121574400 ADATE11 11/26/1966
12121574400 SDATE11 1966/11/26
12121574400 MOYR8
NOV 1966
12121574400 WKYR8
48 WK 66
Thus a command such as.
COMPUTE agesec=DATE.DMY(1,7,2001) - dtbirth.
* calculates the number of seconds between the date of birth and July 1, 2001.
* To convert to number of years and fraction of years, one option is to divide by the
number of seconds in a year.
COMPUTE age1=agesec/(365.25*24*60*60).
* In the above, a year is assumed to have 365.25 days to account for leap years.
* A better way is to ask SPSS to convert the duration in days, then divide by 365.25.
COMPUTE age2=CTIME.DAYS(DATE.DMY(1,1,2001) - dtbirth)/365.25.
If you need a categorical variable (say agegr) such that agegr equals 0
when ages are between 0 and 4.99 1 when ages are between 5 and 9.99, etc
Do the following:
COMPUTE agegr=TRUNC(age2/5).
VALUE LABELS agegr 0 '0-4.99' 1 '5-9.99' 2 '10-14.99'