## Index

If you don't measure it… you can't improve it!

## Library

### Block Designs

1. Completely Randomized Designs (equal or unequal n per treatment)
2. Random assignment of units to experimental treatments This is for Randomized Block Designs (Simple & Generalized) and Completely Randomized Designs (equal n per treatment)
3. Bootstrap confidence interval for the variance of a variable

### Charts and Tables

1. Bar charts for school types by sex where percentages of each sex add up to 100 percent
2. Blank bar for unselected category (from AnswerNet
3. Blank bar for unselected category (generalized) (note however that to show empty categories in CTABLES is trivial. All that is required is to specify EMPTY=INCLUDE in the /Categories subcommand)
4. Compare (superimpose) two histograms could also use a population pyramid (see IGRAPH section below for an example)
5. Count outliers (show number of outliers in a boxplot)
6. Do bar charts excluding categories with small number of cases
7. Do many histograms with the same axis boundaries (this demonstrates how the use of the macro Dograph)
8. Graph cumulative percentage retired at attained age by categorical variable
9. Graph cumulative percent on X axis
10. Graph survey question
11. Histogram with percent on y axis instead of numbers
12. Identify your own data in the chart
13. Identify your own data in the chart version2 This is a generalization of the above syntax.
14. Print current date and time in chart title (Same technique can be used with Tables)
15. Print current date in chart title
16. Print histogram or bar chart depending on data (A good macro example)
17. Print school names as part of graph titles
18. Show mean values in line graph
19. Show 2 categories on same histogram
20. ZIPF law and graph
21. Construct a table "manually" in the data editor (A good example of data restructuration)
22. Find population frequency when multiple response with long strings
23. Construct a table "manually" example no 2
24. List variables infrequency table by order of medians
25. Put 4 variables in the same frequency table
26. Print mean plus minus standard deviation in Table
27. Print actual namegroup and id in heading of each listing
28. Show empty category in tables (from AnswerNet) Note: This is trivial with CTABLE)
29. Sort categories by decreasing count but with Others as last one
30. Show number of valid cases in table footnote
31. Get statistics for grouping of variables
32. Table where list of variables is generated by macro (Illustrates the !IF ...!ELSE... !IFEND macro command)
33. Using Macros and CTABLE
34. Do not display value labels in pivot tables
35. Show empty categories in tables (second method)
36. Hide Cell With Less Than N Persons

### Combinations, Permutations, Interactions

1. All combinations of 3numbers out of n (see "Find all Combinations .." below for a generalization)
2. Find all combinations of 1 up to n items out of m items (highpower stuff!)
3. Find all combinations of n items out of m items (high power stuff!)
4. All combinations of 3letters out of n (with replacement)
5. Calculate interaction terms between 2 categorical variables (within a regression context)
6. Create a new variable for each combination of 2 variables
7. Find all permutations of integers 1 to n Maximum value of n is 7.Combined with recode, this can find permutations of any strings or numbers.
8. Generate orders for block of trials
9. Get all possible crossproducts of pairs of variables (contains a fair amount of comments)

### Compute

1. Find the cubic root
2. Reverse the digits on an integer
3. Create a new variable equal to mean of an other variable
4. Compute average of m variables where m is a variable in the data file
5. Compute distances between 2 points on earth (with thanks to Simon Freidin)
6. Compute percentage of patients having each fracture category
7. Automatically compute sample weights to approximate population
8. Box-Cox Transformation To transform var1 using each of the 31 values of lambda that are between -2 and 1 (increments of 0.1).
9. Compute z = x / max( y) where max( y) is over all cases
10. Count number of distinct values across 400 variables
11. Weight data based on 2 or more vars
12. Find LAG(var1, var2) (variable lag)

### Unclassified

1. Calculate average percent score
2. Calculations on dynamic columns
3. Fill in the gaps (information in file has been left blank when it equals the information in the preceding case, this syntax fills the gap)
4. Fill in the gaps (within ID)
5. Interaction in factorial designs when dependent variable is not normal Thanks to Marta Garcia-Granero for this code.
6. Stop or resume generating outputs in the output window
7. P-value adjustments for Multiple Comparisons

### IGRAPH

1. Clustered bars with percent based on total in cluster
2. Example of surface plot
3. Graphing an arbitrary function
4. Graph showing interaction in multiple regression
5. How to speed up IGRAPH (A similar approach could be used for other type of graphs)
6. Produce long IGRAPHs
7. Population pyramids
8. Separate box plot graph for each category value (syntax can be adapted to any other type of graph)

### Concatenate/modify string variables

1. Apparent problem with concat Newbies should take a look at this example.
2. Combine a string variable and a numeric variable
3. Concatenate content of cases with same id
4. Concatenate numbers
5. Concatenate 22 variables
6. Convert first letter of each word to uppercase (Thanks to A. Paul Beaulne for sending me this code)
7. Create an id using name and dob
8. Normalize string (delete spaces at beginning, remove period at end, capitalize all letters)
9. Normalise alpha (Capitalise the first letter of each word, use lower case for the other letters)
10. Remove initial from name
11. Remove period from string (can be modified to remove any other characters)
12. Reorganize names (place family name at the beginning of the sting)
13. Transform ascii codes into characters
14. Concatenate All Values Into Constant

### Parse or Flag data

1. Extract bits from an integer
2. Extract portion of string (string contains first and last name, want first 3 letters of last name)
3. Extract portion of string starting with a digit
4. Extract Zip code from address field
5. Extract two numbers from as tring (e.g. string "120/90" becomes numbers 120 and 90)
6. Flag if last characters of string are 'Esq'
7. Parse a string into one letter per variable
8. Parse comma separated numbers
9. Parse data separated by slashes
10. Parse domain name from email addresses
11. Parse comma separated strings then autorecode results
12. Parsing a variable which has embedded line feeds (thanks to Bjarte Aagnes)
13. Remove letter at end of string and convert remaining string to a number
14. Splita string variable into plaintiff and defendant portions
15. String variable contains items separated by a slash (there is a variable number of items from one case to the next)
16. Weed out letters in a string and create a number with remaining digits

### Distributions

1. Add variables containing lower and upper CI for mean
2. Bayes estimates for proportions and their CI with thanks to Evgeny Ivashkevich (this also calculates Confidence Intervals for a category not present in the sample)
3. Calculate Chi-square significance given q and df
4. Calculate 95 percent confidence interval for the median (thanks to Marta Garcia-Granero)
5. Calculate McNemar Chi-Square test (thanks to Marta)
6. Hodges-Lehmann Confidence Interval for Median difference (thanks to Marta Garcia-Granero)
7. Exact Confidence Limits for a Binomial Parameter
8. Goodness of Fit Test for Poisson Distribution
9. Inferences and Confidence Intervals for Proportions
10. Fitting Models with Overdispersion
11. Tests of General Linear Hypotheses
12. Normalization of raw scores

### Flag or Select Cases

1. Exclude "outliers" from analysis (where outliers are defined as cases outside Mean +/- 2 SD)
2. Flag cases where a given string variable contains a given word
3. Flag cases where any of a list of variables have same value
4. Flag cases meeting a certain condition as well as preceding and following case for the same person
5. Flag cases where salary is in top 95 percentile
6. Flag first and last dates (within each ID)
7. Keep only duplicate cases
8. Print frequency table of the n most (less) frequent items
9. Select cases where same letter appears twice in string
10. Select patients where drug1 was given before drug2
11. Sophisticated search in string variable (data were scanned, portion of strings include letters (eg B) instead of numbers (eg 8); this syntax flags the errors)

### Item Analysis

1. Syntax for item analysis
2. Syntax For Item Analysis V6 This is a much improved version of the above. It is fully automated and has been developed and tested using SPSS 15.

### Matching data files

1. Create data file if double entries are equals (where entries done by 2 different persons in 2 different files)
2. Double entry check
3. Find errors in 2 files (data entered twice)
4. Match one to many where key has 4 variables
5. Match 2 files using between-dates criteria
6. Merge 2 data files based on many to many relationship
7. Compare 2 Data Files with thanks to Simon Freidin

### Read, Write or Create Data

1. Adding new cases using syntax
2. Add variable equal to function of an existing Var
3. Copy some variables from each record type 1 to add a new record of type 0
4. A few simple examples of INPUT PROGRAM (a short tutorial)
5. Create consecutive records at the end of the file
6. Create constants for each non missing date
7. Define new variables in empty data set
8. Define varx to vary
9. Duplicate cases n times where n is variable (see also Expand Crosstab Data below)
10. Expand crosstab data into original data file (disaggregate data)
11. Expand data x and y times eg from a case where age=20, males=5  and females=6 want to create 5 cases with age 20 and sex=1 and 6 cases where age=20and sex=0
12. Fill the gaps when Aggregate has empty categories Syntax creates cases to fill the gaps
13. Generate random dates
14. INPUT program (to generate a random data file)
15. Insert missing cases (within id)
16. Insert missing dates(within id)
17. Printing date time in output
18. Read ASCII (logical case is made up of 5 rows of 10 cases)
19. Read ASCII file using FILE TYPE
20. Read ASCII file using INPUT PROGRAM
21. Read ASCII file with a forward slash delimiter
22. Read a variable number of records per case
23. Example of data list
24. Example of INPUT program
25. Read data inline File Type MIXED Records
26. Read ASCII file with comma or dash delimited data
27. Read ASCII file with comma separated data (within quotes)
28. Read ASCII file with fixed and free data
29. Read ASCII file with FIXED Data
30. Read ASCII with comma and dot separated decimals
31. Read text file where n columns are to be ignored (n is a variable which varies by file)
32. Skip first 6 Records
33. Skip one line of data
34. Read ASCII file with REPEATING data
36. Read data files that has no carriage returns (from AnswerNet) (data is just one long stream, with no separation between records or fields, and no carriage returns)
37. Read data produced by CGI script
38. Read data where each case has 4 numeric records and a variable number of string records (this is illustrates the use of the REREAD command)
39. Write comma or tab delimited file
40. Write frequency percentages to data file
41. Write missing values as a dot
42. Write special ASCII file
43. Writing value labels instead of values
44. Read comma delimited fields with commas inside quoted strings
46. Read data list free with consecutive commas

### Regression, Repeated Measures

1. Add casewise regression coefficients to data file
2. Breusch-Pagan & Koenker test (thanks to Marta Garcia-Granero)
3. Calculate predicted values (unianova)
4. Compare coefficients generated by various groups
5. Compare regression coefficients (thanks to A. Paul Beaulne for sending me this code)
6. Conditional logistic regression
7. Do All-Subsets regressions
8. Do all univariate linear and logistic regressions (thanks to Marta Garcia-Granero)
9. Logistic regression by macro
10. Regression calculates table of predicted values
11. Regression in a loop
12. Regression when holding out k cases
13. Regression with correlation matrix as input
14. Regression with normed weight
15. Repeated-measures macro
16. Chow test
17. White's test: calculate the statistics and its significance (thanks to Marta Garcia-Granero)
18. White's standard errors full OLS and White's SE output (thanks to Gwilym Pryce)
19. Testing individual regressors in logistic regression
20. Non-linear regression (NLR) with variance of residuals as the loss function (this is not trivial)
21. Piecewise regression (also known as "spline regression" and "piecewise polynomials")

### Tests of Inequality

1. Many tests of inequality v5 (this chart template is used by the syntax)
2. Dissimilarity Index

### Working with Many Files

1. Combine 2 data files many to many
2. Combine any number of consecutively named sav files 50 at a time
3. Combine many data files with same variables
4. Combine many xls files into a single sav file
5. Data list is outside the main syntax (illustrates how a syntax file can be modified by syntax)
6. Delete cases contained in file2 from the main data file
7. Erase files
8. Example 1 using UPDATE command
9. Example 2 using UPDATE command
10. Get mean from 3different files
11. Keep only cases from Master file whose id are in second file
12. Macro to delete a list of files
13. Many folders and many files
14. Run a macro on every file whose name is in a sav file
15. Run syntax on files whose names are derived from a data file
16. Show number of differences, if any, between 2 files (to check double entry of data).
17. Split big files into separate categories (create a different sav file for each value of a numeric categorical variable)
18. Split big files into separate categories string var create adifferent sav file for each value of a string categorical variable)
19. Split file with kn cases into k files of n cases each
20. Unusual file merge
21. Include 200syntax files by macro
22. Process All .xls Files in a Given Folder (script)

### Meta Analysis

1. Meta Analysis: fixed and random effects models (With thanks to Valentim R. Alferes) This SPSS syntax does a meta-analysis on a set of studies comparing two independent means. It produces results for both fixed and random effects models, using Cohen's d statistics. The user has a total of 10 modes for entering summary data.
2. Meta-SPSS An exhaustive set of syntax files written by Marta Garcia-Granero as well as sample data files and supporting documents.

### Transform variable

1. Constrain a variable to a given interval (syntax is first given, then it is generalized using 2 macros)
2. Convert numbers to string with leading zeros
3. Create variable equal to z-scores of an existing variable
4. Extract fist or first 2 digits of a large integer
5. Global autorecode A nice problem: Autorecode many string variables where the recode formula (eg a=1,b=2, etc) is the same for all variables even though none of the variables have all possible values
6. Replace confidential information eg a ssn by a new(known) id
7. Replace values higher than n by the mean of the other values
8. Automatically rescale variable to be between 0 and 1
9. Examples of Converting Strings To Numbers
10. Replace a Letter to 9999 and Convert To Number
11. Transform Alphanumeric Codes to Numeric

### RFM Analysis

1. RFM-analysis on aggregated data (comments are in Russian)

### Remove Characters, Duplicates or Variables

1. Delete cases with offset cases
2. Delete double entries (thanks to Maciek Lobinski) For instance if, for a given case, var1 equals var2, the syntax replaces var2 by sysmis.
3. Find duplicates
4. Remove double quotes
5. Remove duplicate records
6. Remove unused variables from many files
7. Replace character In string
8. Replace consecutive spaces in string by a single space
9. Save duplicates in a separate file

### Strings

1. Soundex Phonetic Comparison
2. Convert string to numeric variable
3. Convert numbers to strings
4. Convert string'250 million' into a number (or '16 billion' etc)
5. Change all strings in data file to lower case
6. Are All Words Present? This tests whether all words passed to the macro are present within a given string variable
7. Are All Words Present? - to Dichotomy Vars Similar to above but creates one dichotomy variables for each target word

### Restructure File

1. Allocate dummy variables to 24 hours
2. Automated data transform from tall to wide
3. Automated restructure from long to wide with thanks to Hillel Vardi.
4. Collapse empty variables within a case
5. Deduplicate cases while keeping all the information (a cute little problem)
6. Each variable occupies 5 rows of 10 columns (another nice little problem)
7. Find beginning and end of continuous periods
8. From many to one example1
9. From many to one example2
10. From many to one with alpha data
11. From many to one with specific order of new variables
12. From one to Many simple
13. From one to many with indicator variable
14. Restructure data fileexample1
15. Restructure data fileexample2
16. Restructure data fileexample3
17. Restructure data fileexample4
18. Restructure from tall to wide (general solution) (non-trivial macro code...)
19. Restructure time periods to a time matrix
20. Restructure to calculate Kappa
21. Transpose(FLIP) string variables
22. VarsToCases and CasesToVars
23. Automated Data Restructure (thanks to Kevin Hynes) This example maintains a grouping factor while restructuring data from tall to wide
24. Use Former Variable Names As Value Labels

### Sample Size and Power

1. Power analysis examples With thanks to Bruce Weaver
2. Sample size for means With thanks to Marta Garcia-Granero. This is a collection of several short macros that perform sample size calculations for confidence interval estimation and one sample / two samples tests for means (this last one with equal or unequal sample sizes).
3. Sample size for proportions With thanks to Marta Garcia-Granero. A collection of macros that perform sample size calculation for the estimation of one proportion and one or two samples hypothesis testing, as well as the calculation of the power of a test.
4. Sample size for correlation hypothesis testing thanks to Marta!

### Conjoint Analysis

1. Textbook Example Analysis of Plan 2 by 2 (comments are in Russian)

### T-Test or Means or ANOVA

1. ANOVA A*B (thanks to Valentim Alferes) This does an A*B Factorial ANOVA and calculates variance components, measures of association, measures of effect size and observed power. Works with raw data or published summary statistics.
2. T-Tests and Likert scales
3. Compare mean of each hospital with mean of all other hospitals (nice little macro)
4. ANOVA Tables using 4 methods (thanks to Valentim Alferes) method1:for Ns, Means and SDs; method 2 for Ns, Means and Variances; method3 for Ns, Means and MS Error; method 4 for Means, Df num, Df den and MSError.
5. Standardized effects size (Cohen Glass and Hedges's d) (with thanks to Marta) The effects size and their standard errors are added to the data file.
6. ONEWAY with summarydataI2 Performs several ONEWAY ANOVAS plus several Homogeneity of variances tests on summary data. Any number of variables can beanalysed. Thanks to Marta Garcia-Granero.
7. ONEWAY with summarydata1 Performs a ONEWAY ANOVA plus several Homogeneity of Variances tests on summary data. Thanks to Marta Garcia-Granero
8. Multiple Mann-Whitney tests (using a macro to have a procedure inside a LOOP)
9. Hotelling's T**2 & Profile Analysis (thanks to Richard MacLennan)
10. Do a T-Test with only the Means, SD and Ns (uses ANOVA)
11. DoT-Test with only means, SD and Ns (thanks to Marta Garcia-Granero)this includes Hartley's F test, the standard T-test and Welch test, asymptotic and non asymptotic 95% CI are calculated.
12. Cochran Hartley Critical Values This gives the tabulated critical values at 5% and 1% for both HOV tests. Thanks to Marta Garcia-Granero
13. T Test: Measures of Effect Size and Nonoverlap, and Observed Power (thanks to Valentim Alferes) User can either analyse raw data or reproduce the SPSS T-Test standard output using summary statistics in published articles