# 程序代写代做代考 DNA Excel data structure algorithm How To: Use the psych package for Factor Analysis and data

How To: Use the psych package for Factor Analysis and data

reduction

William Revelle

Department of Psychology

Northwestern University

November 20, 2016

Contents

1 Overview of this and related documents 3

1.1 Jump starting the psych package–a guide for the impatient . . . . . . . . . 3

2 Overview of this and related documents 5

3 Getting started 5

4 Basic data analysis 6

4.1 Data input from the clipboard . . . . . . . . . . . . . . . . . . . . . . . . . 6

4.2 Basic descriptive statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4.3 Simple descriptive graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

4.3.1 Scatter Plot Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 8

4.3.2 Correlational structure . . . . . . . . . . . . . . . . . . . . . . . . . . 8

4.3.3 Heatmap displays of correlational structure . . . . . . . . . . . . . . 10

4.4 Polychoric, tetrachoric, polyserial, and biserial correlations . . . . . . . . . . 13

5 Item and scale analysis 13

5.1 Dimension reduction through factor analysis and cluster analysis . . . . . . 13

5.1.1 Minimum Residual Factor Analysis . . . . . . . . . . . . . . . . . . . 15

5.1.2 Principal Axis Factor Analysis . . . . . . . . . . . . . . . . . . . . . 16

5.1.3 Weighted Least Squares Factor Analysis . . . . . . . . . . . . . . . . 16

5.1.4 Principal Components analysis (PCA) . . . . . . . . . . . . . . . . . 22

5.1.5 Hierarchical and bi-factor solutions . . . . . . . . . . . . . . . . . . . 22

5.1.6 Item Cluster Analysis: iclust . . . . . . . . . . . . . . . . . . . . . . 26

5.2 Confidence intervals using bootstrapping techniques . . . . . . . . . . . . . 29

1

5.3 Comparing factor/component/cluster solutions . . . . . . . . . . . . . . . . 29

5.4 Determining the number of dimensions to extract. . . . . . . . . . . . . . . 35

5.4.1 Very Simple Structure . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5.4.2 Parallel Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5.5 Factor extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

6 Classical Test Theory and Reliability 40

6.1 Reliability of a single scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

6.2 Using omega to find the reliability of a single scale . . . . . . . . . . . . . . 46

6.3 Estimating ωh using Confirmatory Factor Analysis . . . . . . . . . . . . . . 50

6.3.1 Other estimates of reliability . . . . . . . . . . . . . . . . . . . . . . 52

6.4 Reliability and correlations of multiple scales within an inventory . . . . . . 52

6.4.1 Scoring from raw data . . . . . . . . . . . . . . . . . . . . . . . . . . 52

6.4.2 Forming scales from a correlation matrix . . . . . . . . . . . . . . . . 55

6.5 Scoring Multiple Choice Items . . . . . . . . . . . . . . . . . . . . . . . . . . 57

6.6 Item analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

7 Item Response Theory analysis 58

7.1 Factor analysis and Item Response Theory . . . . . . . . . . . . . . . . . . . 60

7.2 Speeding up analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

7.3 IRT based scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

8 Multilevel modeling 68

8.1 Decomposing data into within and between level correlations using statsBy 68

8.2 Generating and displaying multilevel data . . . . . . . . . . . . . . . . . . . 70

9 Set Correlation and Multiple Regression from the correlation matrix 70

10 Simulation functions 73

11 Graphical Displays 75

12 Miscellaneous functions 77

13 Data sets 78

14 Development version and a users guide 79

15 Psychometric Theory 80

16 SessionInfo 80

2

1 Overview of this and related documents

To do basic and advanced personality and psychological research using R is not as compli-

cated as some think. This is one of a set of “How To” to do various things using R (R Core

Team, 2016), particularly using the psych (Revelle, 2016) package.

The current list of How To’s includes:

1. Installing R and some useful packages

2. Using R and the psych package to find omegah and ωt .

3. Using R and the psych for factor analysis and principal components analysis. (This

document).

4. Using the score.items function to find scale scores and scale statistics.

5. An overview (vignette) of the psych package

1.1 Jump starting the psych package–a guide for the impatient

You have installed psych (section 3) and you want to use it without reading much more.

What should you do?

1. Activate the psych package:

R code

library(psych)

2. Input your data (section 4.1). Go to your friendly text editor or data manipulation

program (e.g., Excel) and copy the data to the clipboard. Include a first line that has

the variable labels. Paste it into psych using the read.clipboard.tab command:

R code

myData <- read.clipboard.tab()
3. Make sure that what you just read is right. Describe it (section 4.2) and perhaps
look at the first and last few lines:
R code
describe(myData)
headTail(myData)
4. Look at the patterns in the data. If you have fewer than about 10 variables, look at
the SPLOM (Scatter Plot Matrix) of the data using pairs.panels (section 4.3.1).
R code
pairs.panels(myData)
5. Find the correlations of all of your data.
3
http://personality-project.org/r/psych/HowTo/getting_started.pdf
http://personality-project.org/r/psych/HowTo/omega.pdf
http://personality-project.org/r/psych/HowTo/factor.pdf
http://personality-project.org/r/psych/HowTo/scoring.pdf
http://personality-project.org/r/psych/overview.pdf
• Descriptively (just the values) (section 4.3.2)
R code
lowerCor(myData)
• Graphically (section 4.3.3)
R code
corPlot(r)
6. Test for the number of factors in your data using parallel analysis (fa.parallel,
section 5.4.2) or Very Simple Structure (vss, 5.4.1) .
R code
fa.parallel(myData)
vss(myData)
7. Factor analyze (see section 5.1) the data with a specified number of factors (the
default is 1), the default method is minimum residual, the default rotation for more
than one factor is oblimin. There are many more possibilities (see sections 5.1.1-5.1.3).
Compare the solution to a hierarchical cluster analysis using the ICLUST algorithm
(Revelle, 1979) (see section 5.1.6). Also consider a hierarchical factor solution to find
coefficient ω (see 5.1.5).
R code
fa(myData)
iclust(myData)
omega(myData)
8. Some people like to find coefficient α as an estimate of reliability. This may be done
for a single scale using the alpha function (see 6.1). Perhaps more useful is the
ability to create several scales as unweighted averages of specified items using the
scoreIems function (see 6.4) and to find various estimates of internal consistency for
these scales, find their intercorrelations, and find scores for all the subjects.
R code
alpha(myData) #score all of the items as part of one scale.
myKeys <- make.keys(nvar=20,list(first = c(1,-3,5,-7,8:10),second=c(2,4,-6,11:15,-16)))
my.scores <- scoreItems(myKeys,myData) #form several scales
my.scores #show the highlights of the results
At this point you have had a chance to see the highlights of the psych package and to
do some basic (and advanced) data analysis. You might find reading the entire overview
vignette helpful to get a broader understanding of what can be done in R using the psych.
Remember that the help command (?) is available for every function. Try running the
examples for each help page.
4
http://personality-project.org/r/psych/overview.pdf
2 Overview of this and related documents
The psych package (Revelle, 2016) has been developed at Northwestern University since
2005 to include functions most useful for personality, psychometric, and psychological re-
search. The package is also meant to supplement a text on psychometric theory (Revelle,
prep), a draft of which is available at http://personality-project.org/r/book/.
Some of the functions (e.g., read.clipboard, describe, pairs.panels, scatter.hist,
error.bars, multi.hist, bi.bars) are useful for basic data entry and descriptive analy-
ses.
Psychometric applications emphasize techniques for dimension reduction including factor
analysis, cluster analysis, and principal components analysis. The fa function includes
five methods of factor analysis (minimum residual , principal axis, weighted least squares,
generalized least squares and maximum likelihood factor analysis). Determining the num-
ber of factors or components to extract may be done by using the Very Simple Structure
(Revelle and Rocklin, 1979) (vss), Minimum Average Partial correlation (Velicer, 1976)
(MAP) or parallel analysis (fa.parallel) criteria. Item Response Theory (IRT) models for
dichotomous or polytomous items may be found by factoring tetrachoric or polychoric
correlation matrices and expressing the resulting parameters in terms of location and dis-
crimination using irt.fa. Bifactor and hierarchical factor structures may be estimated by
using Schmid Leiman transformations (Schmid and Leiman, 1957) (schmid) to transform
a hierarchical factor structure into a bifactor solution (Holzinger and Swineford, 1937).
Scale construction can be done using the Item Cluster Analysis (Revelle, 1979) (iclust)
function to determine the structure and to calculate reliability coefficients α (Cronbach,
1951)(alpha, scoreItems, score.multiple.choice), β (Revelle, 1979; Revelle and Zin-
barg, 2009) (iclust) and McDonald’s ωh and ωt (McDonald, 1999) (omega). Guttman’s six
estimates of internal consistency reliability (Guttman (1945), as well as additional estimates
(Revelle and Zinbarg, 2009) are in the guttman function. The six measures of Intraclass
correlation coefficients (ICC) discussed by Shrout and Fleiss (1979) are also available.
Graphical displays include Scatter Plot Matrix (SPLOM) plots using pairs.panels, corre-
lation “heat maps” (cor.plot) factor, cluster, and structural diagrams using fa.diagram,
iclust.diagram, structure.diagram, as well as item response characteristics and item
and test information characteristic curves plot.irt and plot.poly.
3 Getting started
Some of the functions described in this overview require other packages. Particularly useful
for rotating the results of factor analyses (from e.g., fa or principal) or hierarchical
factor models using omega or schmid, is the GPArotation package. These and other useful
5
http://personality-project.org/r/book/
packages may be installed by first installing and then using the task views (ctv) package
to install the “Psychometrics” task view, but doing it this way is not necessary.
4 Basic data analysis
A number of psych functions facilitate the entry of data and finding basic descriptive
statistics.
Remember, to run any of the psych functions, it is necessary to make the package active
by using the library command:
R code
library(psych)
The other packages, once installed, will be called automatically by psych.
It is possible to automatically load psych and other functions by creating and then saving
a “.First” function: e.g.,
R code
.First <- function(x) {library(psych)}
4.1 Data input from the clipboard
There are of course many ways to enter data into R. Reading from a local file using
read.table is perhaps the most preferred. However, many users will enter their data
in a text editor or spreadsheet program and then want to copy and paste into R. This
may be done by using read.table and specifying the input file as “clipboard” (PCs) or
“pipe(pbpaste)” (Macs). Alternatively, the read.clipboard set of functions are perhaps
more user friendly:
read.clipboard is the base function for reading data from the clipboard.
read.clipboard.csv for reading text that is comma delimited.
read.clipboard.tab for reading text that is tab delimited (e.g., copied directly from an
Excel file).
read.clipboard.lower for reading input of a lower triangular matrix with or without a
diagonal. The resulting object is a square matrix.
read.clipboard.upper for reading input of an upper triangular matrix.
read.clipboard.fwf for reading in fixed width fields (some very old data sets)
6
For example, given a data set copied to the clipboard from a spreadsheet, just enter the
command
R code
my.data <- read.clipboard()
This will work if every data field has a value and even missing data are given some values
(e.g., NA or -999). If the data were entered in a spreadsheet and the missing values
were just empty cells, then the data should be read in as a tab delimited or by using the
read.clipboard.tab function.
R code
my.data <- read.clipboard(sep=" ") #define the tab option, or
my.tab.data <- read.clipboard.tab() #just use the alternative function
For the case of data in fixed width fields (some old data sets tend to have this format),
copy to the clipboard and then specify the width of each field (in the example below, the
first variable is 5 columns, the second is 2 columns, the next 5 are 1 column the last 4 are
3 columns).
R code
my.data <- read.clipboard.fwf(widths=c(5,2,rep(1,5),rep(3,4))
4.2 Basic descriptive statistics
Once the data are read in, then describe will provide basic descriptive statistics arranged
in a data frame format. Consider the data set sat.act which includes data from 700 web
based participants on 3 demographic variables and 3 ability measures.
describe reports means, standard deviations, medians, min, max, range, skew, kurtosis
and standard errors for integer or real data. Non-numeric data, although the statistics
are meaningless, will be treated as if numeric (based upon the categorical coding of
the data), and will be flagged with an *.
It is very important to describe your data before you continue on doing more complicated
multivariate statistics. The problem of outliers and bad data can not be overempha-
sized.
> library(psych)

> data(sat.act)

> describe(sat.act) #basic descriptive statistics

vars n mean sd median trimmed mad min max range skew kurtosis se

gender 1 700 1.65 0.48 2 1.68 0.00 1 2 1 -0.61 -1.62 0.02

education 2 700 3.16 1.43 3 3.31 1.48 0 5 5 -0.68 -0.07 0.05

age 3 700 25.59 9.50 22 23.86 5.93 13 65 52 1.64 2.42 0.36

ACT 4 700 28.55 4.82 29 28.84 4.45 3 36 33 -0.66 0.53 0.18

SATV 5 700 612.23 112.90 620 619.45 118.61 200 800 600 -0.64 0.33 4.27

SATQ 6 687 610.22 115.64 620 617.25 118.61 200 800 600 -0.59 -0.02 4.41

7

4.3 Simple descriptive graphics

Graphic descriptions of data are very helpful both for understanding the data as well as

communicating important results. Scatter Plot Matrices (SPLOMS) using the pairs.panels

function are useful ways to look for strange effects involving outliers and non-linearities.

error.bars.by will show group means with 95% confidence boundaries.

4.3.1 Scatter Plot Matrices

Scatter Plot Matrices (SPLOMS) are very useful for describing the data. The pairs.panels

function, adapted from the help menu for the pairs function produces xy scatter plots of

each pair of variables below the diagonal, shows the histogram of each variable on the

diagonal, and shows the lowess locally fit regression line as well. An ellipse around the

mean with the axis length reflecting one standard deviation of the x and y variables is also

drawn. The x axis in each scatter plot represents the column variable, the y axis the row

variable (Figure 1). When plotting many subjects, it is both faster and cleaner to set the

plot character (pch) to be ’.’. (See Figure 1 for an example.)

pairs.panels will show the pairwise scatter plots of all the variables as well as his-

tograms, locally smoothed regressions, and the Pearson correlation. When plotting

many data points (as in the case of the sat.act data, it is possible to specify that the

plot character is a period to get a somewhat cleaner graphic.

4.3.2 Correlational structure

There are many ways to display correlations. Tabular displays are probably the most

common. The output from the cor function in core R is a rectangular matrix. lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix. lowerCor

calls cor with use=‘pairwise’, method=‘pearson’ as default values and returns (invisibly)

the full correlation matrix and displays the lower off diagonal matrix.

> lowerCor(sat.act)

gendr edctn age ACT SATV SATQ

gender 1.00

education 0.09 1.00

age -0.02 0.55 1.00

ACT -0.04 0.15 0.11 1.00

SATV -0.02 0.05 -0.04 0.56 1.00

SATQ -0.17 0.03 -0.03 0.59 0.64 1.00

When comparing results from two different groups, it is convenient to display them as one

matrix, with the results from one group below the diagonal, and the other group above the

diagonal. Use lowerUpper to do this:

8

> png( ‘pairspanels.png’ )

> pairs.panels(sat.act,pch=’.’)

> dev.off()

null device

1

Figure 1: Using the pairs.panels function to graphically show relationships. The x axis

in each scatter plot represents the column variable, the y axis the row variable. Note the

extreme outlier for the ACT. The plot character was set to a period (pch=’.’) in order to

make a cleaner graph.

9

> female <- subset(sat.act,sat.act$gender==2) > male <- subset(sat.act,sat.act$gender==1) > lower <- lowerCor(male[-1]) edctn age ACT SATV SATQ education 1.00 age 0.61 1.00 ACT 0.16 0.15 1.00 SATV 0.02 -0.06 0.61 1.00 SATQ 0.08 0.04 0.60 0.68 1.00 > upper <- lowerCor(female[-1]) edctn age ACT SATV SATQ education 1.00 age 0.52 1.00 ACT 0.16 0.08 1.00 SATV 0.07 -0.03 0.53 1.00 SATQ 0.03 -0.09 0.58 0.63 1.00 > both <- lowerUpper(lower,upper) > round(both,2)

education age ACT SATV SATQ

education NA 0.52 0.16 0.07 0.03

age 0.61 NA 0.08 -0.03 -0.09

ACT 0.16 0.15 NA 0.53 0.58

SATV 0.02 -0.06 0.61 NA 0.63

SATQ 0.08 0.04 0.60 0.68 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-

low the diagonal) and the difference of the second from the first above the diagonal:

> diffs <- lowerUpper(lower,upper,diff=TRUE) > round(diffs,2)

education age ACT SATV SATQ

education NA 0.09 0.00 -0.05 0.05

age 0.61 NA 0.07 -0.03 0.13

ACT 0.16 0.15 NA 0.08 0.02

SATV 0.02 -0.06 0.61 NA 0.05

SATQ 0.08 0.04 0.60 0.68 NA

4.3.3 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat map

of the correlations. This is just a matrix color coded to represent the magnitude of the

correlation. This is useful when considering the number of factors in a data set. Consider

the Thurstone data set which has a clear 3 factor solution (Figure 2) or a simulated data

set of 24 variables with a circumplex structure (Figure 3). The color coding represents a

“heat map” of the correlations, with darker shades of red representing stronger negative

and darker shades of blue stronger positive correlations. As an option, the value of the

correlation can be shown.

10

> png(‘corplot.png’)

> cor.plot(Thurstone,numbers=TRUE,main=”9 cognitive variables from Thurstone”)

> dev.off()

null device

1

Figure 2: The structure of correlation matrix can be seen more clearly if the variables are

grouped by factor and then the correlations are shown by color. By using the ’numbers’

option, the values are displayed as well.

11

> png(‘circplot.png’)

> circ <- sim.circ(24)
> r.circ <- cor(circ)
> cor.plot(r.circ,main=’24 variables in a circumplex’)

> dev.off()

null device

1

Figure 3: Using the cor.plot function to show the correlations in a circumplex. Correlations

are highest near the diagonal, diminish to zero further from the diagonal, and the increase

again towards the corners of the matrix. Circumplex structures are common in the study

of affect.

12

4.4 Polychoric, tetrachoric, polyserial, and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient. If the

data, e.g., ability items, are thought to represent an underlying continuous although latent

variable, the φ will underestimate the value of the Pearson applied to these latent variables.

One solution to this problem is to use the tetrachoric correlation which is based upon

the assumption of a bivariate normal distribution that has been cut at certain points. The

draw.tetra function demonstrates the process, A simple generalization of this to the case

of the multiple cuts is the polychoric correlation.

Other estimated correlations based upon the assumption of bivariate normality with cut

points include the biserial and polyserial correlation.

If the data are a mix of continuous, polytomous and dichotomous variables, the mixed.cor

function will calculate the appropriate mixture of Pearson, polychoric, tetrachoric, biserial,

and polyserial correlations.

The correlation matrix resulting from a number of tetrachoric or polychoric correlation

matrix sometimes will not be positive semi-definite. This will also happen if the correlation

matrix is formed by using pair-wise deletion of cases. The cor.smooth function will adjust

the smallest eigen values of the correlation matrix to make them positive, rescale all of

them to sum to the number of variables, and produce a “smoothed” correlation matrix. An

example of this problem is a data set of burt which probably had a typo in the original

correlation matrix. Smoothing the matrix corrects this problem.

5 Item and scale analysis

The main functions in the psych package are for analyzing the structure of items and of

scales and for finding various estimates of scale reliability. These may be considered as

problems of dimension reduction (e.g., factor analysis, cluster analysis, principal compo-

nents analysis) and of forming and estimating the reliability of the resulting composite

scales.

5.1 Dimension reduction through factor analysis and cluster analysis

Parsimony of description has been a goal of science since at least the famous dictum

commonly attributed to William of Ockham to not multiply entities beyond necessity1. The

goal for parsimony is seen in psychometrics as an attempt either to describe (components)

1Although probably neither original with Ockham nor directly stated by him (Thorburn, 1918), Ock-

ham’s razor remains a fundamental principal of science.

13

or to explain (factors) the relationships between many observed variables in terms of a

more limited set of components or latent factors.

The typical data matrix represents multiple items or scales usually thought to reflect fewer

underlying constructs2. At the most simple, a set of items can be be thought to represent

a random sample from one underlying domain or perhaps a small set of domains. The

question for the psychometrician is how many domains are represented and how well does

each item represent the domains. Solutions to this problem are examples of factor analysis

(FA), principal components analysis (PCA), and cluster analysis (CA). All of these pro-

cedures aim to reduce the complexity of the observed data. In the case of FA, the goal is

to identify fewer underlying constructs to explain the observed data. In the case of PCA,

the goal can be mere data reduction, but the interpretation of components is frequently

done in terms similar to those used when describing the latent variables estimated by FA.

Cluster analytic techniques, although usually used to partition the subject space rather

than the variable space, can also be used to group variables to reduce the complexity of

the data by forming fewer and more homogeneous sets of tests or items.

At the data level the data reduction problem may be solved as a Singular Value Decom-

position of the original matrix, although the more typical solution is to find either the

principal components or factors of the covariance or correlation matrices. Given the pat-

tern of regression weights from the variables to the components or from the factors to the

variables, it is then possible to find (for components) individual component or cluster scores

or estimate (for factors) factor scores.

Several of the functions in psych address the problem of data reduction.

fa incorporates five alternative algorithms: minres factor analysis, principal axis factor

analysis, weighted least squares factor analysis, generalized least squares factor anal-

ysis and maximum likelihood factor analysis. That is, it includes the functionality of

three other functions that will be eventually phased out.

principal Principal Components Analysis reports the largest n eigen vectors rescaled by

the square root of their eigen values.

factor.congruence The congruence between two factors is the cosine of the angle between

them. This is just the cross products of the loadings divided by the sum of the squared

loadings. This differs from the correlation coefficient in that the mean loading is not

subtracted before taking the products. factor.congruence will find the cosines

between two (or more) sets of factor loadings.

vss Very Simple Structure Revelle and Rocklin (1979) applies a goodness of fit test to

determine the optimal number of factors to extract. It can be thought of as a quasi-

2Cattell (1978) as well as MacCallum et al. (2007) argue that the data are the result of many more

factors than observed variables, but are willing to estimate the major underlying factors.

14

confirmatory model, in that it fits the very simple structure (all except the biggest c

loadings per item are set to zero where c is the level of complexity of the item) of a

factor pattern matrix to the original correlation matrix. For items where the model is

usually of complexity one, this is equivalent to making all except the largest loading

for each item 0. This is typically the solution that the user wants to interpret. The

analysis includes the MAP criterion of Velicer (1976) and a χ2 estimate.

fa.parallel The parallel factors technique compares the observed eigen values of a cor-

relation matrix with those from random data.

fa.plot will plot the loadings from a factor, principal components, or cluster analysis

(just a call to plot will suffice). If there are more than two factors, then a SPLOM

of the loadings is generated.

nfactors A number of different tests for the number of factors problem are run.

fa.diagram replaces fa.graph and will draw a path diagram representing the factor struc-

ture. It does not require Rgraphviz and thus is probably preferred.

fa.graph requires Rgraphviz and will draw a graphic representation of the factor struc-

ture. If factors are correlated, this will be represented as well.

iclust is meant to do item cluster analysis using a hierarchical clustering algorithm

specifically asking questions about the reliability of the clusters (Revelle, 1979). Clus-

ters are formed until either coefficient α Cronbach (1951) or β Revelle (1979) fail to

increase.

5.1.1 Minimum Residual Factor Analysis

The factor model is an approximation of a correlation matrix by a matrix of lower rank.

That is, can the correlation matrix, nRn be approximated by the product of a factor matrix,

nFk and its transpose plus a diagonal matrix of uniqueness.

R = FF ′+U2 (1)

The maximum likelihood solution to this equation is found by factanal in the stats pack-

age. Five alternatives are provided in psych, all of them are included in the fa function

and are called by specifying the factor method (e.g., fm=“minres”, fm=“pa”, fm=“”wls”,

fm=”gls” and fm=”ml”). In the discussion of the other algorithms, the calls shown are to

the fa function specifying the appropriate method.

factor.minres attempts to minimize the off diagonal residual correlation matrix by ad-

justing the eigen values of the original correlation matrix. This is similar to what is done

15

in factanal, but uses an ordinary least squares instead of a maximum likelihood fit func-

tion. The solutions tend to be more similar to the MLE solutions than are the factor.pa

solutions. min.res is the default for the fa function.

A classic data set, collected by Thurstone and Thurstone (1941) and then reanalyzed by

Bechtoldt (1961) and discussed by McDonald (1999), is a set of 9 cognitive variables with

a clear bi-factor structure Holzinger and Swineford (1937). The minimum residual solution

was transformed into an oblique solution using the default option on rotate which uses

an oblimin transformation (Table 1). Alternative rotations and transformations include

“none”, “varimax”, “quartimax”, “bentlerT”, and “geominT” (all of which are orthogonal

rotations). as well as “promax”, “oblimin”, “simplimax”, “bentlerQ, and“geominQ” and

“cluster” which are possible oblique transformations of the solution. The default is to do

a oblimin transformation, although prior versions defaulted to varimax. The measures of

factor adequacy reflect the multiple correlations of the factors with the best fitting linear

regression estimates of the factor scores (Grice, 2001).

5.1.2 Principal Axis Factor Analysis

An alternative, least squares algorithm, factor.pa, (incorporated into fa as an option (fm

= “pa”) does a Principal Axis factor analysis by iteratively doing an eigen value decompo-

sition of the correlation matrix with the diagonal replaced by the values estimated by the

factors of the previous iteration. This OLS solution is not as sensitive to improper matri-

ces as is the maximum likelihood method, and will sometimes produce more interpretable

results. It seems as if the SAS example for PA uses only one iteration. Setting the max.iter

parameter to 1 produces the SAS solution.

The solutions from the fa, the factor.minres and factor.pa as well as the principal

functions can be rotated or transformed with a number of options. Some of these call

the GPArotation package. Orthogonal rotations are varimax and quartimax. Oblique

transformations include oblimin, quartimin and then two targeted rotation functions

Promax and target.rot. The latter of these will transform a loadings matrix towards

an arbitrary target matrix. The default is to transform towards an independent cluster

solution.

Using the Thurstone data set, three factors were requested and then transformed into an

independent clusters solution using targ