50: Understanding its Factor Pairs

To assess inequality of factor variances attributed to coefficient $c$, a measure of inequality was used. The measure involved comparing the proportion of “sum of absolute loadings of the strongest factor / average of such sums across the rest m-1 factors.” This measure serves as a more direct test for the presence of the “general factor.” Varimax was applied to observe and simplify rows (variables). However, Kaiser suggested simplifying the columns (factors) of $bf A$ due to quartimax’s tendency towards the “general factor.


Solution 1:

This response builds upon a previous inquiry regarding rotations in factor analysis (which you should review) and gives a brief overview of various specialized techniques.

Iterative rotations are performed on each pair of factors (columns of the loading matrix) since optimizing the objective criterion simultaneously for all factors would be mathematically challenging. However, the final rotation matrix $bf Q$ is assembled to enable the replication of the rotation by multiplying the extracted loadings $bf A$ by it. This results in the rotated factor structure matrix $bf S$, where the objective criterion is a property of the elements (loadings) of the resultant matrix $bf S$.

The Quartimax orthogonal rotation method aims to maximize the sum of all loadings raised to the power of four in matrix S. The name “quarti” is derived from the fact that it seeks to raise loadings to the power of four. The method satisfies the third Thurstone’s criterion, which states that for every pair of factors, there should be several variables with loadings near zero for one of the factors and far from zero for the other. Quartimax achieves this by minimizing the number of factors required to explain a variable, simplifying the rows of the loading matrix. However, it can often produce a “general factor,” which is not desirable for FA of variables, but more desirable for Q-mode FA of respondents.

The orthogonal rotation method known as Varimax is named after its goal to maximize the variance of the squared loadings in each factor found in matrix S. The result of Varimax is that each factor has a small number of variables with high loadings. This simplifies the columns of the loading matrix and improves the interpretability of the factors. The loading plot of Varimax shows points widely spread along a factor axis with a tendency to polarize into near-zero and far-from-zero. This property partially satisfies Thurstone’s simple structure points. However, Varimax is not completely safe from producing complex variables loaded high by more than one factor. Whether this is viewed negatively or positively depends on the field of study. Varimax is often used in combination with Kaiser’s normalization, which temporarily equalizes communalities while rotating. This method is recommended for use with Varimax and other rotation techniques. Varimax is a popular orthogonal rotation method, particularly in psychometrics and social sciences.

Equamax, also known as Equimax, is an orthogonal rotation method that was created to enhance certain aspects of varimax. Saunders (1962) introduced equalization, a specific weighting, into the algorithm’s working formula. Equamax adjusts itself based on the number of factors being rotated, and it distributes highly loaded variables more evenly between factors than varimax. This makes it less prone to producing “general” factors. However, equamax still aims to simplify rows like quartimax. In essence, it is a combination of varimax and quartimax rather than something in between. Equamax is considered less reliable or stable than varimax or quartimax. It can produce both disastrous and easily interpretable factors with simple structures, depending on the data. Parsimax is another method that is similar to equamax and is even more daring in its pursuit of simple structure (Mulaik, 2010).

Apologies for not delving into the oblique methods – specifically, oblimin (which involves minimizing a criterion) and promax (a procrustes rotation without vari-max constraints). Explaining these techniques may require longer paragraphs, but I didn’t plan on providing a lengthy response today. Both methods are mentioned in Footnote 5 of this answer. If you’re interested, I suggest referring to Mulaik’s “Foundations of Factor Analysis” (2010), Harman’s “Modern Factor Analysis” (1976), or conducting an internet search.

Additionally, for further information on factor analysis rotations, you may refer to the distinction between oblimin and varimax rotations, as well as the definition of “varimax” specifically in SPSS.

Later addendum, with the history and the formulae, for meticulous


Quartimax

During the 1950s, numerous specialists in factor analysis attempted to convert Thurstone’s subjective attributes of a “simple structure” (refer to footnote 1 in this document) into precise and measurable standards.

  • Ferguson proposed that the optimal arrangement of loadings in the factor space would be achieved by having each axis intersect its own cluster of points for most factor pairs. This would maximize the coordinates of each axis and minimize them on the perpendicular axis. To achieve this, he recommended minimizing the summation of loadings products in pairs of factors (i, j) for all variables, represented as:
    $sum^psum_{i,j;i

  • Carroll’s approach involved considering pairs of factors (i,j) and aiming to minimize the expression $sum_{i,j;i

  • Neuhaus and Wrigley’s objective was to ensure that the loadings in the entire matrix $bf A$ were divided into large and near-zero values by maximizing the variance of the squared loadings.
  • Kaiser’s preference was for variance, specifically the variance of the squared loadings in the rows of matrix $bf A$. The goal was to maximize the total sum of variances in the rows.
  • Saunders proposed enhancing the kurtosis of the loadings distribution by doubling every loading from matrix A, taking into account both positive and negative signs due to the arbitrary nature of sign in loadings. This symmetric distribution centered around zero is expected to increase the proportion of extreme and near-zero loadings while reducing moderate-size loadings.

It can be mathematically demonstrated that within the context of orthogonal rotation, the optimization of the five criteria is equivalent when viewed from the “argmax” standpoint. Consequently, all of these criteria can be reduced to the maximization of a single factor.

The expression can be rewritten as the summation of the fourth power of variable a, multiplied by the product of two summations: one from 1 to p and the other from 1 to m.

The quartimax criterion is based on the summation of loadings raised to the fourth power. Its purpose is to simplify the loading matrix, reducing the number of factors needed to explain a variable. However, quartimax may sometimes result in the emergence of a “general factor”.


Varimax

Kaiser noticed that quartimax tends to simplify rows (variables) effectively but is susceptible to a “general factor.” As a result, he proposed simplifying the columns (factors) of matrix $bf A$ instead. Kaiser’s initial idea for quartimax was to maximize the summed variance of squared loadings in rows of $bf A$. However, he now suggests transposing the proposal and maximizing the summed variance of squared loadings in columns of $bf A$. This involves maximizing $sum^m[frac{1}{p} sum^p (a^2)^2 – frac{1}{p^2} (sum^p a^2)^2]$ (where the bracketed part represents the formula for the variance of squared values of $a$), or, for convenience, multiplying it by $p^2$.

The equation can be expressed as $V = pQ – W$, where $Q$ is equal to the sum of the fourth powers of the elements in the $m$ sets of $p$ elements each, and $W$ is equal to the sum of squares of the sums of the $m$ sets of $p$ elements each.

In the given context, $V$ represents the varimax criterion, $Q$ represents the quartimax criterion and $W$ represents the sum of squared variances of factors, which is calculated after the rotation, where the variance of a factor is the sum of its squared loadings.

As I have observed, Kaiser achieved varimax by transposing the quartimax problem, which simplifies columns instead of rows. To obtain the corresponding expression for quartimax by switching the positions of

m

and

p

in the formula for V, you can use mQ – W*. The quartimax’s term, W*, which is the sum of squared communalities of the variables, does not change with rotation since we are rotating columns, not rows. Thus, it can be eliminated from the objective statement, along with the multiplier

m

. This leaves us with only Q, which is the essence of quartimax. However, in the case of varimax, W changes with rotation and remains a crucial part of the formula that needs to be optimized.

Kaiser was discontented with the fact that variables having large communalities had more influence on the rotation by the $V$ criterion as compared to variables with small communalities. To address this issue, he proposed normalizing all communalities to unity before initiating the process of maximizing $V$. After the rotation is performed, the communalities are de-normalized back to their original values because they do not change in an orthogonal rotation. Kaiser normalization is typically recommended for varimax, quartimax, and other rotation methods because it is not exclusively tied to varimax. However, whether this technique provides any real benefit remains a matter of debate. Some software automatically applies this normalization, while others only do so for varimax, and still others do not make it a default option. (Note: I have a comment on normalization at the end of this answer.)

Varimax is a technique that maximizes the variances of squared loadings in the columns of matrix $bf A$ to simplify the factors. Quartimax, on the other hand, simplifies the variables by maximizing variances in rows of $bf A$. Kaiser proved that if the population factor structure is sharp, where variables tend to cluster around different factors, varimax is more stable than quartimax when some variables are removed from the rotation operation.

Equamax and Parsimax are two separate entities.

Saunders decided to highlight that quartimax and varimax are essentially the same formula, given by $pQ – cW$, where $c=0$ for quartimax and $c=1$ for varimax. He conducted experiments using factor analytic data to determine the optimal value of the coefficient c in order to emphasize the non-quartimaxian side of the criterion. He discovered that using $c=m/2$ often produced more interpretable factors compared to varimax or quartimax rotations. This new rotation was called equamax, which involved making the coefficient c dependent on the number of factors m and the expected proportion of variables to be loaded by any one factor. This was done to compensate for the diminishing proportion of variables loaded by any one factor as the number of factors increased while the number of variables remained constant.

Crawford devised an additional coefficient value, denoted by $c$, with the aim of improving the generic criterion. The value of $c$ is dependent on two factors, namely

m

and

p

. The revised criterion was given the name “parsimax”.

An additional option is to assign $c$ the value of $p$, resulting in the criterion known as “facpars” or “factor parsimony”. This criterion is rarely utilized, to my knowledge.

The superiority of equamax or parsimax over varimax remains uncertain and may depend on specific circumstances. The self-tuning nature of these methods for supporters and unpredictable behavior for opponents is due to their dependence on parameters

m

and

p

. Increasing the value of $c$ aims to achieve more equally distributed variances, but does not necessarily make the criterion more varimax or balanced between varimax and quartimax. Both varimax and quartimax optimize the criteria they were designed for, and raising $c$ does not alter their objective goals from a mathematical or general data perspective.

The generic criterion, denoted as $pQ-cW$, is a popular choice for factor rotation. It has several versions, including quartimax, varimax, equamax, parsimax, and facpars. The coefficient $c$ can take any value, where if it is close to

+infinity

, the resulting factors have equal variances, while if it is close to

-infinity

, the loadings are equivalent to the principal components obtained by performing PCA without centering the columns. Therefore, the value of $c$ determines the “great general factor vs all factors equal strength” dimension. The formula comprises of Q, representing quartimax, and W, which is the sum of squared factor variances, given by $sum^m(sum^p a^2)^2$.

In their significant 1970 paper, Crawford and Ferguson introduce the general coefficient kappa, which extends the varying $c$ criterion to nonorthogonal factor rotations.

Literature

  • The book “Modern factor analysis” was authored by Harman, H.H. in 1976.
  • In 2010, S.A. Mulaik authored the book “Foundations of factor analysis.
  • The publication titled “Quartic rotation criteria and algorithms” authored by Clarkson, D.B. was featured in the Psychometrica journal in 1988, specifically in volume 53, issue 2, with pages 251-259.
  • The article titled “A general rotation criterion and its application in orthogonal rotation” authored by Crawford and Ferguson in 1970, published in Psychometrica, Volume 35, Issue 3, Pages 321-332.

Comparing main characteristics of the criteria

Uniformly distributed loading matrices were generated for each combination of

p

and

m/p

proportion, resulting in a total of 50 matrices. Each loading matrix was rotated using quartimax (Q), varimax (V), equamax (E), parsimax (P), and facpars (F) methods, all with Kaiser normalization. Quartimax (Q0) and varimax (V0) were also attempted without Kaiser normalization. The rotated matrices were evaluated based on three characteristics, with 7 values for each matrix rescaled to the 0-1 range. The means across the 50 simulations with 95% CI are presented in the comparison below. However, it is important to note that the initial factor structure may not have been sharp and clean due to the loading matrices being generated from a uniform distribution.

The first figure depicts the quartimax method’s objective of maximizing the sum of variances of squared loadings in rows.

Quartimax tends to outperform other criteria, especially when there is an increase in either

p

or

m/p

. Varimax is generally the second-best option, while Equamax and Parsimax exhibit comparable performance.

The objective of varimax is to maximize the sum of variances of squared loadings in columns, as shown in Fig.2.

The varimax criterion has been found to be superior to other criteria as the value of

p

or

m/p

increases. On the other hand, quartimax tends to lose ground as the parameters increase. In the bottom-right part, quartimax performs the worst and fails to mimic the “varimaxian” job during large-scale factor analysis. However, Equamax and parsimax are quite similar in their performance.

The third figure displays a comparison of the inequality of factor variances, which is determined by the coefficient represented as ‘c’. The measure of “inequality” is determined by the variance used, as illustrated in the image below.

As $c$ increases in the sequence Q V E P F, the inequality of factor variances decreases. Q is the primary factor in the inequality, indicating its inclination towards a “general factor”. Moreover, the gap between Q and the other criteria widens with the growth of either

p

or

m/p

.

To measure inequality in factor variances, which is influenced by coefficient $c$, a proportion was used. This proportion was determined by calculating the “sum of absolute loadings of the strongest factor” and dividing it by the “average of such sums across the rest m-1 factors”.

This is an alternative approach to detect the “general factor” which is more straightforward. The outcome pattern was nearly identical to the one displayed in Figure 3, therefore an image is not provided.

Please note that the attempts showcased in the aforementioned pictures were conducted using loading matrices that had random nonsharp factor structures. This means that there were no predefined distinct clusters of variables or any other particular structure among the loadings.

Based on the information presented in Fig.1-2, it can be observed that quartimax and varimax versions perform better in their maximization tasks when not accompanied by Kaiser normalization. However, it is worth noting that the absence of normalization slightly increases the likelihood of a “general factor,” as depicted in Fig.3.

I am uncertain about the use of Kaiser normalization and when it should be implemented. It may be beneficial to test both approaches, with and without normalization, and determine which factor interpretation is more satisfactory. In situations where mathematical reasoning is inconclusive, we must turn to philosophical considerations, as is often the case. There are two possible stances to consider in this regard.

  • To avoid normalization, it’s best to exclude a variable with high uniqueness and small communality from rotation as it only contains traces of the totality of the factors. Variables with small communality lack the chance to have large loadings, making it difficult to interpret factors. In fact, it may even be justifiable to exclude such variables from rotation. However, Kaiser normalization opposes such a motive/motif.
  • The degree of a variable’s inclination towards external factors in the

    m

    space is known as communality. This indicates the magnitude of its projection into that space, and is not affected by any rotation of axes within that space. Whether a variable has a high or low communality, the rotation process is equally relevant as it determines which

    m

    factors will load the variable. Since the internal decision-making process is equally sharp for all variables based on their external inclination, there is no need to assign different weights to variables during the rotation process. Interpreting the essence of a factor in the variable, regardless of its loading size, is essential for an interpreter of factors.

Orthogonal analytic rotations (Orthomax) algorithm pseudocode

Shorthand notation:
*    matrix multiplication (or simple multiplication, for scalars)
&*   elementwise (Hadamard) multiplication
^    exponentiation of elements
sqrt(M)    square roots of elements in matrix M
rsum(M)    row sums of elements in matrix M
csum(M)    column sums of elements in matrix M
rssq(M)    row sums of squares in matrix M, = rsum(M^2)
cssq(M)    column sums of squares in matrix M, = csum(M^2)
msum(M)    sum of elements in matrix M
make(nr,nc,val)   create nr x nc matrix populated with value val
A is p x m loading matrix with m orthogonal factors, p variables
If Kaiser normalization is requested:
    h = sqrt(rssq(A)). /*sqrt(communalities), column vector
    A = A/(h*make(1,m,1)). /*Bring all communalities to unit
R is the orthogonal rotation matrix to accrue:
Initialize it as m x m identity matrix
Compute the initial value of the criterion Crit;
the coefficient c is: 0 for Quartimax, 1 for Varimax, m/2 for Equamax,
p(m-1)/(p+m-2) for Parsimax, p for Facpars; or you may choose arbitrary c:
Q = msum(A^4)
If “Quartimax”
    Crit = Q
Else
    W = rssq(cssq(A))
    Crit = p*Q – c*W
Begin iterations
    For each pair of factors (columns of A) i, j (i0 /*4Phi is in the 1st or the 4th quadrant
            Phi = Phi4/4
        Else if num>0 /*4Phi is in the 2nd quadrant (pi is the pi value)
            Phi = (pi + Phi4)/4
        Else /*4Phi is in the 3rd quadrant
            Phi = (Phi4 – pi)/4
        Perform the rotation of the pair (rotate if Phi is not negligible):
        @sin = sin(Phi)
        @cos = cos(Phi)
        r_ij = {@cos,-@sin;@sin,@cos} /*The 2 x 2 rotation matrix
        A(:,{i,j}) = {ai,aj} * r_ij /*Rotate factors (columns) i and j in A
        R(:,{i,j}) = R(:,{i,j}) * r_ij /*Update also the columns of the being accrued R
        Go to consider next pair of factors i, j, again copying them out, etc.
    When all pairs are through, compute the criterion:
    Crit = … (see as defined above)
End iterations if Crit has stopped growing any much (say, increase not greater than
0.0001 versus the previous iteration), or the stock of iterations (say, 50) exhausted.
If Kaiser normalization was requested:
     A = A &* (h*make(1,m,1)) /*De-normalize
Ready. A has been rotated. A(input)*R = A(output)
Optional post-actions, for convenience:
1) Reorder factors by decreasing their variances (i.e., their cssq(A)).
2) Switch sign of the loadings so that positive loadings prevail in each factor.
Quartimax and Varimax are always positive values; others can be negative.
All the criteria grow on iterations.


Solution 2:


The objective of rotation methods is to streamline factor loadings through the use of heuristic functions. There are various interpretations of simplicity, with the most frequently used ones being sparsity, column simplicity, and parsimony, as defined by Thurnstone [2]. Majority of the rotation criteria focus on either one of these aspects, and their names are of little significance.

Families of criteria incorporate a single criterion, with the Crawford-Ferguson criterion being the most comprehensive. It is comparable to the Orthomax family for orthogonal rotations. These families consider both simplicity requirements, which are governed by various parameters. By modifying these parameters, nearly all recognized rotation criteria can be achieved. A Browne paper offers an excellent and easy-to-understand summary of rotation techniques.

A comprehensive article titled “Analytic Rotation in Exploratory Factor Analysis” was written by M. Browne, which offers a broad perspective on the subject matter. The article was published in the Multivariate Behavioral Research journal in 2001 and spans over 39 pages, covering various aspects of exploratory factor analysis.

The book titled “Multiple-factor analysis” was authored by L. Thurstone and published by The University of Chicago Press in 1947.

Frequently Asked Questions

Posted in Uncategorized