Alignment method for measurement invariance: Tutorial

It’s been a while since measurement invariance alignment has been introduced in 2014, but not that many researchers applied it in practice. Among ~200 citations of the original alignment paper there were only a few substantive applications. It is a pity because you can always enjoy more optimistic results with alignment as compared to the conventional (frequentist, exact) measurement invariance techniques. I guess, it’s been happening due to statistical complexity and a lack of simple guidelines. In this post I summarized, in an approachable way, the steps that are necessary to apply alignment procedure. In addition, I provide couple of  R functions which automate preparation of Mplus code and extraction of useful information from the outputs.


Step 1. Find an acceptable configural invariance model
Step 2. Set up “FREE” alignment model in Mplus
Step 3. Set up “FIXED” alignment model
Step 4. Interpret the “Approximate measurement invariance” output
Step 5. Interpret “FACTOR MEAN COMPARISON” output
Step 6. Interpret “ALIGNMENT OUTPUT” output
Step 7. Checking the reliability of the results with simulation
Example Mplus files
Additional options (Bayesian estimation, estimation fine tuning,  extra mean ranking table, fit function contribution, categorical indicators)
Software, including automation in R


The typical start for applying alignment procedure is: ok, I tested my model for invariance with multiple group confirmatory factor analysis (MGCFA) and the equality of loadings and/or intercepts was rejected, so what’s next? Next goes one or some of the following:

If you believe that most parameters are invariant and few are non-invariant, try alignment, which will show you the possible set of groups in which invariance holds. It will also provide approximate latent means, even if there is no exact measurement invariance.

If you believe the model is likely to be invariant, but your data are noisy (i.e. many small meaningless intercorrelations of residuals, cross-loadings, etc.), consider applying  approximate invariance. It is based on Bayesian statistics and allows small differences in loadings/intercepts across groups. One sign to apply approximate approach is when you experience problems even with configural invariance model (but you are sure your model is properly specified).

If you believe that most parameters are invariant AND your data are noisy, there is an option of Bayesian alignment, which combines approximate invariance with alignment approach. I describe it in section 8.

In case your data contain very many groups (>100), consider using multilevel CFA with random loadings and intercepts (random effects model).

At some point you might admit there is no invariance at least for some groups. So I would usually try to explain why there is no invariance. You may speculate about differences in meaning, make cognitive interviews to understand it, or apply multilevel CFA with a group-level covariate as in Davidov et al. 2012.

Below, I focus on alignment method in Mplus software. Alignment estimates a configural invariance model and then modifies the factor loadings and intercepts to make them as similar as possible without deteriorating the model fit. Conceptually, the procedure is alike target factor rotation where the target is across-group similarity of loadings and intercepts.

Step 1. Find an acceptable configural invariance model

This is crucial as the alignment procedure is based on configural model and the models with aligned parameters  have the same fit (alignment doesn’t affect fit, similar to factor rotation). If the fit of configural model is not good enough, consider fitting the configural model using Bayesian approach and testing approximate Bayesian invariance (probably with further alignment). Dropping items and groups are the hardcore measures, apply them only if it is reasoned substantively.

My example: I have a data from World Values Survey wave 5 on 10 countries (the sample was randomly shrunk to 5000 respondents for the speed of computation). The model is a single factor of sexual and reproductive morality. It has 4 indicators: justifiability of homosexuality, prostitution, abortion, and divorce. The residuals of abortion and divorce are allowed to covary. The configural measurement invariance model shows an acceptable model fit ( CFI = 0.999, RMSEA = 0.037) however, constraining factor loadings across groups – that is, setting up a metric invariance model – ruins the fit (CFI = 0.975, RMSEA = 0.107). So  I have to reject the metric invariance hypothesis. However, a good fit of configural model gives some hope so I can reach for the alignment procedure.

Before going further, keep in mind that alignment cannot handle cross-loadings (as well as anything beside factor model), but it is fine to have residual covariances. It can also deal with categorical (binary or ordinal) indicators.

Step 2. Set up “FREE” alignment model in Mplus

There are two kinds of alignment models, Free and Fixed, they differ in the set of constraints placed on the MGCFA model. It is advised to run, first, Free alignment, and then Fixed.

In general, the Free model works better with a large non-invariance, so if this model doesn’t converge, skip to the next step.

Mplus code would look like this:

DATA: FILE = "";
  NAMES = country prostit homosex abortion divorce; 
  classes = c(10);! Type a number of groups in your data in parentheses  
  knownclass = c(country = 56 21 22 53 23 40 13 54 14 18);
! The classes are not actually latent, they are *known* and 
! it is just a grouping variable. So place the grouping variable  
! on the place of 'country' above and list the categories in 
! this variable (list all groups).
  TYPE = mixture; ! Actually it is a multiple group model, 
                  ! but for technical reasons is specified as a mixture.   
  ESTIMATOR = ml; ! it can be mlr or mlf, or Bayes as well. See Section 8   
  ALIGNMENT = FREE; ! this line makes Mplus to actually run alignment.

    %OVERALL% ! it means the CFA model specified below is applicable in every group
     Moral BY prostit homosex abortion divorce;
     abortion WITH divorce;
   align; ! This line requests the detailed info on alignment

Step 3. Set up “FIXED” alignment model

In the output of the previous “free alignment” model you can find a message –


It is self-explaining: follow this recommendation and replace in the above Mplus input the ANALYSIS section line ALIGNMENT = FREE; with the line ALIGNMENT = FIXED(22); and put the number of group  in parentheses with the one recommended by Mplus (it is just the smallest estimated latent mean). Sometimes Mplus doesn’t suggest specific group, so you can just choose the one with the smallest latent mean(s). Run this new code.

Step 4. Interpret the “Approximate measurement invariance” output

In the output you will find this specific section of the alignment results.


   PROSTIT     56 21 22 53 23 40 13 54 14 18
   HOMOSEX     56 21 22 53 23 40 13 54 14 18
   ABORTION    56 21 (22) 53 23 (40) 13 54 14 (18)
   DIVORCE     56 (21) 22 53 (23) (40) 13 54 14 18

 Loadings for MORAL1
   PROSTIT     56 21 22 53 23 40 (13) (54) (14) 18
   HOMOSEX     56 21 22 53 23 40 13 54 14 18
   ABORTION    56 21 22 53 23 40 13 54 14 18
   DIVORCE     56 21 22 53 23 40 13 54 14 18

It can be hard to read, but it is meant to simplify the results: this is the table of the intercepts and loadings by groups. The groups in which this current parameter is NOT invariant even after alignment are in parentheses. In my example,  factor loadings of PROSTIT indicator are significantly different in groups 13, 54, and 14, while in other groups they are approximately the same.

Step 5. Interpret “FACTOR MEAN COMPARISON” output


 Results for Factor MORAL1

           Latent    Group      Factor
 Ranking    Class    Value       Mean     Groups With Significantly Smaller Factor Mean
     1         9        14       3.939    56 53 54 21 18 40 23 22
     2         7        13       3.006    54 21 18 40 23 22
     3         1        56       2.703    54 21 18 40 23 22
     4         4        53       2.465    21 18 40 23 22
     5         8        54       2.056    21 18 40 23 22
     6         2        21       1.167    40 23 22
     7        10        18       1.093    40 23 22
     8         6        40       0.671    23 22
     9         5        23       0.145    22
    10         3        22       0.000

For each factor in the model, alignment would produce this comparison of the estimated means. The same information in a different form can be requested by RANKING option of the OUTPUT section.

!! Be careful, these are the latent means that are estimated ignoring measurement non-invariance, it doesn’t mean they are reliable or fully invariant, they were estimated just for reference. These can be treated seriously only if the other tests support approximate measurement invariance.

The table also provides pairwise comparison of every group’s mean  with all the other groups’ means. You can find the same means in the upper parts of the output, in the MODEL RESULTS section.

Step 6. Interpret “ALIGNMENT OUTPUT”

This section will be produced if you add to the input code line OUTPUT: ALIGN;. It provides detailed information on the results of alignment for each parameter.  For each parameter, it shows three things: pairwise  comparison, summarized invariance information, and parameter values that were aligned across groups.

6.1. Pairwise comparison

First, it is a large table which begins like this:



 Intercept for PROSTIT
 Group     Group      Value      Value     Difference  SE       P-value
     21        56      1.403      1.253      0.150      0.096      0.119
     22        56      1.447      1.253      0.195      0.123      0.113
     22        21      1.447      1.403      0.044      0.079      0.574
     53        56      1.226      1.253     -0.027      0.199      0.892
     53        21      1.226      1.403     -0.177      0.246      0.471
     53        22      1.226      1.447     -0.221      0.269      0.411
     23        56      1.354      1.253      0.102      0.075      0.173
     23        21      1.354      1.403     -0.048      0.044      0.270
     23        22      1.354      1.447     -0.093      0.077      0.226
     23        53      1.354      1.226      0.129      0.241      0.593
     40        56      1.313      1.253      0.061      0.073      0.406
     40        21      1.313      1.403     -0.089      0.056      0.111
     40        22      1.313      1.447     -0.134      0.088      0.127
     40        53      1.313      1.226      0.088      0.236      0.710

This table compares parameters and statistically tests their equality across each possible pairs of groups. First line in this example compares Intercept for PROSTIT in group 21 and in group 56, and we can see that the difference is 0.150 which is far from being significant. Yihaa, we found one invariant parameter across two groups. If only always it worked like this.
This table can be really large, because there is every possible pair of groups, so for 10 groups there will be 45 lines for each parameter. This table provides a very detailed information, so I would ignore it at this stage and return to it if only other things fail to help.

6.2. Summarized invariance info

 Approximate Measurement Invariance Holds For Groups:
 56 21 22 53 23 40 13 54 14 18

Below the pairwise comparisons there is a list of groups in which this current parameter was found invariant after alignment. We already seen this information above, at Step 4. Sometimes, it is not very useful, especially if you have many groups and only few of them are non-invariant – imagine trying to identify group(s) which is absent from the list (answer – none). So just ignore it and refer to the above Step 4.

Weighted Average Value Across Invariant Groups:       1.342

This is an aligned value that can be considered common for all the invariant groups, listed at a previous line. Note that this value is applicable only to the invariant groups!

R-square/Explained variance/Invariance index:       0.188

This R² indicates a degree of invariance of the given parameter. Muthén interpreted this index as the degree to which “the variation across groups in the configural model intercepts and loadings for this item is explained by variation in the factor mean and factor variance [respectively] across groups.” A little confusing can be the fact that this R² can be really small even if the corresponding parameter is highly invariant. In my example, the factor loading of indicator PROSTIT was shown to be invariant across all 10 groups, but R² is only 0.188*.

6.3. Aligned parameter values

Invariant Group Values, Difference to Average and Significance
 Group      Value Difference         SE    P-value
     56      1.253     -0.089      0.059      0.131
     21      1.403      0.061      0.049      0.207
     22      1.447      0.106      0.090      0.241
     53      1.226     -0.116      0.222      0.601
     23      1.354      0.013      0.026      0.619
     40      1.313     -0.028      0.034      0.405
     13      1.215     -0.126      0.237      0.594
     54      1.341     -0.001      0.122      0.994
     14      1.382      0.041      0.177      0.818
     18      1.342      0.000      0.031      0.992

Here, the parameter estimates are listed, but only for those groups which were found to be invariant. This table is meant to demonstrate the invariance of the invariant parameter. That’s why the values of parameters in the non-invariant groups are not included in this table (but you can find them in the main output “MODEL RESULTS” where all the parameters are listed).

6.4. Average Invariance index

The tables 6.1-6.3 are repeated for each factor loading and each indicator intercept. In the very end of the Alignment Output you will find

Average Invariance index: 0.625

This is an average R² across all the parameters. It is a handy global score of both metric and scalar invariance. Here, 1 stands for perfect scalar invariance, 0 for (quite impossible) full non-invariance. In general, one may interpret this index as a degree of confidence to which the means can be meaningfully compared across the given set of groups.

Step 7. Checking the reliability of the results with simulation

The issue with alignment is that it is tied to a current dataset, so its external validity is questionable. For example,  if you have small samples within groups the standard errors of the loadings may be underestimated, so the alignment can find an invariance where it is not present. To check if this is the case, it is recommended to run a simulation study. It was made quite easy by Mplus.

7.1. Set up a simulation study

First, you need to re-run your last alignment model adding to the section OUTPUT, a command SVALUES, which will print the parameter estimates in the form of input commands for simulation study.  After running this updated code, navigate the output to the section “MODEL COMMAND WITH FINAL ESTIMATES USED AS STARTING VALUES” and copy the whole section (it is usually very large).

Next, you need to make several modifications to it: (a) remove intercepts part from the %OVERALL%  section, add starting values to loadings in this section, and replace C# with G# in the names of classes. Below the changes are in red.


     moral BY prostit*1;  ! Added *1 for every loading
     moral BY homosex*1;
     moral BY abortion*1;
     moral BY divorce*1;

     [ c#1*-0.05407 ];
     [ c#2*0.48102 ];
     [ c#3*0.56789 ];
     [ c#4*-0.07774 ];
     [ c#5*0.88991 ];
     [ c#6*0.62138 ];
     [ c#7*-0.13948 ];
     [ c#8*0.09506 ];
     [ c#9*-0.06583 ];

     %CG#1% ! This change should be done for the rest of the code as well

     moral1 BY prostit*1.44592;
     moral1 BY homosex*1.38575;
     moral1 BY abortion*0.99752;
     moral1 BY divorce*1.00391;

     abortion WITH divorce*0.72657;

     [ prostit*1.25253 ];

Okay, now we are ready to combine it with the simulation code.

Next, create a new input file

NAMES = prostit homosex abortion divorce; ! Names of indicator variables (only)
ngroups = 10; ! Your number of groups
NOBSERVATIONS = 10(100); ! This is again a number of groups and sample size of each group in parentheses.
NREPS = 500; ! This is how many times the data generation and analysis should be repeated.

alignment = fixed;

MODEL POPULATION:! This section includes a model to generate data

! Paste here the code that we created just before using svalues output
! - it looks like this:
       moral BY prostit*1;
       moral BY homosex*1;
       moral BY abortion*1;
       moral BY divorce*1;


     moral1 BY prostit*1.44592;
     moral1 BY homosex*1.38575;

MODEL: ! This section includes a model to analyze

! AND again, paste here the same svalues code -  
       moral BY prostit*1;
       moral BY homosex*1;
       moral BY abortion*1;
       moral BY divorce*1;


     moral1 BY prostit*1.44592;
     moral1 BY homosex*1.38575;

And run it. It will take some time. Save the output, change the sample size in the parentheses of NOBSERVATIONS = 10(500); and run again. Then change the sample size again and run again. You will end up with three or more outputs.

7.2. Interpret outputs of the simulation

Locate in the output file the following tables:


                        CORRELATIONS                MEAN SQUARE ERROR
                    Average    Std. Dev.           Average    Std. Dev.
    Mean             0.9545      0.0201             2.2674       0.130
    Variance         0.8580      0.1056             3.5743      21.390


         MORAL1 Mean                         0.970        2.262
         MORAL1 Variance                     0.758        1.568

These are two sets of measures of reliability of latent means estimated in the previous steps with alignment. First table results from two-stage computation:

  1. it extracts latent means in across groups, which were estimated in a single simulation and correlates them to the true (population) means, and then
  2. these correlations are averaged across all the simulations (in my case 500).

In the same way, the measure is found for the latent variances. Std. Dev. of correlations/variances here refers to standard deviation of correlations across simulation runs. It seems correct to interpret these scores as a measure of reliability of latent means estimated by alignment. These correlations are typically very high, so I would be worried when they are less than 0.95 (Asparouhov and Muthen, 2013 suggest that correlations should be not less than 0.98). In my example, there is something disturbing going with the estimated variances. However, when I run a simulation with 500 cases in each simulation (which is closer to my actual data) this correlation gets very close to 1 (0.968). It means that such a model wouldn’t work if I had  less respondents.

Additionally, mean square error is calculated, which is an absolute reverse measure of association.

Second table is a product of

  1. averaging latent means across all the simulation runs (500 in my case), and
  2. correlating it with the true values.

First column lists these correlations, the second column is (apparently) mean square error. These measures seem to indicate reliability of the simulation itself, and reliability of the measurement model in general.

Sometimes all these correlations are zeros. If this is the case scroll down to the errors section; it might be that the model was misspecified somehow or none of the models converged.

That’s it.

If the news are good and alignment helped to locate problematic parameters/groups, you may proceed with corresponding dropping groups/modifying model to achieve higher levels of invariance. If you are happy with what you get with alignment, next step might be predicting factor scores based on alignment and then using them as a reliable (though not perfect) substitute of the factor scores.  it can be done in a standard Mplus way by adding SAVE = FSCORES; to the SAVEDATA: section.

Example Mplus files

Here is the list of the files used in the examples above

Additional options

Bayesian estimation

This is pretty much uncharted territory because only a few publications explored this analysis. One may consider using Bayesian alignment if the data are noisy and even configural model does not show a great fit to the data. The next step in this case would be setting up a Bayesian approximate invariance model with large prior variances of parameters across groups, and next running the alignment to find better solution. Check this paper with its supplementary materials for the full example. In our case the input will look something like this:

Ranking table

You can request it by adding SAVEDATA: RANKING IS ranking.dat; in the input file of fixed or free alignment (not simulation). The rankings of groups are based on the freely estimated and aligned group factor means, the differences are determined by the significance of the factor mean differences. It is also listed in the standard output, but in a bit different form, as shown in Step 5 “Factor mean comparison”.

Ranking table for MORAL1


Fit  function contribution

Some papers report Fit Function Contribution from every between-group parameter constraint, that is, how much the model fit improved after applying alignment. I find this statistic problematic because it doesn’t have a clear unit and its comparability across parameters and models is questionable. R² already does this job for you.  Still, you can request it by requesting TECH8 output, by adding OUTPUT: TECH8;

In the output file, closer to the end, you will find a section which contains very detailed information, so scroll directly to these sections:



Fit Function Loadings Contribution By Variable
Fit Function Intercepts Contribution By Variable

These numbers are those contributions to the fit of the model that came from every parameter in alignment. The order of variables follows the data, so it’s like in my VARIABLE: NAMES statement: prostit homosex abortion divorce.

Estimation fine-tuning

The user has a lot of control over alignment optimization. There are several options that you can add in the  ANALYSIS section to tune the alignment optimization algorithm. The following is a copy from Mplus Guide, version 8 (Muthén & Muthén, 1998-2017):

The ASTARTS option is used to specify the number of random sets of starting values to use for the alignment optimization. The default is 30.

The AITERATIONS option is used to specify the maximum number of iterations in the alignment optimization. The default is 5000.

The ACONVERGENCE option is used to specify the convergence criterion for the derivatives of the alignment optimization. The default is 0.001.

Beside this,  it is possible to choose the alignment function itself

The SIMPLICITY option has two settings: SQRT and FOURTHRT. SQRT is the default. The SQRT setting takes the square root of the weighted component loss function. The FOURTHRT setting takes the double square root of the weighted component loss function. It may in some cases further reduce small significant differences.

The precision of alignment can be boosted by lowering the value of tolerance, but you risk to lack the convergence, i.e. it might end with no solution at all.

The TOLERANCE option is used to specify the simplicity tolerance value of the alignment optimization which must be positive. The default is 0.01.

The METRIC option  is not related to metric invariance! This option identifies a set of constraints applied to identify the model:

The METRIC option is used to specify the factor variance metric of the alignment optimization. The METRIC option has two settings: REFGROUP and PRODUCT. REFGROUP is the default where the factor variance is fixed at one in the reference group. The PRODUCT setting sets the product of the factor variances in all of the groups to one. The PRODUCT setting is not allowed with ALIGNMENT=FIXED.

Categorical indicators

In case (some) of the indicators are binary or ordinal, it is possible to apply alignment and all the steps above will be the same with minor differences. In the input files for alignment it is only needed to add the names of categorical indicators to the new line CATEGORICAL = or the VARIABLE: section and algorithm = integration; to the ANALYSIS: section. Due to the fact that it uses integration to estimate parameters, it can take substantial amount of time to compute.

In simulations, everything is the same as well, with couple additions. Like I just mentioned, add new line CATEGORICAL = or the VARIABLE: section and algorithm = integration; to the ANALYSIS: section. And again, list all the categorical variables in the new line  GENERATE =  of the section MONTECARLO:, putting the number of categories minus one in parentheses, something like this  GENERATE = homosex (9) prostitut (9); where both variables have 10 categories.

The automation in R described below simplifies these modifications a lot.


So far, alignment analysis is available only in Mplus software.

The R package “sirt” contains function “invariance.alignment()”, but it provides a different procedure, it was only inspired by original idea by Muthén and Asparouhov.

Automation in R

 I wrote three functions that allow to quickly create and run all the models required for the alignment analysis (free, fixed, and simulations). 

  model = "Moral BY prostit homosex abortion divorce;", ! Formula in Mplus format
  group = "country", ! grouping variable
  categorical = NULL, ! which indicators are ordinal/binary? supply a character vector
  dat = wvs.s, 
  sim.samples = c(100, 500, 1000), ! Group sample sizes for simulation, 
                                   ! the length of this vector also determines 
                                   ! the number of simulation studies.
  sim.reps = 500,      ! The number of simulated datasets in each simulation
  Mplus_com = "Mplus", ! Sometimes you don't have a direct access to Mplus, so this 
                       ! this argument specifies what to send to a system command line.
  path = getwd(),  ! where all the .inp, .out, and .dat files will be stored
  summaries = TRUE ! if the extractAlignment() and extractAlignmentSim() should
                   ! be run after all the Mplus work is done.

Another function summarizes the alignment output – check out extractAlignment(). It ha s a single argument which is a path to an .out Mplus file, it prints the summary of alignment in a nice way and returns a list with all the alignment info in the R-manageable format.

And finally extractAlignmentSim() function helps with summarizing multiple simulation outputs. It extracts only information described in Step 7.2

All functions can be accesses by running:


See more on gitHub


Original paper that suggested the alignment method:

Asparouhov, T., & Muthén, B. (2014). Multiple-group factor analysis alignment. Structural Equation Modeling: A Multidisciplinary Journal21(4), 495-508. (also known as Webnote 18, version 3)

An example with categorical indicators (IRT models):

Muthén, B., & Asparouhov, T. (2014). IRT studies of many groups: the alignment method. Frontiers in Psychology5, 978.

Another clarification with an example:

Muthén, B., & Asparouhov, T. (2018). Recent methods for the study of measurement invariance with many groups: alignment and random effects. Sociological Methods & Research47(4), 637-664.

Extension of alignment to test for equality of residuals and variances (idk why):

Marsh, H. W., Guo, J., Parker, P. D., Nagengast, B., Asparouhov, T., Muthén, B., & Dicke, T. (2018). What to do when scalar invariance fails: The extended alignment method for multi-group factor analysis comparison of latent means across many groups. Psychological Methods, 23(3), 524-545.

Nice applications

Munck, I., Barber, C., & Torney-Purta, J. (2018). Measurement invariance in comparing attitudes toward immigrants among youth across Europe in 1999 and 2009: The alignment method applied to IEA CIVED and ICCS. Sociological Methods & Research47(4), 687-728.

Lomazzi, V. (2018). Using Alignment Optimization to test the measurement invariance of gender role attitudes in 59 Countries. Methods, data, analyses: a journal for quantitative methods and survey methodology (mda)12(1), 77-103.

Another tutorial

Byrne, B. M., & van de Vijve, F. J. (2017). The maximum likelihood alignment approach to testing for approximate measurement invariance: A paradigmatic cross-cultural application. Psicothema29(4).



* One of the authors of the alignment method listed following reasons for the lack of correspondence between R² and the number of invariant groups:

 Tihomir Asparouhov posted on Friday, December 23, 2016 – 9:55 am
There could be several different reasons.

1. The one threshold that is non-invariant is large (due to non-occurrence of a particular category in one group) and that accounts for the majority of the variability in the threshold.

2. The factor mean variability is small

3. The loading is small

4. It can also be a combination of the above and large standard errors that lean to not being able to establish significant non-invariance


Most likely the issue is due to empty cells in certain groups or very small variation in the factor mean and variance across groups or very dis-balanced group design.