Search Algorithms: BPC |
Build Pure Clusters (BPC) is one of the three algorithms in Tetrad designed to build pure measurement/structural models (the others are the MIM Build algorithm and the Purify algorithm).
The goal of Build Pure Clusters is to build a pure measurement model using observed variables from a data set. Observed variables are clustered into disjoint groups, each group representing indicators of a single hidden variable. Variables in one group are not indicators of the hidden variables associated with the other groupsl. Also, some variables given as input will not be used because they do not fit into a pure measurement model along with the chosen ones.
The Build Pure Clusters algorithm assumes that the population can be described as a measurement/structural model where observed variables are linear indicators of the unknown latents. Notice that linearity among latents is not necessary (although it will be necessary for the MIM Build algorithm) and latents do not need to be continuous. It is also assumed that the unknown population graph contains a pure subgraph where each latent has at least three indicators. This assumption is not testable is should be evaluated by the plausibility of the final model.
The current implementation of the algorithm accepts only continuous data sets as input. For general information about model building algorithms, consult the Search Algorithms page.
Entering Build Pure Clusters parameters
For example, consider a model with this true graph:
If data is generated using this model and a search is constructed from the data, selecting BPC, the following parameters will be requested:
Upon executin the search, BPC returns a pure measurement model. Because of the internal randomization, outputs may vary from run to run, but one should not expect large differences (and this can be actually used to evaluate if the assumptions are reasonable for a given set of input variables). In our example, the outcome should be as follows if the sample is representative of the population:
Edges with circles at the endpoints are added only to distinguish latent variables from the indicators. BPC does not make any claims about the causal relationships among latent variables (this is the role of the MIM Build algorithm). The labels given to the latent variables are arbitrary. As part of the analysis, a domain expert should evaluate if such latents have indeed a physical or abstract meaning, or if they should be discarded as meaningless. Such reification is domain dependent.
Note: If the output is not arranged helpfully, use the Fruchterman-Reingold layout in the Layout menu to arrange more readably.