Contextual Analysis of TF Occupancy (CATO) scores

CATO scores are important for predicting which genetic variants affect transcription factor (TF) binding and DNA accessibility.

CATO scores are standardized regression scores from logistic regression models that use the following variables to predict the probability of a variant affecting the binding of a TF:

"This approach resulted in a simple scoring scheme, termed contextual analysis of TF occupancy (CATO), that provides a recalibrated probability of affecting the binding of any TF, as well as a quantitatively ranked list of TF families whose binding might be altered" (information here and below from Matt Maurano et al. Nat. Genetics 2015).

CATO model: significant ∼ log(Read depth) + Num. hets.^2 + MCV^2 + CpG Island + 3 ′ UTR + coding + intron + intergenic + Dist. to TSS^2 + DHS strength^2 + Width of DHS + #nearby binding sites^2 + PhastCons + Footprint presence + Footprint occupancy + log(score)^2 + logodds difference + x2 + ... + xn

For additional information, see CATO explainer from paper in: http://www.mauranolab.org/CATO/