Quantile Normalization and Inverse Normal Transform

Quantile Normalization and Inverse Normal Transform

[image] # Caption (optional) caption = “Image credit: Unsplash

# Focal point (optional) # Options: Smart, Center, TopLeft, Top, TopRight, Left, Right, BottomLeft, Bottom, BottomRight focal_point = “”

Dependency

This post demonstrate the following:

  • Quantile Normalization implemented in R package preprocessCore.
  • Inverse-Normal-Transform implemented in RNOmni
source("http://www.bioconductor.org/biocLite.R")
biocLite("preprocessCore")
install.packages("RNOmni")
library(preprocessCore)
library(RNOmni)

General Princinples

Quantile Normalization

Simplest way to put it: Quantile normalization is a technique for making different distributions have the same statistical property by “aligning”" their quantiles. Statquest has a good video explaining this technique. Here is the video:

Inverse-Normal-Transform

Inverse normal transformation, a.k.a ranked based Inverse-Normal-Transformation(INT), is a theoretically complicated method. But again, the simplest way to put it: INT increase the “normality” of the distribution, by aligning the quantiles to the standard normal quantiles. As a result we can now apply statistal models that have a normality assumption.

Be Cautious!

Normalization techeniques are highly application specific. Blindly appling them may create as many problems as they solve. If you don’t know the “right”" one to use, you’d better do some test-ride and select the one that is statistically better(e.g. Cross-Validation, etc.).

Set plot parameters

library(RColorBrewer)
qual_col_pals = brewer.pal.info[brewer.pal.info$category == 'qual',]
col_vec = unlist(mapply(brewer.pal, qual_col_pals$maxcolors, rownames(qual_col_pals))) # 74 colors in RcolorBrewer

Load Data

The data in this post is a data.frame where rows are genes and columns are patients, the measurement is \(log_2(rpkm + 1)\). It’s straghtforward to understand so I’d rather not to bother to provide a link for it.

data = read.csv("/Users/bos/R-workspace/2.eQTL/large500_expr")
expr = as.matrix(data[,-1])  # 1st column is gene id

Quantile Transformation

before quantile transformation

plot(density(unlist(expr[, 1])), col=col_vec[1])
for(i in 2:40) lines(density(expr[,i]), col=col_vec[i])

after quantile transformation

expr.qt <- preprocessCore::normalize.quantiles(as.matrix(expr))
plot(density(expr.qt[,1]), col = col_vec[1])
for(i in 2:40) lines(density(expr.qt[,i]), col=col_vec[i])

Note here most part of those transformed distributions land on top of each other, but if the variability is larger in your data, then near the tail the distributions will not be perfectly same which is good because it keeps the variability.
However, Quantile Normalization can not remove batch effect, see Jeff’s post

Inverse-Normal-Transformation

before

plot(density(expr[1,]), col = col_vec[1])
for(i in 2:40) lines(density(expr[i, ]), col = col_vec[i])

After

expr.int = t(apply(expr, 1, RNOmni::rankNormal))
plot(density(expr.int[1,]), col = col_vec[1])
for(i in 2:40) lines(density(expr.int[i, ]), col = col_vec[i])

As you can see, the distribution are normalized toward std. normal, and variability of the tails are preserved.(You may expect a different result with this plot, because my data here is RNA-seq data with heavy-tails, for moderate-tailed data, it should become more similar to a std. normal. )

Session Information

if (!requireNamespace("devtools")) install.packages("devtools")
## Loading required namespace: devtools
devtools::session_info()
## Session info -------------------------------------------------------------
##  setting  value                       
##  version  R version 3.5.0 (2018-04-23)
##  system   x86_64, darwin15.6.0        
##  ui       X11                         
##  language (EN)                        
##  collate  en_US.UTF-8                 
##  tz       America/Los_Angeles         
##  date     2018-10-20
## Packages -----------------------------------------------------------------
##  package        * version date       source         
##  abind            1.4-5   2016-07-21 CRAN (R 3.5.0) 
##  backports        1.1.2   2017-12-13 CRAN (R 3.5.0) 
##  base           * 3.5.0   2018-04-24 local          
##  blogdown         0.8     2018-07-15 CRAN (R 3.5.0) 
##  bookdown         0.7     2018-02-18 CRAN (R 3.5.0) 
##  codetools        0.2-15  2016-10-05 CRAN (R 3.5.0) 
##  compiler         3.5.0   2018-04-24 local          
##  datasets       * 3.5.0   2018-04-24 local          
##  devtools         1.13.6  2018-06-27 CRAN (R 3.5.0) 
##  digest           0.6.15  2018-01-28 CRAN (R 3.5.0) 
##  evaluate         0.11    2018-07-17 CRAN (R 3.5.0) 
##  foreach          1.4.4   2017-12-12 CRAN (R 3.5.0) 
##  graphics       * 3.5.0   2018-04-24 local          
##  grDevices      * 3.5.0   2018-04-24 local          
##  htmltools        0.3.6   2017-04-28 CRAN (R 3.5.0) 
##  iterators        1.0.10  2018-07-13 CRAN (R 3.5.0) 
##  knitr            1.20    2018-02-20 CRAN (R 3.5.0) 
##  magrittr         1.5     2014-11-22 CRAN (R 3.5.0) 
##  memoise          1.1.0   2017-04-21 CRAN (R 3.5.0) 
##  methods        * 3.5.0   2018-04-24 local          
##  plyr             1.8.4   2016-06-08 CRAN (R 3.5.0) 
##  preprocessCore * 1.42.0  2018-05-01 Bioconductor   
##  RColorBrewer   * 1.1-2   2014-12-07 CRAN (R 3.5.0) 
##  Rcpp             0.12.18 2018-07-23 cran (@0.12.18)
##  rmarkdown        1.10    2018-06-11 CRAN (R 3.5.0) 
##  RNOmni         * 0.4.0   2018-05-16 CRAN (R 3.5.0) 
##  rprojroot        1.3-2   2018-01-03 CRAN (R 3.5.0) 
##  stats          * 3.5.0   2018-04-24 local          
##  stringi          1.2.4   2018-07-20 CRAN (R 3.5.0) 
##  stringr          1.3.1   2018-05-10 CRAN (R 3.5.0) 
##  tools            3.5.0   2018-04-24 local          
##  utils          * 3.5.0   2018-04-24 local          
##  withr            2.1.2   2018-03-15 CRAN (R 3.5.0) 
##  xfun             0.3     2018-07-06 CRAN (R 3.5.0) 
##  yaml             2.2.0   2018-07-25 CRAN (R 3.5.0)

Related

comments powered by Disqus