AI helps you reading Science
AI Insight
AI extracts a summary of this paper
Weibo:
Tree-guided group lasso for multi-response regression with structured sparsity, with an application to eQTL mapping
ANNALS OF APPLIED STATISTICS, no. 3 (2012): 1095-1117
Abstract
We consider the problem of estimating a sparse multi-response regression function, with an application to expression quantitative trait locus (eQTL) mapping, where the goal is to discover genetic variations that influence gene-expression levels. In particular, we investigate a shrinkage technique capable of capturing a given hierarchical ...More
Code:
Data:
Introduction
- Recent advances in high-throughputtechnologyforprofilinggene expressionsand assayinggeneticvariationsat a genome-wide scale have providedresearchersan unprecedentedopportunityto comprehensivelystudythe geneticcauses of complex diseases such as asthma,diabetes,and cancer.Expression quantitativetraitlocus mappingconsidersgene expressionmeasurements, knownas gene-expressiontraits, as intermediatpehenotypes,and aims to identifythegeneticmarkerssuch as single nucleotidepolymorphisms(SNPs) thatinfluencetheexpressionlevels of genes, whichgives rise to thevariabilityin
ReceiveDdecembe2r009;reviseFdebruar2y012. 2SSuuppppoorrtteeiidnndppaarrttbbyyNONIHR1NR00010G14M0901807765984N,. - The lasso estimationin (2.2) is equivalentto selectingrelevantcovariatesfor each oftheK responsesseparatelya,nddoes notprovideanymechanismtoenforce a joint selectionof commonrelevantcovariatesformultiplerelatedresponses.In theliteraturoefmulti-tasklearning,an L ' ¡Li penalty, knownas a grouplasso penalty[Yuan and Lin (2006)], has been adoptedin multivariate-responsreegression to takeadvantageof therelatednessof theresponsevariablesand recoverthe unionsuppor-t thepatternof nonzeroregressioncoefficientsharedacross all of theresponses[Obozinski,Wainwrightand Jordan(2008)]
- This methodis widely known as the Li/L2-regularized multi-taskregressionin the machine learning communitya,nd itsestimateforregressioncoefficientiss givenas (2.3)
Highlights
- Recent advances in high-throughputtechnologyforprofilinggene expressionsand assayinggeneticvariationsat a genome-wide scale have providedresearchersan unprecedentedopportunityto comprehensivelystudythe geneticcauses of complex diseases such as asthma,diabetes,and cancer.Expression quantitativetraitlocus mappingconsidersgene expressionmeasurements, knownas gene-expressiontraits, as intermediatpehenotypes,and aims to identifythegeneticmarkerssuch as single nucleotidepolymorphisms(SNPs) thatinfluencetheexpressionlevels of genes, whichgives rise to thevariabilityin
ReceiveDdecembe2r009;reviseFdebruar2y012. 2SSuuppppoorrtteeiidnndppaarrttbbyyNONIHR1NR00010G14M0901807765984N - In Section 2 we provide a briefdiscussion of previousworkon sparse regressionestimation.In Section 3 we introducethetreelasso and describean efficienotptimizationmethodbased on SPG
- Let us assume thatdata are collected for J SNPs and K gene-expressiontraits overN individuals.Let X denotetheN x J matrixof SNP genotypesforcovariates,and Y x K matrixof gene-expressionmeasurementsforresponses.In eQTL mapping,each elementoftheX takesvalues from{0, 1,2} accordingto the numberof minoralleles at thegivenlocus in each individual.,we assume a linearmodel forthefunctionalmappingfromcovariatesto responsevariables: (2.1)
- ||i is thematrixL' norm,and A.is a tuning parameterthatcontrolstheamountof sparsityin thesolution.SettingA.to a small value leads to a smallernumberofnonzeroregressioncoefficients
- In thisarticlewe proposeda novel regularizedregressionapproach,called thetreelasso, thatidentifiescovariatesrelevantto multiplerelated responsesjointlyby leveragingthecorrelationstructurein responsesrepresented as a hierarchicalclusteringtree.We discussed how thisapproachcan be used in eQTL analysis to learn SNPs withpleiotropoiceffectsthatinfluencethe activities of multipleco-expressedgenes
- For optimization,we adoptedthe smoothing proximalgradientapproach thatwas originallydeveloped fora generalclass of structured-sparsity-inducpinengalties,as thetree-lassopenaltycan be viewed as a special case
Methods
- The authors demonstratethe performanceof the methodon simulateddata setsand theyeastdata setofgenotypesand gene expressions,and compare the resultswiththose fromthe lasso and the Li /¿^-regularizedmulti-task regressionthatdo not assume any structureover responses.In all of the experiments,the authors determinetheregularizationparameterXby fittingmodels on a training setfora rangeof values forX,computingthepredictionerrorof each model on a validationset,and thenselectingthevalue of a regularizationparameterthatgives thelowestpredictionerror.The authors evaluatethesemethodsbased on two criteria,sensitivity/specificiitnydetectingtruerelevantcovariatesand predictionerrorson test data sets.The authors notethatthe 1-and sensitivityare equivalentto typeI errorrateand 1-,respectivelyT. esterrorsare obtainedas mean squared differencesbetweenthepredictedand observedresponsemeasurements based on testdata setsthatare independentoftrainingand validationdata sets.
4. 1. - The authors demonstratethe performanceof the methodon simulateddata setsand theyeastdata setofgenotypesand gene expressions,and compare the resultswiththose fromthe lasso and the Li /¿^-regularizedmulti-task regressionthatdo not assume any structureover responses.In all of the experiments,the authors determinetheregularizationparameterXby fittingmodels on a training setfora rangeof values forX,computingthepredictionerrorof each model on a validationset,and thenselectingthevalue of a regularizationparameterthatgives thelowestpredictionerror.The authors evaluatethesemethodsbased on two criteria,sensitivity/specificiitnydetectingtruerelevantcovariatesand predictionerrorson test data sets.The authors notethatthe 1-and sensitivityare equivalentto typeI errorrateand 1-,respectivelyT.
- To illustratethe behavior of differenmt ethods,the authors fitthe lasso, the L1/L2regularizedmulti-taskregression,and the methodto a single data set simulated withthenonzeroelementsof B setto 0.4, and show theresultsin Figure3(c)-(e), respectivelyS.
- S a result,once a covariate is selectedas relevantfora response,itgetsselectedforall oftheotherresponses, and the authors observeverticalstripesof nonzerovalues in Figure 3(d).
- When thehierarchical clusteringstructurein Figure 3(a) is available as priorknowledge,it is visuallyclear fromFigure3(e) thatourmethodis able to suppressfalsepositives, and to recoverthe truerelevantcovariatesforcorrelatedresponses significantly betterthanothermethods
Conclusion
- In thisarticlewe proposeda novel regularizedregressionapproach,called thetreelasso, thatidentifiescovariatesrelevantto multiplerelated responsesjointlyby leveragingthecorrelationstructurein responsesrepresented as a hierarchicalclusteringtree.The authors discussed how thisapproachcan be used in eQTL analysis to learn SNPs withpleiotropoiceffectsthatinfluencethe activities of multipleco-expressedgenes.
- The authors' resultson boththesimulatedand yeastdata sets showed a clear advantageof thetreelasso in increasingthepowerof detectingweak signals and reducingfalsepositives
Tables
- Table1: EnricheGdOcategorifeosrgeneswhosexpressiolenvelsareinfluencbeydthesameSNPinthe yeasteQTLdataset.Theresultisncolumn1s-4arebasedonthetree-lasseostimatoefregression
Reference
- Beck, A. andTeboulle, M. (2009).A fastiterativshe rinkage-threshoaldgionrgithfmorlinear inversperoblemSsI.AMJ.ImaginSgci.2 183-202M. R2486527
- Boyd,S. andVandenberghe,L. (2004).ConvexOptimizatioCna.mbridgUenivP. ressC, ambridgeM. R2061575
- Chen,Y.,Zhu,J.,Lum,P. K.,Yang, X.,Pinto,S., MacNeil, D. J.,Zhang,C., Lamb,J., Edwards,S., Sieberts,S. K. etal.(2008).VariationinsDNAelucidatme oleculanretworks thactausediseaseN. atur4e52429-435.
- Chen,X.,Lin,Q., Kim,S., Carbonell, J.andXing,E. P. (2011).SmoothinpgroximgalradientmethofdorgenerasltructursepdarselearningIn. Proceedingosfthe21thConferenocne UncertainitnyArtificIinaltelligen(cUeAI)105-114.AUAIPressC, orvalliOs,R.
- Cheung,V.,Spielman,R.,Ewens,K., Weber,T.,Morley, M. andBurdick,J.(2005). Mappindgeterminaonfthsumangeneexpressiobnyregionalndgenome-wiadsesociatioNn.ature4371365-1369.
- Emilsson,V.,Thorleifsson,G.,Zhang,B.,Leonardson,A. S.,Zink,F.,Zhu,J.,Carlson, S., Helgason, A., Walters, G. B., Gunnarsdottir,S. etal. (2008).Geneticosf geneexpressioannditseffecotndiseaseN. atur4e52423-428.
- Friedman,J.,Hastie, T. andTibshirani,R. (2010).A noteonthegrouplassoanda sparse grouplasso.TechnicarleporDt,ept.StatisticSst,anforUdniv.S,tanforCdA,.
- Friedman,J.,Hastie, T.,Höfling,H. andTibshirani,R. (2007).Pathwisceoordinaotpe timizatioAn.nnA. pplS. tat1. 302-332M. R2415737
- Golub, T. R.,Sloním,D. K., Tamayo,P.,Huard, C., Gaasenbeek, M.,Mesirov,J.P., Coller, H., Loh, M. L., Downing,J.R., Caligiuri, M. A., Bloomfield, C. D. and Lander, E. S. (1999).Moleculacrlassificatioofncancerc:lassdiscoverayndclassprediction bygeneexpressiomnonitorinScgi.ence286531-537.
- Hastie, T., Tibshirani,R., Botstein, D. andBrown,P. (2001).Supervisehdarvestinogf expressiotnreesG. enomBeiol.2 0003.1-0003.12.
- Jacob,L., Obozinski,G. andVert,J.(2009).Grouplassowithoverlapandgraphlasso.In Proceedinogfsthe26thInternationCaolnferenocneMachinLeearningA.CM,NewYork.
- Jenatton,R., Audibert,J.andBach, F. (2009).Structurveadriablseelectiown ithsparsityinducinngormTs.echnicraleporItN, RIA.
- Kim,S. andXing,E. P.(2009).Statisticeasltimationfcorrelatgedenomaessociationtosa quantitativteraintetworPkL. oSGenetic5se1000587.
- Kim,S. andXing,E. P. (2012).Supplementot "Tree-guidegdrouplassoformulti-response regressiownithstructursepdarsityw,ithan applicatiotnoeQTL mappingD."OI:10.1214/12AOAS549SUPP.
- Lee, S. I.,Pe'er, D., Dudley,A.,Church,G. andKoller, D. (2006).Identifyirneggulatory mechanismussingindividuvalariatiornevealks eyroleforchromatminodificatiPonro. c. Natl. Acad.Sci.USA10314062-14067.
- Obozinski,G.,Taskar, B. andJordan,M. I. (2010).JoinctovariatseelectioanndjointsubspaceselectiofnormultipclelassificatipornoblemSst.atC. ompu2t.0231-252M. R2610775
- Obozinski,G.,WainwrightM,. J.andJordan,M. J.(2008).High-dimensiounnailonsupport recoveriynmultivarirateegressioInn.AdvanceisnNeuraIlnformatiPornocessinSgystem2s1. MITPressC, ambridgMe,A.
- Pujana, M. A.,Han, J.J.,Starita, L. M.,Stevens, K. N.,Tewari,M.,Ahn,J.S., Rennert,G.,Moreno,V.,KirchhoffT,.,Gold, B. etal.(2007).Networmkodelinlginkbs reast cancersusceptibilaitnydcentrosomdyesfunctiNonat. urGe enetic3s9 1338-1349.
- Segal, E., Shapira, M., Regev,A., Pe'er, D., Botstein, D., Koller, D. andFriedman,N. (2003).ModulenetworkIds:entifyirneggulatomryoduleasndtheicrondition-specific regulatofrrsomgeneexpressiodnataN. aturGe enetic3s4 166-178.
- S0RLIE,T.,Perou,e. M.,Tibshirani,R.,Aas, T.,Geisler, S., Johnsen,H.,Hastie, T., Eisen,M. B., van de Run,M.,JeffreyS,. S., Thorsen,T.,Quist,H.,Matese, J.C., Brown,P. O., Botstein, D., L0nning,P. E. andB0rresen-Dale, A. (2001).Geneexpressiopnatternosfbreasctarcinomdasistinguitsuhmosrubclassews ithclinicailmplications. ProcN. atlA. cad.Sci.USA98 10869-10874.
- Stranger, B., Forrest, M., Clark, A., Minichiello, M., Deutsch, S., Lyle, R., Hunt,S., Kahl, B.,Antonarakis,S., Tavare,S. etal.(2005).Genome-wiadsesociations ofgeneexpressiovnariatioinhumanPs.LoSGenetic1s695-704.
- Tibshirani,R. (1996).Regressiosnhrinkaagnedselectiovniathelasso.J.RoyS. tatisSt.oc.SerB. 58267-288M. R1379242
- Wu,T.T.,Chen,Y. F.,Hastie, T.,Sobel, E. andLange, K. (2009).Genome-wiadsesociation analysibsylassopenalizeldogistircegressioBni.oinformat2i5c7s14-721.
- Yuan, M. andLin,Y.(2006).Modelselectioanndestimatioinnregressiownithgroupevdariables. J.R.StatS. oc.SerB. StatM. ethodo6l8. 49-67.MR2212574
- Yuan, X. andYan, S. (2010).Visualclassificatiwonithmulti-tajsokintsparserepresentatIinon. Proceedinogfsthe2010IEEE ConferenocneComputeVrisionandPatterRnecogniti(oCnVPR). IEEE ComputeSrocietyPressL, osAlamitoCs,A.
- Zhang, Y. (2010).Multi-tasakctivelearninwgithoutpuctonstrainItns.Proceedingosfthe24th AAACI onferenocneArtificIinaltelligen(cAeAA/A).AAIPressM, enloParkC, A.
- Zhang,B. andHorvath,S. (2005).A generaflramewofrokrweightegdeneco-expressinonetworkanalysiSs.tatA. pplG. enetM. ol.Biol.4 Art1. 7,45pp.(electroniMc).R2170433
- Zhao, P.,Rocha, G. andYu, B. (2009).Thecompositaebsolutpeenaltiefsamilfyorgroupeadnd hierarchivcaalriablseelectionA.nnS. tatist3.73468-3497M. R2549566
- Zhou,Y.,Jin,R.andHoi,S. C. H. (2010).Exclusivleassoformulti-tafsekaturselectionIn.Proceedingosfthe13thInternationCaolnferenocneArtificIinaltelligenacnedStatisti(cAs ISTAT).S JMLRW&CP.
- Zhu, J.,Zhang, B., Smith,E. N., Drees, B., Brem,R. B., Kruglyak,L., Bumgarner,R. E. andSCHADTE,. E. (2008).Integratilnargge-scafleunctiongaelnomidcatatodissect thecomplexiotyfyeastregulatonryetworkNsa.turGe enetic4s0854-861.
- Zou, H. andHastie, T. (2005).RegularizatiaonndvariablseelectiovniatheelasticnetJ..R.Stat. Soc.Ser.B StatM. ethodo6l7. 301-320M. R2137327
Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn