AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
In Section 2 we provide a briefdiscussion of previousworkon sparse regressionestimation.In Section 3 we introducethetreelasso and describean efficienotptimizationmethodbased on SPG

Tree-guided group lasso for multi-response regression with structured sparsity, with an application to eQTL mapping

ANNALS OF APPLIED STATISTICS, no. 3 (2012): 1095-1117

Cited: 600|Views43
Full Text
Bibtex
Weibo

Abstract

We consider the problem of estimating a sparse multi-response regression function, with an application to expression quantitative trait locus (eQTL) mapping, where the goal is to discover genetic variations that influence gene-expression levels. In particular, we investigate a shrinkage technique capable of capturing a given hierarchical ...More

Code:

Data:

0
Introduction
  • Recent advances in high-throughputtechnologyforprofilinggene expressionsand assayinggeneticvariationsat a genome-wide scale have providedresearchersan unprecedentedopportunityto comprehensivelystudythe geneticcauses of complex diseases such as asthma,diabetes,and cancer.Expression quantitativetraitlocus mappingconsidersgene expressionmeasurements, knownas gene-expressiontraits, as intermediatpehenotypes,and aims to identifythegeneticmarkerssuch as single nucleotidepolymorphisms(SNPs) thatinfluencetheexpressionlevels of genes, whichgives rise to thevariabilityin

    ReceiveDdecembe2r009;reviseFdebruar2y012. 2SSuuppppoorrtteeiidnndppaarrttbbyyNONIHR1NR00010G14M0901807765984N,.
  • The lasso estimationin (2.2) is equivalentto selectingrelevantcovariatesfor each oftheK responsesseparatelya,nddoes notprovideanymechanismtoenforce a joint selectionof commonrelevantcovariatesformultiplerelatedresponses.In theliteraturoefmulti-tasklearning,an L ' ¡Li penalty, knownas a grouplasso penalty[Yuan and Lin (2006)], has been adoptedin multivariate-responsreegression to takeadvantageof therelatednessof theresponsevariablesand recoverthe unionsuppor-t thepatternof nonzeroregressioncoefficientsharedacross all of theresponses[Obozinski,Wainwrightand Jordan(2008)]
  • This methodis widely known as the Li/L2-regularized multi-taskregressionin the machine learning communitya,nd itsestimateforregressioncoefficientiss givenas (2.3)
Highlights
  • Recent advances in high-throughputtechnologyforprofilinggene expressionsand assayinggeneticvariationsat a genome-wide scale have providedresearchersan unprecedentedopportunityto comprehensivelystudythe geneticcauses of complex diseases such as asthma,diabetes,and cancer.Expression quantitativetraitlocus mappingconsidersgene expressionmeasurements, knownas gene-expressiontraits, as intermediatpehenotypes,and aims to identifythegeneticmarkerssuch as single nucleotidepolymorphisms(SNPs) thatinfluencetheexpressionlevels of genes, whichgives rise to thevariabilityin

    ReceiveDdecembe2r009;reviseFdebruar2y012. 2SSuuppppoorrtteeiidnndppaarrttbbyyNONIHR1NR00010G14M0901807765984N
  • In Section 2 we provide a briefdiscussion of previousworkon sparse regressionestimation.In Section 3 we introducethetreelasso and describean efficienotptimizationmethodbased on SPG
  • Let us assume thatdata are collected for J SNPs and K gene-expressiontraits overN individuals.Let X denotetheN x J matrixof SNP genotypesforcovariates,and Y x K matrixof gene-expressionmeasurementsforresponses.In eQTL mapping,each elementoftheX takesvalues from{0, 1,2} accordingto the numberof minoralleles at thegivenlocus in each individual.,we assume a linearmodel forthefunctionalmappingfromcovariatesto responsevariables: (2.1)
  • ||i is thematrixL' norm,and A.is a tuning parameterthatcontrolstheamountof sparsityin thesolution.SettingA.to a small value leads to a smallernumberofnonzeroregressioncoefficients
  • In thisarticlewe proposeda novel regularizedregressionapproach,called thetreelasso, thatidentifiescovariatesrelevantto multiplerelated responsesjointlyby leveragingthecorrelationstructurein responsesrepresented as a hierarchicalclusteringtree.We discussed how thisapproachcan be used in eQTL analysis to learn SNPs withpleiotropoiceffectsthatinfluencethe activities of multipleco-expressedgenes
  • For optimization,we adoptedthe smoothing proximalgradientapproach thatwas originallydeveloped fora generalclass of structured-sparsity-inducpinengalties,as thetree-lassopenaltycan be viewed as a special case
Methods
  • The authors demonstratethe performanceof the methodon simulateddata setsand theyeastdata setofgenotypesand gene expressions,and compare the resultswiththose fromthe lasso and the Li /¿^-regularizedmulti-task regressionthatdo not assume any structureover responses.In all of the experiments,the authors determinetheregularizationparameterXby fittingmodels on a training setfora rangeof values forX,computingthepredictionerrorof each model on a validationset,and thenselectingthevalue of a regularizationparameterthatgives thelowestpredictionerror.The authors evaluatethesemethodsbased on two criteria,sensitivity/specificiitnydetectingtruerelevantcovariatesand predictionerrorson test data sets.The authors notethatthe 1-and sensitivityare equivalentto typeI errorrateand 1-,respectivelyT. esterrorsare obtainedas mean squared differencesbetweenthepredictedand observedresponsemeasurements based on testdata setsthatare independentoftrainingand validationdata sets.

    4. 1.
  • The authors demonstratethe performanceof the methodon simulateddata setsand theyeastdata setofgenotypesand gene expressions,and compare the resultswiththose fromthe lasso and the Li /¿^-regularizedmulti-task regressionthatdo not assume any structureover responses.In all of the experiments,the authors determinetheregularizationparameterXby fittingmodels on a training setfora rangeof values forX,computingthepredictionerrorof each model on a validationset,and thenselectingthevalue of a regularizationparameterthatgives thelowestpredictionerror.The authors evaluatethesemethodsbased on two criteria,sensitivity/specificiitnydetectingtruerelevantcovariatesand predictionerrorson test data sets.The authors notethatthe 1-and sensitivityare equivalentto typeI errorrateand 1-,respectivelyT.
  • To illustratethe behavior of differenmt ethods,the authors fitthe lasso, the L1/L2regularizedmulti-taskregression,and the methodto a single data set simulated withthenonzeroelementsof B setto 0.4, and show theresultsin Figure3(c)-(e), respectivelyS.
  • S a result,once a covariate is selectedas relevantfora response,itgetsselectedforall oftheotherresponses, and the authors observeverticalstripesof nonzerovalues in Figure 3(d).
  • When thehierarchical clusteringstructurein Figure 3(a) is available as priorknowledge,it is visuallyclear fromFigure3(e) thatourmethodis able to suppressfalsepositives, and to recoverthe truerelevantcovariatesforcorrelatedresponses significantly betterthanothermethods
Conclusion
  • In thisarticlewe proposeda novel regularizedregressionapproach,called thetreelasso, thatidentifiescovariatesrelevantto multiplerelated responsesjointlyby leveragingthecorrelationstructurein responsesrepresented as a hierarchicalclusteringtree.The authors discussed how thisapproachcan be used in eQTL analysis to learn SNPs withpleiotropoiceffectsthatinfluencethe activities of multipleco-expressedgenes.
  • The authors' resultson boththesimulatedand yeastdata sets showed a clear advantageof thetreelasso in increasingthepowerof detectingweak signals and reducingfalsepositives
Tables
  • Table1: EnricheGdOcategorifeosrgeneswhosexpressiolenvelsareinfluencbeydthesameSNPinthe yeasteQTLdataset.Theresultisncolumn1s-4arebasedonthetree-lasseostimatoefregression
Download tables as Excel
Reference
  • Beck, A. andTeboulle, M. (2009).A fastiterativshe rinkage-threshoaldgionrgithfmorlinear inversperoblemSsI.AMJ.ImaginSgci.2 183-202M. R2486527
    Google ScholarLocate open access versionFindings
  • Boyd,S. andVandenberghe,L. (2004).ConvexOptimizatioCna.mbridgUenivP. ressC, ambridgeM. R2061575
    Google ScholarLocate open access versionFindings
  • Chen,Y.,Zhu,J.,Lum,P. K.,Yang, X.,Pinto,S., MacNeil, D. J.,Zhang,C., Lamb,J., Edwards,S., Sieberts,S. K. etal.(2008).VariationinsDNAelucidatme oleculanretworks thactausediseaseN. atur4e52429-435.
    Google ScholarFindings
  • Chen,X.,Lin,Q., Kim,S., Carbonell, J.andXing,E. P. (2011).SmoothinpgroximgalradientmethofdorgenerasltructursepdarselearningIn. Proceedingosfthe21thConferenocne UncertainitnyArtificIinaltelligen(cUeAI)105-114.AUAIPressC, orvalliOs,R.
    Google ScholarFindings
  • Cheung,V.,Spielman,R.,Ewens,K., Weber,T.,Morley, M. andBurdick,J.(2005). Mappindgeterminaonfthsumangeneexpressiobnyregionalndgenome-wiadsesociatioNn.ature4371365-1369.
    Google ScholarFindings
  • Emilsson,V.,Thorleifsson,G.,Zhang,B.,Leonardson,A. S.,Zink,F.,Zhu,J.,Carlson, S., Helgason, A., Walters, G. B., Gunnarsdottir,S. etal. (2008).Geneticosf geneexpressioannditseffecotndiseaseN. atur4e52423-428.
    Google ScholarFindings
  • Friedman,J.,Hastie, T. andTibshirani,R. (2010).A noteonthegrouplassoanda sparse grouplasso.TechnicarleporDt,ept.StatisticSst,anforUdniv.S,tanforCdA,.
    Google ScholarFindings
  • Friedman,J.,Hastie, T.,Höfling,H. andTibshirani,R. (2007).Pathwisceoordinaotpe timizatioAn.nnA. pplS. tat1. 302-332M. R2415737
    Google ScholarLocate open access versionFindings
  • Golub, T. R.,Sloním,D. K., Tamayo,P.,Huard, C., Gaasenbeek, M.,Mesirov,J.P., Coller, H., Loh, M. L., Downing,J.R., Caligiuri, M. A., Bloomfield, C. D. and Lander, E. S. (1999).Moleculacrlassificatioofncancerc:lassdiscoverayndclassprediction bygeneexpressiomnonitorinScgi.ence286531-537.
    Google ScholarFindings
  • Hastie, T., Tibshirani,R., Botstein, D. andBrown,P. (2001).Supervisehdarvestinogf expressiotnreesG. enomBeiol.2 0003.1-0003.12.
    Google ScholarFindings
  • Jacob,L., Obozinski,G. andVert,J.(2009).Grouplassowithoverlapandgraphlasso.In Proceedinogfsthe26thInternationCaolnferenocneMachinLeearningA.CM,NewYork.
    Google ScholarLocate open access versionFindings
  • Jenatton,R., Audibert,J.andBach, F. (2009).Structurveadriablseelectiown ithsparsityinducinngormTs.echnicraleporItN, RIA.
    Google ScholarFindings
  • Kim,S. andXing,E. P.(2009).Statisticeasltimationfcorrelatgedenomaessociationtosa quantitativteraintetworPkL. oSGenetic5se1000587.
    Google ScholarFindings
  • Kim,S. andXing,E. P. (2012).Supplementot "Tree-guidegdrouplassoformulti-response regressiownithstructursepdarsityw,ithan applicatiotnoeQTL mappingD."OI:10.1214/12AOAS549SUPP.
    Google ScholarFindings
  • Lee, S. I.,Pe'er, D., Dudley,A.,Church,G. andKoller, D. (2006).Identifyirneggulatory mechanismussingindividuvalariatiornevealks eyroleforchromatminodificatiPonro. c. Natl. Acad.Sci.USA10314062-14067.
    Google ScholarLocate open access versionFindings
  • Obozinski,G.,Taskar, B. andJordan,M. I. (2010).JoinctovariatseelectioanndjointsubspaceselectiofnormultipclelassificatipornoblemSst.atC. ompu2t.0231-252M. R2610775
    Google ScholarFindings
  • Obozinski,G.,WainwrightM,. J.andJordan,M. J.(2008).High-dimensiounnailonsupport recoveriynmultivarirateegressioInn.AdvanceisnNeuraIlnformatiPornocessinSgystem2s1. MITPressC, ambridgMe,A.
    Google ScholarFindings
  • Pujana, M. A.,Han, J.J.,Starita, L. M.,Stevens, K. N.,Tewari,M.,Ahn,J.S., Rennert,G.,Moreno,V.,KirchhoffT,.,Gold, B. etal.(2007).Networmkodelinlginkbs reast cancersusceptibilaitnydcentrosomdyesfunctiNonat. urGe enetic3s9 1338-1349.
    Google ScholarFindings
  • Segal, E., Shapira, M., Regev,A., Pe'er, D., Botstein, D., Koller, D. andFriedman,N. (2003).ModulenetworkIds:entifyirneggulatomryoduleasndtheicrondition-specific regulatofrrsomgeneexpressiodnataN. aturGe enetic3s4 166-178.
    Google ScholarLocate open access versionFindings
  • S0RLIE,T.,Perou,e. M.,Tibshirani,R.,Aas, T.,Geisler, S., Johnsen,H.,Hastie, T., Eisen,M. B., van de Run,M.,JeffreyS,. S., Thorsen,T.,Quist,H.,Matese, J.C., Brown,P. O., Botstein, D., L0nning,P. E. andB0rresen-Dale, A. (2001).Geneexpressiopnatternosfbreasctarcinomdasistinguitsuhmosrubclassews ithclinicailmplications. ProcN. atlA. cad.Sci.USA98 10869-10874.
    Google ScholarLocate open access versionFindings
  • Stranger, B., Forrest, M., Clark, A., Minichiello, M., Deutsch, S., Lyle, R., Hunt,S., Kahl, B.,Antonarakis,S., Tavare,S. etal.(2005).Genome-wiadsesociations ofgeneexpressiovnariatioinhumanPs.LoSGenetic1s695-704.
    Google ScholarFindings
  • Tibshirani,R. (1996).Regressiosnhrinkaagnedselectiovniathelasso.J.RoyS. tatisSt.oc.SerB. 58267-288M. R1379242
    Google ScholarLocate open access versionFindings
  • Wu,T.T.,Chen,Y. F.,Hastie, T.,Sobel, E. andLange, K. (2009).Genome-wiadsesociation analysibsylassopenalizeldogistircegressioBni.oinformat2i5c7s14-721.
    Google ScholarFindings
  • Yuan, M. andLin,Y.(2006).Modelselectioanndestimatioinnregressiownithgroupevdariables. J.R.StatS. oc.SerB. StatM. ethodo6l8. 49-67.MR2212574
    Google ScholarLocate open access versionFindings
  • Yuan, X. andYan, S. (2010).Visualclassificatiwonithmulti-tajsokintsparserepresentatIinon. Proceedinogfsthe2010IEEE ConferenocneComputeVrisionandPatterRnecogniti(oCnVPR). IEEE ComputeSrocietyPressL, osAlamitoCs,A.
    Google ScholarFindings
  • Zhang, Y. (2010).Multi-tasakctivelearninwgithoutpuctonstrainItns.Proceedingosfthe24th AAACI onferenocneArtificIinaltelligen(cAeAA/A).AAIPressM, enloParkC, A.
    Google ScholarFindings
  • Zhang,B. andHorvath,S. (2005).A generaflramewofrokrweightegdeneco-expressinonetworkanalysiSs.tatA. pplG. enetM. ol.Biol.4 Art1. 7,45pp.(electroniMc).R2170433
    Google ScholarFindings
  • Zhao, P.,Rocha, G. andYu, B. (2009).Thecompositaebsolutpeenaltiefsamilfyorgroupeadnd hierarchivcaalriablseelectionA.nnS. tatist3.73468-3497M. R2549566
    Google ScholarFindings
  • Zhou,Y.,Jin,R.andHoi,S. C. H. (2010).Exclusivleassoformulti-tafsekaturselectionIn.Proceedingosfthe13thInternationCaolnferenocneArtificIinaltelligenacnedStatisti(cAs ISTAT).S JMLRW&CP.
    Google ScholarFindings
  • Zhu, J.,Zhang, B., Smith,E. N., Drees, B., Brem,R. B., Kruglyak,L., Bumgarner,R. E. andSCHADTE,. E. (2008).Integratilnargge-scafleunctiongaelnomidcatatodissect thecomplexiotyfyeastregulatonryetworkNsa.turGe enetic4s0854-861.
    Google ScholarFindings
  • Zou, H. andHastie, T. (2005).RegularizatiaonndvariablseelectiovniatheelasticnetJ..R.Stat. Soc.Ser.B StatM. ethodo6l7. 301-320M. R2137327
    Google ScholarLocate open access versionFindings
0
Your rating :

No Ratings

Tags
Comments
avatar
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn