推荐系统netflix获奖算法(8)

时间：2025-07-02

赢得netflix推荐系统大奖的算法

Later,werefertothesetwomodelsas[PQ4]and[PQ5].Interestingly,theRMSE=0.8928resultisthebestweknowbyusingapureRBM.Ifourgoodexperiencewithpostpro-cessingRBMbykNN[2]isrepeatable,onecanachieveafurthersigni cantRMSEreductionbyapplyingkNNtotheresiduals.However,wehavenotexperimentedwiththis.Finally,thereisasinglepredictorRBMwith50hiddenunitsand50day-speci chiddenunits,whichran70iterationstoproduceRMSE=ter,werefertothismodelas[PQ6].

VII.GBDTBLENDING

AkeytoachievinghighlycompetitiveresultsontheNet- ixdataisusageofsophisticatedblendingschemes,whichcombine3themultipleindividualpredictorsintoasingle nalsolution.Thissigni cantcomponentwasmanagedbyourcolleaguesattheBigChaosteam[14].Still,wewereproduc-ingafewblendedsolutions,whichwerelaterincorporatedasindividualpredictorsinthe nalblend.

Ourblendingtechniqueswereappliedtothreedistinctsetsofpredictors.Firstisasetof454predictors,whichrepresentallpredictorsoftheBellKor’sPragmaticChaosteamforwhichwehavematchingProbeandQualifyingresults[14].Second,isasetof75predictors,whichtheBigChaosteampickedoutofthe454predictorsbyforwardselection[14].Finally,asetof24BellKorpredictorsforwhichwehadmatchingProbeandQualifyingresults.Detailsofthissetaregivenattheendofthissection.

A.GradientBoostedDecisionTrees

Whilemajorbreakthroughsinthecompetitionwereachievedbyuncoveringnewfeaturesunderlyingthedata,thosebecamerareandveryhardtoget.Asweenteredthe nal30daysofthecompetition(“lastcallforgrandprizeperiod”),werealizedthatindividualpredictors,evenifnovelandaccurate,areunlikelytomakeadifferencetotheblend.Wespeculatedthatthemostimpactduringashortperiodof30dayswouldbeachievedbyexploringnewblendingtechniquesorimprovingtheexistingones.Blendingoffersalowerriskpathtoimprovementinashorttime.First,unlikeindividualpredictors,betterblendingisdirectlyconnectedtothe nalresult.Second,blendingsimultaneouslytouchesmanypredictors,ratherthanimprovingoneatatime.ThisledtotheideaofemployingGradientBoostedDecisionTrees,whichwasraisedtogetherwithMichaelJahrerandAndreasT¨oscher.Eventually,itdidindeedmakeacontributiontotheblend,thoughwehopedforamoresigni cantimpact.

GradientBoostedDecisionTrees(GBDT)areanadditiveregressionmodelconsistingofanensembleoftrees, ttedtocurrentresidualsinaforwardstep-wisemanner.Inthetra-ditionalboostingframework,theweaklearnersaregenerallyshallowdecisiontreesconsistingofafewleafnodes.GBDTensemblesarefoundtoworkwellwhentherearehundredsofsuchdecisiontrees.Standardreferencesare[5,6],andaknownimplementationisTreenet[16].

3Whileweuseherethegenericterm“blending”,themoreaccurateterm

wouldbe“stackedgeneralization”.

GBDTcombineafewadvantages,includinganabilityto ndnon-lineartransformations,abilitytohandleskewedvariableswithoutrequiringtransformations,computationalro-bustness(e.g.,highlycollinearvariablesarenotanissue)andhighscalability.Theyalsonaturallylendthemselvestoparallelization.Thishasmadethemagoodchoiceforseverallargescalepracticalproblemssuchasrankingresultsofasearchengine[9,17],orquery-biasedsummarizationofsearchresults[10].Inpracticewehadfoundthem,indeed,very exibleandconvenient.However,theiraccuracylagsbehindthatofNeuralNetworkregressorsdescribedin[13].

TherearefourparameterscontrollingGBDT,whichare:(1)numberoftrees,(2)sizeofeachtree,(3)shrinkage(or,“learningrate”),and(4)samplingrate.Ourexperimentsdidnotshowmuchsensitivitytoanyoftheseparameters(exactchoicesaredescribedlater.)

SinceGBDTcanhandleveryskewedvariables,weaddedtothelistofpredictorsfouradditionalfeatures:usersupport(numberofratedmovies),moviesupport(numberofratingusers),frequencyanddateofrating(numberofdayspassedsinceearliestratinginthedataset).

WeappliedGBDTlearningontheaforementionedsetsof454and75predictors.TheProbesetisusedfortrain-ingtheGBDT,whichisthenappliedontheQualifyingset.Parametersettingsare:#trees=200,tree-size=20,shrink-age=0.18,andsampling-rate=0.9.Theresults,whicharein-cludedintheblend,areofRMSE=0.8603(454predictors)andRMSE=0.8606(75predictors).

Whenworkingwiththemuchsmallersetof24BellKorpre-dictors,weusedthesettings:#trees=150,tree-size=20,shrink-age=0.2,andsampling-rate=1.0.TheresultofRMSE=0.8664wasincludedintheblend.

Itisalsobene cialtointroduceaclusteringofusersormovies,whichwillallowGBDTtotreatallusers(ormovies)ofacertainkindsimilarly.Inthepast[2],wetoutedsplittingusersintobinsbasedontheirsupport,andapplyinganequalblendingstrategyforallusersinthesamebin.ThisisalreadyaddressedintheGBDTimplementationdescribedabove,thankstoaddingtheusersupportvariabletotheblendedfeatures.Howeverwecanintroduceadditionalkindsofuserrelationshipstothescheme.Forexample,amatrixfactorizationmodelcomputesashortvectorcharacterizingeachuser(auserfactor).Like-mindedusersareexpectedtogetmappedtosimilarvectors.Hence,addingsuchvectorstotheblendedfeaturesetswilleffectivelyallowGBDTtosliceanddicetheuserbaseintosubsetsofsimilarusersonwhichthesameblendingrulesshouldbeapplied.Thesamecanbedonewithmovies.

Weincludedintheblendthreeformsofthisidea,allappliedonthesetof24BellKorpredictors.FirstweaddedtotheblendedpredictorsfeaturesfromthetimeSVD++model(16)ofdimensionalityf=20.Thisway,allindividualbiastermswereaddedasfeatures.Inaddition,foreachmovie-userpairu i,weaddedthe20-Dmoviefactor(qi+qi,fui),andthe

20-Duserfactorp)+|R(u)| 1

u(tui∑j∈R(u)yj.ThisresultedinRMSE=0.8661.

Second,weusedthe20hiddenunitsofanRBMasa20-Duserrepresentation(inlieuofthetimeSVD++user

…… 此处隐藏：3316字，全部文档内容请下载后查看。喜欢就下载吧 ……

推荐系统netflix获奖算法(8).doc 将本文的Word文档下载到电脑

下载这篇word文档