推荐系统netflix获奖算法(8)
发布时间:2021-06-07
发布时间:2021-06-07
赢得netflix推荐系统大奖的算法
Later,werefertothesetwomodelsas[PQ4]and[PQ5].Interestingly,theRMSE=0.8928resultisthebestweknowbyusingapureRBM.Ifourgoodexperiencewithpostpro-cessingRBMbykNN[2]isrepeatable,onecanachieveafurthersigni cantRMSEreductionbyapplyingkNNtotheresiduals.However,wehavenotexperimentedwiththis.Finally,thereisasinglepredictorRBMwith50hiddenunitsand50day-speci chiddenunits,whichran70iterationstoproduceRMSE=ter,werefertothismodelas[PQ6].
VII.GBDTBLENDING
AkeytoachievinghighlycompetitiveresultsontheNet- ixdataisusageofsophisticatedblendingschemes,whichcombine3themultipleindividualpredictorsintoasingle nalsolution.Thissigni cantcomponentwasmanagedbyourcolleaguesattheBigChaosteam[14].Still,wewereproduc-ingafewblendedsolutions,whichwerelaterincorporatedasindividualpredictorsinthe nalblend.
Ourblendingtechniqueswereappliedtothreedistinctsetsofpredictors.Firstisasetof454predictors,whichrepresentallpredictorsoftheBellKor’sPragmaticChaosteamforwhichwehavematchingProbeandQualifyingresults[14].Second,isasetof75predictors,whichtheBigChaosteampickedoutofthe454predictorsbyforwardselection[14].Finally,asetof24BellKorpredictorsforwhichwehadmatchingProbeandQualifyingresults.Detailsofthissetaregivenattheendofthissection.
A.GradientBoostedDecisionTrees
Whilemajorbreakthroughsinthecompetitionwereachievedbyuncoveringnewfeaturesunderlyingthedata,thosebecamerareandveryhardtoget.Asweenteredthe nal30daysofthecompetition(“lastcallforgrandprizeperiod”),werealizedthatindividualpredictors,evenifnovelandaccurate,areunlikelytomakeadifferencetotheblend.Wespeculatedthatthemostimpactduringashortperiodof30dayswouldbeachievedbyexploringnewblendingtechniquesorimprovingtheexistingones.Blendingoffersalowerriskpathtoimprovementinashorttime.First,unlikeindividualpredictors,betterblendingisdirectlyconnectedtothe nalresult.Second,blendingsimultaneouslytouchesmanypredictors,ratherthanimprovingoneatatime.ThisledtotheideaofemployingGradientBoostedDecisionTrees,whichwasraisedtogetherwithMichaelJahrerandAndreasT¨oscher.Eventually,itdidindeedmakeacontributiontotheblend,thoughwehopedforamoresigni cantimpact.
GradientBoostedDecisionTrees(GBDT)areanadditiveregressionmodelconsistingofanensembleoftrees, ttedtocurrentresidualsinaforwardstep-wisemanner.Inthetra-ditionalboostingframework,theweaklearnersaregenerallyshallowdecisiontreesconsistingofafewleafnodes.GBDTensemblesarefoundtoworkwellwhentherearehundredsofsuchdecisiontrees.Standardreferencesare[5,6],andaknownimplementationisTreenet[16].
3Whileweuseherethegenericterm“blending”,themoreaccurateterm
wouldbe“stackedgeneralization”.
8
GBDTcombineafewadvantages,includinganabilityto ndnon-lineartransformations,abilitytohandleskewedvariableswithoutrequiringtransformations,computationalro-bustness(e.g.,highlycollinearvariablesarenotanissue)andhighscalability.Theyalsonaturallylendthemselvestoparallelization.Thishasmadethemagoodchoiceforseverallargescalepracticalproblemssuchasrankingresultsofasearchengine[9,17],orquery-biasedsummarizationofsearchresults[10].Inpracticewehadfoundthem,indeed,very exibleandconvenient.However,theiraccuracylagsbehindthatofNeuralNetworkregressorsdescribedin[13].
TherearefourparameterscontrollingGBDT,whichare:(1)numberoftrees,(2)sizeofeachtree,(3)shrinkage(or,“learningrate”),and(4)samplingrate.Ourexperimentsdidnotshowmuchsensitivitytoanyoftheseparameters(exactchoicesaredescribedlater.)
SinceGBDTcanhandleveryskewedvariables,weaddedtothelistofpredictorsfouradditionalfeatures:usersupport(numberofratedmovies),moviesupport(numberofratingusers),frequencyanddateofrating(numberofdayspassedsinceearliestratinginthedataset).
WeappliedGBDTlearningontheaforementionedsetsof454and75predictors.TheProbesetisusedfortrain-ingtheGBDT,whichisthenappliedontheQualifyingset.Parametersettingsare:#trees=200,tree-size=20,shrink-age=0.18,andsampling-rate=0.9.Theresults,whicharein-cludedintheblend,areofRMSE=0.8603(454predictors)andRMSE=0.8606(75predictors).
Whenworkingwiththemuchsmallersetof24BellKorpre-dictors,weusedthesettings:#trees=150,tree-size=20,shrink-age=0.2,andsampling-rate=1.0.TheresultofRMSE=0.8664wasincludedintheblend.
Itisalsobene cialtointroduceaclusteringofusersormovies,whichwillallowGBDTtotreatallusers(ormovies)ofacertainkindsimilarly.Inthepast[2],wetoutedsplittingusersintobinsbasedontheirsupport,andapplyinganequalblendingstrategyforallusersinthesamebin.ThisisalreadyaddressedintheGBDTimplementationdescribedabove,thankstoaddingtheusersupportvariabletotheblendedfeatures.Howeverwecanintroduceadditionalkindsofuserrelationshipstothescheme.Forexample,amatrixfactorizationmodelcomputesashortvectorcharacterizingeachuser(auserfactor).Like-mindedusersareexpectedtogetmappedtosimilarvectors.Hence,addingsuchvectorstotheblendedfeaturesetswilleffectivelyallowGBDTtosliceanddicetheuserbaseintosubsetsofsimilarusersonwhichthesameblendingrulesshouldbeapplied.Thesamecanbedonewithmovies.
Weincludedintheblendthreeformsofthisidea,allappliedonthesetof24BellKorpredictors.FirstweaddedtotheblendedpredictorsfeaturesfromthetimeSVD++model(16)ofdimensionalityf=20.Thisway,allindividualbiastermswereaddedasfeatures.Inaddition,foreachmovie-userpairu i,weaddedthe20-Dmoviefactor(qi+qi,fui),andthe
20-Duserfactorp)+|R(u)| 1
u(tui∑j∈R(u)yj.ThisresultedinRMSE=0.8661.
Second,weusedthe20hiddenunitsofanRBMasa20-Duserrepresentation(inlieuofthetimeSVD++user
上一篇:火车票售票管理系统
下一篇:网店加盟、实体加盟合同规范。